Monday, December 2, 2013

How do we think?


No, I'm not talking about how our neurons communicate.  Nor am I talking about what we think about (there are plenty of blogs on that topic).  Rather, I want to discuss the mental process of moving from an elementary idea or an ill-understood problem to a well-structured proposal or solution.  Note that I think that we can and should use a very similar process in at least two somewhat different circumstances:
  • Elaborating on a very high-level, briefly expressed, requirement into a more complete definition that is sufficient for implementation.
  • Understanding the details and underpinnings of a bug sufficiently that you can define what to do about it.
Even though this blog is directed at engineers, I suggest that we would do well to keep the principles of the scientific method in mind.  What do I mean by that?  Well, when we look at the incredible simplicity and beauty of formulas like F=ma, E=IR, E=mc**2, it is very tempting to think that these laws sprang fully formed from the brains of the authors.  As brilliant as Newton, Ohm, and Einstein were, the reality is much messier than that.  First of all, in the broad sweep of history, it took generations to get past the idea of Earth, Air, Fire, and Water to get to the point where we could even contemplate F=ma.   (Newton said, "if I have seen far, it is because I stood on the shoulders of giants").  At the individual level, even those geniuses took many years to fully understand what was going on and to fully distill that understanding to a simple and coherent statement of truth.

So, my first message: give yourself permission to do what Newton and Einstein did - i.e. give yourself permission not to completely understand the problem at the beginning of your exploration.

OK, so what does that mean in practice?  Just as most modern software development life cycle methodologies build upon the principle of iterative refinement, our thinking process should do the same.  Don't try to come to a conclusion too quickly - give yourself permission to more completely understand the problem space, and to get an idea of what is going on and what you even need to observe, before you try to formulate a rigorous experiment and/or to formulate a coherent proposal.

In the language of science and mathematics, you need to do some somewhat more unstructured playing and observing before you can:
  • Design the experiment (determine what steps/operations you need to perform).
  • Decide what the independent variables are (these are the things that you want to vary in the process of the experiment - you are trying to determine what the impact of these variables is).
  • Decide what the dependent variables are (these are the things that you want to make careful observations of during the experiment.
  • In the end, come to a conclusion about causes and effects and what to do about them.
I've been a bit abstract so far, so let me give some examples, which I will draw from ControlPoint development.

Analyzing a bug

We had 2 or 3 customers complain that ControlPoint was marking many (but not all) of the SharePoint users as having logged on recently, even though they had not.  (As an interesting aside - Active Directory maintains this information, and these customers were using this to identify inactive accounts that might need to be shut down – ControlPoint’s behavior was interfering with that...)

Of course, the first step was to confirm that the application really is causing this (that it wasn't something else in the environment - cause and effect can be a slippery thing at times.). We were able to recreate the problem, although not consistently, so we were comfortable that we were at least triggering it.  However we didn't yet know why it happened on one server but not another.

So, the next question was whether the application was actually doing this explicitly.  Logically, this seemed unlikely - doing this explicitly would mean that we would need to have the credentials for all of those accounts, which we didn't have.  And, empirically, we did not find any code that obviously did a logon.

 This meant that this was probably a side effect of something else we were doing, but what was that (and was there anything we could do about it?)

 At about this time we realized (partly through logic and partly through observation) that anytime that active directory recorded that the user had logged in recently, the event viewer recorded a login event in the security section.  This gave us a much more useful (in the sense of being more immediate) way of observing the effect - we had now identified our dependent variable for experimentation.  (Note that at this point, we are still exploring, and aren't ready for a rigorous experiment yet, because we haven't identified the independent variables, and don't yet have a hypothesis to test.).  This allowed us to confirm something we suspected, that the problem was occurring somewhere in our nightly discovery process.  But what part of discovery, and what specific action was causing the effect?

 We observed that from the perspective of ControlPoint, there are a number of different user types, that we do different things with - there is the user who is running ControlPoint, there are users who are authorized to run ControlPoint, as a subclass of those, there are business admins, and finally, there are the ordinary users of SharePoint who have been given rights to SharePoint (but not ControlPoint).  So, we formed the hypothesis that the type of user may have been what was different between cases where we observed the problem and didn't observe the problem - we now had the independent variable for our experiment, and we were now ready to graduate from relatively unstructured experimenting and observation to create a carefully structured experiment.

The experiment took the following shape: create brand new accounts (since in SharePoint existing accounts can be treated somewhat differently by SharePoint) with the following characteristics:
  • Farm admin who is also a ControlPoint (ordinary) admin
  • Farm admin who is also a ControlPoint business admin
  • Site Collection admin who is a ControlPoint (ordinary) admin
  • Site Collection admin who is a ControlPoint business admin
  • User with Full Control who is a ControlPoint business admin
  • User with Full Control who has no rights to ControlPoint 
Having distinct and separate accounts, allowed us to clearly identify the impact of the type of account – in other words, the account type was an independent variable.

 We ran discovery, and then observed which accounts triggered a login event in the event viewer log.  In the end, this allowed us to determine that it was any user with rights to use ControlPoint, which in turn allowed us to narrow it down to a WindowsIdentity system call, which we were using to determine what groups the user was in so that we could determine what rights that user might be getting from those groups. (It appears that the operating system is doing an implicit impersonation of the user, which in turn amounts to a login...)

Armed with that knowledge, we were able to come up a different mechanism to get the list of groups, and thereby avoid the impersonation/login for each of the users.

Note that the process above unfolded over the course of a couple of weeks – clarity does not come in a flash (even Archimedes’ moment of “Eureka” followed a lot of thinking!)

As a side note: there is another non-technical process that you should use here: give the problem some thought, explore the details and the alternatives, and then intentionally set it aside, ideally at least overnight.  Your brain has a remarkable background processor that works on problems while your attention is elsewhere – when you come back to the problem, it is often a lot clearer than when you set it aside.

Responding to requirements

Consider the requirement “We need to be able to duplicate workflows”.  As developers, our first objective is to elaborate this into enough detail that we can know what we need to build, that we can ensure that we have the same idea of what is needed as the product owner, and that we can fairly accurately estimate the effort for this.  Normally, that means that we need to understand workflows enough to give the product owner some useful background, to ask intelligent targeting questions, and ultimately to propose a set of functionality that delivers useful value to the customer (the goal of agile, of course) while providing value to the company.

Unless you are already an expert in SharePoint workflow, getting to that understanding requires some time exploring, reading, and experimenting – the result of that may be the understanding that the following factors affect the duplicate functionality:
  • Version of SharePoint (2007, 2010, 2013), and compatibility among versions (e.g. 2013 supports 2010 style workflows, but also supports an entirely new workflow architecture)
  • Types of workflow (Out of the Box, individually defined, reusable)
  • The elements of the workflow, i.e. the definition, the association with a particular library, the instances of the workflow, the history list, the task list
  • Versioning of the workflow definition (i.e. an older instance might still be running with an older version of the workflow rules, and a newer instance running with a revised set of workflow rules)
Given this increasing understanding of workflows, this exercise is not going to culminate in a hypothesis and a rigorous experiment, but it does lead to a step of increasing rigor in understanding and questions.  So, at this point, we are ready to more completely articulate what resources/artifacts are involved in a workflow, and what resources/artifacts are shared among different workflows, and therefore need consideration when duplicating workflows.  We’re now prepared to have a discussion with the product owner of whether the initial implementation might be limited to 2010 and 2007 style workflows.  And we can think about what it means to duplicate a workflow that involves resources that are shared with other workflows.

What is common among those examples?
  • Recognizing that you won’t always understand the problem space up front
  • Recognizing that your understanding will improve iteratively
  • Setting a goal of increasing the precision of your understanding with each iteration.

No comments:

Post a Comment