Tuesday, June 05, 2007

Bohrbugs and Heisenbugs

As promised, here are views of the newly decluttered and rearranged office/guest room. Mark hung up the wall cabinets and shelves on the north wall and in the closet on the south side. Seiri and Seiton are harder work than hanging cabinets or shelves. ;-)
We finally have usable workspace that does not require moving piles of stuff from one surface to another. The walls, now that we can see them, sure look white. I need to hang up some of the artwork stored in the closet.

We have been experiencing sporadic internet outages in the months since Time Warner Cable took over from Adelphia. Calls to customer service elicited nothing more than instructions to reboot the cable modem and the computer. That was inconvenient, but usually solved the problem.

However, since last weekend, we have had sporadic internet connectivity. That is, we have had internet outage with only very short periods of internet connectivity. First, we were told that our area was experiencing an outage and that there was nothing to do but wait it out.

Later, they said that the problem was in our wiring and that they would send someone out to repair it on Thursday. Thirdly, we were told that the outages were due to old Adelphia software that had not been deleted from their end. They said they deleted the Adelphia software and installed Time Warner software.

Alas, hours later, the internet failed yet again. I called customer service back. After wasting my time again convincing a gatekeeper tier 1 troubleshooter that the problem was serious, I was transferred to a tier 2 troubleshooter. Again, she claimed to delete the old Adelphia software and install the Time Warner software. How does software reinstall itself? This time, she gave me a ticket number which I can read to the gatekeeper to immediately be transferred to tier 2 again. I wonder when I will be allowed to talk to tier 3? (In grad school, I used to get tier 3 automatically at my office because they recognized the phone number.)

What does any of this have to do with the title? Is it a Bohrbug or a Heisenbug? A Bohrbug can always be replicated with a specific combination of events. A Heisenbug occurs sporadically in response to a set of not well understood and replicable computer and network states. We are not sure yet; the tier 2 troubleshooter gave us a ticket for sporadic outages. Just in case, we are holding on to the appointment for someone to come test our line.

In grad school, my research group discovered a floating point error in a new computer chip. Molecular dynamics simulation code from our group became a standard floating point test for the computer manufacturer. It would be nice to be able to pick up the phone and speak to a level 3 or higher engineer/troubleshooter whenever I had a problem. But, I would not like to experience the number of problems that would get me flagged as someone who only calls about extremely difficult problems. Such is life.

Iris has been reading Nancy Drew books obsessively. She can read one each weeknight and two on a weekend day. She asked me if I wanted to be a sleuth. I told her that I am one.

She said, "No, you are a scientist."

I told her that, sometimes, I have to investigate what went wrong and how to right things again. That makes me a technical sleuth. (I didn't get into a discussion with her about how all science is investigatory. We didn't have that much time before school.)

I served on an independent review team (IRT) this Spring for a project that is over budget and behind schedule. (I finally had a chance to brief a general but had to turn it down due to schedule conflicts; I had to be in 3 meetings and a birthday party on two coasts that day.) She (the general), also wanted us to outbrief the reviewees as a courtesy.

This morning, we did that (only 1 kilometer from Iris' school!), giving them our recommendations and giving them a chance to respond. The review findings will then be passed on up the chain of command. Maybe, we will have a positive effect, maybe not. I have never served as a member of an IRT before, though I have done a technical study for one.

One thing that struck me while sitting in the presentations made to the IRT in our investigation is how often the same problems recur. They happen so infallibly, in a variety of projects, that I wonder if the true problem is that we do not properly plan for them.

For instance, take that budget spending chart that has dates along the horizontal (project start at left, end at right) and dollars along the vertical axis. The graph looks like a straight diagonal line from 0 at the bottom left and the budget limit at the top right. The actual spending is plotted over the straight line. Any deviation must be corrected to get back to the straight line. Let's not discuss whether a straight line is realistic for software delivery schedules.

Year after year, we saw them get behind in July. I asked why and was told that it was summer vacations. There were also smaller dips during the holidays and spring break. So, why don't they adjust their spending plan and work schedule to account for the fact that people have lives outside of work? We are treating a Bohrbug as a Heisenbug.

That was one of my recommendations. But it got reworded into something unrecognizable (by me).

No comments:

Post a Comment