I was cruising down the highway around 110~120 kmph, though the car was capable of running at 150+ kmph I chose to keep it below 120 as the thought at the back of my mind always says it is not going to stop quickly or control the direction well in case I need to. When I am very sure that I have an open & straight road, I test the limits of the car, but will quickly pull back to manageable speeds when a turning comes in sight. During one of those high speed bursts of 160 kmph, a sports car overtook me. It did not just overtake, instead it zoomed past and disappeared out of sight. Enjoying speed was not much about the road, it was the control available in a vehicle for a driver. Sports cars don’t just go fast, they turn well, stop quickly and have lots of safety bits to protect occupants from a crash. You could ram a sports car at a high speed into a wall and walk away from the crash. If I use my passenger car downhill at 200 kmph (which I still can), that is insanity; it is not going fast.

It was when I had these thoughts that I stumbled on an article pointing out that developers who are eying for speed often compromise the safety aspects. In software development there are plenty of aspects to take care. In simple terms it is taking a problem and solving it using computers by people with various skill sets. You have analysts, developers, designers, operations etc. The very nature of different people getting involved means there is lots of communication, if there is lots of communication between people of different skill sets then there is translation loss. If there is translation loss then there will be misunderstanding and rework. If you need to rework often, then the speed at which you can code matters. If speed matters, then better be safe.

Test harness consisting of unit, integration and functional tests, static analysis, performance checks, automated deployments, coding practices all together form the safety package for software development. As the code base grows and the number of people increase the more important the safety checks become. It will always be tempting to avoid the process and get something out quickly but the price to pay will be bad. There is nothing prudent in crash landing.

Another aspect of speed that is also often compromised is sustainability. The common example given to agility and speed is Cheetah, Cheetahs can maintain its top speed only for about 90~120 seconds followed by a long dip in physical activities. Any activity that requires a spike in the output is followed by a dip. There is nothing called sustainable peak performance.

Violating safety or sustainability of speed removes control out of the equation, it makes sense only if we are crash worthy and have the energy and resources to get back to normal. Speed for the sake of speed will thrill and eventually kill.

 

 

HelpToyota introduced the andon cord in their manufacturing lines to help people to stop the production line and alert the people around if an abnormal situation arises. It is a cord that hangs above the head within easy reach of any to help to immediately gather the attention of the others. While the rest of the industry was treating production line is something that never should be stopped and let quality control take care, this one put power in the hands of people on the line to take a call to improve quality by attacking the problem at its source.

The idea then widely got adopted in different forms like ‘Help’ buttons in various manufacturing sectors. It was easy to adopt in manufacturing as we can see what is going wrong and gather together to immediately fix the problem. In software development it is not obvious when to pull the andon cord and how to collect thoughts of people on what to solve.

huddle

Dev huddles are the answer to the andon cord in software development. Huddles are very common in any team sports, team members quickly huddle to celebrate a goal or discuss a plan. The team also optimises over time to communicate very effectively in fewer words and quick time.

When to call for a dev huddle?
We need to call for a dev huddle when

  • Our programming has deteriorated from a flow to brute force or trial/error method. Manuals did not help to resolve and even a quick help from another team member did not help.
  • We find a badly developed code and need to bring it to attention of the team when it is still fresh in mind.
  • We are about to make a major change in the code base and every body needs to be aware of the incoming change so they are not surprised.

How to make sure that we don’t waste other’s time?

Team time interruption is expensive, so before calling for a dev huddle we need to make sure that we have all our show and tell pieces ready for discussion. One example is when stuck at a problem, quickly jot down the problem and the steps tried to resolve that did not work on the whiteboard in diagrams or words; then call for the huddle. If the resolution or direction is not found within 10~15 minutes, break the huddle to come and do a detailed research on the problem.

Why should dev huddles work?

Dev huddles work on the idea of crowd wisdom, the average output of a group is always higher than best individual in the group in most cases. Ideas & solutions can come from any one provided they are given a good explanation about the problem. Dev huddles also help in knowledge sharing in a terse & effective manner.

I got a chance to walk on the Golden Gate bridge, could not stop admiring the beauty of the surrondings and the bridge. One can see what the bridge is subjected when taking a walk on this one, I arrived at one edge of the bridge when it was bright & sunny and by the time I reached the other end it was misty & foggy. It is constantly pounded by waves & high speed wind; and takes a lot of traffic. It left me curios to find the engineering behind it, after a while I stumbled on a documentary made by Nat Geo titled ‘Impossible bridges – Golden Gate’.

The engineers thought that it is impossible to build such a bridge because the water was too deep and winds were too fast and the distance is too long to cover. Even if built people wondered about how could it be sustained. Nat Geo’s documentary showed how the crew worked over the years to strengthen the bridge, rework the floor without disrupting much traffic flow. There were innovations like suspended traveling platforms to work on the cables. The bridge withstood an earthquake and was  strengthened to withstand stronger earthquakes. At the end of the documentary, the narrator mentions, he is sure that all parts of the bridge has been replaced at least twice but due to the engineers’ skill, people never knew that they are looking at a new bridge.

Golden Gate

The same holds good for long running software projects, many softwares over the course of time change so much internally based on the people working on it, which Richard Gabriel calls as Habitability in his book ‘Patterns of software architecture’. If the engineers who took over the maintenance of Golden Gate have not replaced the bridge over the course of time we would have a big heap of metal lying on the sea bed due to structural erosion. It is the same analogy that applies to software called ‘big ball of mud’ due to software rot.

The biggest reason given for software rot is that there is not enough time. Architecture is a long term concern, little beyond the learning horizon as mentioned by Peter Senge in his book ‘Fifth Discipline‘. We are not able to observe the effects of software architecture in the near term as there are very urgent business needs that keep coming up and they are never tangible like bridges; visualising them requires experience and skill. The advantage a software has is that it can be made more robust even after it has been built hastily.

How do we find that the architecture & design needs a revamp?

  • If the change to the software looks simple and trivial but takes a bafflingly disproportional amount of time to fix.
  • If it is difficult to visualise the code and explain to others in simple abstractions.
  • If a technology upgrade looks infeasible without a rewrite.
  • If there is too much dependency on individuals and context.

The revamp is not just about modular design & clean code but also adequate test coverage along with a robust continuous delivery mechanism similar to how the Golden Gate’s engineers used suspended traveling platforms to carry out their repair work on the cables without affecting traffic. A successful well maintained app that takes advantage of new technologies and changes hands would definitely been written many times over and no one would have noticed it.