Wednesday, May 16, 2012

Egoless Practice: Becoming the Best in your Field

[Full post on other blog.]

Jerry Weinberg coined the term, "egoless programming" in his 1971 book "Psychology of Computer Programming". Jerry describes the practice and mindset, and in 1977 co-wrote with Friedman, the definitive manual for practitioners:  "Handbook of Walkthroughs, Inspections, and Technical Reviews: Evaluating Programs, Projects, and Products".

Is there a precise definition of "egoless programming" that could be expanded to a generic Professional Behaviour of "egoless practice"?

Johana Rothman is quoted by Jeff Atwood, presumably from a book, as saying:
Egoless programming occurs when a technical peer group uses frequent and often peer reviews to find defects in software under development. The objective is for everyone to find defects, including the author, not to prove the work product has no defects. [my italics]
When asked for a modern definition, Jerry pointed at Jeff's Ten Commandments of Egoless Programming.

The field of Reliability Engineering is aimed at creating near-Perfect (i.e. highly reliable) operation from imperfect parts and sub-systems. This approach can work very well, even when maintenance and fixes can't be done: the NASA Mars Rovers, Spirit and Opportunity, exceeded their 90-day design life by around 15 times, working from 2004-2010.

A working definition (unfortunately, of many parts).

Egoless Practice is:
  • a Professional Behaviour
  • designed to 
  • routinely and reliably achieve
  • as Perfect as Possible outcomes
  • for the Client or Service Recipient
  • by knowledgable and skilful
  • Practitioners
  • supported by systems, processes and procedures
  • that actively monitor, examine and report performances,
  • for both failures and successes,
  • to systematically and without-backsliding improve 
  • Quality, Performance and Process
  • of Individuals, Teams and Organisations.
To Err is Human isn't a syllogism, it is an Iron-Clad Law.

It's the basis of the unending, relentless Professional Challenge:
  • we're not machines,
  • we cannot ever exactly repeat a process, not even twice, let alone the many times every day needed in Professional Practice, and
  • our Minds and Bodies are always letting us down or tricking us in some way.
Simply stated: We are constantly making mistakes, inadvertently or not.

Monday, May 14, 2012

The unnoticed Crisis in Healthcare

This paper on solving the Quality of Care crisis in Healthcare, "An NTSB for Healthcare", made me wonder why nobody was talking about another long-running, endemic Crisis in Healthcare:
In trying to spend less, it costs more to provide less of a worse service.The more we try to cut costs, the more it will cost and there is no simple way out: the system is locked into this craziness.
Doing "more of the same" not only cannot break us out of the rut, it pushes us deeper into it
W. Edwards Deming, the person responsible for the Quality Improvement movement in Japan that also forced a revolution in manufacturing the United States in the 1980's, was very clear on this:
  • When people and organizations focus primarily on quality, defined by the ratio (Results of Work Effort / Total Effort), quality tends to increase and costs fall over time.
  • However, when people and organizations focus primarily on costs, costs tend to rise and quality declines over time.
Turning around any system spiralling out of control cannot be done by "more of the same", but needs careful attention to causes and the underlying systems. As Quality Improvement has repeatedly shown, focussing on "Doing Things Right First Time, Every Time", is a remarkably effective means of effecting even very large turn-arounds.

No sane Politician or Healthcare Administrator/Bureaucrat would intentionally do or allow this downward spiral, but the effect is slow and insidious starting and irreversible when started. That we have gone there in many US Hospital Systems is more than adequately documented in Dr. Otis Brawley's "How we Do Harm".

The good news is that there a number of high profile Hospital Systems that seem to have avoided this crisis: Mayo Clinic, Cleveland Clinic, Dartmouth-Hitchcock Medical Center, Denver Health, Geisinger Health System and Intermountain Healthcare.

All these Hospital Systems make Quality of Care their priority and indirectly achieve much better financial outcomes (20-30% less per service), illustrating Dr. Deming's assertion. Such "Systemic Quality" techniques not only yield better Patient Outcomes, but Optimal Care Costs because the Active Learning underpinning it necessarily includes Process and Performance Improvement.

You'd expect decision makers would want to lower costs, improve Quality and improve staff morale and employment conditions. And I believe they would, if they properly understood both the problem and the solution.

The central challenge with the theory posited here is twofold:
  • proving its more than a theory, and
  • convincing Decision Makers of the problem and that "more of the same" cannot be a solution.
The definitive theoretical works on how this counter-intuitive effect presents in Computing, Virtual Memory "Thrashing", started in 1968 with the first paper on "Working Set" theory. It's not overstating the fact that without this work (theory + proof-in-practice) computers as we know them could not exist.

A related computing theory, the "Universal Scalability Law" (USL), applies more widely than computers, as shown by this piece on Projects and the "Mythical Man Month" (adding more people to a late project makes it later).

This is the counter-intuitive world that in Computing we call "Thrashing", in Catastrophe Theory a "tipping point" and in everyday parlance "past the point of no return" or "starting down a slippery slope". Even sometimes, "in a flat spin", meaning "with no way out".

These all occur when a system or thing is irreversibly pushed past a critical point or limit and then the rules of the game change. Much like stretching out the small spring from a retractable ballpoint pen renders it useless. It cannot be properly remade because the steel has been stretched permanently past its elastic limit. There's a different effect in "Memory Metals" which return to their original shape when heated, but you can't make springs out of them, only automobile body panels.

There are a huge variety of examples of this that breakdown roughly into 4 types:
  • Dynamic systems that exceed a critical threshold. E.g.:
    •  a car, motorcycle, pushbike or skateboard "fishtails" or "tank slaps".
    • Aeroplanes and rockets experience violent, uncontrolled oscillations like a flag: "flutter".
    •  Ice Skating has the term "Death Spiral": a person can't get up from this without help once locked in.
  • Dynamic systems that go below a threshold. E.g.
    • Aircraft in "the Region of Reverse Command" or "behind the Power Curve". When flying too slow, with the nose pointing too high, planes are more like kites. Applying more power, pushes the nose higher and the plane flies slower. Reducing power puts you into a stall and if you're on take-off, you crash.
    • Riding a pushbike or motorcycle to a stop without putting your feet down (especially when you can't reach the ground). Because the centrifugal forces from the wheels are no longer holding you up, you can quickly overbalance and not be able to right yourself. 
  • Static systems that exceed a critical limit. E.g.:
    • The overstretched spring cannot be put back.
    • A paper-clip can be straightened, but never properly reformed without weak spots.
    • Plastic items or toys that are bent too far and crease, forming weak spots.
    • Letting Ice Cream melt. All the bubbles escape, it separates and won't reform easily.
    • Plastic Film and Duct or Gaffer Tape: is very strong until a small nick is made in it. Then it tears easily for as long as force is applied.
    • Nylon fabric in a tent or flag: very strong until nicked, then will tear along its whole length.
    • Touching the inside of a tent during rain makes it leak. While the surface tension isn't broken, water doesn't drip. Once broken, the drip cannot be stopped.
  • Static systems that go below a critical limit E.g.:
    • Chocolate that is cooled too far assumes a white, powdery appearance as the fats/oils separate out.
    • A drop of water in a hot skillet will happily float on a cushion of steam if small enough to start with, but when it gets too small it becomes unstable and explodes when it touches the skillet.
    • Soap bubbles support themselves and happily float around in the air, until the film becomes too thin (from evaporation or flow) to withstand the slightly higher internal air-pressure and they explode.
    • Snow Skis and sled runners 'glide' by  melting snow with pressure. In extreme cold, this effect stops because the pressure doesn't melt the snow. Unseasonably cold weather was a contributing factor in the failure of Robert Scott's Antarctic expedition. The man-hauled sledges became very hard to pull when they stopped sliding over the snow and started digging in like they were in sand. 
There are some other dynamic systems that most drivers are very aware of:
  • Overbraking leads to the tyres skidding as the friction melts the rubber and you're suddenly sliding on a thin film of liquid rubber. For drivers encountering this for the first time, the though of releasing the brakes, not pushing harder, is usually terrifying. "ABS" braking solves this by automatically releasing the brakes and re-applying them.
  • The opposite effect is high-powered cars spinning their wheels when accelerating. The wheels continue to slide until power is reduced enough to regain traction.
  • Cornering or swerving too fast, usually in slippery conditions like ice, mud or rain, results in some or all the wheels losing traction. There are no good recovery techniques for an all-wheel slide. When only the back wheels have lost traction, the classic "steer into the slide" technique works - which for those new to it, is usually counter-intuitive.
In all these situations, once "traction is lost", control is lost unless specific recovery measures are taken.
Once a rubber tyre starts to slide, it will continue to slide at that and previously tractable speeds.
Recovery isn't just a matter of reverting just a little, but often quite a lot until the rubber stops melting or sliding. Once traction is restored, it will again stay adhering until the critical limit is reached again. "Good car control" is often staying just below the critical limit and maintaining maximum friction without slipping.

These counter-intuitive effects are well understood in General Systems Theory. One of the essential understandings is that to optimise whole system output, at most one sub-system can be optimised. All others have to run with some "slack" to allow the best outcomes of the whole system.

The necessary ingredient to create a system which can sink into "Reversal of Command" type dysfunction is two opposing system response curves:
  • The "normal" response curve where increasing staff numbers (i.e.higher staff costs, more time per patient and more individual "slack" time) results in more throughput, but at the cost of lower "cost effectiveness" per patient, and
  • The "stressed" response curve, where low staff numbers creates higher absentee and sickness rates, increases Medical Errors and Adverse Events, increases staff-overtime for those able to work, increased time-pressure creates more stressed staff, reduces their job satisfaction and radically increases turn-over. Because the total demand for care has not reduced, extra staff have to be found: either through overtime, substitution of under-qualified staff or hiring expensive Agency staff. Overly tired staff not only work slower, but miscommunicate more, are worse at detecting errors and omissions  and make inordinately more clerical errors, requiring extra time to correct.
There is an Optimum Staff Cost point: the most cases are treated for the lowest staff costs.
Attempting to reduce staff costs below this point is counter-productive. The "stressed" response curve takes over and increases staff costs whilst the overworked staff produce significantly worse outcomes.

The problem with large Healthcare and Hospital systems, is that nobody is tracking the dysfunction curve, only the headline "staff costs".

Generally, you can reduce average treatment costs by reducing staff, at the cost of increasing patient wait time. But, the average cost of patient treatment increases with very long wait times. If you don't know what those limits are, and aren't providing staffing levels sufficient to keep below them, then the institution will blindly wander into the counter-productive zone.

An example from Dr. Brawley: A breast-cancer case caught early in stage-1 may be treated for $30,000 with very good patient outcomes. Each year the person has to wait for treatment, the cost of treatment increases and their outcomes reduce. Brawley tells the story of  woman who waited for 7-8 years until her breast cancer had become stage-5. The treating oncologist estimated a treatment cost of $150,000 at this time and survival of only 12-24 months...

At every level it was more expensive for her treatment to have been delayed, yet "the system" had no ability to properly measure and model these costs, nor to even notice that it had been 'efficient' by not 'effective' when she did finally consume a massive lump of funding.

Because these events go unnoticed and unreported, total System Costs are much higher than they need be.
But without measuring them, who's going to believe it?
And if you don't believe it, why would you measure?

Teams and Departments can suffer similar system breakdowns in their culture, as described in this: the "Blame Spiral".
The crucial point is that the "Do it Right, First Time" Quality Improvement methodology, because it is based in real measurement and relevant reporting, catches these issues early and prevents minor culture issues from descending into massive dysfunction.