When “Failure Culture” Fails

I asked everybody in the room who had ever blown up production to raise their hand. Almost every hand in the room went up. Photo by Lisa Sweeney.

A week and a half ago I gave a lightning talk at SCaLE 16x titled “You’re a Failure! Now what?

The premise of the talk was that those of us in tech and especially those of us in software, are far, far more comfortable with and used to failure than most other professions. I noted early in the talk that Boeing doesn’t get to tell you that they can’t duplicate your series of crashes, so it must be your environment, please change the environment and they’ll continue to see if they can work on a patch for your fleet of regularly-crashing airplanes. Your heart surgeon doesn’t get to tell you to wait a few days for an update to your botched surgery. A lawyer doesn’t tell a client that it’s OK, he’ll get the right verdict in release 2.2.1. That bridge designer in Miami doesn’t get to explain that they’ll correct it in the next sprint. But those kinds of statements are pretty normal to those of us in software. Even people working on software for interplanetary probes at JPL regularly update the software on their distant robotic devices.

The problem is that sometimes we encounter the real world, where the expectations are wildly different from what we consider normal.

“Failure culture” works for software, not other places

Which brings us to today’s Bloomberg Business Week piece about Elon Musk and Tesla.

Bloomberg’s assertion is that Elon Musk is borrowing the production techniques championed by Henry Ford, which are the same ones that nearly led to the demise of US auto manufacturing in the 1970s. Ford’s approach — since proven wildly wrong — was to keep the line running at full speed all the time, not slowing for anything. If that meant that some products weren’t done right (which was often the case) they could be repaired later. Running the line at full speed with no stops was the religion, and right or wrong every other process followed from that.

When Toyota (and others) nearly decimated the US car industry it was in large part because they took a different approach. Borrowing from W. Edwards Deming — one of the people who figured out how to build planes most efficiently for the US in World War II — they imposed a different religion. Their religion was “no defects.” If the line had to slow or stop to make sure everything was done right the first time, then the line stopped. Every employee on the line could pull a cord or hit a switch to stop the line. And if a defect was found, the line stopped until they figured out how that defect came to be in the first place (called “root cause analysis”) and whatever caused it was changed or fixed so that it could never happen again.

The brilliance of this approach is that the factories became learning machines long before we had ever thought of such a thing in the computer world. Every minor problem was an opportunity to learn what you were doing wrong, fix it once at the source, and never deal with it again. In the meantime, GM and others were rolling cars off the line in Detroit and never learning a thing about why those cars had problems. The postwar environment (in which pretty much all competitors were bombed flat) allowed them to be lazy and disregard rigorous quality approaches.. They’d build cars that were “good enough” and if necessary ship them off to a different facility to fix them. But the line would never stop or slow.

It turns out that stopping the line from time to time and fixing the root cause of a problem is a lot cheaper than fixing the same problems over and over again in cars that have already rolled off the line that may even need to be partially disassembled to be fixed. Not only that, but cars that are built right the first time rather than being built wrong and repaired tend to have less problems that show up later. They need fewer visits to the mechanic, not just early in their lives for but for years or decades. People, it turns out, care about having things that work, especially when the thing in question is the second largest purchase most people will ever make after their residence.

“Ship quickly and fix it later” works OK in software. It may or may not be the most desirable thing to do depending on who the customer is, but patching internet-connected software is relatively easy. Fixing a built in problem deep in the bowels of a complex mechanical device like a car or a washing machine is a whole different process, requiring repair visits, time wasted, and fairly excessive costs. Failure culture fails completely in those environments.

You can’t run a hardware business like a software business

Musk’s background is in software, as is mine. His preferred method of operating is to break things and fix them. I understand this. I do wonder if he understands how poorly that translates into manufacturing and servicing complex products, especially in light of many state and federal “lemon laws” that can leave the manufacturer holding the bag if something is delivered with problems that can’t be reasonably addressed reasonably quickly. In that respect too, this isn’t software.

So it’s a concern that Tesla is not worried about building things right the first time. It’s a problem that they already have a rework facility near their factory to repair things that weren’t done right on the production line. It is ironic that the factory where Tesla is doing this kind of retrograde production is the former site of the NUMMI plant, a joint-venture of GM and Toyota where General Motors learned a lot about all the things we taught the Japanese and then forgot ourselves.

Tesla has convinced a lot of people including the investors that keep them afloat that they have something nobody else has. But that’s not particularly true. The largest manufacturer of electric vehicles in the world is Renault-Nissan. You won’t see many of those besides the occasional Nissan Leaf on US roads, but the biggest market for EVs is currently China and that’s where most of them are going. Mostly they’re pretty “normal” cars for normal people. They’re made for people for whom a few hours or days lost at a repair facility matters a lot. Those are the people who dumped GM, Ford and Chrysler in droves back in the 70s and flocked to Toyota, Nissan, Honda and others whose cars worked right off the lot and somehow never needed to go back.

While there’s no denying the awesome design and performance that Tesla has packed into their cars, the basic technologies are pretty well understood. The model S has shattered our notion of an electric car being a glorified golf cart and did it in a spectacular manner. For electric cars to succeed, somebody had to do that, and if nothing else we owe Musk for that. But few people will spend $100k on a super-performance, super-luxury car, electric or otherwise. To survive, Tesla has to compete in the market for more mundane vehicles. And as my friend Vitaliy notes, Tesla is not profitable and has a tough road ahead of it to get there [1].

In that market for the mundane, they’ll be competing with Renault-Nissan and virtually every other carmaker on earth, all of whom are now exploring electrics. Most of those will deliver cars with appliance-like reliability from day one. Us Americans tend to think of our cars as far more than appliances, but one thing the past several decades have proven is that with the exception of small numbers of enthusiasts with time and money to spare, the appliance-like reliability is not something we will compromise on. We want more than an appliance, but it has to be an appliance too.

This is where I worry for Tesla. Because as much as I’m not a car guy, I want them to do well. But I know the realities of mass manufacturing and these days, you have to get it right the first time. Fail and fix doesn’t work when you’re building things. The world has moved on from there.

[1] Vitaliy’s other piece about Tesla is here, it requires a free login.