Additive vs Reductive TPM work

In my previous post, I commented about the difference between Software/Application Technical Program Managers (sTPMs) and Infrastructure Technical Program Managers (iTPMs). In considering what I wrote about, I realized there was another thread that I hadn’t considered before and that is not explicitly tied to software or infrastructure. TPMs may be doing either additive work (building new stuff) or reductive work (simplifying, or in some cases outright eliminating stuff). The latter has been more frequently associated with iTPMs than sTPMs but that’s not universally true. An era like the one we are living through — one with a greater emphasis on cost reduction and efficiency — tends to favor reductive work across all specialties.

So, what are these things, how are they different, what are the similarities and differences, and why is one far more commonly referenced?

Additive: building something new

We all know this one, and it’s the default across the TPM world. In fact, virtually all TPM interviews and screens will ask about this one. Most TPM career ladders and job descriptions also favor this one. This should be no surprise as the TPM role came into wide useage during the period of unprecedented Zero Interest Rate Policy (ZIRP) and it’s associated investment and growth. During this period, reductive work: eliminating things that didn’t make sense, cost too much money, or weren’t attracting enough users, just didn’t happen very much. When the cost of capital is roughly zero, almost everything you can build makes sense, and the savings aren’t a big deal. Companies like my former employer, who had a major capital spending component and a historic focus on cost containment (“Frugality” being one of the Amazon Leadership principles), still cared to a greater degree than most, but even so most people were building, not optimizing or reducing.

So most TPMs were building. Approaches like “move fast and break things” worked. And the TPM role that evolved during that era came to reflect this.

This is true of many engineering specialties over the past 10-15 years. However, most specialities pre-dated ZIRP, and practices embedded in those specialties include knowing how to reduce or simplify. Still, many who came of age professionally from 2010 and onward are struggling with this change.

For TPMs, this is reflected in hiring practices like the ones I described in my discussion of iTPMs: interviews that assume that your program was a “build” program, and don’t even have the metrics to evaluate a TPM whose work was more reductive.

Reductive: simplifying, optimizing, or outright eliminating something that exists

I cited one example in my previous post. “I built and ran a program that improved infra utilization by 30%, saved AWS $42m in hard infra spending, and at least as much in soft and human costs during the time I was there (and much more in the two years since).”

This was a significant undertaking that went through multiple 6 month phases, each of which was a fairly substantial program on its own. It involved engagement with everybody from the teams that ordered/delivered/installed physical hardware, and the other internal infra teams that would deliver other resources, like EC2 capacity, to the infra operations teams that would have to configure it, to the capacity management team (that I was nominally part of) that would manage it all. And of course, I had to work with all our internal customers to ensure they were getting what they needed and that it was being delivered as they needed it, across the right mix of our three different possible platforms. Nothing about this was simple or easy, and nothing was possible without coordinating the activities of multiple teams around the world.

From a software/operations perspective I simplified and eliminated more than I added. I didn’t build a new platform to allow us to manage this. Rather, I improved existing tooling, simplified configurations down from a complex set of options to a few standard ones, and eliminated process and systems that slowed down our work. The key challenge was not “how to build and deliver something new and complex” but rather “how to break down something that existed, in a way that improved our capabilities, while not causing problems.” I spent a lot of time thinking about Chesterton’s fence, and reminding myself that I should not remove something before I understood why it was there in the first place. Context is everything.

[Aside: Amazon’s Principal Engineers have a set of tenets they strive to embody. One of the ones I like most is “respect what came before.” A reminder that the people who preceded you were not idiots, that what they built made sense for the environment they built it in, and that you would do best to understand what they built, even if you ultimately conclude that it no longer makes sense today.]

You may build to reduce!

Throughout my program tooling was built and improved. But mostly with ourselves as the customer and rarely as a large complex endeavor. A common “user story” would be “I spend 30 minutes every time I need to manually perform a task, but I could spend a week to automate it so it takes me 30 seconds in the command line every time.” Or one I came up with myself: “we are unable to move capacity from one customer to another because historical/organizational tagging of our resources divided a large pool of identical resources into smaller dedicated pools, so let’s take the 3-4 days to get rid of unnecessary dedicated pools, so we can more efficiently utilize what we have.”

This kind of work was typical at AWS, where there is a lot of emphasis on compounding 1-2% improvements over time. In fact, almost all of the 30% improvement I made was the result of compounded small improvements, that could only be found by looking across the boundaries that separated teams and figuring out how to break down processes and systems that entrenched less-than-optimal ways of doing things. Sometimes we had to write new software to do this, but often it was just changing configurations and more importantly changing expectations of how things should work. That’s prime TPM territory, but it’s not about building something completely new.

Additive/Reductive TPM activities matrix

Activities/SkillsAdditive TPMReductive TPM
ArchitectureDrives creations of a systems architecture from scratch, considering constraints that are mostly a result of current requirements or organizational practices.Drives and manages understanding a complex and poorly-documented architecture. Then changes, simplifies or modifies it in a way that does not cause customer impact.
Product or Process?Mostly product focused, but will often be involved in defining process as necessary to support the product.Will be very focused on reliable engineering or DevOps processes and ongoing simplification/improvement.
Application or Infra?More commonly application-focused.More commonly infra-focused.
Cost/EfficiencyEnsures use of the most efficient/cost effective practices and infrastructure based on the latest information.Can understand a costly, inefficient infrastructure and incrementally reduce the inefficiencies and/or drive re-architecture.
Work with engineeringDirects and prioritizes work on new features and products to address customer needs as understood today, and for future anticipated needs.Directs and prioritizes work on organizational (not team-specific) tech debt, architectural inefficiency, and products that no longer best address customer needs, though they may be in heavy use.
Executive presenceArguing for new solutions/features, often for new teams or leadership positions.Arguing to eliminate things that some may have built their careers on, but are no longer appropriate
“Big Question”What should we build and how?What are we doing that no longer makes sense, and how do we optimally eliminate it?

Do you really need a TPM for that?

As with anything a TPM does, it depends on the scale of the effort, the number of cross-team dependencies, and the complexity of the architecture and organization. I’ve seen entire products built and launched with no TPM on the team, because the product was built in a single organization with limited dependencies. That was pretty typical at AWS. I’ve also seen products launched that would have failed miserably without a TPM to coordinate complex dependencies. And the same is true for simplifying and reducing. A team can very easily simplify its own environment with no help from a TPM. But if you want to do something that impacts a lot of teams, like streamlining your network structure, or consolidating work from dozens of independent home-grown environments to a standardized platform, or even just do something very basic like begin enforcement of TLS 1.3 across all services, or get rid of other bad security practices, you will likely need a TPM to run things. As I noted in the matrix above, the reductive TPM has the added burden of trying to convince people to eliminate or simplify away products and features that they may have built their careers on. That requires a type of political saavy that is different from what you need when arguing for spending money on something that a person might conceivably build their future career on!

In my experience, a few things need to happen before a “reductive” TPM makes sense:

  1. You need to be large enough to need a TPM in the first place. Small organizations generally don’t need TPMs in any role.
  2. You need to have sufficiently complex environment that coordination of reductive activities across teams, organizations, and in some cases even customers, is necessary. It has to be bigger than routine refactoring and restructuring that should take place in engineering teams without a TPM.
  3. You need to be beyond the “build fast and break things” phase of your company’s existence. Startups rarely need TPMs, and generally aren’t focused on reductive activities.
  4. A complex infrastructure is a common for reductive work. There isn’t complete overlap between infra and reductive TPMs, but I find a lot of reductive work in the infra space. This makes sense, as infra is generally more stable over time than the applications that run on it, and therefore iTPMs are more likely to be working on reduction, simplification and efficiency than are sTPMs.
  5. Applications or application families that are grown by acquisition or otherwise molding together disparate pieces that were never intended to work together usually need reductive restructuring.
  6. In the past few years I’ve seen a lot of reductive TPMing at companies that have pivoted from user growth to earnings growth as their strategic focus.

This is not an exhaustive list, but it describes some of the boundaries around the nature of a reductive program, and the kinds of environment where one makes sense.

Reductive TPMs may not match your conventional (additive) TPM filters

Mostly we hire for “what did you build.” But those may not be the skills you need if you’re looking for somebody to simplify, replace, or eliminate something. I won’t go into too many details here, because what I wrote before applies. If you spend your interview time on “what did you build?” and expect somebody to describe a green-field architecture they came up with, you’ll probably miss the skill set that is required for simplifying or breaking things down.

You’ll note that many of the baseline technical skills for additive and reductive TPMs are similar. We have to understand architectural tradeoffs, deal with cost and efficiency issues, work with executive management, etc. But they express themselves in different ways and filters designed to select the best “adders” will often eliminate “reducers” even when their baseline skills are similar or better.

As I’ve been looking for work opportunities recently, it’s become clear that “reductive” work is very much in demand right now. If your company is anywhere beyond the startup stage, you probably need at least a few people who have strong reductive skills, but as with all skills, you need to explicitly look for them, you can’t hire for something else and hope for the best.

Michael Gat