A piece I read recently about how “Systems Design Explains The World” reminded me of a story. Well, a bunch of stories. And since I was recently asked for a story by a very special person and didn’t have one ready, here it goes.
Really, you should read that blog post and the ones it links to, but if you don’t want to, here is the critical excerpt that smacked me in the face. Fair credit: it’s only a tiny piece of what he discusses and you should really read the whole thing.
There were two groups of misfits:
- People who maxed out as a senior engineer (building things) but didn’t seem to want to, or be able to, make it to staff engineer (translating business problems).
- People who were ranked at junior levels, but were much better at translating business problems than at fixing bugs.
Group #1 was already formally accounted for: the official word was that most employees should never expect to get past Senior Engineer. That was the whole point of calling that level Senior.
People in group #2 weren’t supposed to exist. They were doing some hard jobs – translating business problems into designs – with great expertise, but these accomplishments weren’t interesting to the junior-level promotion committees, who had been trained to look for “exactly one level up” attributes like deep technical knowledge in one or two specific areas, a history of rapid and numerous bug fixes, small independent launches, and so on.
As has often been the case in my life, I was in a category of people who were not supposed to exist. I was that “Type II misfit.” How I got there is the story.
When I joined the tech business it was a much smaller place, and it was one into which it was not possible to hire enough people with solid technical backgrounds. So we had things called “training programs” that most large companies ran to get people without the right tech background up to speed. My company made a point of hiring people with limited tech backgrounds. The approach was based on the recognition that most of our programming problems were not very hard, but that the larger problems we were solving could be very challenging. They hired for generic problem solving not tech skill, and they put you through 6 months of hell to come up to speed on enough tech to get started.
I had the tech background, but everybody went through the same (more or less) training. I realized later that this was also a great filter for the company. When it was over they knew who was good at what. Most of us went into software development roles, but some went into data center ops, some worked on internal customer service teams, and others went on to specialties like database management or systems programming. You had six months to demonstrate what role you’d be most ideal for.
Training was a full-time job. You spent 8 hours a day paying your dues doing grunt work in the data center, then 4 hours a day of training. Depending which shift you were on that week, you worked 8-4 and trained 4-8, trained 12-4 then worked 4-midnight, or trained 8-midnight, then worked through to 8am. Unless you had a weekend shift in which case you worked two 12 hour shifts on the weekend, normal shifts 2 other days, and did one day of solid training.
We still went out for beers after work, even after getting out at 8am. It was NYC, there were lots of people working a variety of night shifts, and the bars open early. The bar that we went to is still there, and from what I’m told still does a solid early-morning business for Wall Streeters who are up all night. [There are plenty of bars in NYC that will lock the doors and stop serving at 4am as the law requires, but allow you to remain and “finish your drink” until they re-open at 6. If you’re in such a place, please be polite and pick up your feet when asked so they can mop the floor.]
I graduated from the training program two months early. They had just increased the class size and didn’t need that many people in the data center. I was one of the few with a CS background and I could accelerate through the training quickly once they put me on it full time since all I needed to pick up was some language and environment-specific details. Then I got moved uptown into a development and ops support job.
It was different then. We always deployed on Fridays, because as a financial institution we were closed weekends, and that meant we would always have a full weekend to recover if things really screwed up. That was preferable to having to be shut down for a day during the week. Code was monolithic, running a single version on a single platform. Ops was relatively straightforward. I had to wear a tie. The programming we did was not incredibly technical and we worked in straightforward languages. Programming for financial operations is mostly reads, sorts, selects, and occasionally updates. We used standard libraries for everything complex. Very little of my CS background was critical. I’ve never in my life actually implemented a sort or search algorithm. Has anybody?
I spent six months doing small maintenance tasks, feature improvements, some reporting, and ops support. I got to support our Tokyo operation’s first major audit by the Japanese Ministry of Finance. They were surprised at how quickly we were able to collect and present the data they wanted. It was the last time I had a role with a significant ops support responsibility.
I worked independently. The guy who had previously supported our Tokyo office ops had quit and I moved into the role. My boss brought me up to speed and turned me loose. Colleagues working on our “primary” (US markets) systems helped me figure out how to manage what was effectively a limited-feature fork of their larger environment. There wasn’t anybody else managing our daily “Tokyo flow” and time differences meant that the “overnight” processing that was the bulk of my work happened during our early morning. I had a fairly independent schedule and role.
Part of my job was being the one who was pulled in to support major projects being done by the New York systems teams, to ensure that whatever changes were being made would be properly reflected in our Tokyo environment and not break anything. The expectation was that in most cases, changes made to the major systems would be incorporated by all the geographic forks unless there was a compelling reason to build a localized solution. The New York systems teams did much of the work, but testing and validation had to include the local tech ops people, which for Tokyo was me!
Unlike most of my classmates who were assigned to teams with a narrow focus, my function was more as a generalist than a specialist. I had to have decent knowledge of all the systems used by our relatively small office rather than highly-detailed knowledge of a single system used in multiple offices. I learned the breadth of all our major systems, rather than one of them in depth.
Along the way I really impressed somebody who was managing a very disruptive change to the way we processed and recorded end of day balances and positions. To put the change into effect, all systems that updated the database had to be modified in advance to use standard routines to do the updates rather than updating it directly, then the database could be re-written to meet the new requirements, and the standard routines upgraded. It was a pretty ingenious change that reduced our database storage by over 90%, with minimal impact on real-world use cases. The changes to the systems took place over the course of months, but there was still a lot that had to happen all at once at the end. I learned a lot just from watching.
The change to the databases and the standard routines that updated them had to be done as a “big bang” implementation in one weekend. We created two parallel streams: the old and the new. We would do a full night’s processing run, then change over the database and run it again. Every report, balance and position had to match the second time around, or the whole thing would get rolled back. No other changes were permitted anywhere. Everything was locked down (or so we thought).
Time zones meant that Tokyo went first. That made me the guinea pig. Thanks to the magic of time zones, I was able to advise everybody on the team by mid-afternoon on Friday that the full systems run had been completed twice and that everything matched up. The London stream went next and also ran well. We were in great shape.
Well, except the part where a “star programmer” on the NY trading system decided that he had to make a change to reports that a “star trader,” needed, that he was sure it wouldn’t have impact. Somehow, he also happened to have a senior person’s override code to approve it despite the lockdown. But that “star” introduced some errors to some of the back end reporting for New York (Tokyo and London didn’t use that particular functionality so it didn’t show up for us), which resulted in our test suite flagging discrepancies across a range of reporting products, and almost caused the whole thing to be backed out.
Fortunately somebody looked more critically, noticed that there had in fact been a change to a program and asked “hey, who the hell approved this?” After some confusion, backing out of the unauthorized change, and re-running of the whole reporting stream, it checked out. A lot of us were really, really pissed that the “star” kept his job. The fact that he did taught me a lot of lessons about office politics. The fact that he could get a change through when none were being permitted taught me a lot about permissions, security, and how to improve process to make sure such a thing could never happen again. (The follow up was the first in a long list of Correction of Errors or Incident reviews that I’ve been part of.)
Not long after that, a new team was assembled to build a Tokyo-native development and ops team. It was long overdue given the rapid growth we had experienced. Having a single guy (me!) half a world away running all tech ops may have made sense when the office was new, but wasn’t adequate for the expected growth. I trained them and got them ready to take on my role along with other tasks, then had little to do.
The manager of the Positions/Balances database redesign had moved on to a new role and asked me to join her team. It was also internationally focused, but mostly on smaller geographies where we didn’t have a physical presence and instead worked through intermediaries. I was to figure out how to automate procedures that previously had been managed with faxes and odd-hour phone calls. Two weeks later, I was in London.
Much of the work I needed to do leveraged systems that had been developed by our London team for local use. London was much larger than Tokyo, and already had a long-established local systems team developing applications for the London and Euro markets that extended the capabilities of our core New York systems. The idea was that I would learn from them while building new pieces of software. In retrospect, I used a lot less of their work than anybody thought I would. I could have gotten all I needed from the trip in about a week, but I was there for six. I wrote some of my best code in that job during that period, and also learned about the things I should never do, like build user interfaces.
[For whatever reason, user interfaces that make sense to me are useless to anybody else. Most “professional” user interfaces are painful for me to use. Things designers tell me make for a great “user experience” I find hateful. This has only gotten worse over the years, which is why I own so many broken keyboards and my home computer runs Linux, mostly in terminal/command line mode.]
I spent my 1-year work anniversary drinking beers in London with a bunch of Lebanese guys I had met in a pub near my hotel rather than with my classmates who had gathered at our old “8am after the night shift” bar. It was a fun summer and while I could have done a much shorter trip, I also learned a lot about working cross-border and cross-culture by spending as much time there as I did. Shortly after my 23rd birthday, I flew back to New York.
I spent a couple of months finishing what I had started, testing it and working through the bugs. The tech world was smaller then. When I was experiencing problems between my software, the Hayes modem that we expected our counterparts overseas to use, and Microsoft drivers, I could just pick up the phone, call Hayes, call Microsoft, ask for “whoever is working on X,” find the right people and figure it out with them. I worked with an outside programmer whose office was in WTC Tower 2, the only time I had ever been in one of those lost buildings.
It was a lot of coordination, understanding the larger problems, and working around them based on the realities of what we needed to achieve and the limitations of the technologies and communication infrastructure. There was no map and no process for getting from where we were to where we needed to go, so I made up my own. By October it was done, and we were ready to deploy. Italy was first. I was off to Milan.
Milan set the tone for the next 6-12 months of deployments. I arrived on an early-morning flight at the start of fashion week after being forced to spend my first night at a Zurich airport hotel because of a lack of hotel rooms in Milan. This would also become a recurring theme. Almost every deployment we did was logistically at the worst possible time for the place we were doing it. Along with the rest of my team, I learned a lot about the kinds of questions to ask when trying to schedule something in an unfamiliar location.
I was the sole technical member of a team of business development and market operations people. We gave presentations, we did training, we helped our counterparts understand the docs which were all in English and translate them as needed. I troubleshot their PC and network/modem connection. We worked with our software, showing them how to connect to our systems remotely, download information, process it locally and upload results reliably. By the end of the week, faxes were a thing of the past.
My role was to hold together all the pieces, to be the “glue” on the technical side that ensured that the software I wrote, the software the guy at WTC 2 wrote, systems software on a mainframe in New York, a leased network, and all the other pieces, worked together. It was about figuring out how to solve problems as they came up and address them in the most flexible and agile manner possible. There was no real model and no real metrics to use as a guide. I wasn’t in a position to do much real development work when I was at a partner site, so the key was to know who to call and to be able to accurately specify what was needed.
We were working with a bank that had emerged from the bank at the center of the Vatican Bank Scandal. Its former chairman inexplicably jumped off a bridge in London with a rope around his neck and bricks in his pockets, and shortly thereafter the pope died mysteriously after a month on the job. Several people who had been part of the organization disappeared quickly or found themselves in Italian prison.
All this made for an interesting environment. On the one hand, they were eager for the business and an opportunity to be part of something forward-looking that would be a break from their past. On the other hand, it was… weird. Our primary contact was a hustler we nicknamed “Lucky” because he always seemed to be focused on other opportunities and possibilities. He spent most of his time monitoring the prices of gold and South African shares. Several months later, he disappeared. I decided not to ask questions.
[Years later, he resurfaced in Switzerland. Today, LinkedIn shows him running his own asset management firm, so my worst guesses at the time were apparently wrong.]
The same repeated elsewhere. I held things together dealing with the kinds of messy problems that aren’t solved through the application of algorithms and technical knowledge. Not that those hurt, but they just weren’t the key to success. Over time, the “on-site” team was reduced to just me, and occasionally one business development person who would come along for the first day of introductions, wining and dining.
I have great memories, but none of them are about programming: the time that I — by far the most junior person on the team — ended up in the most expensive hotel in Paris while everybody else was in a cheaper place down the street. (I was added to the team last, and by then the cheap rooms were gone, so travel put me in the more expensive place.) They got me back by drinking in the bar every night and charging it to my room. My boss was not amused at the $2000 bar bill. I learned a lot about dealing with colleagues. and I learned that Paris just before Christmas is a wonderful place to be, but a terrible place to try to get work done.
My boss was also not amused when I called from Melbourne on my first day there to explain that they had sent me during Melbourne Cup week, that starting the next day, the entire country would be drunk for the rest of the week, and by the way I’m going to have to extend my stay another week to get anything done. As usual, I was there at the worst possible time, but we made it work by scheduling a last-minute stop through Auckland to visit our bank there on the way back, after extending my Melbourne stay through the middle of the following week.
But through this I built a lot of trust with both my internal and external partners, and we did something that a year prior nobody thought was possible.
Years later, a manager I worked with said my greatest strength was my ability to sit at the back of the room, take everything in, figure out the direction things were going, and only say something when the team needed to be put back on course, or when they hit a stumbling block that — he thought — I had probably anticipated before the meeting even started and had prepared for. (He exaggerated, I’m not that good.) That’s the skill I learned in those few early years.
During all this I had been promoted, ahead of the normal schedule. There’s a downside to that. I had not done the more detailed coding and ops work that everybody else who came up with me and had been slotted into more typical developer roles had done. When it was over, I moved into a related role with another startup division that was using variants on the systems I had built. In that role I learned one of the most important lessons I’ve ever had about where and when engineering and technology matter.
The manager of that division was a youngish British banker who had come over to New York for the opportunity. He was really smart about how we could use our technology to disrupt the staid world he had come out of. But there was a very cold realism to his approach. After one particularly successful customer launch, he reminded us: “Our customers don’t come to us because we have the best technology, they come to us because we do a better job and provide a better value. They’re happy to see us using great technology and are impressed with our mastery of it, but in the end, if we can provide the same level of service by putting a million monkeys in front of a million typewriters, they’d be just as interested in the value proposition.”
[Years later, I realized that his idea may be the first known example of a serverless architecture.]
I have never forgotten the “million monkeys” analogy. I credit him for my tendency to always be skeptical of technology even as I work to master it. I’ve saved my clients and employers millions of dollars by warning them off of expensive “investments” that would only generate value to the vendor, and resume-padding material for the people working on it.
I’ve pissed off hundreds of software and services sales people by keeping my clients focused on the question of “what problem does this solve at what cost?” rather than “is this cool tech?” or even “could this work?” To this day, I use less tech in my home than a lot of people who aren’t in the tech business, because, really, what problem is it solving?
I also learned something from him about pricing. We were delivering a better and more accurate service, and were able to do it cheaper than the competition. Some thought we could charge less and just take over the market. But we charged more. When you’re doing a better job, you charge more. If you don’t, the smart customers will wonder why. If you’re able to do it at lower internal cost, that just means more profit to you.
In that business, competitors eventually caught up and prices/profitability declined, but I learned not to under-price, and the importance of using higher prices as a signal of value. This has held true for me working independently. I was never more in demand than when I raised my rates.
My further success in that job was short-lived. It’s tough to come back to the “standard progression” when you’ve gone off in a completely different direction for 2-3 years. I had spent my time working on very messy problems of very high importance, but where rigorous engineering formulas failed or just weren’t applicable. I loved every minute of it, but it came back to haunt me when financial markets declined, budgets were cut, and the focus moved to incremental efficiencies in existing products and refactoring code. That is to say, work that required the skills I had mostly not developed while doing other things.
I became the Type II misfit cited at the start. It’s not because I couldn’t be a great developer/engineer. Whenever I had to dive into hard engineering again I did well. But once you jump off that train and start dealing with the incredibly messy problems of inserting systems into different organizations in different countries with different expectations of how they would and should work, it’s hard to go back. I had been doing systems design, deployments, project management and customer relationships since I was a year out of school and I was good at it, but I moved there well ahead of the normal progression one would expect. Like other “Glue” types in the Type II Misfit category, I was in an odd position: a early/mid-level employee with senior or better skills in many critical but less technical areas, and far less rigorous engineering. After two other relatively brief jobs in New York, I took a few months off to ski, then on the advice of one of my first managers I applied to business school and never looked back.
[OK, business school may have been a mistake for other reasons, but the decision to refocus was not.]
I do still wonder what might have been had I been slotted into a different job early, or had I decided to take a step or two back to return to hardcore programming. I was good at it and might have done better for myself, but it’s impossible to know. Of the members of my original training class, I’m one of only two who remained in a tech-related field over the long haul (the other remained in that organization for his entire career and eventually occupied the CTO role).
As a technical program manager I’m still very much in the “glue” category, and I still wonder where the future will take me. “Glue” has had a lot of different names in my lifetime and “TPM” is just the most recent one. I’m long past the point of going back to engineering, and I’m also not likely to want to move into more senior management. But “glue” is currently paying very well, so staying where I am is entirely viable.
Maybe that’s OK.