Anti-Patterns in Tech Cost Management: Bad Cloud Strategy

Series Index

Introduction
Anti-Pattern 1: Not considering scale
Anti-Pattern 2: Bad cloud strategy
Anti-Pattern 3: Inability to assign/attribute costs
Anti-Pattern 4: No metrics or bad metrics
Anti-Pattern 5: Not designing it in
Anti-Pattern 6: Cost management as a standalone
Anti-Pattern 7/8: No ongoing reviews (current and potential)
Anti-Pattern 9: Across the board cuts
Anti-Pattern 10: “A tool will solve the problem!”
Anti-Pattern Bonus: Don’t do rewards programs!
A few thoughts: Three things you can do right now, for yourself and your team
Wrap up

It is likely that you are using a public cloud provider for something, unless your initials are “DHH” in which case I’ll address your concerns later.

Even businesses who have rationally determined that they should not use the cloud for some of their workloads or storage, will generally use it for something. It’s something you must have a strategy for, and a bad one will cost you.

You won’t save money

The cloud is not about saving money! Over time, the additional capabilities of cloud providers may, if used correctly, save you money, but more likely they will allow you to do more or different things, for the same money. A tech director or CTO who justifies a cloud move on the basis of cost savings is making a potentially career-destroying move.

Choice of cloud provider is important. The historical default choice — AWS — may not be the best choice for everybody. It’s big, it’s complex, and it’s initially harder to use than many others. Historically they’ve been less willing to make deals to attract or retain customers. They’ve been less willing to budge on certain charges (network egress, for example) than others. You need to defined your requirements and benchmark, then decide.

Multi-cloud is a mistake

One mistake I’ve seen, even among very successful companies is the decision to not decide. Rather, companies will declare that they are cloud-neutral, and proceed on the assumption that they can easily add or move to another provider later. This is always wrong.

While newer DevOps tools may make it easier than it once was to move a generic application to another cloud provider, it is never easy or trivial. Last year, at DevOps Vancouver, I saw an excellent presentation in which the speaker walked through the work that was necessary to move a simple application from AWS to GCP. In an enterprise of any size, you can increase that effort by several orders of magnitude. And that is the effort just to move the workload. Add another order of magnitude in ongoing work to keep the infrastructures on the two clouds aligned and in sync over time.

Many companies carefully benchmark things like compute instances and storage services to understand how to best use them. With two providers, you’ll have to do that on two clouds, and also benchmark the two clouds against each other, to keep track of what instances on what cloud are equivalent to what other instances on other clouds, where the bottlenecks show up on each (they are guaranteed to be different!) etc. Then you’ll have to build automation that can easily build either. Sound daunting? It is.

Cloud is not a data center

One of the big mistakes of trying to be “cloud neutral” is that keeping your platforms and applications “generic” forces you to avoid the advanced features of most clouds, which are one of the big advantages of using cloud in the first place.

Forest Brazeal, who is far smarter than I am, wrote a great piece about when you should or shouldn’t go to the cloud. I’m not going to steal his words, but encourage you to read them directly. Forest also quite effectively addresses DHH‘s rants against cloud.

However, I have some comments. First, as you’ll see in my slide, I’ve replaced some of the wording on his matrix. Since he’ll be at SCaLE, I hope he’ll be present to comment on them. To me, the x axis on that matrix is “rate of change” in a company, not “rate of growth.” While the two are correlated, they are not the same. I have seen some companies (particularly in entertainment) who have lots of change on an ongoing basis, even when growth is tepid. I’ve seen others (healthcare comes to mind) where there is strong growth, but where change is always slow.

In my opinion if your environment doesn’t change much over time, the cloud kind of tilts against you. Scaling does not necessarily imply much change, which is why I explicitly note the difference.

So should you cloud?

In general, yes. But there are cases where it doesn’t make sense. Recently I became aware of this blog post by a senior technologist at Ahrefs, who are an SEO optimization company. Almost all the criticism in their comments, and in the forum where we discussed it, focused on what they were giving up by not being in the cloud: the advantages of scaling their infrastructure to their workloads, using different types of storage, etc.

I thought the logic was bad, because their calculations assumed that they would build the application in the cloud exactly as they had on-prem. That would be a horrible choice. To take advantage of AWS, you’d almost certainly architect it differently.

I looked through their tech blog, and then I looked at their website. I tried to understand what they did and why. The blog post did not articulate the reasons for their choice of architecture, but it was possible to understand the environment with some effort. Their cost estimates were wildly wrong (because you would build it differently on AWS), but their decision was still correct.

To support their SEO business they operate the third largest fleet of web crawlers on the planet, after Google and Bing. A key point of my presentation is “you are not Google!” But within their space, they are at least comparable.

3600 servers, 729k CPU cores, 5PB RAM, 33PB HDD, 522 PB direct-attached SSD running all-out, 24×7 is unusual. Per their tech blog, they had to invent their own database just to store and index every website and link they came across. (A problem Google solved with BigTable.) Their core backend is highly optimized and changes slowly. None of this screams “you should move to the cloud and go serverless.” It all argues for the opposite.

Also, unlike DHH, they are not dogmatic. Their customer-facing front end runs on AWS. They use it where it’s appropriate. Which is what you should do even if your environment is far more cloud-friendly than theirs.

Key Takeaway

Ignore the people on the sidelines throwing out “generic” reasons you should use the cloud, and definitely ignore anybody claiming that immediate dollar savings are an expected outcome. Understand your own business. Understand where it does and doesn’t fit. Even mainframes are the right choice in some cases. Understand that for some applications, getting the advantages of the cloud will require a major re-architecture. Then choose wisely.