Anti-Patterns in Tech Cost Management: Bad Cloud Strategy
Series Index
Introduction
Anti-Pattern 1: Not considering scale
Anti-Pattern 2: Bad cloud strategy
Anti-Pattern 3: Inability to assign/attribute costs
Anti-Pattern 4: No metrics or bad metrics
Anti-Pattern 5: Not designing it in
Anti-Pattern 6: Cost management as a standalone
Anti-Pattern 7/8: No ongoing reviews (current and potential)
Anti-Pattern 9: Across the board cuts
Anti-Pattern 10: “A tool will solve the problem!”
Anti-Pattern Bonus: Don’t do rewards programs!
A few thoughts: Three things you can do right now, for yourself and your team
Wrap up
It is likely that you are using a public cloud provider for something, unless your initials are “DHH” in which case I’ll address your concerns later.
Even businesses who have rationally determined that they should not use the cloud for some of their workloads or storage, will generally use it for something. It’s something you must have a strategy for, and a bad one will cost you.
You won’t save money
The cloud is not about saving money! Over time, the additional capabilities of cloud providers may, if used correctly, save you money, but more likely they will allow you to do more or different things, for the same money. A tech director or CTO who justifies a cloud move on the basis of cost savings is making a potentially career-destroying move.
Choice of cloud provider is important. The historical default choice — AWS — may not be the best choice for everybody. It’s big, it’s complex, and it’s initially harder to use than many others. Historically they’ve been less willing to make deals to attract or retain customers. They’ve been less willing to budge on certain charges (network egress, for example) than others. You need to define your requirements and benchmark, then decide.
Multi-cloud is a mistake
One mistake I’ve seen, even among very successful companies is the decision to avoid a decision. Companies will declare that they are “cloud-neutral,” avoid the use of cloud-specific features that might really help them, and justify this by claiming they can easily add or move to another provider later. This is always wrong.
While newer DevOps tools may make it easier than it once was to move a generic application to another cloud provider, it is never easy or trivial. Last year, at DevOps Vancouver, I saw an excellent presentation in which the speaker walked through the work that was necessary to move a simple application from AWS to GCP. In an enterprise of any size, you can increase that effort by several orders of magnitude. And that is the effort just to move a single workload. Add another order of magnitude in ongoing work to keep the infrastructures on the two clouds aligned and in sync over time. Then repeat for every application.
Many companies carefully benchmark things like compute instances and storage services to understand how to best use them. With two providers, you’ll have to do that on two clouds, benchmark the two clouds against each other, understand where the bottlenecks show up on each (they are guaranteed to be different!) etc. Then you’ll have to create automation that can easily build both in a consistent way. Sound daunting? It is.
Cloud is not a data center
One of the big mistakes of trying to be “cloud neutral” is that keeping your platforms and applications “generic” forces you to avoid the advanced features of most clouds, which are one of the big advantages of using cloud in the first place.
Forest Brazeal, who is far smarter than I am, wrote a great piece about when you should or shouldn’t go to the cloud. I’m not going to steal his words, but encourage you to read them directly. Forest also quite effectively addresses DHH‘s rants against cloud.
However, I have some comments. First, as you’ll see in my slide, I’ve replaced some of the wording on his matrix. Since he’ll be at SCaLE, I hope he’ll be present to comment on them. [Follow up: We spoke privately and he agreed with me.] To me, the x axis on that matrix is “rate of I.T. change” in a company, not “rate of growth.” While the two are correlated, they are not the same. “Scaling up” does not imply change in the environment. In fact, scaling up automatically is one of the major benefits of the cloud! I have seen companies (particularly in entertainment) who have lots of change on an ongoing basis, even when growth is tepid. I’ve seen others (healthcare comes to mind) where there is strong growth due to demographics, but where more fundamental I.T. change is always slow.
If your environment doesn’t change much over time, the cloud kind of tilts against you.
So should you cloud?
In general, yes. But there are cases where it doesn’t make sense. Recently I became aware of this blog post by a senior technologist at Ahrefs, who are an SEO optimization company. Almost all the criticism in their comments, and in the forum where we discussed it, focused on what they were giving up by not being in the cloud: the advantages of scaling their infrastructure to their workloads, using different types of storage, etc.
I thought the logic was bad, because their calculations assumed that they would build the application in the cloud exactly as they had on-prem. That would be a horrible choice. To take advantage of AWS, you’d almost certainly architect it differently.
I looked through their tech blog, and then I looked at their website. I tried to understand what they did and why. The blog post did not articulate the reasons for their choice of architecture, but it was possible to understand the environment with some effort. Their cost estimates were probably wrong (because you would build it differently on AWS), but their decision was still correct.
3600 servers, 729k CPU cores, 5PB RAM, 33PB HDD, 522 PB direct-attached SSD running all-out, 24×7 is unusual. Per their tech blog, they had to invent their own database just to store and index every website and link they came across. (A problem Google solved with BigTable.) Their core backend is highly optimized and changes slowly. None of this screams “you should move to the cloud and go serverless.” It all argues for the opposite.
Also, unlike DHH, they are not dogmatic. Their customer-facing front end runs on AWS. They use it where it’s appropriate. Which is what you should always do. In extremes that will mean 0% cloud. In others it will mean 100% cloud. Most of us live somewhere in between.
Key Takeaway
Ignore the people on the sidelines throwing out “generic” reasons you should use the cloud, and definitely ignore anybody claiming that immediate dollar savings are an expected outcome. Understand your own business. Understand where it does and doesn’t fit. Even mainframes are the right choice in some cases. Understand that for some applications, getting the advantages of the cloud will require a major re-architecture. Then choose wisely.