September 23, 2021

Thedevopsblog

DevOps, AWS, Azure, GCP, IaC

The Trillion-Dollar Paradoxical Arguments of a16z

A couple of weeks ago, a blog post from Andreesen Horowitz (Motto: Over 3/4 of our partners have blocked Corey on Twitter!) talked about “The Cost of Cloud, a Trillion Dollar Paradox.” Its central thesis is that companies are leaving money on the table by not repatriating workloads to data centers, and a16z proceeded to lay out an economic case for doing just that.

Needless to say, the internet took issue with this. Responses ranged from well-thought-out tweet threads, to incoherent comments left on various forums, to Hacker News remaining Hacker News.

I encourage you to read that post — not much of what I’m about to say next will make sense without it, and I’d like the authors’ points to be taken fairly, as they’ve structured them, instead of “through the lens of my own interpretation.”

Their data is valuable and seems to be correct; I haven’t discovered anything they’ve said on the numbers side that I object to. Rather, my concerns with the report stem from the lack of context for the primary example they cite, the lies that companies tell themselves internally, and equating spend on cloud as being fungible to spend on one’s own data centers.

Why Dropbox is a terrible example for building the cloud repatriation case

Dropbox is the poster child for cloud repatriation, in no small part because there are so few companies actually repatriating workloads. In its S-1, Dropbox claimed a $75 million savings in cloud spend by moving some workloads to its own data centers. This is a heady number.

But in the two years surrounding Dropbox’s IPO, it spent $200 million in capital expenditure, which doesn’t factor directly into cost of goods sold (COGS). This is a matter of “which pocket does the money come out of?” I sincerely doubt that there was an actual spend reduction of $75 million over two years when you factor in the amortized cost of the data centers, the servers therein, and the additional staff you need to run those facilities effectively.

Further, Dropbox’s migration was the correct move for the company along several axes. It was reportedly a single, very large, very well-understood workload: storing user files. That use case wasn’t scaling in dramatic swings up and down, the capacity growth was easy to predict, and Dropbox engineers certainly understood it start to finish. That use case didn’t lend itself particularly well to S3’s economic model at the time (the decision to migrate this workload predated Infrequent Access tiering as well as a host of analytics offerings).

Lastly, Dropbox was basically out of ideas around product innovation. It started out as “a folder that synced everywhere,” and the industry loved Dropbox for it. Had it not repatriated that workload, its engineering resources would have instead focused on other problems. We may well have been cursed with its bloated, unwanted desktop app and its basically unused collaboration facilities far sooner.

At that point, Dropbox innovation had either stalled or taken some dark turns as viewed externally. Saving money on its “solved problem” workload by running the storage workload elsewhere made sense. That’s not most companies, and that’s not most workloads. Dropbox, itself, announced it was moving a 34PB data warehouse to AWS last year.

Lies companies tell themselves, pt. 1: Cutting costs is growth, and cloud is purely infrastructure

Engineering skill is the limiting factor today in how much companies can achieve. Developers are hard to find and expensive to hire, and most of the money companies raise from VCs goes toward hiring. Companies that are significantly focused on cutting costs are generally companies who find themselves in decline.

One of the more surprising things that I learned when I started fixing AWS bills for customers was that the need was rarely “cut the bill as deeply as possible” so much as it was “optimize the spend to what makes sense to our environment.” Any suggestions that require significant engineering effort (and it’s hard to think of a heavier lift than migrating to or from a data center!) are generally tabled until other strategic factors are in play.

It’s also worth pointing out that the differentiated services that the major cloud providers offer go well beyond a pile of VMs, some disks, and a network. There are managed database offerings, Machine Learning® powered suites of tools, and other various bits and bobs — but let’s pretend for a minute that those things aren’t true. Should there be an event that impacts the availability of the cloud provider’s network, its storage offerings, or its ability to keep VMs online, it’s going to be better situated to diagnose and resolve the issue in less time than any of its customers would be.

Let’s also state an uncomfortable truth: When your provider takes an outage (and it always, always will, given enough time), there’s some cold comfort in being down at the same time as a good portion of the internet. If AWS or GCP has a bad day, so do a lot of companies. If just your data center is taking an outage on an otherwise slow news day, your reputation for reliability will definitely take a hit.

Lies companies tell themselves, pt. 2: Repatriating will pay off big-time in the stock market

Let’s assume that everything the a16z analysis says about margins, markups, and recapturing value is absolutely true. If you repatriate $100 million a year in cloud spend, you’ll save $50 million a year after the migration is complete. I strongly disagree with the estimate, but let’s pretend that it’s true.

Will the additional $50 million you’ve saved in COGS actually improve the price of your company’s stock? In theory, it absolutely should, but look at the markets that we’re seeing. Amazon’s stock has been largely flat over the past year, during a time when its ability to deliver anything to anyone sent its utility and revenues into the stratosphere.

There are companies whose market capitalizations now exceed most analysts’ consensus of their total addressable markets, and that strikes me as more than a little bizarre. The reason I mention it is that it’s exceedingly unclear to me that a meaningful shift in a company’s margins is guaranteed to have a positive result on its stock price.

The markets seem increasingly divorced from underlying fundamentals, but insofar as the principles should hold true, I can’t shake the feeling that a company would be better served by taking the capex and build-out costs of migrating to a new data center and instead invest that money into releasing offerings and features that expand its revenues. Companies chase growth; when that shifts, they’re not long for this world.

Why your money won’t go as far with cloud repatriation

The a16z post goes on to talk about the margins that cloud providers run on, but it’s hard to get to those numbers without a giant raft of assumptions. Let’s take hard drives as an example.

It’s a pretty clear bet that AWS pays less for hard drives with enterprise performance and reliability characteristics than your company will; volume discounting and well-established supplier relationships make that a near certainty. If AWS slaps that hard drive into a server that’s part of EBS, it’ll present that drive to you in the form of a volume, in any size you’d like. Assuming it’s a gp3 volume in us-east-1, it’s going to cost you 8¢ per GB per month.

If that hard drive fails and needs to be replaced, you’ll never notice or care. EBS drive failures manifest as very brief latency spikes, to the point that if you’re not looking for it, it’s incredibly easy to miss that it happened.

AWS has also already made the networking investment necessary to ensure that your instance can send data to and from that hard drive at appropriate speeds. This is a non-trivial exercise when doing data center build-outs, but you don’t have to think about it at all in the context of a cloud provider.

These are just a couple of myriad examples, and all of this speaks to costs that you don’t have to incur yourself.

Hey, it’s not all bad

All things considered, “The Cost of Cloud, a Trillion Dollar Paradox” is a decent blog post, and we’re better as an industry for it being published. Though I disagree mightily with some of the conclusions authors Sarah Wang and Martin Casado reach, their methodology is sound and their data valuable.

The post The Trillion-Dollar Paradoxical Arguments of a16z appeared first on Last Week in AWS.