September 22, 2021

Thedevopsblog

DevOps, AWS, Azure, GCP, IaC

Counting Twitter Followers over Time, the Corey Quinn Way

One of my favorite things to do at the end of the year is dive into a bit of writing code myself. I’m incredibly bad at it, which—while frustrating from an execution perspective—makes posts like this way more entertaining.

My lingua franca is “Python.” This shouldn’t shouldn’t be misconstrued as me saying, “I’m good at Python.” Rather, “I’m less bad at Python than I am most other languages.”

The toy problem I set out to solve this time was this: How do I track my Twitter follower growth over time? The native Twitter analytics tools don’t do a good job of showing this beyond a three-month window, nor do they offer daily granularity. A bunch of social media analytics companies purport to do this same thing, but they’re all fairly sketchy looking and/or are dripping with Big Enterprise positioning; either way, not really the sort of thing I want to trust with access to my Twitter account.

In its ongoing war against developers on its platform, Twitter has declared something of a truce. Recently, they released some v2 APIs that offer exactly what I want, so in practice what I want to do becomes super straightforward. In effect:

Query the Twitter API for the number of followers an arbitrary account has.Take that result and write it to a database.Write this blog post about it.

Now, that’s not much of a challenge; how do we make this more cloud relevant, more complicated, and a better story?

That’s right, we’re going to do this with the magic of…

Serverless

The point of Serverless isn’t to solve real business problems. It’s to write meandering blog posts like this one about how you solved a toy problem with way more work, then highlight how much money you saved in raw infrastructure while forgetting that you spent weeks of your (presumably not inexpensive) time learning the fundamentals of the platform you’re attempting to (mis)use.

This, in time, becomes a job known as “Serverless Advocate”—mostly because “Twitter Shitposter” is usually shot down by your company’s business card printing approval process.

I usually use the Serverless Framework for my Serverless projects, so let’s try something different. A quick “what won’t the Serverless Advocates shut up about this week” Twitter check later shows that the CDK is all the rage. I’ve taken glances at the CDK before and come away more confused than when I started.

Let’s try the CDK

This time I know better. So I studiously avoid AWS’s official documentation, choosing instead to look at a number of blogs, tutorials, and workshops along the way.

My takeaway here is that there’s a fundamental disconnect between the folks who write about using the CDK in the sense of “here’s how you instantiate an S3 bucket with the following attributes” and the folks looking to learn about the CDK in the sense of “yeah, I don’t care about individual resources right now; how does the CDK relate to the actual code I’m writing—in this case to query Twitter?”

The CDK apparently is going to be the Most Important Part of your application, as initializing a CDK project declaratively lays out your entire structure for the application in a folder structure it unilaterally spits out. It takes the time to build out stubs for unit testing, mumbles about where your application code should live, and is suspiciously silent on questions such as: “Okay, now that my infrastructure is built I’m set with the CDK bits, how do I update the application code and deploy that repeatedly without potentially breaking working infrastructure?”

Having a clean separation between things like “my DynamoDB table” and “the application code” is important because some changes to the former result in “deleting the table (along with all of its historical data) and recreating it,” which for non-trivial use cases seems…less than ideal.

My solution was to give up in frustration and toss the CDK out entirely as “either not baked yet or not for me” and instead delve into the Serverless Application Model and its attendant CLI (sam-cli).

Well that sucked; let’s try the sam-cli

The sam-cli is appealing in some ways. It appears as if AWS took a look at the Serverless Framework, decided not to acquire them like a more pragmatic company might have, and instead tried to implement something similar and vastly more limited themselves. (“It only supports AWS resources!” “We won’t support secure strings in AWS Systems Manager Parameter Store!” “We’ll draw confusing boundaries between whether this is a tool or a mental model!”)

I will say that a quick sam init got me up and running with a straightforward skeleton of a Python application. I then edited the relevant files and came up with a template that spelled out what I needed: a Lambda function, a CloudWatch scheduled event to invoke it, and a DynamoDB table to which it could record data.

It took a bit of work to get the IAM permissions right; the default “full control to all DynamoDB tables in this AWS account” was hilariously overbroad whereas getting it scoped to a single table took a disturbing amount of digging and tweaking. This would imply to my mind that “narrowly scoped IAM permissions” aren’t really a thing in the Serverless world, and I dread what that’s going to mean for the future.

A series of sam build and sam deploy steps got this working reasonably well. I could even sam invoke and test my code with a locally generated event.

The missing parts

I wound up doing this in VS Code, as a departure from my beloved Vim. The CDK Explorer was less than helpful, and the SAM application stuff was also a bit hit-or-miss. I use aws-vault to assume roles in various profiles; VS Code ignores that entirely and deploys to my default account, which is less than awesome.

I use AWS Systems Manager Parameter Store to retain things like API keys and various other strings; the SecureString form is apparently not supported in sam-cli (but is in the Serverless Framework).

Neither the CDK nor SAM plugins or process of getting this thing shipped ever really delved into how to maintain it going forward from a CI/CD perspective, having multiple folks working on it, or recording basics such as “which account is this currently deployed into?” These are bigger questions than these tools are presumably scoped to answer. But when I come back to this thing in six months, I’m not going to remember what I was doing; I’ll get to play Cloud Archeologist to uncover the answer.

It’s also unclear what files should or should not be included in what I put up on GitHub. The samconfig.toml file has my-account-specific entries in it (e.g., which S3 bucket it uses for the deploy), so it’s important to keep this file. But I don’t really want to expose that to the public. Which files should I exclude? It’s patently hazy.

What’s next

So, the code itself is currently available on GitHub. It’s pretty bad. But that’s okay: It works!

This is the “shove the Twitter follower count into DynamoDB” portion. Until it’s run for a few months, there won’t be much point in building any form of consumption dashboard to show the results (using, say, Grafana, Tableau, or absolutely not Amazon QuickSight). As a result, this is just the data-gathering portion. If anyone has ideas for visualizing the data, I’m all ears!

Relatedly, it’s entirely possible that DynamoDB is the wrong choice for the job. Given the simplicity of the data model, I can dump it into a CSV fairly easily, and then import it into basically anything when the time comes. Maybe this is a time series story? Perhaps an SQLite file stored in S3? Perhaps a “statistics” RDS instance that it speaks to? Route 53 is always an option, but it’s hard doing time series data with it at the moment.

Picking a software license is always a challenge. I, of course, selected MongoDB’s SSPL so that AWS couldn’t turn this thing into a managed service offering that tells people how many Twitter followers they have over time.

“That is patently ridiculous” you might think. “There is zero chance of that happening. If anything, they’d build their own thing, so isn’t this a desperate cry for attention and relevance because open source isn’t really a business model?” I suggest that you direct those very valid objections to the Graylog folks.

Until then, help me test whether this thing works by following me on Twitter if you’re not already. Otherwise, feel free to reach out and tell me how terrible my code is and what you would have done differently. Pull requests welcome; pull requests with humorous comments are appreciated.

The post Counting Twitter Followers over Time, the Corey Quinn Way appeared first on Last Week in AWS.