I Built an MMORPG on AWS Lambda. Here's What It Forced Me to Throw Away.

· 6 min · hackerquest.online

Every tutorial on building a multiplayer game starts the same way: pick a persistent server. Node with WebSockets, maybe Colyseus, maybe a dedicated Go or Rust process. Spin up EC2. Put it behind a load balancer. Welcome to the club.

I did not do any of that. HackerQuest is a browser-based hacking MMORPG with PvP, player trading, a dark-web marketplace, 21 NPCs with branching dialogue, and a persistent world. The whole backend runs on 15 Lambda functions, one DynamoDB table, and a Cognito user pool. There is no server sitting idle. When no one is playing, the bill is zero.

This works. It also forced me to redesign a handful of things I assumed a game "just has." Here is what the architecture looks like, and where it stops making sense.

Why Lambda Is Fine for This Genre (and Terrible for Others)

The games people use to argue against serverless are the wrong games. Nobody is suggesting you host Counter-Strike on Lambda. Twitch-reaction PvP needs a persistent process holding state in memory at sub-100ms tick rates. Lambda is not that.

HackerQuest is not that either. It is turn-based at heart. A player types scan, the server runs a probability roll, writes the result to DynamoDB, returns JSON. The next action might come 10 seconds later or 10 minutes later. Between actions, there is nothing to keep warm. The game genre I grew up on -- RuneScape, Torn City, EVE Online's market game loop -- is almost perfectly shaped for pay-per-request compute.

The test I use: can the state of the game be reconstructed from scratch by reading the database, with no in-memory context from the last tick? If yes, Lambda works. If no, you need a process.

Server Authority When the Client Is a Sandbox

A browser game is a JavaScript bundle the player can open the devtools on. If there is a variable called balance and the client trusts it, that variable is already 9,999,999.

The rule in this codebase is blunt: the client renders, the server decides. Every action that changes state goes through a Lambda. The client sends "I want to hack 10.0.41.41" -- the Lambda checks if the target exists, pulls the player's hacking skill, rolls the success probability server-side, writes the outcome, and returns the result. The client animates it. If someone edits the JS to claim they have hacking 999, the Lambda reads their real level from DynamoDB and ignores the claim.

The wrinkle is offline mode. HackerQuest has a BACKEND_ENABLED flag that lets the game run entirely in localStorage for solo play. That is a different universe. Offline, the client is the authority and the player is cheating themselves, which is fine. The moment the flag flips and a player is on the leaderboard, the server is the only source of truth and the client's copy of state is a cache that exists to make the UI feel fast.

Building both modes from day one was more work, but it meant the game was playable while the backend was still being written. I could ship the frontend to S3 and let people mess with the mechanics before I had a single Lambda deployed.

One DynamoDB Table for the Entire Game

The instinct when you have a player entity, an item entity, a chat message entity, a leaderboard, and a marketplace listing is to make five tables. Don't. You end up paying for provisioned throughput on five tables, writing join logic in application code, and dealing with transactions across them.

HackerQuest uses a single DynamoDB table in PAY_PER_REQUEST mode with 3 global secondary indexes. The partition key is a generic PK string, the sort key is SK. A player is PK=PLAYER#abc123, SK=PROFILE. That player's inventory items are PK=PLAYER#abc123, SK=ITEM#xyz, which means fetching all their items is one query. A chat message is PK=CHAT#global, SK=TS#1745020301, so the last 50 messages is a range query on SK in reverse order with Limit=50.

Leaderboards are where the GSIs earn their keep. GSI1 flips the schema so PK=LEADERBOARD#hacking and SK is a zero-padded skill score. The top 100 ranked hackers is a single query with no application sorting. The marketplace listings use a second GSI keyed on item category. The third handles active PvP bounties.

Total monthly spend on storage plus per-request at the current player count: a dollar or two. If the game gets 10x bigger, the bill goes up roughly 10x, and I do not have to rewrite anything.

Energy Regen Without a Cron Job

Every RPG has a stamina bar that refills over time. The obvious way to build it is a scheduled job that runs every minute and adds a point to everyone's energy. Do not do this. It scales badly, it wakes up a Lambda for players who are not online, and it is going to drift.

Lazy regen is the move. The player record stores energy and energy_updated_at. When a request comes in, the Lambda computes elapsed = now - energy_updated_at, adds floor(elapsed / 180) points (capped at max), and writes the result with the new timestamp. A player who logs back in 6 hours later gets their full refill on the next request. A player who never logs in costs zero.

The math is on one function in one file. No cron, no queue, no drift. Scheduled jobs still exist in the stack -- mining payouts every 15 minutes, botnet decay every hour, leaderboard snapshots every hour -- but those are actual scheduled mutations that cannot be computed lazily. Anything that can be derived from a timestamp should be.

The Cognito Trigger That Cost Me an Afternoon

New users sign up through Cognito. When they confirm their email, I want a Lambda to fire and create their initial DynamoDB player record. Cognito calls this a post-confirmation trigger. The SAM template should wire it up.

It did not. The first time I tried to deploy the full stack with the trigger defined in the same template as the user pool, SAM complained about a circular dependency. The Lambda needs permission to write to DynamoDB, the user pool needs the Lambda ARN as a trigger, and somewhere in CloudFormation's dependency graph, those two chase their own tail.

The fix is not elegant but it works: deploy the Lambda and the user pool separately in SAM, then attach the trigger in a second step with a one-line AWS CLI call.

aws cognito-idp update-user-pool \
  --user-pool-id us-east-1_GgMLLnvXB \
  --lambda-config PostConfirmation=arn:aws:lambda:us-east-1:692859945539:function:hq-auth-trigger

The Lambda had the cognito-idp.amazonaws.com invoke permission from the original deploy, so it just worked. I wrapped the command in the deploy script with a guard that only runs it when the trigger is not already attached. If I had done this in the template from the start, I would still be fighting CloudFormation.

When This Architecture Stops Working

Be honest about the limits. I would not build any of the following on Lambda:

HackerQuest avoids all of these by being a terminal game. The world is a set of database rows. Actions are discrete. Nobody cares if there is a 200ms latency before their hack command resolves -- it reads more cinematic when there is.

The whole backend -- 15 Lambdas, one table, one user pool, three scheduled jobs -- fits in a single SAM template. It deploys in under three minutes. I can tear the entire stack down and redeploy it from scratch while making coffee. For the kind of game this is, that is worth more than any persistent-server feature I gave up.