GOOSEMAN(1) GOOSEMAN(1)

NAME

gnome-trader

PROBLEM STATEMENT

The RuneScape gold market runs on human latency. Buyer and seller meet in-game, do the trade by hand, and settle payment off-platform. Gnome Trader closes that loop in software: payment in SOL, delivery by an always-on swarm of bots.

fig. 1 · gnome-trader · end-to-end purchase and delivery

A purchase from the shop end of the system: the customer picks a seller and an amount, pays from their Phantom wallet, and a bot in the seller's swarm delivers the gold in-game within seconds of on-chain confirmation. The same flow would normally take a person twenty minutes to a day, depending on when the seller wakes up, and fails routinely when the payment is reversed, the seller cancels, or the buyer never shows up.

ARCHITECTURE

Four backend services around a Redis state hub and a Postgres durable store, plus a fleet of Java bots that connect over WebSocket and a React/TypeScript client served by the main API.

   ┌──────────────┐                         ┌──────────────┐
   │   Browser    │                         │   Main API   │
   │  (React UI)  │───────── http ─────────▶│   (FastAPI)  │
   └──────────────┘                         └──────────────┘

   ┌──────────────┐                         ┌──────────────┐
   │   Browser    │                         │  Client WS   │
   │  (React UI)  │◀──────── ws push ───────│              │
   └──────────────┘                         └──────────────┘

   ┌──────────────┐                         ┌──────────────┐
   │  Java Bots   │                         │    Bot WS    │
   │  (In-Game)   │◀────── websocket ──────▶│              │
   └──────────────┘                         └──────────────┘

   ┌──────────────┐                         ┌──────────────┐
   │   Helius     │                         │   Orch API   │
   │  (Webhooks)  │──────── webhook ───────▶│              │
   └──────────────┘                         └──────────────┘

   ┌─ shared substrate ──────────────────────────────────────┐
   │  Redis     · in-flight state · jobs · claims · queues   │
   │  Postgres  · durable state · users · orders · payments  │
   └─────────────────────────────────────────────────────────┘
fig. 2 · system architecture

Main API is the user-facing FastAPI. It handles Solana-signature authentication, orders, seller and swarm configuration, and admin operations. It owns the writes to Postgres and creates the initial gold reservations in Redis.

Orch API is the coordinator. It receives Helius webhooks confirming on-chain Solana payments, creates the Redis job keys that the delivery side reads, and runs the cron jobs that pay sellers out, refresh the SOL price, and drive a three-stage graceful shutdown. A background loop reaps orphaned claims when bots crash.

Bot WS is the real-time hub for the Java bots. Each bot opens a single WebSocket on startup and that connection carries everything in both directions. The bot heartbeats and polls every five seconds, sends a fresh inventory whenever its contents change, reports successful and failed deliveries, and emits grid positions and terminal log lines that show up live in the customer's browser as the trade unfolds. The server pushes back the bot's swarm configuration on connect, its directive on every poll (active, paused, stopped, or shutdown), and any job the bot should pick up. Jobs are claimed atomically through Lua scripts in Redis, with a twenty-second TTL so a crashed bot releases its work automatically and another bot can take it over.

Client WS is a thin WebSocket gateway that forwards messages from a Redis stream out to the customer's browser, so the customer sees the bot moving on a grid and a live terminal log as the delivery happens.

The substrate band at the bottom of the diagram is shared by every service. Redis is the source of truth for in-flight state: which jobs exist, which bots are alive, who is currently claiming what. Postgres is the durable store for users, orders, payments, payouts, and seller stats. Every box in the row above reads from and writes to both.

LIFECYCLE

An order moves through four phases: purchase, payment confirmation, delivery, and payout. The first two and the last are mostly one-step transitions driven by user input, webhooks, and cron. The interesting phase is delivery, where multiple actors run concurrently and the system has to handle crashes, lost messages, and multi-bot coordination.

Purchase. The customer picks a seller and an amount on the shop. Main API reserves the gold in Redis with a TTL, creates Order and Payment records in Postgres, and returns a Solana payment instruction. The customer's Phantom wallet prompts them to sign.

Payment confirmation. Once the transaction is on-chain, Helius posts a webhook to Orch API. Orch verifies the amount against the Payment record (within a tolerance window, idempotent on the on-chain signature), advances the Order to PENDING_DELIVERY, and creates the Redis job keys that the delivery side reads.

Delivery. From here a bot in the seller's swarm picks up the work. The diagrams below come from the bot-ws-v2 docs and show the protocol as it actually runs.

  BOT                           SERVER                          REDIS
   │                               │                               │
   │  ── poll ──────────────────►  │                               │
   │                               │  ── find_and_claim_job ────►  │
   │                               │  ◄─ claim created (TTL=20s) ─ │
   │  ◄─ state {job: {...}} ─────  │                               │
   │                               │                               │
   │  [Trade gold in-game]         │                               │
   │                               │                               │
   │  ── items_update {Coins} ──►  │  ── HSET bot:items ────────►  │
   │  ◄─ ack ────────────────────  │                               │
   │                               │                               │
   │  ── delivery_complete ─────►  │  ── Lua: verify claim ─────►  │
   │                               │  ── HINCRBY job:delivered ─►  │
   │                               │  ── DEL job:claimed ───────►  │
   │  ◄─ ack {complete: true} ───  │                               │
   │                               │                               │
fig. 3 · successful delivery

When a bot crashes mid-delivery, the system recovers itself through Redis TTLs. The crashed bot's claim expires after twenty seconds and the work returns to the queue for another bot to take.

  BOT A                         SERVER                          REDIS
   │                               │                               │
   │  ── poll ──────────────────►  │                               │
   │  ◄─ state {job: 5000g} ─────  │  job:claimed = Bot A          │
   │                               │                               │
   │  [CRASH]                      │                               │
   │     ✗                         │                               │
   │                               │         [20s passes]          │
   │                               │  ◄─ job:claimed expired ────  │
   │                               │                               │
  BOT B                            │                               │
   │  ── poll ──────────────────►  │                               │
   │                               │  ── find_and_claim_job ────►  │
   │                               │     available = 5000 - 0      │
   │                               │  ── create job:claimed ────►  │
   │  ◄─ state {job: 5000g} ─────  │     (Bot B now owns claim)    │
   │                               │                               │
   │  [Bot B continues delivery]   │                               │
   │                               │                               │
fig. 4 · self-healing on bot crash

When a single order is too large for one bot's inventory, the order is split across multiple bots. Each bot claims as much as it can carry, and the next poll from another bot picks up the remainder.

  BOT A (3000g)                 SERVER                          REDIS
   │                               │                               │
   │  ── poll ──────────────────►  │  job:items = 5000             │
   │                               │  job:delivered = 0            │
   │                               │  available = 5000             │
   │                               │  claim = min(3000, 5000)      │
   │  ◄─ state {job: 3000g} ─────  │  job:claimed = {Bot A, 3000}  │
   │                               │                               │
   │  [Deliver 3000g]              │                               │
   │  ── delivery_complete ─────►  │  job:delivered = 3000         │
   │  ◄─ ack {complete: false} ──  │  (partial, 2000 remaining)    │
   │                               │                               │
  BOT B (4000g)                    │                               │
   │  ── poll ──────────────────►  │  available = 5000 - 3000      │
   │                               │  claim = min(4000, 2000)      │
   │  ◄─ state {job: 2000g} ─────  │  job:claimed = {Bot B, 2000}  │
   │                               │                               │
   │  [Deliver 2000g]              │                               │
   │  ── delivery_complete ─────►  │  job:delivered = 5000         │
   │  ◄─ ack {complete: true} ───  │  ORDER COMPLETE → cleanup     │
   │                               │                               │
fig. 5 · partial delivery, multi-bot

Payout. When all the gold is delivered the Order is marked COMPLETE and a payout:{order_id} key is set in Redis. A cron in Orch API picks it up once per minute, sends ninety-five percent of the SOL to the seller's wallet, and records the result in the Payout table. Three failed attempts marks the payout FAILED.

EVOLUTION

Gnome Trader went through four versions over two years. Each one was a full rebuild rather than a refactor, because each version taught me enough about what did not work that the next was easier to start over than to retrofit.

v1 (Jan 2024). Single bot, simple web API, long polling. Built to prove that a Solana payment could trigger an automated in-game delivery without anyone in the loop. It made the first sale and the concept was viable.

fig. 6 · v1 demo

v2. The first multi-tenant version. Other sellers could install my bot through the RuneMate client and connect their own swarm without exposing their RuneScape credentials. Buyers got a real UI with a status terminal, an order history, and transaction details.

fig. 7 · v2 demo
Gnome Trader v2 interface
fig. 8 · v2 interface

v3. A full rebuild around Redis and WebSockets, and the version that introduced per-seller shop pages so each seller had their own storefront. Long polling could not keep up with the order volume the marketplace was generating. Most of the system became event-driven, but delivery processing still ran through a Celery worker. To get atomic payment handling out of a non-atomic system I built a bloated workaround that worked but only processed one order at a time. When I finished v3 I immediately knew it needed to go in the bin.

Gnome Trader v3 interface
fig. 9 · v3 interface

What changed between v3 and v4 was a realisation more than a feature. Halfway through v3 I knew I was in voodoo territory. There had to be a simpler way, because every other system that handles concurrent state must have the same problem. That led me to read about Redis atomics and to the discovery that Redis is far more than key-value storage. Atomic Lua scripts and TTL-driven state turned the workarounds into a few lines of correct, race-free code.

v4 (Jan 2026). The version this page documents. Heavier reliance on Redis still, with more state moved into TTL keys and the cleanup logic that had grown around v3 simply gone. Redis streams replaced Google Pub/Sub for in-process event handling. The order, payment, and delivery flows were refactored around atomic Lua scripts. The codebase became dramatically leaner without losing any functionality. It was striking how often a smarter design turned out to be the smaller one.

ENGINEERING DECISIONS

The crux of Gnome Trader was the bot WebSocket. v4 went through about five refactors of the bot WS alone, looking for the right compromise between push and pull, and the right split of authority between the bot and the server. Most of the decisions below are downstream of those refactors.

WebSocket transport with bot-driven polling. The final design uses a long-lived WebSocket but the bot polls over it every five seconds, and the server reads from Redis to reply. Plain HTTP long-polling was tried and could not deliver the snappy responsiveness needed under load. A fully server-pushed WebSocket meant the server had to manage every bot's state machine, which broke down the moment the connection died, and WebSockets die all the time. Polling over a warm WebSocket combines the millisecond delivery latency of a socket with the simpler state model of a pull-based client. The server stays stateless between polls.

Atomic claim creation via Lua. Multiple bots polling simultaneously can race for the same job. Lua scripts run atomically inside Redis, so the claim operation (read available, write claim, return result) is one indivisible step. No double-claims, no missed claims, no application-level locks.

Self-healing through TTLs. Every piece of bot state has a TTL. Heartbeat, session, and items expire after ten seconds. Claims expire after twenty. A bot that crashes or disconnects simply lets its state expire, and the work falls back into the pool for another bot. There is no "is this bot still alive?" check, no dead-letter queue, no explicit cleanup path. Failure becomes a no-op.

  BOT                           SERVER                          REDIS
   │                               │                               │
   │  [DISCONNECT]                 │                               │
   │                               │                               │
   │                               │         [10s passes]          │
   │                               │  ◄─ bot:items expired ──────  │
   │                               │  ◄─ bot:session expired ────  │
   │                               │                               │
   │                               │         [20s passes]          │
   │                               │  ◄─ job:claimed expired ────  │
   │                               │     (claim auto-released)     │
   │                               │                               │
   │  [RECONNECT]                  │                               │
   │  ── connect ───────────────►  │  ── create session ────────►  │
   │  ◄─ init_ok ────────────────  │                               │
   │                               │                               │
   │  ── items_update {Coins} ──►  │  ── HSET bot:items ────────►  │
   │  ◄─ ack ────────────────────  │     (inventory restored)      │
   │                               │                               │
   │  ── poll ──────────────────►  │  ── find_and_claim_job ────►  │
   │                               │     (can claim again)         │
   │                               │                               │
fig. 10 · reconnect after disconnect

Idempotency by state, not by message ID. When a delivery_complete ACK is lost and the bot retries, the server has already released the claim, so the duplicate returns not_your_claim. The bot reads that as success because it knows the in-game trade happened. This works because of an authority split: the bot owns in-game truth (it executed the trade), the server owns the order ledger. They never overrule each other. No msg_id dedup table, no replay buffers, no idempotency keys.

  BOT                           SERVER                          REDIS
   │                               │                               │
   │  ── delivery_complete ─────►  │  ── verify & release claim ►  │
   │  ◄─ ack ─────────────── ✗     │  (claim deleted)              │
   │     (lost)                    │                               │
   │                               │                               │
   │  [wait 5 seconds]             │                               │
   │                               │                               │
   │  ── delivery_complete ─────►  │  ── verify claim ──────────►  │
   │                               │  ◄─ claim doesn't exist ────  │
   │  ◄─ error: not_your_claim ──  │                               │
   │                               │                               │
   │  [Bot knows trade succeeded   │                               │
   │   in-game, interprets as OK]  │                               │
   │                               │                               │
   │  ── poll ──────────────────►  │  (fresh state, job gone)      │
   │                               │                               │
fig. 11 · retry after lost ACK

Partial deliveries via min(inventory, remaining). A single order can be larger than one bot's coin inventory. Instead of failing, the claim is the smaller of what this bot can carry and what the order still needs. Big orders naturally span multiple bots without coordination.

Webhook idempotency keyed on Solana signature. Helius can post the same transaction more than once. Orch stores the on-chain signature on the Payment record and ignores duplicates. Amount matching uses a tolerance window (the larger of one percent of expected or 500,000 lamports, capped at 10 million) to absorb fee rounding.

Background reaper for orphaned claims. The TTL system handles most failures. The reaper handles what slips through: a loop in Orch scans Redis every couple of seconds, marks orders FAILED when their job keys expire mid-delivery, and restores any claimed items back to the job before deleting the orphan claim.

STACK

backend
FastAPI (Python), asyncpg, async workflows
frontend
React 19, TypeScript, Vite, WebTUI CSS kit (terminal-styled UI)
data
PostgreSQL via SQLModel, Redis (atomic Lua, streams, TTL keys)
real-time
WebSockets
blockchain
Solana (solders / solana-py on server, @solana/web3.js on client), Phantom wallet, Helius transaction webhooks
bots
Java, RuneMate botting framework
infra
Google Cloud Run, Cloud Build, Pulumi, Google Secret Manager

POSTMORTEM

Summary. Gnome Trader was shut down in 2026, not because the software stopped working, but because the platform it depended on became unstable enough that the system's reliability requirements could no longer be met.

The dependency chain. Every layer of Gnome Trader ultimately depended on the RuneScape game client, which is controlled by Jagex. The bots ran inside the RuneMate botting client, which is itself an arms-race opponent of Jagex. When the dependency chain looks like that, a hostile end state at the top cascades all the way down. There is no version of "build it tighter" that fixes a hostile platform.

Routine update churn. Even ignoring anti-bot measures, the weekly RuneScape update cycle was already a structural problem. Any breaking change in the game forced RuneMate to ship a fix, and during the gap between change and fix every bot in the swarm could behave unpredictably. The platform had a safe-shutdown system that stopped accepting new orders and drained the in-flight ones, but activating it depended on a human reading the RuneMate team's stability reports. The right next step would have been to scrape the RuneLite git for breaking changes and trigger the shutdown automatically. I never got there.

Jagex's anti-botting breakthrough. Through late 2025 and into 2026, Jagex appears to have finally developed an effective anti-botting layer after twenty years of trying. The public ban statistics tell the story: real-world-trading bans went from around 4,000 in December to 44,000 in April (see aggrgtr.com/player-support). That is an 11x increase in a few months. The botting ecosystem absorbed an enormous amount of damage, and the cadence of game-side updates accelerated in parallel.

Why the blast radius mattered. Botting in general can survive this kind of instability because the cost of failure is bounded: a banned account, lost XP, a wasted afternoon. Gnome Trader's blast radius was different. A failure mode could mean a customer paid SOL and lost it, or a seller delivered gold and never got paid. The system was designed around hard guarantees of correctness exactly because of this asymmetry: idempotent payments, atomic claims, automatic refunds. But guarantees in software can only protect against software failures. They cannot protect against an upstream platform being adversarial.

What I would do differently. The technical decision I would not change is the v4 architecture: the WebSocket bus, the atomic claim model, the TTL self-healing. Those held up under the entire operational window. The strategic decision I would change is the dependency. Building a platform on top of an adversarial third-party game was the original sin, and no amount of engineering rigour at lower layers can compensate for it.

Future. The WebSocket and orchestration core is general. It is not bound to RuneScape, RuneMate, or Solana in any meaningful way. The same shape would work for any system where a fleet of clients claims work atomically and reports completion in real time. I will revisit Gnome Trader if the OSRS botting ecosystem stabilises, or if a similar problem appears with a more cooperative platform underneath.

SCREENSHOTS

A few views of the v4 interface, terminal-styled via the WebTUI CSS kit.

Gnome Trader landing page with live sales feed
fig. 12 · landing page
Admin orders view with payment, payout, and order log per order
fig. 13 · orders dashboard
Swarm configuration list with directives, statuses, and quantities
fig. 14 · swarm list
Edit swarm dialog with name, price, min/max, level cap, and location
fig. 15 · edit swarm
Edit location dialog with world, name, and trade area coordinates
fig. 16 · edit location

SEE ALSO