Skip to content

0058 — Monetization, admin authorization, and webhook security (Group M)

Status

Accepted. Supersedes ADR-0027 on the admin authorization model.

Context

Five audit findings cluster around the auth/billing/admin perimeter. R-V3 (free-tier visibility of unrevealed Nexus warp) is closed under Group J's worldgen pipeline cleanup; the four remaining sit on three orthogonal surfaces:

  • Webhook security (A-D3): PayPal webhook signature validation had an env-var bypass and a log-only failure path. Either of those, alone, lets a forged webhook commit subscription mutations.
  • Admin authorization (A-F2): the flat User.is_admin boolean was the only gate. A compromised admin account had unbounded blast radius — could create free Region Owner subscriptions, replay arbitrary webhooks, terminate regions.
  • Subscription consistency (A-F1): subscription-tier upgrades during an active ARIA dialogue produced inconsistent multiplier values mid-exchange.
  • Multi-regional takeover races (A-F3): concurrent takeover claims for the same region had no documented serialization; losing claims could leave escrow held without a corresponding ownership state.

Decision

A-D3 — PayPal webhook signature validation: mandatory + binding

The webhook handler validates every incoming PayPal webhook in production. Validation has three checks; all must pass:

  1. Signature verification against PayPal's webhook key (PSP-side rotation handled per PayPal's standard procedure).
  2. Timestamp window: the webhook's event_time must be within 5 minutes of server now(). Older or far-future timestamps are rejected — this closes the replay-attack window.
  3. Idempotency: the webhook's event_id is checked against a processed_webhook_events table (UNIQUE INDEX (event_id)). If the event has been processed before, the handler returns HTTP 200 (PayPal expects success on duplicate delivery) but does not re-apply the mutation.

Validation failure → HTTP 401. Not log-and-allow. The processed_webhook_events row is inserted inside the same transaction as the subscription mutation, so a successful insert means the mutation also committed.

Bypass code path (per A-D3 user pick): production binary contains zero bypass code. Dev/test/CI uses a separate config-gated path, gated at module-load time by if settings.environment != 'production'. The gameserver fails to start with a clear BypassFlagInProductionError if PAYPAL_WEBHOOK_BYPASS is set while environment == 'production'. This is a fail-fast import-time check, not a runtime branch — even if a misconfiguration deployed, the process never starts serving traffic.

A-F2 — Admin authorization: fine-grained scopes + audit log

The flat User.is_admin boolean is replaced with a scope-based authorization model. Per the user pick, granularity is fine-grained (15–20 scopes), with no two-person approval rule — accountability is delivered through a comprehensive audit log + an admin-action review queue.

Scopes (Launch list):

Scope Capability
admin.players.view Read player records, ARIA logs, conversation history (subject to GDPR-anonymized fields).
admin.players.suspend Suspend a player account (in-game or full-platform).
admin.players.adjust_rep Manually adjust personal or faction rep (audit-noted).
admin.players.transfer_assets Move assets between accounts (player-merge, account-recovery cases).
admin.subscriptions.view Read subscription state.
admin.subscriptions.modify Create / cancel / change-tier subscriptions outside the PayPal flow (e.g., comp accounts).
admin.subscriptions.refund Issue refunds outside PayPal's interface.
admin.webhooks.view Inspect webhook event log.
admin.webhooks.replay Replay a previously-rejected or held webhook event.
admin.regions.view Read region state.
admin.regions.create Provision a new region outside the standard Region-Owner subscription flow.
admin.regions.terminate Force-terminate a region (cleanup-cascade trigger).
admin.regions.transfer_ownership Transfer region ownership (account-recovery cases).
admin.aria.audit Read ARIA security logs (90-day raw window per ADR-0057 A-V3).
admin.multi_account.review Decide on MultiAccountCluster review-queue entries (per ADR-0056).
admin.bang.regenerate Trigger a region or galaxy regeneration via the bang integration.
admin.scopes.grant Grant scopes to other admin users. (This is the meta-scope; the bootstrap superadmin starts with it.)
admin.scopes.revoke Revoke scopes from admin users.
admin.audit.view Read the admin-action audit log.

Scopes are stored on a join table AdminScopeGrant (user_id, scope, granted_by, granted_at). The legacy User.is_admin boolean is retained as a derived view (is_admin = EXISTS scope grant) for backwards compatibility but is no longer the authorization gate.

Audit log + review queue (no two-person rule per A-F2 user pick):

  • Every admin action writes an AdminActionLog row: admin_user_id, scope_used, action, target_type, target_id, payload_snapshot, result, at.
  • A daily admin-review queue surfaces high-impact actions (anything in admin.subscriptions.*, admin.webhooks.replay, admin.regions.terminate, admin.scopes.*) for retrospective review by another admin (any holder of admin.audit.view). The review acknowledgement is itself logged.
  • Actions are not blocked pending review. The model is: trust admin selection + scope assignment up-front; catch deviations through audit, not through gating.

Bootstrap: the first user is granted admin.scopes.grant + admin.scopes.revoke + admin.audit.view via a one-time migration. From there, scopes are assigned through the admin UI. The bootstrap superadmin can grant themselves additional scopes (audit-logged); the alternative — requiring a peer to grant — bricks single-operator deployments.

A-F1 — Subscription tier snapshot at ARIA dialogue start

The ARIA dialogue handler reads subscription_tier and aria_bonus_multiplier exactly once at dialogue start. The values are carried as a snapshot through the entire exchange (input handling → provider call → response → persist). The next exchange reads fresh values.

Concretely: the dialogue context object (per ../SYSTEMS/aria-dialogue.md) gains an auth_snapshot field populated at step [1] (request received). Steps [2]–[10] of the dialogue flow read from the snapshot, never from the live Player row. A subscription upgrade landing mid-dialogue therefore takes effect on the next exchange — the in-flight exchange completes consistently with the tier the player paid for at start-time.

This rule applies to the consciousness multiplier (per ADR-0017) too: the multiplier is computed from the snapshot's aria_consciousness_level, so a level-up happening mid-exchange shows up on the next exchange. The continuous-multiplier semantics from ADR-0057 A-D2 still apply at the per-exchange granularity.

A-F3 — Multi-regional takeover race serialization

Region takeover claims serialize through a SELECT FOR UPDATE lock on the Region row at claim time. The flow:

  1. Client posts a takeover claim with the offer amount.
  2. Server opens a transaction; SELECT * FROM regions WHERE id = :region_id FOR UPDATE.
  3. Inside the lock, check whether another claim has already been accepted (Region.takeover_state IN ('claim_won', 'transferring')). If so, this claim loses — write a TakeoverIntent row with state = 'lost', refund the offer escrow to the claimant's wallet, commit. Return ERR_REGION_TAKEOVER_LOST.
  4. Otherwise this claim wins — write TakeoverIntent.state = 'won', set Region.takeover_state = 'claim_won', commit. Return success.

The escrow refund for losing claims happens inside the same transaction as the loss decision — the row is never in a state where money is held without a corresponding state. State machine on TakeoverIntent:

pending  ──claim_accepted──▶  won         (region transfer follows)
pending  ──claim_rejected──▶  lost        (escrow refunded in same tx)
won      ──transfer_done──▶   transferred
won      ──transfer_failed──▶ failed      (escrow refunded; region stays with old owner)

Every transition is atomic. The losing-claim refund timing was the original gap A-F3 called out; the transactional refund closes it.

Consequences

  • Webhook handler: the processed_webhook_events table is new. Insert + subscription-mutation share a transaction; PayPal's at-least-once delivery semantics are handled by the idempotency check.
  • Admin model migration: existing admins (those with User.is_admin = true) get a default scope grant of all 19 scopes via a one-time migration. Operators trim down from there. Backfill is forward-only; the legacy boolean stays as a computed property.
  • Admin UI: per ../OPERATIONS/admin-ui.md, the page surface for scope management lives at /admin/scopes. The audit-log and review-queue pages live at /admin/audit and /admin/review-queue. Scope-gated routes return 403 with the missing scope name in the response body.
  • ARIA dialogue context: the auth_snapshot field is additive to the existing dialogue context. Implementation is one read at request entry.
  • Takeover races: the SELECT FOR UPDATE adds a serialization point under contention. For Launch-scale regions, contention is rare (takeover events are user-initiated and relatively infrequent); the lock's contention cost is acceptable. If hot-region takeovers later became a measurable issue, optimistic-lock-with-retry would be the natural follow-up — punted until measured.
  • Audit log volume: every admin action persists. A separate retention rule applies (default: 5 years for compliance). The log is append-only; no in-place edits.
  • No two-person rule means the audit + review queue is the load-bearing accountability layer. Operators must commit to actually reviewing the queue. A sweep-test alarm fires if the review queue's 30-day-old unacknowledged count crosses a threshold (target: review acknowledged within 7 days).

Alternatives considered

  • Coarse-grained roles instead of scopes. Considered (was the recommended pick); rejected per user direction. Fine-grained gives operators precise dial-up control on what each admin can do; coarse roles bundle capabilities and require role proliferation when needs diverge.
  • Two-person rule on high-impact actions. Considered (was the recommended pick); rejected per user direction. The audit log + review queue + careful scope assignment are the load-bearing controls. Avoids ops bottlenecks where pairing is impractical (off-hours incident response, single-operator deployments).
  • Webhook bypass guarded at runtime instead of import-time. Rejected — runtime branches still ship the bypass code in the production binary. Fail-fast at startup means the bypass can never be reached in prod.
  • Optimistic-lock retry on takeover race instead of FOR UPDATE. Rejected for Launch — FOR UPDATE is simpler, contention is rare. Optimistic-lock is the future-tunable if hotspot regions emerge.