0058 — Monetization, admin authorization, and webhook security (Group M)¶
Status¶
Accepted. Supersedes ADR-0027 on the admin authorization model.
Context¶
Five audit findings cluster around the auth/billing/admin perimeter. R-V3 (free-tier visibility of unrevealed Nexus warp) is closed under Group J's worldgen pipeline cleanup; the four remaining sit on three orthogonal surfaces:
- Webhook security (A-D3): PayPal webhook signature validation had an env-var bypass and a log-only failure path. Either of those, alone, lets a forged webhook commit subscription mutations.
- Admin authorization (A-F2): the flat
User.is_adminboolean was the only gate. A compromised admin account had unbounded blast radius — could create free Region Owner subscriptions, replay arbitrary webhooks, terminate regions. - Subscription consistency (A-F1): subscription-tier upgrades during an active ARIA dialogue produced inconsistent multiplier values mid-exchange.
- Multi-regional takeover races (A-F3): concurrent takeover claims for the same region had no documented serialization; losing claims could leave escrow held without a corresponding ownership state.
Decision¶
A-D3 — PayPal webhook signature validation: mandatory + binding¶
The webhook handler validates every incoming PayPal webhook in production. Validation has three checks; all must pass:
- Signature verification against PayPal's webhook key (PSP-side rotation handled per PayPal's standard procedure).
- Timestamp window: the webhook's
event_timemust be within 5 minutes of servernow(). Older or far-future timestamps are rejected — this closes the replay-attack window. - Idempotency: the webhook's
event_idis checked against aprocessed_webhook_eventstable (UNIQUE INDEX (event_id)). If the event has been processed before, the handler returns HTTP 200 (PayPal expects success on duplicate delivery) but does not re-apply the mutation.
Validation failure → HTTP 401. Not log-and-allow. The processed_webhook_events row is inserted inside the same transaction as the subscription mutation, so a successful insert means the mutation also committed.
Bypass code path (per A-D3 user pick): production binary contains zero bypass code. Dev/test/CI uses a separate config-gated path, gated at module-load time by if settings.environment != 'production'. The gameserver fails to start with a clear BypassFlagInProductionError if PAYPAL_WEBHOOK_BYPASS is set while environment == 'production'. This is a fail-fast import-time check, not a runtime branch — even if a misconfiguration deployed, the process never starts serving traffic.
A-F2 — Admin authorization: fine-grained scopes + audit log¶
The flat User.is_admin boolean is replaced with a scope-based authorization model. Per the user pick, granularity is fine-grained (15–20 scopes), with no two-person approval rule — accountability is delivered through a comprehensive audit log + an admin-action review queue.
Scopes (Launch list):
| Scope | Capability |
|---|---|
admin.players.view |
Read player records, ARIA logs, conversation history (subject to GDPR-anonymized fields). |
admin.players.suspend |
Suspend a player account (in-game or full-platform). |
admin.players.adjust_rep |
Manually adjust personal or faction rep (audit-noted). |
admin.players.transfer_assets |
Move assets between accounts (player-merge, account-recovery cases). |
admin.subscriptions.view |
Read subscription state. |
admin.subscriptions.modify |
Create / cancel / change-tier subscriptions outside the PayPal flow (e.g., comp accounts). |
admin.subscriptions.refund |
Issue refunds outside PayPal's interface. |
admin.webhooks.view |
Inspect webhook event log. |
admin.webhooks.replay |
Replay a previously-rejected or held webhook event. |
admin.regions.view |
Read region state. |
admin.regions.create |
Provision a new region outside the standard Region-Owner subscription flow. |
admin.regions.terminate |
Force-terminate a region (cleanup-cascade trigger). |
admin.regions.transfer_ownership |
Transfer region ownership (account-recovery cases). |
admin.aria.audit |
Read ARIA security logs (90-day raw window per ADR-0057 A-V3). |
admin.multi_account.review |
Decide on MultiAccountCluster review-queue entries (per ADR-0056). |
admin.bang.regenerate |
Trigger a region or galaxy regeneration via the bang integration. |
admin.scopes.grant |
Grant scopes to other admin users. (This is the meta-scope; the bootstrap superadmin starts with it.) |
admin.scopes.revoke |
Revoke scopes from admin users. |
admin.audit.view |
Read the admin-action audit log. |
Scopes are stored on a join table AdminScopeGrant (user_id, scope, granted_by, granted_at). The legacy User.is_admin boolean is retained as a derived view (is_admin = EXISTS scope grant) for backwards compatibility but is no longer the authorization gate.
Audit log + review queue (no two-person rule per A-F2 user pick):
- Every admin action writes an
AdminActionLogrow:admin_user_id,scope_used,action,target_type,target_id,payload_snapshot,result,at. - A daily admin-review queue surfaces high-impact actions (anything in
admin.subscriptions.*,admin.webhooks.replay,admin.regions.terminate,admin.scopes.*) for retrospective review by another admin (any holder ofadmin.audit.view). The review acknowledgement is itself logged. - Actions are not blocked pending review. The model is: trust admin selection + scope assignment up-front; catch deviations through audit, not through gating.
Bootstrap: the first user is granted admin.scopes.grant + admin.scopes.revoke + admin.audit.view via a one-time migration. From there, scopes are assigned through the admin UI. The bootstrap superadmin can grant themselves additional scopes (audit-logged); the alternative — requiring a peer to grant — bricks single-operator deployments.
A-F1 — Subscription tier snapshot at ARIA dialogue start¶
The ARIA dialogue handler reads subscription_tier and aria_bonus_multiplier exactly once at dialogue start. The values are carried as a snapshot through the entire exchange (input handling → provider call → response → persist). The next exchange reads fresh values.
Concretely: the dialogue context object (per ../SYSTEMS/aria-dialogue.md) gains an auth_snapshot field populated at step [1] (request received). Steps [2]–[10] of the dialogue flow read from the snapshot, never from the live Player row. A subscription upgrade landing mid-dialogue therefore takes effect on the next exchange — the in-flight exchange completes consistently with the tier the player paid for at start-time.
This rule applies to the consciousness multiplier (per ADR-0017) too: the multiplier is computed from the snapshot's aria_consciousness_level, so a level-up happening mid-exchange shows up on the next exchange. The continuous-multiplier semantics from ADR-0057 A-D2 still apply at the per-exchange granularity.
A-F3 — Multi-regional takeover race serialization¶
Region takeover claims serialize through a SELECT FOR UPDATE lock on the Region row at claim time. The flow:
- Client posts a takeover claim with the offer amount.
- Server opens a transaction;
SELECT * FROM regions WHERE id = :region_id FOR UPDATE. - Inside the lock, check whether another claim has already been accepted (
Region.takeover_state IN ('claim_won', 'transferring')). If so, this claim loses — write aTakeoverIntentrow withstate = 'lost', refund the offer escrow to the claimant's wallet, commit. ReturnERR_REGION_TAKEOVER_LOST. - Otherwise this claim wins — write
TakeoverIntent.state = 'won', setRegion.takeover_state = 'claim_won', commit. Return success.
The escrow refund for losing claims happens inside the same transaction as the loss decision — the row is never in a state where money is held without a corresponding state. State machine on TakeoverIntent:
pending ──claim_accepted──▶ won (region transfer follows)
pending ──claim_rejected──▶ lost (escrow refunded in same tx)
won ──transfer_done──▶ transferred
won ──transfer_failed──▶ failed (escrow refunded; region stays with old owner)
Every transition is atomic. The losing-claim refund timing was the original gap A-F3 called out; the transactional refund closes it.
Consequences¶
- Webhook handler: the
processed_webhook_eventstable is new. Insert + subscription-mutation share a transaction; PayPal's at-least-once delivery semantics are handled by the idempotency check. - Admin model migration: existing admins (those with
User.is_admin = true) get a default scope grant of all 19 scopes via a one-time migration. Operators trim down from there. Backfill is forward-only; the legacy boolean stays as a computed property. - Admin UI: per
../OPERATIONS/admin-ui.md, the page surface for scope management lives at/admin/scopes. The audit-log and review-queue pages live at/admin/auditand/admin/review-queue. Scope-gated routes return 403 with the missing scope name in the response body. - ARIA dialogue context: the
auth_snapshotfield is additive to the existing dialogue context. Implementation is one read at request entry. - Takeover races: the
SELECT FOR UPDATEadds a serialization point under contention. For Launch-scale regions, contention is rare (takeover events are user-initiated and relatively infrequent); the lock's contention cost is acceptable. If hot-region takeovers later became a measurable issue, optimistic-lock-with-retry would be the natural follow-up — punted until measured. - Audit log volume: every admin action persists. A separate retention rule applies (default: 5 years for compliance). The log is append-only; no in-place edits.
- No two-person rule means the audit + review queue is the load-bearing accountability layer. Operators must commit to actually reviewing the queue. A sweep-test alarm fires if the review queue's 30-day-old unacknowledged count crosses a threshold (target: review acknowledged within 7 days).
Alternatives considered¶
- Coarse-grained roles instead of scopes. Considered (was the recommended pick); rejected per user direction. Fine-grained gives operators precise dial-up control on what each admin can do; coarse roles bundle capabilities and require role proliferation when needs diverge.
- Two-person rule on high-impact actions. Considered (was the recommended pick); rejected per user direction. The audit log + review queue + careful scope assignment are the load-bearing controls. Avoids ops bottlenecks where pairing is impractical (off-hours incident response, single-operator deployments).
- Webhook bypass guarded at runtime instead of import-time. Rejected — runtime branches still ship the bypass code in the production binary. Fail-fast at startup means the bypass can never be reached in prod.
- Optimistic-lock retry on takeover race instead of
FOR UPDATE. Rejected for Launch —FOR UPDATEis simpler, contention is rare. Optimistic-lock is the future-tunable if hotspot regions emerge.
Related¶
- ADR-0017 — consciousness-level scale (snapshot rule applies at dialogue start).
- ADR-0050 —
TakeoverIntentrow lifecycle. - ADR-0053 — periodic-service surface used by audit-log retention sweep.
- ADR-0056 —
MultiAccountClusterreview queue (consumesadmin.multi_account.review). - ADR-0057 — ARIA security log (consumes
admin.aria.audit). ../OPERATIONS/monetization.md— PayPal webhook flow + takeover race.../OPERATIONS/admin-ui.md— admin app, scope-management UI, audit/review pages.../SYSTEMS/aria-dialogue.md—auth_snapshotin dialogue context.../DATA_MODELS/player.md— User row, derivedis_admin.../DATA_MODELS/gameplay.md—AdminScopeGrant,AdminActionLog,processed_webhook_events,TakeoverIntentstate machine extension.