0057 — ARIA security, cost, and isolation invariants (Group L)¶
Status¶
Accepted.
Context¶
Eight audit findings cluster around three sub-problems on ARIA:
- Input/output security (A-V1, A-I2, A-I1, A-V2): the prompt-injection defense was specified as "25+ regex patterns" without enumeration; JSON-envelope parse-failure was undefined; the realtime bus's
personal:{user_id}rooms were not specified as single-recipient; cross-player aggregates feeding ARIA could be poisoned by wash-trade rings now that ADR-0038 replaced the explicit anti-gaming layer with the observation-log model (which has no equivalent cross-player defense). - Economics (A-D1, A-I3): rate limits had advisory-vs-binding ambiguity; cost-cap accounting in the multi-regional setup was unspecified.
- Behaviour and compliance (A-D2, A-V3): consciousness-level multiplier gating (continuous vs atomic) was ambiguous; security-log retention conflicted with GDPR right-to-erasure.
ADR-0038 replaced the original anti-gaming layer with the observation-log learning model — that's what A-V2 references when it says "ADR-0038 dropped the explicit anti-gaming layer."
ARIA per ADR-0016 is per-player, no aggregate ML — that constrains the surface but doesn't eliminate cross-player exposure (market-signal aggregates, security logs, cost pooling).
Decision¶
A-D1 — Rate limits are binding hard caps¶
ARIA rate limits are enforced at the gateway, not advisory:
- Per-tier daily caps (free: minimal; Galactic Citizen: standard; Region Owner: extended) — exact numbers are launch-tunable in
../OPERATIONS/aria.md. - Cap-hit response:
ERR_ARIA_RATE_LIMITwith HTTPRetry-Afterheader pointing at the next reset window. - 80%-utilization soft-warn surfaces as a UI hint inline with ARIA's response — non-blocking, advisory only.
- No queueing. A rejected call doesn't happen; player retries when their window opens.
Rationale: ARIA calls cost real money in LLM API spend. Advisory-only rate limits mean unbounded cost; queueing adds head-of-line blocking and infra complexity for marginal UX gain.
A-V1 — Prompt-injection layered defense¶
The injection-defense stack replaces the unspecified "25+ regex patterns" with a documented layered approach. Every ARIA input passes through, in order:
- Unicode normalization — NFKC at ingestion. Closes the homoglyph / fullwidth / RTL-override / zero-width-joiner bypass family before any string check sees the input.
- JSON envelope wrap — user content is placed inside a structured field (
{"user_input": "..."}), never concatenated into the system prompt. The LLM is instructed to treat that field's content as data. - Lightweight content classifier — a
claude-haiku-4-5call runs in parallel with the main ARIA dispatch, returninginject_probability ∈ [0,1]andcategory(jailbreak / extraction / role-confusion / off-topic / clean). Threshold:inject_probability ≥ 0.6→ reject the main call; record violation per A-I2 escalation ladder. - Pattern list (versioned) — a maintained list of known-bad sequences as defense-in-depth. The list lives under
services/gameserver/src/aria/security/patterns.json(target path) with a version field; updates are PR-reviewed. Regex matching is the fourth layer, not the primary defense. - Output classifier — every ARIA response is screened by a small classifier that flags responses containing system-prompt fragments, tool-definition leakage, or context-bleed from other players' sessions. A flagged response is replaced with a generic "I can't help with that" before send.
Layer 3 and Layer 5 combined are the load-bearing defenses. Layers 1, 2, and 4 are cheap pre-filters.
A-V2 — Cross-player aggregate poisoning¶
ARIA reads market-signal aggregates (prices, volumes, popular routes) even though it does not do per-player aggregate ML per ADR-0016. Those aggregates are the wash-trade attack surface.
Two defenses land:
- Multi-account discount (per ADR-0056 E-V5): trades by free-tier accounts in a flagged cluster contribute to ARIA-readable aggregates at 0× (hard signal) or 0.5× (soft signal). Paid-tier flagged accounts unaffected per the subscription-tier-aware rule.
- Reciprocal-trade exclusion: trades within a 5-minute window between the same two players, repeated more than 3 times in a 24-hour window, are excluded from market-signal aggregates ARIA reads. The trades themselves still execute — the exclusion is on the aggregate-feed only, so wash-traders can't poison ARIA's view of "popular commodities" or "average price."
Both filters apply at the aggregate-extraction layer, not at the trade-execution layer. The trade record itself is unchanged; only ARIA's read of it is gated.
A-D2 — Consciousness multiplier is continuous¶
The aria_bonus_multiplier (Player schema, range 1.0–1.5) applies on every ARIA interaction, computed from the current aria_consciousness_level per ADR-0017. The level itself transitions as an atomic boundary-crossing event (level-up moments are narrated); the multiplier is read on each call.
Specifically: there is no "atomic gating" mode where the multiplier only applies at level-transition events. The multiplier is a tunable on every recommendation strength, every narration richness, every observation-window depth.
A-I1 — personal:{user_id} rooms are single-recipient¶
The realtime bus invariant: a personal:{user_id} room has exactly one subscriber — the user themselves.
Enforcement:
- Room-join is gated by the realtime gateway. A connection authenticated as
user_Xmay only subscribe topersonal:user_X. Cross-user subscription is rejected withERR_AUTH_FORBIDDEN. - The gateway logs cross-user subscription attempts as a security event (per A-V3 retention).
- The invariant is documented in
../SYSTEMS/realtime-bus.mdas a load-bearing rule that ARIA's per-player privacy depends on.
A-I2 — JSON-envelope parse-failure ladder¶
If the JSON-envelope wrap (per A-V1 layer 2) fails to parse — typically because the user input contained adversarial structure trying to break out of the envelope — the ingestion handler treats it as an injection attempt:
- Reject the call with
ERR_ARIA_MALFORMED_INPUT. - Log the raw input + error class to the security log (per A-V3 retention).
- Increment
Player.aria_violation_countper the existing schema. - Apply the existing escalation ladder: 1st–2nd violation → soft warning narrated by ARIA; 3rd violation →
aria_blocked_until = now + 1 hour(existing field onPlayer); subsequent violations extend the block, capped at 24h.
A-I3 — Cost-cap model: per-player only, platform absorbs cost¶
Per-player daily $-caps by subscription tier are the only cost gate. The central platform pays all LLM bills:
- Free tier: minimal ARIA access (welcome narration, basic explanations); cap small enough that abuse is not economically meaningful.
- Galactic Citizen ($5/mo): standard cap; sized so a normal day of play stays well under, intensive use approaches it.
- Region Owner ($25/mo): extended cap; sized for region-administration narration workload + standard play.
- Cap-hit behaviour: per A-D1 — hard reject with
ERR_ARIA_RATE_LIMITuntil the next daily reset.
Region owners do not carry ARIA cost in the multi-regional setup. The $25/mo Region Owner fee is a flat operator subscription, not a token-budget passthrough. This means:
- Region operators don't see surprise LLM bills.
- The platform absorbs aggregate cost risk; per-player caps are the cost-control layer.
- A region with mostly free-tier players generates near-zero ARIA spend; a region with many GC/RO subscribers generates more — but the per-player caps ensure each subscriber's spend is bounded.
The platform's global cost ceiling (across all regions) is a separate operational concern handled by the alerting layer in ../ARCHITECTURE/ — emergency cutoffs are an incident-response surface, not a player-facing one.
A-V3 — Security log retention with GDPR-compliant anonymization¶
Two log streams have different retention rules:
- ARIA conversation logs (normal player ↔ ARIA exchanges): retained per the platform's standard user-data policy; subject to GDPR right-to-erasure (deleted on request).
- ARIA security/abuse logs (prompt-injection attempts, policy violations, JSON-envelope parse failures, cross-user subscription attempts): retained 90 days raw, then the
player_idfield is irreversibly hashed. The log row itself persists indefinitely (anonymized) for security analysis — pattern detection across abuse waves needs long history — but no longer ties to an identifiable individual.
GDPR compliance: the right-to-erasure obligation is to remove identifiability, not to delete every byte. Hashing player_id with a destroyed salt at the 90-day mark satisfies the obligation. The hash is one-way; even with a leak of the security log, attribution back to a specific player is not feasible.
If a player explicitly requests erasure within the 90-day window, the log row's player_id is anonymized immediately rather than waiting for the 90-day rollover. The other fields (timestamp, violation type, raw input snippet) remain.
Consequences¶
- The injection-defense stack adds a
claude-haiku-4-5call per ARIA interaction (Layer 3 classifier). Cost: small relative to the mainclaude-opus-4-7dispatch. Latency: parallel, so no user-visible delay. - The reciprocal-trade exclusion runs at aggregate-extraction time. Trade history is unmodified; ARIA's market view filters at read-time. Implementation: a SQL view or materialized view over
MarketTransactionwith the exclusion predicate. - The single-recipient personal-room invariant is enforced by the realtime gateway. Existing implementations may have shipped without explicit gateway checks — this ADR's landing surfaces those as items to verify against current code per the repo's doc-vs-code policy (validate against current code before recommending file paths; doc-vs-doc mismatches: fix; code-vs-doc mismatches: leave alone).
- The 90-day anonymization rolls in a periodic job (per ADR-0053) that scans security log rows older than 90 days and replaces the
player_idfield withSHA256(player_id || destroyed_salt). The salt is rotated quarterly and previous salts are destroyed. - Cost-risk is concentrated at the platform level. Per-player caps must be tuned conservatively at Launch — operational dashboards monitor aggregate spend and the cap values are launch-tunable without schema changes.
- Region owners get a simpler value proposition (flat $25/mo, no surprise LLM bills) but the platform takes on the LLM-cost variance. Operationally: the alerting layer fires when global daily ARIA spend crosses a configured ceiling.
Alternatives considered¶
- Regex-only injection defense. Rejected — A-V1 specifically called out brittleness to Unicode/encoding bypass. Layered defense closes the gap.
- Advisory-only rate limits. Rejected — unbounded cost.
- Queue-on-cap rate limits. Rejected — head-of-line blocking + infra complexity for marginal UX gain. Hard reject with retry header is simpler.
- Per-region budget allocation for ARIA cost. Considered (this was the recommended pick before the user's call); rejected because it pushes LLM-cost variance to region operators, complicating their value proposition. Platform absorbs cost is cleaner.
- No retention on security logs (delete after 30 days). Rejected — loses long-tail abuse-pattern analysis. 90-day-raw-then-anonymize keeps the analytical value with GDPR-compliant identifiability removal.
Related¶
- ADR-0016 — per-player ARIA, no aggregate ML.
- ADR-0017 — consciousness-level scale.
- ADR-0038 — observation-log learning model that replaced the original anti-gaming layer (the gap A-V2 closes).
- ADR-0053 — periodic-service surface used by the 90-day anonymization job.
- ADR-0056 — multi-account discount layer used by A-V2.
../OPERATIONS/aria.md— ARIA security model, rate limits, cost caps, retention.../SYSTEMS/aria-dialogue.md— consciousness multiplier semantics.../SYSTEMS/realtime-bus.md— personal-room single-recipient invariant.../DATA_MODELS/player.md—aria_violation_count,aria_blocked_until,aria_bonus_multiplier.