0057 — ARIA security, cost, and isolation invariants (Group L)¶

Status¶

Accepted.

Context¶

Eight audit findings cluster around three sub-problems on ARIA:

Input/output security (A-V1, A-I2, A-I1, A-V2): the prompt-injection defense was specified as "25+ regex patterns" without enumeration; JSON-envelope parse-failure was undefined; the realtime bus's personal:{user_id} rooms were not specified as single-recipient; cross-player aggregates feeding ARIA could be poisoned by wash-trade rings now that ADR-0038 replaced the explicit anti-gaming layer with the observation-log model (which has no equivalent cross-player defense).
Economics (A-D1, A-I3): rate limits had advisory-vs-binding ambiguity; cost-cap accounting in the multi-regional setup was unspecified.
Behaviour and compliance (A-D2, A-V3): consciousness-level multiplier gating (continuous vs atomic) was ambiguous; security-log retention conflicted with GDPR right-to-erasure.

ADR-0038 replaced the original anti-gaming layer with the observation-log learning model — that's what A-V2 references when it says "ADR-0038 dropped the explicit anti-gaming layer."

ARIA per ADR-0016 is per-player, no aggregate ML — that constrains the surface but doesn't eliminate cross-player exposure (market-signal aggregates, security logs, cost pooling).

Decision¶

A-D1 — Rate limits are binding hard caps¶

ARIA rate limits are enforced at the gateway, not advisory:

Per-tier daily caps (free: minimal; Galactic Citizen: standard; Region Owner: extended) — exact numbers are launch-tunable in ../OPERATIONS/aria.md.
Cap-hit response: ERR_ARIA_RATE_LIMIT with HTTP Retry-After header pointing at the next reset window.
80%-utilization soft-warn surfaces as a UI hint inline with ARIA's response — non-blocking, advisory only.
No queueing. A rejected call doesn't happen; player retries when their window opens.

Rationale: ARIA calls cost real money in LLM API spend. Advisory-only rate limits mean unbounded cost; queueing adds head-of-line blocking and infra complexity for marginal UX gain.

A-V1 — Prompt-injection layered defense¶

The injection-defense stack replaces the unspecified "25+ regex patterns" with a documented layered approach. Every ARIA input passes through, in order:

Unicode normalization — NFKC at ingestion. Closes the homoglyph / fullwidth / RTL-override / zero-width-joiner bypass family before any string check sees the input.
JSON envelope wrap — user content is placed inside a structured field ({"user_input": "..."}), never concatenated into the system prompt. The LLM is instructed to treat that field's content as data.
Lightweight content classifier — a claude-haiku-4-5 call runs in parallel with the main ARIA dispatch, returning inject_probability ∈ [0,1] and category (jailbreak / extraction / role-confusion / off-topic / clean). Threshold: inject_probability ≥ 0.6 → reject the main call; record violation per A-I2 escalation ladder.
Pattern list (versioned) — a maintained list of known-bad sequences as defense-in-depth. The list lives under services/gameserver/src/aria/security/patterns.json (target path) with a version field; updates are PR-reviewed. Regex matching is the fourth layer, not the primary defense.
Output classifier — every ARIA response is screened by a small classifier that flags responses containing system-prompt fragments, tool-definition leakage, or context-bleed from other players' sessions. A flagged response is replaced with a generic "I can't help with that" before send.

Layer 3 and Layer 5 combined are the load-bearing defenses. Layers 1, 2, and 4 are cheap pre-filters.

A-V2 — Cross-player aggregate poisoning¶

ARIA reads market-signal aggregates (prices, volumes, popular routes) even though it does not do per-player aggregate ML per ADR-0016. Those aggregates are the wash-trade attack surface.

Two defenses land:

Multi-account discount (per ADR-0056 E-V5): trades by free-tier accounts in a flagged cluster contribute to ARIA-readable aggregates at 0× (hard signal) or 0.5× (soft signal). Paid-tier flagged accounts unaffected per the subscription-tier-aware rule.
Reciprocal-trade exclusion: trades within a 5-minute window between the same two players, repeated more than 3 times in a 24-hour window, are excluded from market-signal aggregates ARIA reads. The trades themselves still execute — the exclusion is on the aggregate-feed only, so wash-traders can't poison ARIA's view of "popular commodities" or "average price."

Both filters apply at the aggregate-extraction layer, not at the trade-execution layer. The trade record itself is unchanged; only ARIA's read of it is gated.

A-D2 — Consciousness multiplier is continuous¶

The aria_bonus_multiplier (Player schema, range 1.0–1.5) applies on every ARIA interaction, computed from the current aria_consciousness_level per ADR-0017. The level itself transitions as an atomic boundary-crossing event (level-up moments are narrated); the multiplier is read on each call.

Specifically: there is no "atomic gating" mode where the multiplier only applies at level-transition events. The multiplier is a tunable on every recommendation strength, every narration richness, every observation-window depth.

A-I1 — `personal:{user_id}` rooms are single-recipient¶

The realtime bus invariant: a personal:{user_id} room has exactly one subscriber — the user themselves.

Enforcement:

Room-join is gated by the realtime gateway. A connection authenticated as user_X may only subscribe to personal:user_X. Cross-user subscription is rejected with ERR_AUTH_FORBIDDEN.
The gateway logs cross-user subscription attempts as a security event (per A-V3 retention).
The invariant is documented in ../SYSTEMS/realtime-bus.md as a load-bearing rule that ARIA's per-player privacy depends on.

A-I2 — JSON-envelope parse-failure ladder¶

If the JSON-envelope wrap (per A-V1 layer 2) fails to parse — typically because the user input contained adversarial structure trying to break out of the envelope — the ingestion handler treats it as an injection attempt:

Reject the call with ERR_ARIA_MALFORMED_INPUT.
Log the raw input + error class to the security log (per A-V3 retention).
Increment Player.aria_violation_count per the existing schema.
Apply the existing escalation ladder: 1st–2nd violation → soft warning narrated by ARIA; 3rd violation → aria_blocked_until = now + 1 hour (existing field on Player); subsequent violations extend the block, capped at 24h.

A-I3 — Cost-cap model: per-player only, platform absorbs cost¶

Per-player daily $-caps by subscription tier are the only cost gate. The central platform pays all LLM bills:

Free tier: minimal ARIA access (welcome narration, basic explanations); cap small enough that abuse is not economically meaningful.
Galactic Citizen ($5/mo): standard cap; sized so a normal day of play stays well under, intensive use approaches it.
Region Owner ($25/mo): extended cap; sized for region-administration narration workload + standard play.
Cap-hit behaviour: per A-D1 — hard reject with ERR_ARIA_RATE_LIMIT until the next daily reset.

Region owners do not carry ARIA cost in the multi-regional setup. The $25/mo Region Owner fee is a flat operator subscription, not a token-budget passthrough. This means:

Region operators don't see surprise LLM bills.
The platform absorbs aggregate cost risk; per-player caps are the cost-control layer.
A region with mostly free-tier players generates near-zero ARIA spend; a region with many GC/RO subscribers generates more — but the per-player caps ensure each subscriber's spend is bounded.

The platform's global cost ceiling (across all regions) is a separate operational concern handled by the alerting layer in ../ARCHITECTURE/ — emergency cutoffs are an incident-response surface, not a player-facing one.

Two log streams have different retention rules:

ARIA conversation logs (normal player ↔ ARIA exchanges): retained per the platform's standard user-data policy; subject to GDPR right-to-erasure (deleted on request).
ARIA security/abuse logs (prompt-injection attempts, policy violations, JSON-envelope parse failures, cross-user subscription attempts): retained 90 days raw, then the player_id field is irreversibly hashed. The log row itself persists indefinitely (anonymized) for security analysis — pattern detection across abuse waves needs long history — but no longer ties to an identifiable individual.

GDPR compliance: the right-to-erasure obligation is to remove identifiability, not to delete every byte. Hashing player_id with a destroyed salt at the 90-day mark satisfies the obligation. The hash is one-way; even with a leak of the security log, attribution back to a specific player is not feasible.

If a player explicitly requests erasure within the 90-day window, the log row's player_id is anonymized immediately rather than waiting for the 90-day rollover. The other fields (timestamp, violation type, raw input snippet) remain.

Consequences¶

The injection-defense stack adds a claude-haiku-4-5 call per ARIA interaction (Layer 3 classifier). Cost: small relative to the main claude-opus-4-7 dispatch. Latency: parallel, so no user-visible delay.
The reciprocal-trade exclusion runs at aggregate-extraction time. Trade history is unmodified; ARIA's market view filters at read-time. Implementation: a SQL view or materialized view over MarketTransaction with the exclusion predicate.
The single-recipient personal-room invariant is enforced by the realtime gateway. Existing implementations may have shipped without explicit gateway checks — this ADR's landing surfaces those as items to verify against current code per the repo's doc-vs-code policy (validate against current code before recommending file paths; doc-vs-doc mismatches: fix; code-vs-doc mismatches: leave alone).
The 90-day anonymization rolls in a periodic job (per ADR-0053) that scans security log rows older than 90 days and replaces the player_id field with SHA256(player_id || destroyed_salt). The salt is rotated quarterly and previous salts are destroyed.
Cost-risk is concentrated at the platform level. Per-player caps must be tuned conservatively at Launch — operational dashboards monitor aggregate spend and the cap values are launch-tunable without schema changes.
Region owners get a simpler value proposition (flat $25/mo, no surprise LLM bills) but the platform takes on the LLM-cost variance. Operationally: the alerting layer fires when global daily ARIA spend crosses a configured ceiling.

Alternatives considered¶

Regex-only injection defense. Rejected — A-V1 specifically called out brittleness to Unicode/encoding bypass. Layered defense closes the gap.
Advisory-only rate limits. Rejected — unbounded cost.
Queue-on-cap rate limits. Rejected — head-of-line blocking + infra complexity for marginal UX gain. Hard reject with retry header is simpler.
Per-region budget allocation for ARIA cost. Considered (this was the recommended pick before the user's call); rejected because it pushes LLM-cost variance to region operators, complicating their value proposition. Platform absorbs cost is cleaner.
No retention on security logs (delete after 30 days). Rejected — loses long-tail abuse-pattern analysis. 90-day-raw-then-anonymize keeps the analytical value with GDPR-compliant identifiability removal.

ADR-0016 — per-player ARIA, no aggregate ML.
ADR-0017 — consciousness-level scale.
ADR-0038 — observation-log learning model that replaced the original anti-gaming layer (the gap A-V2 closes).
ADR-0053 — periodic-service surface used by the 90-day anonymization job.
ADR-0056 — multi-account discount layer used by A-V2.
../OPERATIONS/aria.md — ARIA security model, rate limits, cost caps, retention.
../SYSTEMS/aria-dialogue.md — consciousness multiplier semantics.
../SYSTEMS/realtime-bus.md — personal-room single-recipient invariant.
../DATA_MODELS/player.md — aria_violation_count, aria_blocked_until, aria_bonus_multiplier.