Reference Architecture  Sovereign AI Stack

Dalla boardroom al rack: un'architettura on-prem pensata per bassa latenza, auditabilità e TCO prevedibile. Interattiva, stampabile, pronta per i tuoi casi d'uso.

p95 d 300 ms chat interna
Data Zero egress full sovereignty
Unit economics Trasparenti TCO prevedibile
Architecture Preview

How to read this blueprint

Follow the flow, document the decisions

  1. 01
    Select a scenario + filter.

    Medium = default enterprise. Small for PoC, Large for 24/7 workloads, Air-gapped for regulated racks. Filters highlight data, security, observability, or RAG paths.

  2. 02
    Click each node in order of the data flow.

    Every card opens KPI, risks, and a 5-step runbook. Use the badges to see where metrics are contractually enforced or where anti-patterns are most common.

  3. 03
    Capture deltas in your assessment doc.

    Use the side panel notes to compare with your current stack, then export the layer screenshots or link this page in the strategic assessment.

KPI Focus

Nodes carrying SLO/SLA accountability. Track them weekly.

Anti-Pattern Watch

Frequent audit findings or technical debt hotspots. Mitigate in design reviews.

Quick scenarios

Start with one of these enterprise profiles

01
Banking knowledge assistant

Medium scenario + Security filter

  • Prioritize IAM, Guardrails, Vector DB ACLs.
  • Pin Observability filter for audit trails.
  • Pair with TCO Calculator → On-Premise dual H200.
02
Industrial co-pilot

Large scenario + Data Flow filter

  • Emphasize Router → Serving → KV cache path.
  • Enable Observability to surface tracing + metrics stack.
  • Use GPU auto-sync in tools to compare cloud vs racks.
03
Air-gapped research pod

Air-Gapped scenario + RAG filter

  • Focus on Ingestion, Feature Store, and Secrets layers.
  • Document runbooks for manual patching + offline updates.
  • Export BOM to stakeholders via the TCO Calculator report.
Open the cost calculator to validate BOM →

Esplora l'Architettura per Layer

4 macro-layer organizzano i 27 componenti. Clicca su ogni layer per espandere e vedere i dettagli. Usa il tour guidato per seguire il flusso di una richiesta end-to-end.

SLO e capacità di riferimento

Gli SLO sono la nostra contrattualizzazione tecnica: definiscono le aspettative di latenza, throughput e disponibilità, e guidano scelte su modelli, hardware e caching.

Chat interna (assistente knowledge)

p95 d 300 ms
p50 d 150 ms

Generazione contenuti

p95 d 1.5 s

Ricerca semantica

Query Vector DB d 30 ms
Top-k 8-16

Throughput (Medium)

Sustained 100 RPS
Burst 200 RPS

Disponibilità

Single-DC 99.9%
Active-Active 99.99%

Unit Economics

Costo/1k token = ((Capex/h + Opex/h) / tokens/h) × 1000

Pronto a mettere in produzione la tua Sovereign AI?

Scarica il blueprint in PDF o avvia subito l'Assessment Strategico per adattare l'architettura al tuo dominio.