Reference Architecture Sovereign AI Stack

Dalla boardroom al rack: un'architettura on-prem pensata per bassa latenza, auditabilità e TCO prevedibile. Interattiva, stampabile, pronta per i tuoi casi d'uso.

p95 d 300 ms chat interna

Data Zero egress full sovereignty

Unit economics Trasparenti TCO prevedibile

Scarica il Compendio Case Studies (PDF) Prenota l'Assessment Strategico

How to read this blueprint

Follow the flow, document the decisions

01
Select a scenario + filter.
Medium = default enterprise. Small for PoC, Large for 24/7 workloads, Air-gapped for regulated racks. Filters highlight data, security, observability, or RAG paths.
02
Click each node in order of the data flow.
Every card opens KPI, risks, and a 5-step runbook. Use the badges to see where metrics are contractually enforced or where anti-patterns are most common.
03
Capture deltas in your assessment doc.
Use the side panel notes to compare with your current stack, then export the layer screenshots or link this page in the strategic assessment.

KPI Focus

Nodes carrying SLO/SLA accountability. Track them weekly.

Anti-Pattern Watch

Frequent audit findings or technical debt hotspots. Mitigate in design reviews.

Quick scenarios

Start with one of these enterprise profiles

Banking knowledge assistant

Medium scenario + Security filter

Prioritize IAM, Guardrails, Vector DB ACLs.
Pin Observability filter for audit trails.
Pair with TCO Calculator → On-Premise dual H200.

Industrial co-pilot

Large scenario + Data Flow filter

Emphasize Router → Serving → KV cache path.
Enable Observability to surface tracing + metrics stack.
Use GPU auto-sync in tools to compare cloud vs racks.

Air-gapped research pod

Air-Gapped scenario + RAG filter

Focus on Ingestion, Feature Store, and Secrets layers.
Document runbooks for manual patching + offline updates.
Export BOM to stakeholders via the TCO Calculator report.

Open the cost calculator to validate BOM →

Scenario

Filtri

Esplora l'Architettura per Layer

4 macro-layer organizzano i 27 componenti. Clicca su ogni layer per espandere e vedere i dettagli. Usa il tour guidato per seguire il flusso di una richiesta end-to-end.

Layer 1: Client & Ingress

Authentication, security, and API gateway

Perché importante: Protegge il perimetro, autentica utenti, applica rate limiting e instrada le richieste in modo sicuro.

Internal User/App

IdP/SSO

WAF

API Gateway

Rate Limiter

Layer 2: Orchestration

Request routing, prompt management, and policies

Perché importante: Gestisce il routing intelligente, orchestra i prompt, applica guardrails e ottimizza con il caching.

Request Router

Prompt Orchestrator

Guardrails/Policy

Prompt Cache

Layer 3: AI Core

Vector search, model serving, and inference

Perché importante: Cuore del sistema: retrieval semantico, generazione di embeddings, serving dei modelli, caching KV e adapters LoRA.

Ingestion Pipeline

Embedding Service

Vector DB

Document Store

Feature Store

Model Serving

KV/Token Cache

Adapters (LoRA)

Quantization

Layer 4: Support Systems

Security, observability, and data management

Perché importante: Garantisce sicurezza dei segreti, observability completa (tracing, metrics, logs), evaluation e feedback loop.

KMS/HSM

Secrets Manager

RDBMS

Tracing

Metrics

Logs

Eval Pipeline

Feedback Loop

Red Team

SLO e capacità di riferimento

Gli SLO sono la nostra contrattualizzazione tecnica: definiscono le aspettative di latenza, throughput e disponibilità, e guidano scelte su modelli, hardware e caching.

Chat interna (assistente knowledge)

p95 d 300 ms

p50 d 150 ms

Generazione contenuti

p95 d 1.5 s

Ricerca semantica

Query Vector DB d 30 ms

Top-k 8-16

Throughput (Medium)

Sustained 100 RPS

Burst 200 RPS

Disponibilità

Single-DC 99.9%

Active-Active 99.99%

Unit Economics

Costo/1k token = ((Capex/h + Opex/h) / tokens/h) × 1000

Pronto a mettere in produzione la tua Sovereign AI?

Scarica il blueprint in PDF o avvia subito l'Assessment Strategico per adattare l'architettura al tuo dominio.

Scarica il Compendio Case Studies (PDF) Richiedi Assessment

Reference Architecture  Sovereign AI Stack