TCO Calculator Expert

Compare Cloud API, Cloud GPU Rental, and On-Premise deployments with real-time hardware validation and cost breakdown. Built for senior architects making production decisions.

Pricing dataset

Loading verified October 2025 pricing…

v—

How to use

Run the comparison in three passes

  1. 01
    Lock the workload first.

    Set queries/month + token mix on the Cloud API tab. The values auto-sync to Cloud GPU and On-Premise so every scenario uses the same demand curve.

  2. 02
    Pick the model pair.

    Choosing a provider automatically selects the equivalent open-source model and GPU requirements. Reverse sync buttons keep API ⇄ Cloud GPU ⇄ On-Prem aligned.

  3. 03
    Validate TCO + VRAM.

    Use the sticky bar for 3-year totals, then open the VRAM + power widgets to ensure the GPU count/quantization is production-safe before exporting the report.

Quick profiles

Jump-start with a persona checklist

A
Regulated enterprise

LLM-based knowledge assistant, 500K queries/month.

  • Stay on Claude 3.7 Sonnet ↔ Llama 3.3 70B FP8.
  • Toggle Security filter on the architecture page for context.
  • Compare Cloud GPU H100 commit vs dual H200 racks.
B
Industrial co-pilot

Streaming telemetry, 1.2M queries/month.

  • Increase queries + concurrency, pick GPT-4o ↔ DeepSeek 67B.
  • Use GPU auto-suggestion to size 4× H100 or MI300X.
  • Record throughput widget values in the session notes.
C
Air-gapped research pod

80K queries/month, strict sovereignty.

  • Select Air-Gapped scenario on the architecture page first.
  • Switch API tab to Gemini 1.5 Pro ↔ Qwen 32B.
  • Keep On-Prem GPU discount at 0 and bump opex multipliers.
Scroll to the scenarios ↓
1 USD = 0.92 EUR (Loading...)
Cloud API
€0
3-year TCO
Cloud GPU
€0
3-year TCO
On-Premise
€0
3-year TCO

📊 Default Scenario: Enterprise with Existing Datacenter (500K queries/month)

Profile: Mid-sized enterprise, 500K queries/month, Claude 3.7 Sonnet equivalent (Llama 3.3 70B FP8), 40% GPU discount, industrial power (€0.12/kWh), automated DevOps (0.05 FTE), existing datacenter.

Real pricing (Oct 2025): Cloud API €227K/3yr (€3/$15 per M tokens). Lambda H100 @ $1.85/hr = €91K/3yr (2× H100 80GB). On-premise 2× H200: €44K capex + €39K opex = €83K total. Breakeven at ~18 months.

🎯 Key Insight: The larger/smarter the LLM, the more on-premise wins! Small models (8B): Cloud API best (€10K). Medium models (70B): On-premise wins (€83K vs €91K Cloud GPU). Large models (671B): On-premise dominates with 62-82% savings. At 500K queries/month, self-hosting starts making financial sense. 5yr+ horizon: on-premise wins dramatically.

Cloud API Configuration

📊 Workload

🔄 Auto-synced with Cloud GPU scenario
Typical RAG: context + question
⚠️ Output tokens cost 2-5× more than input!

💰 Pricing

Typical for RAG with document retrieval

💵 Cost Summary

Input tokens cost: €0
Output tokens cost: €0
Egress bandwidth: €0
Monthly Total: €0
Annual Total: €0
3-Year TCO: €0

Cloud GPU Configuration

🤖 Model Selection

📊 Total VRAM Required: 88 GB
  • Model weights: 70 GB
  • KV cache: 18 GB
  • Safety margin (20%): 18 GB

🖥️ Hardware

Auto-filled from provider, or enter custom rate

💾 Storage & Networking

$0.10/GB/month typical

📊 Workload

🔄 Auto-synced with API scenario
1200 input + 600 output = 1800 total
🚀 Performance:
  • Throughput: 480 tokens/sec
  • Max queries/hour: 1,570
  • GPU utilization: 87%

💵 Cost Summary

GPU rental (24/7): €0
Storage: €0
Egress bandwidth: €0
Monthly Total: €0
Annual Total: €0
3-Year TCO: €0

On-Premise Configuration

🤖 Model Selection

📊 Total VRAM Required: 88 GB
  • Model weights: 70 GB
  • KV cache: 18 GB
  • Safety margin (20%): 18 GB

🖥️ Hardware Capex

2× H200 = sufficient for Llama 3.3 70B FP8
30-40% typical for multi-GPU enterprise orders
AMD EPYC 9554 or similar
Total Capex: $0

⚡ Power & Cooling

1.2 = excellent datacenter, 1.6 = office
Industrial rate (€0.12) vs residential (€0.18-0.30)
Monthly Power: €0
Total TDP: 0W × PUE × hours

🔧 Operational Costs (Annual)

0.05 FTE = ~2hr/week (fully automated k8s/vLLM)
Shared DevOps/MLOps team cost allocation
Incremental cost (existing datacenter connectivity)

💵 Cost Summary

Capex (amortized 36 months): €0
Power & cooling: €0
Maintenance: €0
IT staff: €0
ISP: €0
Monthly Total: €0
Annual Total: €0
3-Year TCO: €0

📈 3-Year TCO Comparison

Metric Cloud API Cloud GPU On-Premise
Monthly Cost €0 €0 €0
3-Year TCO €0 €0 €0
Breakeven vs API - - -
ROI over 3 years - - -

💡 Recommendations

📥 Export Results