Every GPU that DeltaStore TSC removes from a serving fleet is 300–700 watts that stops drawing power 24/7/365. At scale, the energy and carbon savings dwarf the cost savings. This paper quantifies the environmental dividend of serving-level KV cache compression.
The AI industry's environmental conversation focuses on training. A single GPT-4-scale training run consumes ~50 GWh — enough to power 4,600 US homes for a year. That number makes headlines.
But inference is the iceberg below the waterline. Once a model is deployed, it serves requests 24/7 for months or years. The cumulative energy of inference dwarfs training within the first few months of production deployment. And the KV cache — the per-user memory that grows with context length and conversation history — is the primary driver of GPU count in serving fleets.
More users = more KV caches = more GPUs = more power. The relationship is linear without compression. Every GPU in a serving fleet draws 300–700 watts continuously, whether it's processing a request or holding KV cache memory idle between turns.
A100 80GB at 300W each = 10.2 kW continuous draw. That's 89,352 kWh/yr of raw GPU power, or 116,158 kWh/yr including data center cooling and overhead (PUE 1.3). Equivalent to powering 10.6 US homes.
DeltaStore TSC compresses KV caches at the serving level — not just per-user quantization, but cross-user prefix deduplication and temporal delta coding. The system prompt KV cache (typically 60%+ of memory) is stored once, regardless of how many users are connected. The result: sub-linear GPU scaling.
| Fleet Size | Without TSC | With TSC | GPUs Saved | % Reduction |
|---|---|---|---|---|
| 2,000 users | 7 GPUs | 3 GPUs | 4 | 57% |
| 5,000 users | 17 GPUs | 3 GPUs | 14 | 82% |
| 10,000 users | 34 GPUs | 4 GPUs | 30 | 88% |
The reduction percentage increases with fleet size because shared prefix deduplication amortizes more aggressively. At 2K users, 57% of GPUs are eliminated. At 10K users, 88%. This is the sub-linear scaling property in action.
An NVIDIA A100 SXM draws 300W at TDP. Data centers add cooling, networking, and facility overhead — captured by the Power Usage Effectiveness (PUE) multiplier. The industry average PUE is 1.58; hyperscalers achieve 1.1–1.3. We use PUE 1.3 (conservative hyperscaler estimate).
| Metric | 2K Users | 5K Users | 10K Users |
|---|---|---|---|
| GPUs eliminated | 4 | 14 | 30 |
| Raw power saved (kW) | 1.2 | 4.2 | 9.0 |
| Facility power saved (kW, PUE 1.3) | 1.56 | 5.46 | 11.7 |
| Annual energy saved (MWh) | 13.7 | 47.8 | 102.5 |
| CO₂ avoided (tonnes) | 5.3 | 18.7 | 40.0 |
| Equivalent homes powered | 1.3 | 4.4 | 9.4 |
| Equivalent cars removed | 1.2 | 4.1 | 8.7 |
The NVIDIA H100 SXM draws 700W at TDP — more than double the A100. As the industry migrates to H100 (and eventually B100/B200), the energy savings from GPU reduction grow proportionally. The same compression ratios apply; only the watts-per-GPU change.
| Metric | 2K Users | 5K Users | 10K Users |
|---|---|---|---|
| GPUs eliminated | 4 | 14 | 30 |
| Raw power saved (kW) | 2.8 | 9.8 | 21.0 |
| Facility power saved (kW, PUE 1.3) | 3.64 | 12.74 | 27.3 |
| Annual energy saved (MWh) | 31.9 | 111.6 | 239.1 |
| CO₂ avoided (tonnes) | 12.4 | 43.5 | 93.3 |
| Equivalent homes powered | 2.9 | 10.2 | 21.9 |
| Equivalent cars removed | 2.7 | 9.5 | 20.3 |
That's 20 cars off the road, or the annual carbon sequestration of 1,538 mature trees. From a single deployment. And the savings grow super-linearly as the fleet scales — the next 10K users adds even fewer GPUs.
A single 10K-user deployment is meaningful. But the AI inference market is growing at 30%+ annually, and the largest providers operate hundreds of thousands of GPUs. What happens when DeltaStore TSC is applied across a fleet of deployments?
| Scale | GPUs Saved | Energy (GWh/yr) | CO₂ (tonnes) | Cars Equivalent |
|---|---|---|---|---|
| 1 deployment (10K users, A100) | 30 | 0.10 | 40 | 8.7 |
| 10 deployments (100K users) | 300 | 1.02 | 400 | 87 |
| 100 deployments (1M users) | 3,000 | 10.2 | 4,000 | 870 |
| 1,000 deployments (10M users, H100) | 30,000 | 239 | 93,300 | 20,283 |
At the 10M-user scale on H100 hardware — well within reach for a top-10 AI provider — DeltaStore TSC eliminates 30,000 GPUs from the global fleet. That's 239 GWh/yr of energy savings and 93,300 tonnes of CO₂ avoided annually — equivalent to removing over 20,000 cars from the road or the carbon sequestration of 1.5 million trees.
GPUs need cooling. Data centers consume approximately 1.8 liters of water per kWh of energy used. This water is evaporated through cooling towers and is not recoverable. As AI clusters grow, water consumption has become a serious concern — Microsoft's water consumption increased 34% in 2022, largely driven by AI workloads.
Water saved per year. That's 48,700 gallons — enough to fill a residential swimming pool 2.4 times.
Water saved per year. That's 113,700 gallons — enough to supply a household's water needs for 3.1 years.
The financial savings from DeltaStore TSC are significant — $1.35M/year at 10K users. But the environmental savings compound on top. Every GPU eliminated delivers savings across four dimensions simultaneously.
| Dimension | Per GPU Saved (A100) | 30 GPUs (10K users) |
|---|---|---|
| Annual cost | $17,520 | $525,600 |
| Annual energy | 3,416 kWh | 102,492 kWh |
| Annual CO₂ | 1.33 tonnes | 40.0 tonnes |
| Annual water | 6,150 L | 184,500 L |
The cost savings pays for the infrastructure. The environmental savings is the dividend. Companies adopting DeltaStore TSC don't need to choose between profitability and sustainability — they get both, from the same technical improvement, with zero additional effort.
GPU power draw: A100 SXM = 300W TDP, H100 SXM = 700W TDP (NVIDIA specifications)
PUE: 1.3 (conservative hyperscaler estimate; industry average is 1.58 per Uptime Institute 2023)
Carbon intensity: 0.39 kg CO₂/kWh (US national average, EPA eGRID 2023). Varies by region: 0.05 in Quebec/Norway, 0.90 in coal-heavy grids. Savings scale proportionally.
Water usage: 1.8 L/kWh (WUE estimate, based on Google/Microsoft sustainability reports 2023)
GPU pricing: A100 80GB at $2/hr spot pricing (representative cloud pricing, Q1 2026)
Average US home: 10,900 kWh/year (EIA 2023)
Average passenger car: 4.6 tonnes CO₂/year (EPA 2023)
Tree sequestration: 0.06 tonnes CO₂/year per mature tree (USDA Forest Service)
Compression model: LLaMA-3-7B, 2048-token system prompt, 4096-token per-user context, A100 80GB. GPU counts from DeltaStore TSC Part IV scaling model. See Part IV for full methodology.
All numbers are per-deployment. Industry projections assume independent deployments with similar parameters. Actual savings vary by model size, context length, and hardware generation.