ENVIRONMENTAL IMPACT ANALYSIS · APRIL 2026

The Green Dividend:
How KV Cache Compression
Eliminates 88% of GPU Energy Waste

Every GPU that DeltaStore TSC removes from a serving fleet is 300–700 watts that stops drawing power 24/7/365. At scale, the energy and carbon savings dwarf the cost savings. This paper quantifies the environmental dividend of serving-level KV cache compression.

Justin Meister · Solstice AI Studio · April 2026

GPUs eliminated
at 10K users

102 MWh

Energy saved
per year (A100)

40 t

CO₂ avoided
per year (A100)

93 t

CO₂ avoided
per year (H100)

The Problem: LLM Inference Is an Energy Crisis Hiding in Plain Sight

The AI industry's environmental conversation focuses on training. A single GPT-4-scale training run consumes ~50 GWh — enough to power 4,600 US homes for a year. That number makes headlines.

But inference is the iceberg below the waterline. Once a model is deployed, it serves requests 24/7 for months or years. The cumulative energy of inference dwarfs training within the first few months of production deployment. And the KV cache — the per-user memory that grows with context length and conversation history — is the primary driver of GPU count in serving fleets.

More users = more KV caches = more GPUs = more power. The relationship is linear without compression. Every GPU in a serving fleet draws 300–700 watts continuously, whether it's processing a request or holding KV cache memory idle between turns.

The baseline: 10,000 users on LLaMA-3-7B

34 GPUs running 24/7

A100 80GB at 300W each = 10.2 kW continuous draw. That's 89,352 kWh/yr of raw GPU power, or 116,158 kWh/yr including data center cooling and overhead (PUE 1.3). Equivalent to powering 10.6 US homes.

The Reduction: 34 GPUs Become 4

DeltaStore TSC compresses KV caches at the serving level — not just per-user quantization, but cross-user prefix deduplication and temporal delta coding. The system prompt KV cache (typically 60%+ of memory) is stored once, regardless of how many users are connected. The result: sub-linear GPU scaling.

Fleet Size	Without TSC	With TSC	GPUs Saved	% Reduction
2,000 users	7 GPUs	3 GPUs	4	57%
5,000 users	17 GPUs	3 GPUs	14	82%
10,000 users	34 GPUs	4 GPUs	30	88%

The reduction percentage increases with fleet size because shared prefix deduplication amortizes more aggressively. At 2K users, 57% of GPUs are eliminated. At 10K users, 88%. This is the sub-linear scaling property in action.

GPU Count: Uncompressed vs. DeltaStore TSC

Energy Impact: A100 Deployments

An NVIDIA A100 SXM draws 300W at TDP. Data centers add cooling, networking, and facility overhead — captured by the Power Usage Effectiveness (PUE) multiplier. The industry average PUE is 1.58; hyperscalers achieve 1.1–1.3. We use PUE 1.3 (conservative hyperscaler estimate).

Metric	2K Users	5K Users	10K Users
GPUs eliminated	4	14	30
Raw power saved (kW)	1.2	4.2	9.0
Facility power saved (kW, PUE 1.3)	1.56	5.46	11.7
Annual energy saved (MWh)	13.7	47.8	102.5
CO₂ avoided (tonnes)	5.3	18.7	40.0
Equivalent homes powered	1.3	4.4	9.4
Equivalent cars removed	1.2	4.1	8.7

102.5 MWh

Energy Saved / Year

Enough to power 9.4 average US homes for a full year

40 tonnes

CO₂ Avoided / Year

Equivalent to taking 8.7 passenger cars off the road

184,500 L

Water Saved / Year

Data center cooling water no longer needed for 30 GPUs

Energy Impact: H100 Deployments

The NVIDIA H100 SXM draws 700W at TDP — more than double the A100. As the industry migrates to H100 (and eventually B100/B200), the energy savings from GPU reduction grow proportionally. The same compression ratios apply; only the watts-per-GPU change.

Metric	2K Users	5K Users	10K Users
GPUs eliminated	4	14	30
Raw power saved (kW)	2.8	9.8	21.0
Facility power saved (kW, PUE 1.3)	3.64	12.74	27.3
Annual energy saved (MWh)	31.9	111.6	239.1
CO₂ avoided (tonnes)	12.4	43.5	93.3
Equivalent homes powered	2.9	10.2	21.9
Equivalent cars removed	2.7	9.5	20.3

H100 at 10K users — The headline number

93 tonnes CO₂ avoided per year

That's 20 cars off the road, or the annual carbon sequestration of 1,538 mature trees. From a single deployment. And the savings grow super-linearly as the fleet scales — the next 10K users adds even fewer GPUs.

Annual CO₂ Savings by GPU Type and Fleet Size

Industry-Scale Projection

A single 10K-user deployment is meaningful. But the AI inference market is growing at 30%+ annually, and the largest providers operate hundreds of thousands of GPUs. What happens when DeltaStore TSC is applied across a fleet of deployments?

Scale	GPUs Saved	Energy (GWh/yr)	CO₂ (tonnes)	Cars Equivalent
1 deployment (10K users, A100)	30	0.10	40	8.7
10 deployments (100K users)	300	1.02	400	87
100 deployments (1M users)	3,000	10.2	4,000	870
1,000 deployments (10M users, H100)	30,000	239	93,300	20,283

At the 10M-user scale on H100 hardware — well within reach for a top-10 AI provider — DeltaStore TSC eliminates 30,000 GPUs from the global fleet. That's 239 GWh/yr of energy savings and 93,300 tonnes of CO₂ avoided annually — equivalent to removing over 20,000 cars from the road or the carbon sequestration of 1.5 million trees.

Energy Savings Scale with Fleet Size (H100, GWh/year)

Water: The Hidden Resource

GPUs need cooling. Data centers consume approximately 1.8 liters of water per kWh of energy used. This water is evaporated through cooling towers and is not recoverable. As AI clusters grow, water consumption has become a serious concern — Microsoft's water consumption increased 34% in 2022, largely driven by AI workloads.

A100 — 10K Users

184,500 L

Water saved per year. That's 48,700 gallons — enough to fill a residential swimming pool 2.4 times.

H100 — 10K Users

430,470 L

Water saved per year. That's 113,700 gallons — enough to supply a household's water needs for 3.1 years.

The Combined Dividend: Cost + Energy + Carbon + Water

The financial savings from DeltaStore TSC are significant — $1.35M/year at 10K users. But the environmental savings compound on top. Every GPU eliminated delivers savings across four dimensions simultaneously.

Dimension	Per GPU Saved (A100)	30 GPUs (10K users)
Annual cost	$17,520	$525,600
Annual energy	3,416 kWh	102,492 kWh
Annual CO₂	1.33 tonnes	40.0 tonnes
Annual water	6,150 L	184,500 L

The cost savings pays for the infrastructure. The environmental savings is the dividend. Companies adopting DeltaStore TSC don't need to choose between profitability and sustainability — they get both, from the same technical improvement, with zero additional effort.

Methodology and Assumptions

GPU power draw: A100 SXM = 300W TDP, H100 SXM = 700W TDP (NVIDIA specifications)

PUE: 1.3 (conservative hyperscaler estimate; industry average is 1.58 per Uptime Institute 2023)

Carbon intensity: 0.39 kg CO₂/kWh (US national average, EPA eGRID 2023). Varies by region: 0.05 in Quebec/Norway, 0.90 in coal-heavy grids. Savings scale proportionally.

Water usage: 1.8 L/kWh (WUE estimate, based on Google/Microsoft sustainability reports 2023)

GPU pricing: A100 80GB at $2/hr spot pricing (representative cloud pricing, Q1 2026)

Average US home: 10,900 kWh/year (EIA 2023)

Average passenger car: 4.6 tonnes CO₂/year (EPA 2023)

Tree sequestration: 0.06 tonnes CO₂/year per mature tree (USDA Forest Service)

Compression model: LLaMA-3-7B, 2048-token system prompt, 4096-token per-user context, A100 80GB. GPU counts from DeltaStore TSC Part IV scaling model. See Part IV for full methodology.

All numbers are per-deployment. Industry projections assume independent deployments with similar parameters. Actual savings vary by model size, context length, and hardware generation.