Economics

The economics of lease-bound infrastructure.

The companion to the technical architecture. That one says "here's the programming model." This one says "here's why it changes the cost model."

grafos · acme tenant

$ grafos billing report --tenant acme --window 24h

  resource   leases   capacity·s    cost
  ──────────────────────────────────────
  GPU             4        1,428    $14.28
  Memory          2       11,520    $11.52
  Block           1       86,400    $43.20

  total                                 $69.00 USD
  // exact, not estimated · audit chain anchor: 0x9b41…d0e3

Exact accounting

Bills emerge from the lease lifecycle itself. No estimation, no sampling, no reconciliation.

Hash-linked audit

Every billable event seals into a SHA-256 chain. The auditor and the meter read the same artifact.

Priority-driven preemption

Guaranteed never preempted. Standard yields to Guaranteed. Scavenger yields to anyone.

The problem

The unit economics are broken.

Infrastructure operators buy capacity in instance-hours. They pay for a machine whether it's working or idle. When a GPU sits at 35% utilization — which is typical — the operator pays for 100% of the silicon.

The waste isn't laziness. It's a rational response to unsafe sharing. Operators overprovision because the software stack can't safely revoke access to a GPU, a block of RDMA-registered memory, or an NVMe namespace. If you can't take something back, you have to buy more of it.

fabricBIOS changes the unit of accounting. Every resource access is a lease with a TTL and mandatory teardown. The actual unit of work — capacity multiplied by time — becomes measurable, attributable, and billable.

The native unit

Capacity-seconds.

A GPU-hour tells you nothing about how much of the GPU was actually used, by whom, or for what. In grafOS, the native unit is the capacity-second: the amount of a specific resource held for one second.

Every lease event — allocation, renewal, release, expiry, revocation — is recorded with a timestamp, tenant ID, resource type, and capacity.

pub struct LeaseEvent {
    pub event_id: u128,
    pub timestamp: u64,
    pub tenant_id: TenantId,
    pub accounting_tag: Option,
    pub lease_id: u128,
    pub resource_type: ResourceKind,    // Mem, Block, Net, Gpu, Cpu
    pub capacity: u64,
    pub node_id: NodeId,
    pub event_type: LeaseEventType,
    pub tier_kind: Option,          // memory tier, for tier-aware metering
}

pub enum LeaseEventType {
    Allocated { duration_secs: u64 },
    Renewed   { new_expires_at: u64 },
    Released,
    Expired,
    Revoked { reason: RevocationReason },
    Fenced,
}

Usage summaries are computed by reconstructing lease intervals from events and calculating pro-rated usage for partial overlaps. Peak concurrent capacity is computed via sweep-line algorithm. This is not sampling — it's exact accounting from the lease lifecycle itself.

The optional tier_kind field tracks capacity-seconds per (resource_kind, tier_kind) pair, so tiered memory (DRAM vs. CXL vs. RDMA-attached) bills at the rate that actually matches the silicon a workload held.

Cost attribution

Deterministic billing.

Every tenant carries an accounting tag. Every lease event inherits it. Cost attribution flows directly from the event log.

Resource	Default rate (per capacity-second)
GPU	0.01
CPU	0.002
Memory	0.001
Block	0.0005
Network	0.0003

Cost for a lease = capacity x duration_seconds x rate. No amortization, no averaging. The cost is deterministic because the lease lifecycle is deterministic.

For external billing systems, a webhook pusher sends lease event JSON on every lifecycle transition with exponential backoff retry. The billing system doesn't need to poll or estimate.

The trust foundation

Every billable event is a sealed link in a hash chain.

An exact event log only matters if the events can be trusted. fabricBIOS and grafOS seal each lease lifecycle, preemption, and admission event into a SHA-256-linked audit chain — assembled at the emit point, anchored in a 32-byte head pointer on the daemon's narrow durable surface, and validated end-to-end by an upstream collector.

pub struct AuditRecord {
    pub kind: AuditEventKind,           // typed, no free-form strings
    pub identity: WorkloadIdentity,
    pub reason: Option,
    pub timestamp: u64,
    pub sequence: u64,
    pub prev_event_hash: Hash,          // 32-byte SHA-256 of prior record
    pub current_event_hash: Hash,       // SHA-256(canonical_bytes)
    pub signature: Option,
    pub affected_hcl_entry: Option,
    pub dra_claim_id: Option,
}

Each record's current_event_hash is computed over canonical bytes that include the previous record's hash. Tampering with any record — or losing one — re-hashes the chain and breaks at ingest. The FileAnchorStore persists the head pointer across restarts, so the chain survives a daemon crash without a "first event after restart" gap.

Three properties fall out of one design: billing you can audit, preemption you can replay, compliance you can prove. The auditor and the metering pipeline read the same artifact.

Quota enforcement

Exact limits. No estimation gap.

Hard limit

Immediate rejection when exceeded. No ambiguity, no grace period.

Soft limit

Allows overage but tracks it. For monitoring and alerting, not enforcement.

Burst limit

Allows temporary over-quota with a TTL. The burst allocation auto-expires — backed by the same lease TTL mechanism.

pub enum LimitType {
    Hard,
    Soft,
    Burst { burst_limit: u64, burst_ttl_secs: u64 },
}

Quotas are enforced per-tenant, per-resource, per-node. Lease count limits prevent a single tenant from holding thousands of small leases.

Quota enforcement is exact because resource accounting is exact. There's no gap between "what the tenant thinks they're using" and "what they're actually using." The lease is the source of truth.

Priority and preemption

Not all work is equal.

Priority	Behavior
Guaranteed	Never preempted. Highest cost.
Standard	Preempted by Guaranteed requests when capacity is scarce.
Scavenger	Preempted by anyone. Cheapest tier. Uses otherwise-idle capacity.

When a Guaranteed request arrives and capacity is insufficient, the preemption manager identifies victims — leases at strictly lower priority. Within the same tier, leases with the shortest remaining TTL are preempted first.

pub struct VictimLease {
    pub lease_id: u64,
    pub holder_id: u64,
    pub priority: Priority,
    pub capacity: u64,
    pub remaining_ttl_secs: u64,
    pub tenant_id: TenantId,
}

Preemption flows from the lease primitive: revoking a lease triggers mandatory teardown, the resource is deterministically reclaimed, and the higher-priority request proceeds. Every preemption generates an audit event.

This is the mechanism that makes scavenger pricing possible. Idle GPUs earn revenue at scavenger rates instead of earning nothing. The operator's floor utilization rises because scavenger tenants are safe to admit — they can always be evicted.

Placement scoring

Constraint arbitrage.

Where you place a workload matters as much as whether you admit it. Placement scoring optimizes across multiple dimensions simultaneously.

Fit Does this node have enough free capacity?

Locality Is the workload near its data or affinity targets?

Fragmentation How packed is the node after this allocation?

Pressure How loaded is the node overall?

Heat Is this node running hot? (from runtime telemetry)

Spread

Maximize fault tolerance. Distribute load so no single failure is catastrophic.

Pressure: 0.5, Fit: 0.2, Locality: 0.2, Fragmentation: 0.1

BinPack

Minimize cost. Pack nodes tight, leave others completely empty, power down or sell idle capacity.

Fragmentation: 0.5, Fit: 0.2, Locality: 0.2, Pressure: 0.1

Heat-map steering adds a dynamic component: nodes running hot get scored down. The system naturally routes work away from trouble without explicit health checks or circuit breakers.

Waste detection

You can't fix waste you can't see.

The grafOS profiler classifies every lease into a waste category — computed from the lease lifecycle itself, not from host-level monitoring.

Classification	Trigger	What it means
Overprovisioned	utilization < 25%	Lease holds more capacity than it uses
Idle	active < 10%	Lease exists but does almost nothing
Fragmented	many small leases	Overhead from lease management exceeds value
Premium waste	high-value, low-util	Expensive hardware sitting idle
Healthy	none of the above	Lease is well-sized and active

Sensitivity-aware placement

Different workloads are sensitive to different resources.

A training job is GPU-bound. A serving workload is memory-bound. A data pipeline is storage-bound. The grafOS profiler infers workload sensitivity from observed behavior.

Level	Trigger
Critical	Resource accounts for > 50% of cost, or > 30% of time spent waiting
Moderate	Resource accounts for > 20% of cost
Low	Resource accounts for > 5% of cost
Indifferent	Resource used but not a significant cost driver
Unused	Resource not used at all

A workload that's Critical on GPU and Indifferent on Network should be placed where GPU is abundant, even if that means paying more for network. The system learns which resource is the binding constraint for each workload, and places accordingly.

The business case

Structural, not incremental.

For infrastructure operators

Scavenger tier monetizes idle capacity that currently earns zero revenue
Preemption makes multi-tenant GPU sharing safe — more tenants per GPU
Exact capacity-seconds billing replaces instance-hour estimation
Waste profiling identifies the most expensive inefficiencies automatically
Deterministic revocation means you can admit more work, because you can always take it back

For tenants

Pay for what you use, measured in capacity-seconds, not instance-hours
Scavenger pricing for fault-tolerant workloads (training checkpoints, batch jobs)
Guaranteed tier for latency-sensitive serving — SLA backed by the lease primitive
No surprise bills from orphaned resources — leases expire, capacity returns

For the industry

GPU clusters at 30–50% utilization represent billions in stranded capital annually. If lease-based authority moves effective utilization from 40% to 60%, the value created per dollar of hardware doubles — without buying a single additional GPU. The savings compound: fewer machines means less power, less cooling, less rack space, less networking.

How it connects

The economics layer is not bolted on.

It emerges from the lease primitive. Every layer derives from the one above it. No side channels, no estimation, no reconciliation jobs.

Lease lifecycle (create/renew/expire/revoke)
    → Event log (append-only, timestamped)
        → Audit chain (sealed, hash-linked, signed)
            → Capacity-seconds accounting (exact, per-tenant)
                → Quota enforcement (hard/soft/burst)
                → Cost attribution (rate card × capacity-seconds)
                → Waste detection (utilization vs. capacity)
                → Preemption decisions (priority × remaining TTL)
                → Placement scoring (constraint arbitrage)
                → External billing (webhook)

The lease is the single source of truth for what was used, by whom, for how long, and at what cost. This is what "lease-native" means at the economic level: the unit of accounting and the unit of enforcement are the same thing.