Exact accounting
Bills emerge from the lease lifecycle itself. No estimation, no sampling, no reconciliation.
Economics
The companion to the technical architecture. That one says "here's the programming model." This one says "here's why it changes the cost model."
Bills emerge from the lease lifecycle itself. No estimation, no sampling, no reconciliation.
Every billable event seals into a SHA-256 chain. The auditor and the meter read the same artifact.
Guaranteed never preempted. Standard yields to Guaranteed. Scavenger yields to anyone.
The problem
Infrastructure operators buy capacity in instance-hours. They pay for a machine whether it's working or idle. When a GPU sits at 35% utilization — which is typical — the operator pays for 100% of the silicon.
The waste isn't laziness. It's a rational response to unsafe sharing. Operators overprovision because the software stack can't safely revoke access to a GPU, a block of RDMA-registered memory, or an NVMe namespace. If you can't take something back, you have to buy more of it.
fabricBIOS changes the unit of accounting. Every resource access is a lease with a TTL and mandatory teardown. The actual unit of work — capacity multiplied by time — becomes measurable, attributable, and billable.
The native unit
A GPU-hour tells you nothing about how much of the GPU was actually used, by whom, or for what. In grafOS, the native unit is the capacity-second: the amount of a specific resource held for one second.
Every lease event — allocation, renewal, release, expiry, revocation — is recorded with a timestamp, tenant ID, resource type, and capacity.
pub struct LeaseEvent {
pub event_id: u128,
pub timestamp: u64,
pub tenant_id: TenantId,
pub accounting_tag: Option,
pub lease_id: u128,
pub resource_type: ResourceKind, // Mem, Block, Net, Gpu, Cpu
pub capacity: u64,
pub node_id: NodeId,
pub event_type: LeaseEventType,
pub tier_kind: Option, // memory tier, for tier-aware metering
}
pub enum LeaseEventType {
Allocated { duration_secs: u64 },
Renewed { new_expires_at: u64 },
Released,
Expired,
Revoked { reason: RevocationReason },
Fenced,
} Usage summaries are computed by reconstructing lease intervals from events and calculating pro-rated usage for partial overlaps. Peak concurrent capacity is computed via sweep-line algorithm. This is not sampling — it's exact accounting from the lease lifecycle itself.
The optional tier_kind field tracks capacity-seconds per
(resource_kind, tier_kind) pair, so tiered memory
(DRAM vs. CXL vs. RDMA-attached) bills at the rate that actually
matches the silicon a workload held.
Cost attribution
Every tenant carries an accounting tag. Every lease event inherits it. Cost attribution flows directly from the event log.
| Resource | Default rate (per capacity-second) |
|---|---|
| GPU | 0.01 |
| CPU | 0.002 |
| Memory | 0.001 |
| Block | 0.0005 |
| Network | 0.0003 |
Cost for a lease = capacity x duration_seconds x rate. No amortization, no averaging. The cost is deterministic because the lease lifecycle is deterministic.
For external billing systems, a webhook pusher sends lease event JSON on every lifecycle transition with exponential backoff retry. The billing system doesn't need to poll or estimate.
The trust foundation
An exact event log only matters if the events can be trusted. fabricBIOS and grafOS seal each lease lifecycle, preemption, and admission event into a SHA-256-linked audit chain — assembled at the emit point, anchored in a 32-byte head pointer on the daemon's narrow durable surface, and validated end-to-end by an upstream collector.
pub struct AuditRecord {
pub kind: AuditEventKind, // typed, no free-form strings
pub identity: WorkloadIdentity,
pub reason: Option,
pub timestamp: u64,
pub sequence: u64,
pub prev_event_hash: Hash, // 32-byte SHA-256 of prior record
pub current_event_hash: Hash, // SHA-256(canonical_bytes)
pub signature: Option,
pub affected_hcl_entry: Option,
pub dra_claim_id: Option,
}
Each record's current_event_hash is computed over canonical
bytes that include the previous record's hash. Tampering with any record
— or losing one — re-hashes the chain and breaks at ingest. The
FileAnchorStore persists the head pointer across restarts,
so the chain survives a daemon crash without a "first event after
restart" gap.
Three properties fall out of one design: billing you can audit, preemption you can replay, compliance you can prove. The auditor and the metering pipeline read the same artifact.
Quota enforcement
Immediate rejection when exceeded. No ambiguity, no grace period.
Allows overage but tracks it. For monitoring and alerting, not enforcement.
Allows temporary over-quota with a TTL. The burst allocation auto-expires — backed by the same lease TTL mechanism.
pub enum LimitType {
Hard,
Soft,
Burst { burst_limit: u64, burst_ttl_secs: u64 },
} Quotas are enforced per-tenant, per-resource, per-node. Lease count limits prevent a single tenant from holding thousands of small leases.
Quota enforcement is exact because resource accounting is exact. There's no gap between "what the tenant thinks they're using" and "what they're actually using." The lease is the source of truth.
Priority and preemption
| Priority | Behavior |
|---|---|
| Guaranteed | Never preempted. Highest cost. |
| Standard | Preempted by Guaranteed requests when capacity is scarce. |
| Scavenger | Preempted by anyone. Cheapest tier. Uses otherwise-idle capacity. |
When a Guaranteed request arrives and capacity is insufficient, the preemption manager identifies victims — leases at strictly lower priority. Within the same tier, leases with the shortest remaining TTL are preempted first.
pub struct VictimLease {
pub lease_id: u64,
pub holder_id: u64,
pub priority: Priority,
pub capacity: u64,
pub remaining_ttl_secs: u64,
pub tenant_id: TenantId,
} Preemption flows from the lease primitive: revoking a lease triggers mandatory teardown, the resource is deterministically reclaimed, and the higher-priority request proceeds. Every preemption generates an audit event.
This is the mechanism that makes scavenger pricing possible. Idle GPUs earn revenue at scavenger rates instead of earning nothing. The operator's floor utilization rises because scavenger tenants are safe to admit — they can always be evicted.
Placement scoring
Where you place a workload matters as much as whether you admit it. Placement scoring optimizes across multiple dimensions simultaneously.
Maximize fault tolerance. Distribute load so no single failure is catastrophic.
Pressure: 0.5, Fit: 0.2, Locality: 0.2, Fragmentation: 0.1
Minimize cost. Pack nodes tight, leave others completely empty, power down or sell idle capacity.
Fragmentation: 0.5, Fit: 0.2, Locality: 0.2, Pressure: 0.1
Heat-map steering adds a dynamic component: nodes running hot get scored down. The system naturally routes work away from trouble without explicit health checks or circuit breakers.
Waste detection
The grafOS profiler classifies every lease into a waste category — computed from the lease lifecycle itself, not from host-level monitoring.
| Classification | Trigger | What it means |
|---|---|---|
| Overprovisioned | utilization < 25% | Lease holds more capacity than it uses |
| Idle | active < 10% | Lease exists but does almost nothing |
| Fragmented | many small leases | Overhead from lease management exceeds value |
| Premium waste | high-value, low-util | Expensive hardware sitting idle |
| Healthy | none of the above | Lease is well-sized and active |
Sensitivity-aware placement
A training job is GPU-bound. A serving workload is memory-bound. A data pipeline is storage-bound. The grafOS profiler infers workload sensitivity from observed behavior.
| Level | Trigger |
|---|---|
| Critical | Resource accounts for > 50% of cost, or > 30% of time spent waiting |
| Moderate | Resource accounts for > 20% of cost |
| Low | Resource accounts for > 5% of cost |
| Indifferent | Resource used but not a significant cost driver |
| Unused | Resource not used at all |
A workload that's Critical on GPU and Indifferent on Network should be placed where GPU is abundant, even if that means paying more for network. The system learns which resource is the binding constraint for each workload, and places accordingly.
The business case
GPU clusters at 30–50% utilization represent billions in stranded capital annually. If lease-based authority moves effective utilization from 40% to 60%, the value created per dollar of hardware doubles — without buying a single additional GPU. The savings compound: fewer machines means less power, less cooling, less rack space, less networking.
How it connects
It emerges from the lease primitive. Every layer derives from the one above it. No side channels, no estimation, no reconciliation jobs.
Lease lifecycle (create/renew/expire/revoke)
→ Event log (append-only, timestamped)
→ Audit chain (sealed, hash-linked, signed)
→ Capacity-seconds accounting (exact, per-tenant)
→ Quota enforcement (hard/soft/burst)
→ Cost attribution (rate card × capacity-seconds)
→ Waste detection (utilization vs. capacity)
→ Preemption decisions (priority × remaining TTL)
→ Placement scoring (constraint arbitrage)
→ External billing (webhook) The lease is the single source of truth for what was used, by whom, for how long, and at what cost. This is what "lease-native" means at the economic level: the unit of accounting and the unit of enforcement are the same thing.