Use cases
Built for shared infrastructure.
Tenura is a fit anywhere multiple workloads need access to expensive hardware without stepping on each other — and where "stepping on each other" includes leaked allocations, stale state, and the cost of buying around the safety margin. Four common shapes:
01
Multi-tenant inference.
Several tenants serve models from the same pool of GPUs. Today most teams pin a fraction of each GPU per tenant for the day, end up at 30–40% utilization across the cluster, and pay full price for capacity that's idle most of the time. The unsafe alternative is time-sharing without enforcement — and one tenant's hung kernel can starve everyone else.
What Tenura does:
- Each inference request acquires a short lease (seconds to minutes) for the GPU slice it actually needs.
- If a request's program crashes or hangs, the lease's TTL elapses and the resource returns to the pool — automatically and deterministically.
- Capacity-seconds metering bills each tenant for what they used, not for a static reservation.
grafos deploy run llama-serve.wasm \
--tenant acme --gpu 1 --mem 80G \
--ttl 30s --priority guaranteed Idle cycles between requests go back to the scheduler, which can admit Standard- or Scavenger-tier work to fill the gaps.
02
Burst training.
A research team needs eight GPUs for a six-hour training run, then nothing for three days. Static reservation ties up the hardware. Manual provisioning forgets to release. Spot interruption mid-run is too disruptive.
What Tenura does:
- The job acquires a lease for the cells and resources it needs, with a TTL that covers the run plus a buffer.
- Lease renewal is poll-driven — if the job dies, renewal stops and the lease expires on schedule.
- Guaranteed-priority leases are not preempted; Standard and Scavenger leases on the same hardware are.
grafos deploy run train.wasm \
--tenant lab --gpu 8 --mem 1T \
--ttl 8h --priority guaranteed \
--renewable When the run finishes — successfully or not — teardown runs and the GPUs return to the pool. Other tenants can immediately admit work onto them. No leftover NCCL state, no stuck QPs.
04
Disaggregated memory and storage.
Modern fabrics let memory and storage live anywhere on the wire — CXL pools, RDMA-attached DRAM, NVMe-oF namespaces. The hard part isn't the protocol; it's the cleanup. Allocations that survive crashes turn into orphaned mappings, fenced QPs, dangling block exports.
What Tenura does:
- Memory, queue pairs, and block namespaces are leased with the same primitive as GPUs. Same TTL, same teardown contract.
- Failed teardown puts the resource in
FENCED— no new lease lands on dirty hardware until an operator clears it. - Cross-resource leases are atomic: a workload that needs both a GPU and 80 GiB of CXL gets both or none.
grafos deploy run pipeline.wasm \
--tenant acme --gpu 1 --cxl-mem 256G \
--rdma-qp 4 --block 4T --ttl 1h Not sure which one fits
Tell us your shape.
Most pilots start with one of these and grow into another. If you have an unusual workload — high-frequency trading on shared FPGAs, multi-region inference replication, or something we haven't thought of — we still want to hear about it.