Use cases

Built for shared infrastructure.

Tenura is a fit anywhere multiple workloads need access to expensive hardware without stepping on each other — and where "stepping on each other" includes leaked allocations, stale state, and the cost of buying around the safety margin. Four common shapes:

01

Multi-tenant inference.

Several tenants serve models from the same pool of GPUs. Today most teams pin a fraction of each GPU per tenant for the day, end up at 30–40% utilization across the cluster, and pay full price for capacity that's idle most of the time. The unsafe alternative is time-sharing without enforcement — and one tenant's hung kernel can starve everyone else.

What Tenura does:

grafos deploy run llama-serve.wasm \
  --tenant acme --gpu 1 --mem 80G \
  --ttl 30s --priority guaranteed

Idle cycles between requests go back to the scheduler, which can admit Standard- or Scavenger-tier work to fill the gaps.

02

Burst training.

A research team needs eight GPUs for a six-hour training run, then nothing for three days. Static reservation ties up the hardware. Manual provisioning forgets to release. Spot interruption mid-run is too disruptive.

What Tenura does:

grafos deploy run train.wasm \
  --tenant lab --gpu 8 --mem 1T \
  --ttl 8h --priority guaranteed \
  --renewable

When the run finishes — successfully or not — teardown runs and the GPUs return to the pool. Other tenants can immediately admit work onto them. No leftover NCCL state, no stuck QPs.

03

Shared research clusters.

A research lab gives every grad student "their" GPU — except half are idle most of the time and the other half are blocked behind whoever's running ablations. The lab buys more hardware to make the queue feel less full. The hardware spends most of its life idle.

What Tenura does:

# Default lease: 30 minutes idle and the resource releases.
grafos deploy run notebook.wasm \
  --tenant research/student-id --gpu 1 \
  --idle-timeout 30m

04

Disaggregated memory and storage.

Modern fabrics let memory and storage live anywhere on the wire — CXL pools, RDMA-attached DRAM, NVMe-oF namespaces. The hard part isn't the protocol; it's the cleanup. Allocations that survive crashes turn into orphaned mappings, fenced QPs, dangling block exports.

What Tenura does:

grafos deploy run pipeline.wasm \
  --tenant acme --gpu 1 --cxl-mem 256G \
  --rdma-qp 4 --block 4T --ttl 1h

Not sure which one fits

Tell us your shape.

Most pilots start with one of these and grow into another. If you have an unusual workload — high-frequency trading on shared FPGAs, multi-region inference replication, or something we haven't thought of — we still want to hear about it.