FAQ

Answers to the questions we hear most.

Search by keyword or filter by tag. Don't see your question? Submit it below — we review every entry and publish answers here.

25 entries

What is Tenura Systems?

We're building a lease-bound operating layer for premium hardware: GPUs, memory, RDMA, NVMe storage, and future fabric resources like CXL. Every resource access is bound by a cryptographic lease with a TTL and mandatory teardown — when the lease expires, the resource is deterministically reclaimed.
The stack has two layers. fabricBIOS is the firmware-side authority that grants and revokes access at the metal boundary. grafOS is the resource graph runtime on top, exposing tasklets, leases, and typed bindings to fabric resources through a Rust SDK.
What's the difference between fabricBIOS and grafOS?

fabricBIOS is the lower layer — a firmware-style daemon (or bare-metal binary) that owns the resources and enforces lease lifecycle, capability tokens, and revocation. It runs on standard servers as a Linux daemon, or on DPUs / SmartNICs / edge hardware as bare-metal firmware.
grafOS is the runtime above it. Workloads and resources form an explicit graph; changes happen through safe, rollback-capable graph rewrites. Programs link against grafos-sdk and use higher-level primitives (FabricVec, FabricHashMap, ReplicatedMap, RPC, streams, etc.) that are all backed by underlying fabricBIOS leases.
How is this different from Kubernetes?

Kubernetes schedules processes onto nodes. We govern access to specific premium hardware that sits underneath or alongside Kubernetes — GPUs, RDMA queue pairs, NVMe-oF namespaces, memory regions on remote hosts. fabricBIOS doesn't replace Linux or Kubernetes; it grants and revokes access to the resources they can't manage safely.
Think of it as the layer that makes a GPU or RDMA NIC behave like a leased object instead of a stranded singleton. You can run grafOS programs from inside a Kubernetes pod and have them lease GPUs across the cluster.
Can I run this alongside Kubernetes, Slurm, or Ray?

Yes — that's the common shape. Those systems schedule processes onto nodes; we govern access to specific premium hardware on those nodes. A Kubernetes pod runs your program, your program calls grafos-sdk to lease a GPU through fabricBIOS, the lease enforces deterministic teardown when the pod exits.
The same pattern works with Slurm, Nomad, Ray, and self-hosted CI runners. fabricBIOS doesn't care who launched the process — it cares about who's holding the lease on the resource.
A common deployment shape: customer's Kubernetes cluster on AWS, with a DaemonSet that runs fabricbiosd on each GPU node. Workloads scheduled by Kubernetes lease GPU access through the local fabricbiosd; revocation is sub-second within a node.
What is a lease, exactly?

A cryptographic binding between a workload and a resource with a time-to-live. The lease specifies which resource (a GPU, a chunk of memory, a block range), which holder (a tenant + program identity), and how long the binding lasts.
The key property: when the lease expires, teardown is mandatory. The data plane is forcibly disconnected — RDMA queue pairs are destroyed, NVMe namespaces unmapped, GPU contexts torn down. There is no "best-effort cleanup" path. If teardown itself fails, the resource enters a fenced state and refuses new leases until an operator clears it.
How do I renew a lease before it expires?

The grafos-leasekit crate handles renewal automatically. You call it from your program with a budget (how often to renew, how close to expiry to refresh) and it polls for you. Renewals are cheap signed control-plane round trips.
Default max TTL is 300 seconds. Programs can ask for shorter TTLs (e.g. 30s) if they want fast revocation visibility, or longer ones up to the cap if they want to minimize control-plane chatter.
What does it mean for a resource to be "fenced"?

Fenced means the resource entered a fail-closed state because something happened that we can't safely paper over — a teardown that didn't complete cleanly, a key rotation gap, a stale write detected via epoch fence, or a tenant force-delete with leases in flight.
A fenced resource refuses every new lease request. An operator clears the fence (after investigating) through the grafos CLI's admin surface. We chose fail-closed because the alternative — silently re-leasing a resource we're not sure is clean — is the failure mode that creates outages and security incidents.
Can a single lease span multiple cells or regions?

A lease is bound to a specific cell. For workloads that span cells — a stream pipeline with stages in different regions, a replicated key-value store, a multi-region inference fleet — you compose across leases using the replicated primitives in grafos-collections and grafos-sync: ReplicatedMap, ReplicatedLog, replicated queues. Each backing lease lives in a single cell; the replication primitive coordinates across them with explicit quorum policy.
Cross-failure-domain primitives are an active area for us. The shape is: you declare a logical resource (a queue, a map, an object store) with a placement policy (which cells, regions, or providers it can live in) and a replica policy (how many replicas, what's quorum). The fabric handles the lease-per-cell book-keeping underneath.
What happens to my data when a lease expires?

Mandatory teardown is the rule, and data scrubbing is part of it. Memory leases zero the region before the next lease can claim it. NVMe ranges are deallocated through namespace unmap, which on the controllers we support invokes the device's secure-erase path — we don't trust the unmap until it acknowledges. RDMA buffers go through the HCA's deregister path before the memory region is freed. GPU contexts are torn down completely; no leftover VRAM allocations bleed into the next lease.
If teardown fails for any reason, the resource fences instead of being re-leased. Fail-closed: better an operator alert than a silent data leak.
Do I need special hardware to run this?

No. The reference path is fabricbiosd, a Linux daemon that runs on stock x86 or ARM servers. AWS Graviton, GCP T2A, Azure Ampere all work. The same daemon brokers GPU access on commodity NVIDIA cards, RDMA on ConnectX, NVMe on standard PCIe SSDs.
There's also a bare-metal target for DPUs, SmartNICs, and edge boxes (we use Raspberry Pi 5 as a reference target). That path is for operators who want lease enforcement at the firmware boundary instead of in a Linux process. Both paths speak the same wire protocol.
Which GPUs and accelerators are supported today?

NVIDIA is the validated path. On the Linux-host model — fabricbiosd running as a daemon and brokering GPU access at the device-file boundary — any card the standard NVIDIA driver supports is in scope: A100, H100, L4, L40S, T4 are the ones we test against regularly.
Direct silicon access (where fabricBIOS owns the GPU without a host driver, for the bare-metal firmware path) is in active development. L4 is our reference target; expanding through Ada, Hopper, and Blackwell as we work through GSP firmware boot for each generation.
AMD MI300 and Intel Gaudi are on the roadmap but not yet validated. CXL memory and pooled HBM enclosures are in design. CUDA compatibility is whatever the host driver supports — we don't impose a version constraint from our side.
What about multi-node training over RDMA (NCCL, MPI)?

RDMA is a first-class fabric resource type. ConnectX-class HCAs are validated end-to-end: lease, queue-pair allocation, flow steering, revocation. A program leases RDMA queue pairs on multiple nodes through the SDK and gets a typed handle that exposes the verbs context NCCL or MPI need.
What works today: in-cell multi-node training with leased QPs, where revocation is bounded and the underlying memory regions are deterministic.
What's still being polished: a packaged NCCL plugin so you don't have to wire the QP context yourself, and cross-cell collectives where QPs span cells in different regions. The plugin is in progress; cross-region collectives are tied to the cross-failure-domain work and aren't ready for production training jobs yet.
What clouds do you support?

AWS, GCP, Azure, plus on-prem. Each has a corresponding grafos cloud provision <provider> workflow that uses Terraform under the hood; the cell then registers itself with our scheduler so the same lease and program-deploy surface works everywhere.
For on-prem, you run fabricbiosd directly on your hardware and point grafOS at it. Mixed fleets (some cells on AWS, some on-prem, some on GCP) run as a single logical fabric.
Can I run this in my own cloud account?

Yes. Pass --mode customer-owned to grafos cloud provision. You provide an IAM role ARN and an external-id; we mint short-lived STS sessions per provision call and never hold long-lived credentials. Teardown is symmetric: same role, same external-id check.
This is the path most production users start on — your account, your billing, your security boundary. The tenura-managed mode is mostly for evaluation and demos.
How do I get started?
Install the CLI:
```
curl -fsSL https://get.tenura.systems/install.sh | sh
```
Then request a beta invite at https://tenura.systems/request-invite. Once approved, run grafos login to set up credentials and grafos new my-program to scaffold a starter program.
From there, grafos dev up brings up a local fabric for development, and grafos deploy ships the program to a real cell when you're ready.
I already have a GPU workload — how do I migrate?

It's mostly additive, not a rewrite. Provision a Tenura cell on your cloud account (grafos cloud provision aws --mode customer-owned). The cell is a normal Linux instance — you can scp your code onto it and run it like you would on any host.
From there, the only code change is wrapping the GPU/RDMA acquisition with an SDK lease call. You get a typed handle from the fabric; existing PyTorch, JAX, or TensorFlow code keeps working underneath because the wrapping happens at the device-acquisition boundary, not inside the framework. Release the lease when you're done, or let the lease expire and the fabric tears it down for you.
What you get from the wrap: deterministic teardown if your job crashes, capability tokens scoping access to only the resource you asked for, and the same lease semantics whether the cell is on AWS today or on-prem tomorrow.
What are tasklets?

Tasklets are lightweight WASM programs that run inside the fabric, scoped by capability tokens. Submit a tasklet by referencing its WASM module hash; the scheduler places it on a cell with capacity, it runs against leased resources, and it returns a result.
Tasklets are the unit of work above grafOS leases. Where a lease grants access to a resource, a tasklet is the bit of code that uses that access. Capability tokens limit what each tasklet can touch — including which leases it can present.
Do I have to write tasklets, or can I run a normal program?

Both paths exist. The primary path is a Rust program that links grafos-sdk and runs on a cell — you compile to WASM and grafos deploy run program.wasm. The program claims leases through the SDK, talks to fabric resources through typed handles, and runs as long as you'd normally run a service.
Tasklets are a different primitive: short-running WASM units submitted via TASKLET_SUBMIT and run by the cell's tasklet executor under capability tokens. Useful for user-supplied compute snippets, untrusted code, or fan-out work. You'd write a tasklet when you want isolation per-invocation; you'd write a program when you want a long-lived service.
Most users start with programs.
What language do I write programs in?

Rust is the first-class SDK. The grafos-sdk crate exposes typed access to fabric memory, block, GPU, and CPU resources, plus the higher-level data structures (FabricVec, FabricHashMap, FabricQueue, ReplicatedMap, etc.) and frameworks (RPC, streams, batch jobs, message queues).
For tasklets specifically, anything that compiles to WASI works. We've tested Rust and TinyGo; in principle Zig, Swift, AssemblyScript, and others should compile fine.
How fast is revocation?
REVOKE_BROADCAST is a signed control-plane message; holders see it within the control-plane round-trip time (typically under 50ms within a cell). For the data plane the lower bound depends on the underlying hardware:
- GPU contexts and CPU lease handles: tens of milliseconds.
- NVMe-oF namespaces: tens to low-hundreds of milliseconds.
- RDMA queue pairs on ConnectX-class HCAs: ~1-2 seconds for in-flight traffic to fully drain due to HCA-internal cache propagation; new connections are blocked immediately.
The contract is deterministic, not instantaneous: by the time a revoked lease's teardown completes, no new I/O can issue against the resource. If teardown can't complete cleanly, the resource fences instead of silently leaking.
What's the trust model?

Every control-plane message is signed (Ed25519) and verified before it's parsed deeply or decompressed. Tokens have short TTLs (default max 300 seconds) with audience binding so a token minted for one cell can't be replayed against another.
Trust bootstrap is TOFU + a pinned controller key — the operator pins a public key once, and the fabric establishes mutual TLS from there. Anti-replay uses nonce + timestamp with a replay cache. Anything we can't verify, we drop fail-closed.
How do you isolate tenants who share the same node or GPU?

Multiple layers, enforced at the metal. Each lease binds a specific resource scope — a GPU vRAM range, an NVMe namespace, a memory region, an RDMA queue pair — and the data plane enforces the boundary at the hardware level. RDMA QPs are isolated by HCA flow steering, GPU contexts use NVIDIA MIG slices or VRAM-partitioned contexts, NVMe-oF namespaces are namespace-scoped at the controller.
Capability tokens scope what code can do: a token specifies which leases the bearer can present, signed by the cell scheduler with a short TTL. The scheduler refuses placements that would put incompatible tenants on the same physical resource if the policy says so.
Co-tenancy is configurable. The default placement spreads tenants across nodes; you can pin to dedicated nodes with an exclusive-node placement policy if you need physical isolation. We never assume two tenants are friendly to each other.
How is this priced?

Per-resource-second metering, similar to spot capacity. You pay for what you actually leased, by the second, with the price varying by resource class (GPU type, memory tier, network class).
New accounts get $20 in starting credit. There's no minimum commitment and no per-seat or per-project licensing. For customer-owned cells, your cloud bill is yours; we charge a thin layer on top of the resource-seconds we brokered.
Is there a free tier?

Yes — the $20 starting credit on every new account covers evaluation and small workloads. It's enough to run several thousand small lease-seconds across CPU, memory, and storage classes, or hours of low-tier GPU time, depending on what you're testing.
When the credit runs out, the account doesn't disappear — leases stop issuing until you top up. No surprise overage charges.
Is this open source?

fabricBIOS the spec is open and the wire protocol is published. The reference Rust implementation (the daemon, bare-metal firmware, and the grafOS runtime crates) is source-available today and on a path to a permissive license once the API surface stabilizes.
The scheduler service and managed-control-plane code stays closed for now — it's the part where we run the multi-tenant control plane that customers don't want to operate themselves.

Ask a question

Have something we haven't answered?

We review every submission and add answers to this page. Anonymous is fine — email is optional and only used so we can follow up if a question needs clarification.