Dashboard
Overview of your local AI environment
Loaded Models
| Model | Size | Context | Status | Actions |
|---|---|---|---|---|
| Meta Llama 3.1 70B Instruct GGUF | 48.7 GB | Q4_K_M | Running | ⋯ |
Hue Labs detects your GPU, runs the right model, and benchmarks it, then tunes the settings for more tokens per second.
Windows 10/11 · NVIDIA RTX GPU · Free core, no account.
Models and chats stay on your machine.
Settings matched to your exact GPU and VRAM.
Tokens per second, measured before and after.
Detect, download, run. No terminal.
Overview of your local AI environment
| Model | Size | Context | Status | Actions |
|---|---|---|---|---|
| Meta Llama 3.1 70B Instruct GGUF | 48.7 GB | Q4_K_M | Running | ⋯ |
Ollama, LM Studio, and Jan run a model. They do not check whether it fits your GPU or tell you when it is slow.
The model, quantization, and offload split are on you. Get one wrong and it still runs, just slower. Three common causes:
Reads your GPU, VRAM, RAM, and CPU.
Picks the model, quantization, and context size that fit.
Downloads and launches it. No config files, no flags.
Measures tokens per second. This is the baseline.
Applies settings your machine supports but was not using.
Runs it again and shows the difference.

Models, chats, and benchmarks stay on your disk. No account, no server, no data leaving your machine.
Nothing is charged now. Reserving holds your place and tells us which plan fits. You confirm at launch.
Windows and NVIDIA. Add your GPU and we'll have your recommended setup ready at launch.
With your permission, each run adds to a public leaderboard of tokens per second by hardware.
