Windows & NVIDIA compatible

Run AI better
on your hardware.

Hue Labs detects your GPU, runs the right model, and benchmarks it, then tunes the settings for more tokens per second.

Windows 10/11 · NVIDIA RTX GPU · Free core, no account.

Hue Labs mascot running a model on a rack of hardware

Runs locally

Models and chats stay on your machine.

Auto-tuned

Settings matched to your exact GPU and VRAM.

Benchmarked

Tokens per second, measured before and after.

One click

Detect, download, run. No terminal.

Dashboard

Overview of your local AI environment

GPU
NVIDIA RTX 5080
VRAM 15.8 / 15.9 GB
99%
RAM
96.2 GB
42.1 / 96.2 GB
44%
CPU
AMD Ryzen 9 9950X
 
18%
Storage
3.1 TB
1.2 / 3.1 TB
38%

Loaded Models

ModelSizeContextStatusActions
Meta Llama 3.1 70B Instruct GGUF 48.7 GB Q4_K_M Running

Recent Chats

Research Assistant2m ago
Explain the transformer architecture...
Code Helper15m ago
Optimize this Python function...
Data Analyst1h ago
Help me analyze this dataset...
The problem

Local AI runs slower than your GPU can.

30–50%
Speed most local setups leave unused on a GPU that supports more.

Ollama, LM Studio, and Jan run a model. They do not check whether it fits your GPU or tell you when it is slow.

The model, quantization, and offload split are on you. Get one wrong and it still runs, just slower. Three common causes:

01Model too large for the GPU's memory
02Context window larger than the workload needs
03Offload split set wrong between GPU and CPU
How it works

Six steps, one click each.

01

Detect

Reads your GPU, VRAM, RAM, and CPU.

02

Recommend

Picks the model, quantization, and context size that fit.

03

Run

Downloads and launches it. No config files, no flags.

04

Benchmark

Measures tokens per second. This is the baseline.

05

Optimize

Applies settings your machine supports but was not using.

06

Re-benchmark

Runs it again and shows the difference.

Benchmarks

Before and after, same machine.

Meta Llama 3.1 70B · RTX 5080 · before / after tuning tokens per second
Before
47
tokens / sec
After
68
tokens / sec
Faster
+44%
no hardware change
Detect → recommend → run → benchmark → optimize → re-benchmark Export and share any result
Mascot working at a local computer
Local only

Everything runs on your machine.

Models, chats, and benchmarks stay on your disk. No account, no server, no data leaving your machine.

Pricing

Reserve a plan.

Nothing is charged now. Reserving holds your place and tells us which plan fits. You confirm at launch.

Free
$0
  • Detect hardware
  • Recommend model & settings
  • One-click run
  • Full first optimization
  • Export & share results
Pro
$49per year
  • Everything in Free
  • Auto re-tune on new models & drivers
  • Saved use-case profiles
  • Multi-model comparison
  • Performance history
First 500 buyers
Founder
$99lifetime
  • All Pro features, forever
  • No annual renewal
  • Founder pricing locked
  • Capped at 500 seats
Commercial
$149per year
  • Everything in Pro
  • Licensed for work use
  • Per-seat for teams
Early access

Get the first build.

Windows and NVIDIA. Add your GPU and we'll have your recommended setup ready at launch.

One email at launch. No spam.
Coming later

A public benchmark for local AI.

With your permission, each run adds to a public leaderboard of tokens per second by hardware.

Mascot at a benchmark console