Get early access

Windows & NVIDIA compatible

Run AI better
on your hardware.

Hue Labs detects your GPU, runs the right model, and benchmarks it, then tunes the settings for more tokens per second.

Get early access See the benchmark

Windows 10/11 · NVIDIA RTX GPU · Free core, no account.

Hue Labs mascot running a model on a rack of hardware

Runs locally

Models and chats stay on your machine.

Auto-tuned

Settings matched to your exact GPU and VRAM.

Benchmarked

Tokens per second, measured before and after.

One click

Detect, download, run. No terminal.

Dashboard

Overview of your local AI environment

GPU

NVIDIA RTX 5080

VRAM 15.8 / 15.9 GB

99%

RAM

96.2 GB

42.1 / 96.2 GB

44%

CPU

AMD Ryzen 9 9950X

18%

Storage

3.1 TB

1.2 / 3.1 TB

38%

Loaded Models

Model	Size	Context	Status	Actions
Meta Llama 3.1 70B Instruct GGUF	48.7 GB	Q4_K_M	Running	⋯

Recent Chats

Research Assistant2m ago

Explain the transformer architecture...

Code Helper15m ago

Optimize this Python function...

Data Analyst1h ago

Help me analyze this dataset...

The problem

Local AI runs slower than your GPU can.

30–50%

Speed most local setups leave unused on a GPU that supports more.

Ollama, LM Studio, and Jan run a model. They do not check whether it fits your GPU or tell you when it is slow.

The model, quantization, and offload split are on you. Get one wrong and it still runs, just slower. Three common causes:

01Model too large for the GPU's memory

02Context window larger than the workload needs

03Offload split set wrong between GPU and CPU

How it works

Six steps, one click each.

Detect

Reads your GPU, VRAM, RAM, and CPU.

Recommend

Picks the model, quantization, and context size that fit.

Run

Downloads and launches it. No config files, no flags.

Benchmark

Measures tokens per second. This is the baseline.

Optimize

Applies settings your machine supports but was not using.

Re-benchmark

Runs it again and shows the difference.

Benchmarks

Before and after, same machine.

Meta Llama 3.1 70B · RTX 5080 · before / after tuning tokens per second

Before

tokens / sec

After

tokens / sec

Faster

+44%

no hardware change

Detect → recommend → run → benchmark → optimize → re-benchmark Export and share any result

Local only

Everything runs on your machine.

Models, chats, and benchmarks stay on your disk. No account, no server, no data leaving your machine.

Pricing

Reserve a plan.

Nothing is charged now. Reserving holds your place and tells us which plan fits. You confirm at launch.

Free

Detect hardware
Recommend model & settings
One-click run
Full first optimization
Export & share results

Pro

$49per year

Everything in Free
Auto re-tune on new models & drivers
Saved use-case profiles
Multi-model comparison
Performance history

First 500 buyers

Founder

$99lifetime

All Pro features, forever
No annual renewal
Founder pricing locked
Capped at 500 seats

Commercial

$149per year

Everything in Pro
Licensed for work use
Per-seat for teams

Early access

Get the first build.

Windows and NVIDIA. Add your GPU and we'll have your recommended setup ready at launch.

One email at launch. No spam.

Coming later

A public benchmark for local AI.

With your permission, each run adds to a public leaderboard of tokens per second by hardware.

Run AI betteron your hardware.