GeniusPro Lab

Research, train, and ship models

Our lab turns benchmarks and fine-tunes into the Simon lineup you call in production — on state-of-the-art GPUs and servers we operate.

ResearchTrainEvalShip

From benchmark to simon-says-* slug · often via auto-update

Our mission

Models that work for real business workloads

Frontier models are impressive in demos. Our lab exists to turn them into slugs your product can depend on — with domain discipline, predictable latency, and upgrade paths that do not require migration projects.

We benchmark upstream providers, fine-tune experts for professional roles, and run eval harnesses before anything ships to Simon. When a model clears the bar, it lands behind a stable name — simon-says-chat, simon-says-finance, gp-auto-* — so your integration stays the same while quality improves.

The same pipeline powers voice, vision, and agent products. Research is not a separate track from production; it is how we keep the platform current without breaking customer apps.

Research focus

What we investigate

Lab work runs in focused tracks — each with benchmarks, training runs, and shipping criteria tied to a product surface.

Domain experts

Fine-tuned Simon Says slugs for finance, legal, architecture, and other professional workloads — tuned for vocabulary, reasoning style, and tool use.

  • Profession pairs: regular + -pro on stronger upstream
  • Shared specialist prompts across slug tiers
  • Benchmarked against frontier baselines
Browse experts

Agents & orchestration

Multi-step agents, CAT workflows, Tiger Mode, and Cursor-class coding agents — routing, memory, and tool loops validated before they ship.

  • Orchestrators that pick upstream per request
  • Long-running Fleet workers on private nodes
  • Fast agent paths for IDE integrations
See Tiger Mode

Voice & vision

Realtime speech, transcription, image generation, and vision understanding — benchmarked on lab GPUs and wired into Simon API routes.

  • Whisper STT and Gemini Live voice sessions
  • Image model fallback chains for reliability
  • Latency targets for conversational UX
Explore Voice

Eval & reliability

Quality, latency, safety, and regression harnesses run on every candidate model — nothing ships without passing gates and audit trails.

  • Slug-level version telemetry in production
  • Auto-update with rollback when needed
  • Hallucination and jailbreak spot checks
Auto-update slugs

Lab pipeline

1

Research

Benchmark frontier and domain models — find what works for business workloads.

2

Train

Fine-tune experts, orchestrators, and base models on lab hardware.

3

Eval

Run latency, quality, and safety harnesses before anything ships.

4

Ship

Release to Simon — often through auto-update slugs, no migration projects.

Infrastructure

Lab hardware we operate

Training and eval run on GPUs we control — Einstein-class nodes for sensitive workloads, shared lab capacity for benchmarks, and hybrid routes into cloud providers when burst makes sense.

Einstein lab nodes

Dedicated GPU servers for voice experiments, fine-tunes, and private inference previews — not multi-tenant shared pools.

Hybrid with Simon API

Route heavy jobs to your hardware or ours through the same Simon API endpoints — one integration, flexible placement.

Fleet control plane

Register workers, dispatch long-running agent jobs, and monitor runs from a single dashboard tied to production API keys.

How we ship

Principles behind every release

Every slug that reaches production follows the same rules — stable integrations, measurable quality, and clear version history.

Stable names, moving quality

Customers integrate once. Auto-update slugs absorb upstream improvements while request bodies and telemetry stay consistent.

Business-first benchmarks

We score models on professional tasks — citations, tool use, latency under load — not just academic leaderboard spots.

No silent regressions

Every slug upgrade is versioned. Teams can audit which model served a request and roll back if a release misses the bar.

US hosting by default

Production inference and customer data stay on infrastructure we operate in the United States unless a contract says otherwise.

Build with us

Use what the lab ships

Every slug on this page is callable today through Simon API. Start with the model catalog, wire a single endpoint, and let auto-update keep you current.

Open developer docs