Aivizor
Aivizor
SkinsCreatsCommunity
Back
  1. Community
  2. /
  3. Hugging Face

Hcompany adds Holo3.1 with quantized checkpoints for fast, local: what developers gain

News
T
Thalia Mercer

6/2/2026, 2:30:41 PM

Hcompany adds Holo3.1 with quantized checkpoints for fast, local: what developers gain

Hcompany published Holo3.1 on June 2, 2026 as an iterative production release of Holo3 (released last March), adding quantized checkpoints (FP8, Q4 GGUF, NVFP4) built on the Qwen family, native function‑calling support, new model sizes (0.

Hcompany announced Holo3.1 on June 2, 2026 as the next production release in the Holo3 line (Holo3 itself was released last March), positioning the update to enable faster, locally runnable computer‑use agents across web, desktop and mobile. The release matters because it targets real deployment gaps-robustness across environments and agent frameworks — so teams can run agents where workflows live with improved speed, privacy and lower operating cost.

Holo3.1 introduces several practical technical changes for local and mixed deployments. The release ships quantized checkpoints optimized for local inference — FP8, Q4 GGUF and NVFP4-and is built on the Qwen family. Hcompany also added native support for function‑calling protocols alongside the structured JSON outputs used by Holo3, improving compatibility with a broader range of agent harnesses and integration patterns.

To give teams more cost‑performance options for private or on‑device inference, Hcompany published smaller sizes in addition to the 35B A3B model: 0.8B, 4B and 9B variants. Early adopter testing shows meaningful gains in mobile automation: on AndroidWorld the 35B A3B variant improved from 67% to 79.3%, while the 4B and 9B variants rose from 58% to 72%. These results address cross‑environment weaknesses identified during Holo3 production use.

Holo3.1 also delivers cross‑harness performance improvements. Evaluations on OSWorld and Hcompany’s internal benchmark suite — which covers e‑commerce, business software and collaboration workflows — indicate that function‑calling and native execution approach parity. Inside Hcompany’s Holotab harness, Holo3.1 shows more than a 25% improvement over Holo3, suggesting easier integration into third‑party agent stacks and reduced engineering overhead for deployers.

This is Hcompany’s first release that ships quantized weights intended for production. For NVFP4 the team used NVIDIA’s Model Optimizer in a W4A16 configuration; FP8 and NVFP4 achieve the same OSWorld scores and sit roughly two points below the full‑precision BF16 checkpoint. Measured throughput on DGX Spark finds NVFP4 W4A16 yields 1.41× the token throughput of FP8 and 1.74× that of BF16, giving builders explicit speed‑versus‑precision tradeoffs when choosing checkpoints.

Hcompany frames Holo3.1 as a step toward universal computer‑use agents that can operate across web, desktop and mobile while running where workflows live, emphasizing deployment flexibility from cloud inference to fully local execution. The update promotes on‑device privacy and lower operating cost for teams that adopt the new quantized checkpoints. The release announcement was published June 2, 2026 and credits Maxime Langevin, Hamza Benchekroun, Axel Moyal and Emrick Sinitambirivoutin.

Sources

  1. Hugging Face Blog · 6/2/2026
0
0
0

Replies (0)

No replies in this topic yet.

9:41