AI News

Hugging Face

Fresh topics, news, and discussions about All in Hugging Face on Aivizor Community.

Hugging Face

Hcompany adds Holo3.1 with quantized checkpoints for fast, local: what developers gain

Hcompany published Holo3.1 on June 2, 2026 as an iterative production release of Holo3 (released last March), adding quantized checkpoints (FP8, Q4 GGUF, NVFP4) built on the Qwen family, native function‑calling support,

Thalia Mercer

JetBrains adds Mellum2, a 12B Mixture-of-Experts Model: why it matters for developers

AI News · Hugging Face

JetBrains released Mellum2 on June 1, 2026: A 12‑billion‑parameter Mixture‑of‑Experts model trained from scratch on natural language and code that activates 2.5B parameters per token and is available under the Apache 2.

Caspian Vale

IBM Research: Agent Logic Is Key to Scaling AI in Enterprise Workflows

AI News · Hugging Face

In a June 1, 2026 post by Nicholas Fuller, IBM Research argues that explicit “agent logic” — structured software primitives that steer LLMs-is necessary to run reliable, cost‑effective AI inside dynamic, long‑running

Avalon Reed

NVIDIA Releases Cosmos 3 Omni‑Model for Physical AI, Combining World Generation and Robot Policy Capabilities

AI News · Hugging Face

On June 1, 2026 NVIDIA released Cosmos 3, an open omni‑model that unifies world generation, physical reasoning and action generation in a single Mixture‑of‑Transformers architecture.

Caspian Vale

Tutorial teaches PyTorch builders to read torch.profiler traces and map Python calls to CUDA kernels

AI News · Hugging Face

A beginner's guide to torch.profiler was published on May 29, 2026. Authored by Aritra Roy Gosthipaty, Sayak Paul, Sergio Paniego, Rémi Ouazan Reboul and Pedro Cuenca, the Part 1 tutorial uses a minimal matmul+add

Wren Ashcroft

Reachy Mini runs full speech-to-speech stack locally for on-device conversations

AI News · Hugging Face

A new step-by-step guide shows how Reachy Mini can host a full speech — to-speech pipeline on a local machine, exposing a /v1/realtime WebSocket so the robot’s conversation UI never sends audio to remote servers.

Briar Kensington

TRL adds Delta Weight Sync, cutting per-step weight transfers from gigabytes to tens of megabytes

AI News · Hugging Face

TRL now encodes only changed parameters between RL optimizer steps as sparse safetensors uploaded to a Hub bucket; vLLM fetches those deltas, shrinking per-step transfers from gigabytes to tens of megabytes and enabling

Orion Hartwell

Six Ettin Rerankers Released, From 17M to 1B Parameters, With Data and Training Recipe

AI News · Hugging Face

On May 19, 2026 Tom Aarsen published six Sentence Transformers CrossEncoder rerankers built on Ettin ModernBERT, from 17M to 1B parameters, and released the full training data, distillation recipe, and evaluation scripts

Thalia Mercer

NVIDIA publishes Nemotron‑Labs Diffusion models for faster, revision‑capable Text Generation

AI News · Hugging Face

NVIDIA has published the Nemotron‑Labs Diffusion family: Diffusion‑capable language models at 3B, 8B and 14B parameters plus an 8B vision‑language model, with code, training recipe and a technical report.

Avalon Reed

Under 3% of PDF Pages Cause Nearly Half of OCR Inference Time, Dharma — AI Finds

AI News · Hugging Face

A Dharma — AI report (May 22, 2026) on a domain‑specialized OCR model, DharmaOCR, finds that fewer than 3% of pages that never emit an end‑of‑sequence token can account for almost half of batched GPU wall‑clock time;

Sable Whitaker

Specialized 3B Model Beat Frontier APIs on Structured OCR at Roughly 50× Lower Cost

AI News · Hugging Face

Dharma reported on May 22, 2026 that a 3‑billion‑parameter model, produced by a fine‑tuning pipeline, outperformed every commercial frontier API it tested on a structured OCR enterprise task while costing about fifty

Avalon Reed

AllenAI's OlmoEarth v1.1 cuts satellite-model compute up to 3× while preserving prior performance

AI News · Hugging Face

Published May 19, 2026, OlmoEarth v1.1 is an updated family of transformer — based remote sensing models that lowers end-to-end compute costs by up to threefold while maintaining the original v1’s benchmark and partner —

Caspian Vale

PaddleOCR 3.5 Adds Transformers Backend to Simplify Document AI Integration

AI News · Hugging Face

Published May 18, 2026, PaddleOCR 3.5 lets supported OCR and document‑parsing models run on the Transformers runtime by setting engine="transformers", giving developers a direct runtime path for RAG, Document AI,

Avalon Reed

IBM Research launches Open Agent Leaderboard to benchmark full AI agent systems

AI News · Hugging Face

On May 18, 2026, IBM Research released the Open Agent Leaderboard and the Exgentic evaluation framework, an open benchmark that assesses entire AI agent systems — measuring performance and operational cost across

Thalia Mercer

IBM Granite releases Apache‑2.0 Multilingual Embeddings with 32K context and leading sub‑100M retrieval

AI News · Hugging Face

IBM’s Granite team published Granite Embedding Multilingual R2 on May 14, 2026: Two Apache‑2.

Caspian Vale

Asynchronous Continuous Batching cuts LLM GPU idle time by about 24%

AI News · Hugging Face

A technical post published May 14, 2026 by Rémi Ouazan Reboul, Pedro Cuenca and Aritra Roy Gosthipaty lays out an implementation for asynchronous continuous batching that disentangles CPU batch preparation from GPU

Avalon Reed

On May 11, 2026, AWS publishes a technical post by Keita Watanabe, Pavel Belevich and Aman Shanbhag that lays out

AI News · Hugging Face

On May 11, 2026, AWS published a technical post by Keita Watanabe, Pavel Belevich and Aman Shanbhag that lays out the infrastructure and open‑source software building blocks needed to scale foundation‑model pre‑training,

Sable Whitaker

CyberSecQwen‑4B releases as a Locally Runnable 4B Model for Defensive Cybersecurity

AI News · Hugging Face

A 4‑billion‑parameter model, CyberSecQwen‑4B, was published May 8, 2026 under Apache 2.0 and built for the AMD Developer Hackathon.

Briar Kensington

MachinaCheck runs on AMD MI300X to deliver 30‑second on‑premise manufacturability reports for CNC shops

AI News · Hugging Face

MachinaCheck, reported May 10, 2026, parses STEP files with cadquery/OpenCASCADE and runs Qwen 2.

Orion Hartwell

OncoAgent debuts MI300X-optimized dual-tier, multi-agent oncology Decision Support system

AI News · Hugging Face

A May 2026 technical preprint presents OncoAgent, an open-source, on‑premises clinical decision support stack that routes oncology queries between 9B and 27B QLoRA‑fine‑tuned models via a multi‑agent LangGraph

Thalia Mercer

AllenAI's EMO MoE model induces modular experts from data in end-to-end pretraining

AI News · Hugging Face

AllenAI announced EMO on May 8, 2026: A 14B-parameter sparse mixture — of-experts (MoE) pretrained end-to-end so modular structure emerges from data.

Wren Ashcroft

Researchers fine-tune Qwen3-1.7B into MedQA on AMD Instinct MI300X using ROCm

AI News · Hugging Face

On May 8, 2026 Harikrishna (HK2184) published MedQA, a clinical multiple‑choice QA model fine‑tuned with LoRA on a single AMD Instinct MI300X via ROCm.

Elara Winslow

ServiceNow engineers found that migrating PipelineRL inference from vLLM 0.8.5 to a vLLM V1 rewrite (vLLM 0.18.1)

AI News · Hugging Face

ServiceNow engineers found that migrating PipelineRL inference from vLLM 0.8.5 to a vLLM V1 rewrite (vLLM 0.18.1) produced rollout logprobs the trainer did not expect, shifting early GSPO metrics.

Orion Hartwell

Open ASR Leaderboard moves Appen and DataoceanAI English test sets to private hosting to curb bench‑specific tuning

AI News · Hugging Face

On May 6, 2026, Open ASR Leaderboard maintainers said several high‑quality English test sets from Appen Inc. and DataoceanAI will be hosted privately to reduce test‑set contamination and “benchmaxxing.

Thalia Mercer

1 / 2

Statistics

Sections