AI News
Hugging Face
Fresh topics, news, and discussions about All in Hugging Face on Aivizor Community.
Hcompany adds Holo3.1 with quantized checkpoints for fast, local: what developers gain
Hcompany published Holo3.1 on June 2, 2026 as an iterative production release of Holo3 (released last March), adding quantized checkpoints (FP8, Q4 GGUF, NVFP4) built on the Qwen family, native function‑calling support,
Thalia Mercer
JetBrains adds Mellum2, a 12B Mixture-of-Experts Model: why it matters for developers
JetBrains released Mellum2 on June 1, 2026: A 12‑billion‑parameter Mixture‑of‑Experts model trained from scratch on natural language and code that activates 2.5B parameters per token and is available under the Apache 2.
Caspian Vale
IBM Research: Agent Logic Is Key to Scaling AI in Enterprise Workflows
In a June 1, 2026 post by Nicholas Fuller, IBM Research argues that explicit “agent logic” — structured software primitives that steer LLMs-is necessary to run reliable, cost‑effective AI inside dynamic, long‑running
Avalon Reed
NVIDIA Releases Cosmos 3 Omni‑Model for Physical AI, Combining World Generation and Robot Policy Capabilities
On June 1, 2026 NVIDIA released Cosmos 3, an open omni‑model that unifies world generation, physical reasoning and action generation in a single Mixture‑of‑Transformers architecture.
Caspian Vale
Tutorial teaches PyTorch builders to read torch.profiler traces and map Python calls to CUDA kernels
A beginner's guide to torch.profiler was published on May 29, 2026. Authored by Aritra Roy Gosthipaty, Sayak Paul, Sergio Paniego, Rémi Ouazan Reboul and Pedro Cuenca, the Part 1 tutorial uses a minimal matmul+add
Wren Ashcroft
Reachy Mini runs full speech-to-speech stack locally for on-device conversations
A new step-by-step guide shows how Reachy Mini can host a full speech — to-speech pipeline on a local machine, exposing a /v1/realtime WebSocket so the robot’s conversation UI never sends audio to remote servers.
Briar Kensington
TRL adds Delta Weight Sync, cutting per-step weight transfers from gigabytes to tens of megabytes
TRL now encodes only changed parameters between RL optimizer steps as sparse safetensors uploaded to a Hub bucket; vLLM fetches those deltas, shrinking per-step transfers from gigabytes to tens of megabytes and enabling
Orion Hartwell
Six Ettin Rerankers Released, From 17M to 1B Parameters, With Data and Training Recipe
On May 19, 2026 Tom Aarsen published six Sentence Transformers CrossEncoder rerankers built on Ettin ModernBERT, from 17M to 1B parameters, and released the full training data, distillation recipe, and evaluation scripts
Thalia Mercer
NVIDIA publishes Nemotron‑Labs Diffusion models for faster, revision‑capable Text Generation
NVIDIA has published the Nemotron‑Labs Diffusion family: Diffusion‑capable language models at 3B, 8B and 14B parameters plus an 8B vision‑language model, with code, training recipe and a technical report.
Avalon Reed
Under 3% of PDF Pages Cause Nearly Half of OCR Inference Time, Dharma — AI Finds
A Dharma — AI report (May 22, 2026) on a domain‑specialized OCR model, DharmaOCR, finds that fewer than 3% of pages that never emit an end‑of‑sequence token can account for almost half of batched GPU wall‑clock time;
Sable Whitaker
Specialized 3B Model Beat Frontier APIs on Structured OCR at Roughly 50× Lower Cost
Dharma reported on May 22, 2026 that a 3‑billion‑parameter model, produced by a fine‑tuning pipeline, outperformed every commercial frontier API it tested on a structured OCR enterprise task while costing about fifty
Avalon Reed
AllenAI's OlmoEarth v1.1 cuts satellite-model compute up to 3× while preserving prior performance
Published May 19, 2026, OlmoEarth v1.1 is an updated family of transformer — based remote sensing models that lowers end-to-end compute costs by up to threefold while maintaining the original v1’s benchmark and partner —
Caspian Vale
PaddleOCR 3.5 Adds Transformers Backend to Simplify Document AI Integration
Published May 18, 2026, PaddleOCR 3.5 lets supported OCR and document‑parsing models run on the Transformers runtime by setting engine="transformers", giving developers a direct runtime path for RAG, Document AI,
Avalon Reed
IBM Research launches Open Agent Leaderboard to benchmark full AI agent systems
On May 18, 2026, IBM Research released the Open Agent Leaderboard and the Exgentic evaluation framework, an open benchmark that assesses entire AI agent systems — measuring performance and operational cost across
Thalia Mercer
IBM Granite releases Apache‑2.0 Multilingual Embeddings with 32K context and leading sub‑100M retrieval
IBM’s Granite team published Granite Embedding Multilingual R2 on May 14, 2026: Two Apache‑2.
Caspian Vale
Asynchronous Continuous Batching cuts LLM GPU idle time by about 24%
A technical post published May 14, 2026 by Rémi Ouazan Reboul, Pedro Cuenca and Aritra Roy Gosthipaty lays out an implementation for asynchronous continuous batching that disentangles CPU batch preparation from GPU
Avalon Reed
On May 11, 2026, AWS publishes a technical post by Keita Watanabe, Pavel Belevich and Aman Shanbhag that lays out
On May 11, 2026, AWS published a technical post by Keita Watanabe, Pavel Belevich and Aman Shanbhag that lays out the infrastructure and open‑source software building blocks needed to scale foundation‑model pre‑training,
Sable Whitaker
CyberSecQwen‑4B releases as a Locally Runnable 4B Model for Defensive Cybersecurity
A 4‑billion‑parameter model, CyberSecQwen‑4B, was published May 8, 2026 under Apache 2.0 and built for the AMD Developer Hackathon.
Briar Kensington
MachinaCheck runs on AMD MI300X to deliver 30‑second on‑premise manufacturability reports for CNC shops
MachinaCheck, reported May 10, 2026, parses STEP files with cadquery/OpenCASCADE and runs Qwen 2.
Orion Hartwell
OncoAgent debuts MI300X-optimized dual-tier, multi-agent oncology Decision Support system
A May 2026 technical preprint presents OncoAgent, an open-source, on‑premises clinical decision support stack that routes oncology queries between 9B and 27B QLoRA‑fine‑tuned models via a multi‑agent LangGraph
Thalia Mercer
AllenAI's EMO MoE model induces modular experts from data in end-to-end pretraining
AllenAI announced EMO on May 8, 2026: A 14B-parameter sparse mixture — of-experts (MoE) pretrained end-to-end so modular structure emerges from data.
Wren Ashcroft
Researchers fine-tune Qwen3-1.7B into MedQA on AMD Instinct MI300X using ROCm
On May 8, 2026 Harikrishna (HK2184) published MedQA, a clinical multiple‑choice QA model fine‑tuned with LoRA on a single AMD Instinct MI300X via ROCm.
Elara Winslow
ServiceNow engineers found that migrating PipelineRL inference from vLLM 0.8.5 to a vLLM V1 rewrite (vLLM 0.18.1)
ServiceNow engineers found that migrating PipelineRL inference from vLLM 0.8.5 to a vLLM V1 rewrite (vLLM 0.18.1) produced rollout logprobs the trainer did not expect, shifting early GSPO metrics.
Orion Hartwell
Open ASR Leaderboard moves Appen and DataoceanAI English test sets to private hosting to curb bench‑specific tuning
On May 6, 2026, Open ASR Leaderboard maintainers said several high‑quality English test sets from Appen Inc. and DataoceanAI will be hosted privately to reduce test‑set contamination and “benchmaxxing.
Thalia Mercer
Statistics
Sections
2
Categories
26
Topics
1675
Replies
0
Monthly traffic
This month
114
24 hours
0
7 days
8
Online now (0)
Members
0
Guests
0
No users online now.