Learned Image Codec Cuts Bitrates 2.3–3× Versus Engineered Codecs While Meeting Mobile Latency Targets

News

5/8/2026, 5:26:16 AM

Learned Image Codec Cuts Bitrates 2.3–3× Versus Engineered Codecs While Meeting Mobile Latency Targets

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system.

A May 2026 paper presents a learned image compression system that achieves large perceptual bitrate reductions while meeting strict on-device speed targets, offering a practical path for use on mobile hardware. The authors report 2.3–3× bitrate savings against a set of leading engineered codecs (AV1, AV2, VVC, ECM, and JPEG‑AI) when judged by subjective perceptual assessments, and 20 — 40% reductions versus the best prior learned codecs while targeting perceptual quality. Those gains matter because they promise lower storage and bandwidth costs without degrading what humans perceive, and they come with end-to-end runtimes that the authors argue are compatible with real-world mobile workflows.

To close the gap between perceptually optimized research codecs and the engineering constraints of real deployment, the team framed codec design as a tradeoff between perceptual fidelity and practical latency. They evaluated a wide set of modeling choices — varying encoder and decoder backbones, balancing rate‑distortion and perceptual loss objectives, and introducing novel ablation techniques intended to preserve perceptual quality while trimming compute. Central to the approach was a performance‑aware neural architecture search (NAS) that explored millions of backbone configurations. The NAS objective explicitly balanced on‑device latency against perceptual compression metrics, yielding candidate architectures that the authors combined with additional optimizations into a single end‑to‑end codec.

Evaluation combined objective metrics with rigorous subjective user studies. According to the reported perceptual assessments, the new codec delivered the 2.3–3× bitrate savings compared with the engineered — codec set and reduced bitrates by 20 — 40% compared with prior learned codecs when operating at comparable perceived quality. The paper emphasizes that these figures derive from subjective testing rather than purely from traditional rate‑distortion curves, reflecting the system’s aim to optimize for the human visual system rather than for numeric proxies alone.

The implementation targets mobile deployment and includes device‑specific benchmarks. On an iPhone 17 Pro Max the system encodes 12MP images in about 230 ms and decodes them in about 150 ms. The authors note that those on‑device times are faster than most top machine‑learning‑based codecs they benchmarked on an NVIDIA V100 GPU, a comparison the paper uses to underscore the design focus on practical speed rather than peak throughput on large datacenter GPUs. Achieving this level of latency required tight co‑design between architecture choice, perceptual loss functions, and implementation optimizations tailored to the target hardware.

The paper lists Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer and Oren Rippel as authors, and it is linked from the project’s publication page and code repository for inspection and reproduction. The authors position their codec as a step toward perceptual yet practical compression: if adopted, the combination of perceptual optimization and runtime‑aware design could shrink image bitrates while keeping mobile latency low. the paper provides source code on GitHub to help the community reproduce results and test portability across devices.

Sources

Apple Machine Learning Research · 5/7/2026

Replies (0)

No replies in this topic yet.

Back