
Apple will present multiple papers, posters and invited talks at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026, held June 3–7 at the Colorado Convention Center in Denver. The company is a conference sponsor and will staff exhibition booth #231 during public exhibit hours: Friday, June 5 and Saturday, June 6 from 10:00 AM to 6:00 PM MDT, and Sunday, June 7 from 10:00 AM to 3:00 PM MDT. Researchers and developers can use the published schedule and session identifiers to follow specific contributions in person or later online.
Apple researchers are scheduled across workshops, invited talks, keynotes and affinity events on June 3. Staff will take part in the LatinX in Computer Vision mentoring hour and deliver invited talks at the Efficient Deep Learning for Computer Vision (ECV) workshop, where Oncel Tuzel is listed among invited speakers. Colin Lea will deliver a keynote at the Generative AI for Sign Language (GenSign) workshop, while Tuzel and Lu Jiang are slated to give invited talks at the Efficient and On‑Device Generation (EDGE) workshop.
On June 4 Afshin Dehghan is scheduled to give an invited talk at the Video Large Language Models (VidLLMs) workshop. Apple’s accepted technical contributions span topics including video generation, multimodal benchmarks, compression, 4D geometry and accessibility. A spotlight poster, STARFlow‑V: End‑to‑End Video Generative Modeling with Normalizing Flows, is scheduled for Friday, June 5, 4:00–6:00 PM (Poster Session 2, #178) with authors including Jiatao Gu and collaborators.
Several other Friday posters are also listed for the 4:00–6:00 PM slot: a multimodal benchmark titled “From Where Things Are to What They’re For” (poster #453) and “What Matters in Practical Learned Image Compression” (poster #457). These contributions sit alongside workshop talks to highlight both algorithmic and evaluation work in video and image processing.
Saturday’s program features posters and demos on sign‑language annotation, 4D representation learning and audio‑visual benchmarks. “Bootstrapping Sign Language Annotations with Sign Language Models” appears in Findings Posters (7:30–9:00 AM, #035), while “Velox: Learning Representations of 4D Geometry and Appearance” will be shown in Poster Session 4 (11:45 AM–1:45 PM, #527). Another Saturday poster, AMusE, presents an audio‑visual benchmark for multi‑speaker agentic understanding (4:45–6:45 PM, #146). A Unified Tokenizer For Vision (9:00 — 10:15 AM, Oral Session 5B).
Taken together, the schedule emphasizes efficiency and on‑device generation, multimodal benchmarks and accessibility. STARFlow‑V and AToken indicate advances relevant to video generation pipelines and unified vision tokenization, while EDGE and ECV talks underscore optimization for deployment on constrained devices. Sign‑language and multimodal alignment work could inform accessibility features and data‑efficient annotation strategies.
Sources
Replies (0)
No replies in this topic yet.