GPT-5.5 set a new state of the art on OfficeQA Pro, cutting errors 46% versus GPT-5.
Databricks will roll out GPT-5.5 into its enterprise agent workflows after the model achieved a new state — of-the-art result on OfficeQA Pro, the company's benchmark for complex document handling. The performance milestone matters because OfficeQA Pro is designed to mirror the kinds of document parsing and grounded reasoning problems that commonly break production agent systems. OfficeQA Pro evaluates parsing, retrieval and grounded reasoning across challenging document types — scanned PDFs, legacy files and long-context documents — rather than simple short — text QA. That mix tests an agent's ability to extract structured data from noisy scans, locate relevant passages across lengthy records, and produce answers tied to source material in formats that enterprises actually store and receive.
In Databricks’ agent harness setting, GPT-5.5 cut errors by 46% compared with GPT-5.4 and became the first model to exceed 50% accuracy on OfficeQA Pro. Those two metrics — a large relative drop in error rate and a new absolute accuracy threshold — mark a measurable improvement in how an agent backed by the model handles the benchmark's hardest scenarios.
Sources
Replies (0)
No replies in this topic yet.