
Databricks announced a partnership with OpenAI to integrate the latest flagship language model GPT-5.5 into its platform. This integration, which will be managed through the Unity AI Gateway system, is designed to provide corporate clients with secure and scalable access to advanced artificial intelligence capabilities. As part of this agreement, an updated Codex agent, a specialized tool for working with software code, will also become available on the Databricks platform, now functioning on the GPT-5.5 architecture.
A key feature of the GPT-5.5 model is its ability to autonomously perform complex, multi-component tasks in an enterprise environment. Unlike previous generations, which required step-by-step control, the new system can independently plan its actions, utilize various software tools, recover from ambiguous situations, and verify its own results. The developers note that the model successfully performs a full cycle of intellectual work: from information search and analysis to code writing, document creation, spreadsheet operations, and software management.
To objectively evaluate how these theoretical improvements manifest in real-world corporate scenarios, the Databricks research team, comprising Hanlin Tang, Ahmed Bilal, Arnav Singhvi, Ivan Zhou, and Harish Gaur, conducted comprehensive testing of the model. The main tool used was the internal OfficeQA benchmark, specifically created to analyze the ability of neural networks to handle complex multi-stage analytical tasks. This test suite is formed from a corpus of eighty-nine thousand pages of official U.S. Treasury Department bulletins. This database allows for testing skills in information retrieval from voluminous documents, interpretation of complex tables, and precise calculations using real corporate data.
The first stage of testing aimed to determine the model's maximum capabilities under conditions where the document retrieval process was already pre-configured. In a test called OfficeQA Pro LLM, which used Oracle PDF databases combined with web search, the GPT-5.5 model demonstrated a result of 64.66 percent successful completions. For comparison, the previous version of the system, GPT-5.4, showed a result of 57.14 percent under similar conditions. According to Databricks researchers, this represents a performance increase of approximately 13 percent and sets a new record for quality of work for this benchmark in tasks requiring complex reasoning over documents.
The most indicative results were recorded during the second stage, which simulated a full autonomous workflow without prior data preparation. In OfficeQA Pro Agent Harness mode, the model was required to independently locate necessary documents, analyze their structure, and perform calculations using the Codex agent's toolkit. In this complex scenario, GPT-5.5 achieved a score of 52.63 percent, which is a significant jump compared to the 36.10 percent result of the GPT-5.4 version. Thus, the number of errors in performing autonomous end-to-end tasks decreased by forty-six percent, confirming the practical applicability of the updated architecture.
The implementation of GPT-5.5 through Databricks infrastructure and the secure Unity AI Gateway addresses the important task of using generative artificial intelligence in business, ensuring data control and security. Although the exact public release dates are not disclosed in the original publication and are only indicated as "in the near future", the presented metrics point to significant progress in creating agent systems. The model's ability to take on unstructured tasks from start to finish marks a new stage in the automation of routine analytical and engineering work at the enterprise level.
Sources
Replies (0)
No replies in this topic yet.