
PaddleOCR 3.5, released May 18, 2026, adds a Transformers inference backend so supported PaddleOCR models can run inside the Transformers runtime by setting engine="transformers". This provides a direct runtime alternative to existing Paddle backends and simplifies integrating OCR into Transformers‑centered toolchains for downstream LLM workflows. Developers building RAG, Document AI, retrieval agents or document search pipelines are the primary beneficiaries, since the change shortens the path from ingestion to LLM consumption.
The update introduces a flexible engine interface: PaddleOCR continues to manage the overall OCR pipeline while an engine parameter selects the inference backend and engine_config accepts backend‑specific options. In practice, PaddleOCR now handles the orchestration of internal OCR components and exposes configuration knobs for dtype, device placement and attention implementation through engine_config, so teams do not need to implement component orchestration themselves.
Functionally, this change affects only the inference backend layer. PaddleOCR still supplies its OCR and document parsing model series — for example PP — OCRv5 and PaddleOCR — VL‑1.5 — while Transformers becomes an additional supported runtime. Existing Paddle backends, including Paddle static graph and Paddle dynamic graph, remain available; the release adds a Transformers‑native execution path rather than replacing prior runtimes.
For builders, the main benefit is reduced integration friction before the LLM stage. Reliable ingestion of PDFs, scans, screenshots, tables, formulas and complex layouts is a common bottleneck; by making PaddleOCR models runnable inside Transformers‑centered environments, teams can more easily pipe structured outputs into downstream LLM workflows without reimplementing inference plumbing or translating between runtimes.
PaddleOCR 3.5 includes concrete quick‑start guidance. The release notes advise installing a PyTorch build that matches your hardware (the blog’s CUDA 12.6 example shows using the CUDA 12.6 wheel index) and then running: python -m pip install "paddleocr==3.5.0" "paddlex==3.5.2" "transformers>=5.4.0". A command‑line run example is paddleocr ocr -i --device gpu:0 --engine transformers, and a Python usage example demonstrates pipeline creation: pipeline = PaddleOCR(device="gpu:0", engine="transformers", engine_config={"dtype": "float32"}).
Additional implementation notes: the demo Hugging Face Space uses float32 for broad compatibility, but engine_config lets you tune dtype, device mapping and attention implementation for your hardware (CPU, ROCm, or CUDA). The transformers backend demo Space is available at https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5 — transformers-demo, and the larger task of Document AI pipeline orchestration remains the responsibility of application builders.
Sources
Replies (0)
No replies in this topic yet.