
NVIDIA has introduced Nemotron OCR v2 — a multilingual text recognition model that demonstrates high accuracy and processing speed using synthetically created data.
NVIDIA announced the launch of the new multilingual text recognition model Nemotron OCR v2, which combines high accuracy and document processing speed. A key advantage is the use of 12 million synthetic images for training, which significantly improves recognition accuracy for languages other than English.
Nemotron OCR v2 was developed with the shortcomings of the previous version in mind. The existing model Nemotron OCR v1 showed low accuracy rates for languages such as Japanese, Korean, and Russian due to a limited character set and a lack of training data. The new approach using synthetic data ensures clean labeling and the necessary scale for training multilingual models.
Against the backdrop of growing demands for quality text recognition in various languages, Nemotron OCR v2 enters a competitive market where other companies are also developing OCR technologies. The use of synthetic data helps to overcome the limitations of traditional methods, which require significant volumes of annotated images and are costly.
The launch of Nemotron OCR v2 could have a significant impact on the text recognition industry. The model is available for researchers and developers, which may expand document processing capabilities in various languages. Performance on a single A100 GPU is 34.7 pages per second, which could significantly increase the efficiency of processing large volumes of documents.
Sources
Replies (0)
No replies in this topic yet.