ChartNet: 1M+ synthetic charts train smaller vision–language models to outperform larger systems

News

6/3/2026, 4:42:20 AM

ChartNet: 1M+ synthetic charts train smaller vision–language models to outperform larger systems

ChartNet is a synthetic dataset of more than one million chart images, each annotated with visual, linguistic and numerical encodings, built to teach vision — language models to interpret charts and improve performance on extraction and summarization tasks.

Researchers from MIT and the MIT — IBM Computing Research Lab have released ChartNet, a synthetic training set of more than one million chart images plus the generation code, and used it to train open-source vision — language models (VLMs). Models trained on ChartNet substantially outperformed orders — of-magnitude larger commercial models on tasks such as data extraction and chart summarization, a result that could lower compute and licensing barriers for organizations relying on automated chart interpretation.

ChartNet was produced with a novel synthetic data pipeline that generates high-quality, varied chart images and attaches detailed annotations. Every image in the dataset carries explicit encodings of visual elements, linguistic labels and the underlying numerical data, and the team is releasing the generative code so practitioners can reproduce or extend specific chart types and styles.

The project targets a known training — data bottleneck: publicly available chart datasets have been limited in scale and metadata, making it difficult for models to learn how to integrate visual, numeric and textual signals. The authors note that, unlike humans, current VLMs often require thousands of examples to reliably recognize chart types and patterns, which motivated a synthetic approach to achieve much broader coverage across formats and layouts.

In benchmarking experiments reported by the team, smaller open-source models trained on ChartNet achieved accuracy on chart — focused tasks comparable to-or better than-much larger commercial systems. The improvements were most pronounced on structured tasks such as extracting numerical values from plots and generating concise chart summaries, indicating that targeted synthetic training can yield practical gains without the compute footprint of massive models.

Lead author Jovana Kondic, an EECS graduate student, described ChartNet as intended to be “a one-stop shop for chart understanding.” Co-authors include Pengyuan Li, Dhiraj Joshi, Isaac Sanchez, Aude Oliva and Rogerio Feris from MIT, the MIT — IBM Computing Research Lab and IBM Research. The work will be presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

By releasing both the dataset and the generation code, the team aims to let researchers iterate on model architectures and training recipes without excessive compute and to accelerate development of VLM features that integrate visual, numeric and textual reasoning. Immediate use cases the researchers highlight include business trend analysis, financial reporting and automated interpretation of scientific figures.

Sources

MIT News AI · 6/3/2026

Replies (0)

No replies in this topic yet.

Back