Datalab Releases Lift: A 9B Parameter Model for PDF JSON Extraction

Models

Datalab Releases Lift: A 9B Parameter Model for PDF JSON Extraction

Datalab has introduced Lift, a 9B open-weights vision model designed to convert PDFs into structured JSON using schema-constrained decoding.

AZAli Zayed · Founder & EditorJune 25, 20261 min read✓ Independently fact-checked
The quick version
  • Lift is a 9B parameter open-weights vision model specifically tuned for document processing and structured data extraction.
  • The model utilizes schema-constrained decoding to ensure all output adheres to a predefined JSON format.
  • It features a trained abstention mechanism that returns null for missing data, reducing hallucinations compared to standard models.
  • According to MarkTechPost, the model achieved a 90.2% field accuracy score on a benchmark of 225 documents.

Datalab has released Lift, a 9B parameter open-weights vision model engineered to solve the common issue of unstructured data extraction from PDFs and images. Unlike general-purpose vision models that often struggle with document layout and hallucinate missing fields, Lift is built to map document content directly into predefined JSON schemas.

According to reporting from MarkTechPost, the model uses schema-constrained decoding to force the output into a valid structure. This approach removes the guesswork often associated with LLM-based data extraction. Perhaps most importantly for enterprise reliability, Lift includes a trained abstention capability. If the model cannot identify a piece of information within the source document, it is programmed to return a null value rather than inventing data to fill the schema.

Why it matters

Document digitization remains a bottleneck for many businesses, as traditional OCR and general LLMs frequently produce inconsistent, unparsable results. By focusing on a smaller, 9B parameter architecture, Datalab is prioritizing efficiency and precision over raw general knowledge. This makes the tool easier to host locally or on private cloud infrastructure compared to massive, opaque proprietary models. For those managing complex data workflows, finding the right infrastructure is as critical as the model itself; you can see how we compare various utilities in our best AI coding tools guide.

What it means for you

If your current workflow relies on cleaning up messy text output from generic vision models, Lift offers a more disciplined alternative. The 90.2% field accuracy reported on the 225-document benchmark suggests that the model is ready for serious testing in production environments where data integrity is non-negotiable. Because the weights are open, developers can integrate the model into their own pipelines without relying on external API stability or paying per-token costs for standard extraction tasks.

90.2%Field accuracy on a 225-document benchmark

Frequently asked questions

What is Datalab Lift?

Lift is a 9B parameter open-weights vision model designed to extract data from PDFs and images into structured JSON format.

How does Lift handle missing data?

The model uses trained abstention to return a null value when information is missing, which prevents the model from hallucinating data.

Is Datalab Lift open source?

Datalab has released Lift as an open-weights model, allowing users to deploy it in their own environments.

Our tested pick

For more on managing automated data and development workflows, check out our guide to the best AI coding tools.

Best AI Coding Tools (2026): 7 Tested & Ranked →

Source: MarkTechPost. Published June 25, 2026.

AZ
Ali Zayed
Founder & Editor · AI Tools Worth

Ali has hands-on tested 50+ AI tools and tracks model releases daily. Every verdict here comes from real, paid usage — never vendor demos or sponsored placements.

AI Tools Worth is independent and unsponsored. Some linked guides contain affiliate links — they never change our verdicts.