-->

Register Now to SAVE BIG & Join Us for KMWorld 2025, November 17-20, in Washington, DC.

Vertesia releases semantic document preparation service

Article Featured Image

Vertesia, a unified, low-code platform for developing, deploying, and operating custom generative AI applications, is introducing its Semantic DocPrep service, a cloud-based API service designed to eliminate hallucinations and speed the development of generative AI (GenAI) applications.

“The two concerns we hear most from enterprise leaders are consistent: 95% accuracy isn’t good enough, and data preparation is a costly, time-consuming challenge,” said Chris McLaughlin, chief revenue officer at Vertesia. “Our Semantic DocPrep service was built to solve both—giving developers a set of APIs to automate document preparation and significantly improve the accuracy and relevancy of LLM outputs. It removes two major hurdles to building reliable, enterprise-grade GenAI applications.”

With five patents pending, Vertesia’s new Semantic DocPrep service works by converting even the most complex documents, such as invoices, annual reports, and regulatory filings, into richly structured, semantically tagged XML—without rewriting or altering the source, the company said.

By preserving the original structure, relationships, and context, Vertesia ensures that large language models (LLMs) can accurately interpret documents without fabricating or misrepresenting information.

Unlike conventional tools that flatten or rewrite inputs, Vertesia’s approach deconstructs documents at the page level, automatically determining the most appropriate AI model based on that page’s content—whether it's dense text, tabular data, images, or a mix, according to the company.

Some pages are best handled by LLMs, others by OCR or vision models. This hybrid method also forbids model rewrites, preserving the original text without corrections.

Designed for developers building custom GenAI apps and Retrieval-Augmented Generation (RAG) systems, Semantic DocPrep fits seamlessly into modern AI pipelines, Vertesia said.

Developers send documents—PDFs generated from Word, PowerPoint, or other formats—via an API, and receive structured XML output that’s ready for chunking, indexing, and model ingestion. No setup or model training is required.

Semantic DocPrep is part of Vertesia’s broader platform, which provides the end-to-end infrastructure organizations need to build, deploy, and manage custom GenAI applications and agents at scale.

From intelligent content pre-processing to agentic RAG, hybrid search and observability, Vertesia offers a unified foundation to accelerate GenAI development while maintaining control, accuracy, and performance, the vendor said.

For more information about this news, visit https://vertesiahq.com.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues