-->

Super Early Bird Pricing for KMWorld 2026 Available for a Limited Time!
Register NOW for November 16-19. Use code SUPERSAVINGS.

LF AI & Data Foundation create DocLang Specification Working Group to formulate an open standard for AI-native documents

LF AI & Data Foundation, the premier organization supporting open source innovation in AI and data under the Linux Foundation, announced the formation of the DocLang Specification Working Group—supporting a new collaborative standards development initiative to develop DocLang, an open, universal, AI-native document format designed to improve how enterprises prepare, exchange, and govern document data for AI systems.

Founded by LF AI & Data premier members IBM, NVIDIA, and Red Hat, as well as contributors ABBYY and HumanSignal, the DocLang Working Group will operate under Joint Development Foundation’s vendor-neutral, open governance model to develop and maintain a specification that supports more reliable, interoperable document processing across AI and agentic workflows, according to the group.

“Documents remain one of the most important sources of enterprise knowledge, but most were never designed for AI-driven workflows,” said Mark Collier, general manager of AI and infrastructure at the Linux Foundation and executive director of LF AI & Data. “With the launch of the DocLang Working Group, we are bringing the open source community together to develop a vendor-neutral, interoperable standard that helps organizations prepare document data for AI more reliably, transparently, and at scale. Combined with projects like Docling, this effort can help create a more open foundation for document understanding across the AI ecosystem.”

Enterprises today work across a fragmented landscape of document formats, including PDFs, JPEGs, and other file types built primarily for human consumption rather than AI interpretation. As organizations increasingly rely on generative AI and agentic systems, this disconnect can introduce complexity, raise costs, and reduce reliability when extracting meaning from business documents.

“DocLang is designed to solve one of the foundational problems in enterprise AI: documents were built for humans, not machines,” said Maxime Vermeir, vice president, AI strategy at ABBYY. “By introducing a minimal, standardized, and AI-native representation of document structure, layout, meaning and governance, DocLang creates a far more deterministic foundation for modern AI systems. The results in an AI native context layer at scale.”

DocLang is designed to support:

  • Preservation of both semantic meaning and geometric layout in a single AI-native format
  • Representation of structural elements such as headings, paragraphs, and tables alongside their position on the page
  • Embedded governance controls to help downstream systems enforce policies related to privacy, extraction scope, and model training permissions
  • Optimization for modern AI tokenization and modeling approaches to support more efficient and reliable document understanding

The new working group builds on the momentum of Docling, the open source document processing toolkit hosted by LF AI & Data. Originally developed by the AI for Knowledge team at IBM Research Zurich, and released as open source in 2024, Docling has become a widely adopted project for converting documents into structured, AI-ready representations, the group said.

DocLang complements that foundation by defining an open, interoperable standard for expressing and exchanging that structured output across systems.

Together, Docling and DocLang create a more complete open source document AI stack under LF AI & Data, spanning document ingestion, parsing, standardized representation, and downstream consumption by language models and agentic AI systems, said the group.

Organizations and individual contributors interested in helping shape the future of AI document processing are invited to participate in the Doclang Working Group. Membership is open to organizations committed to building open, interoperable AI infrastructure.

To learn more, adopt the standard, or contribute to the Doclang specification, visit https://doclang.ai/.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues