John Snow Labs improves natural language processing solution

John Snow Labs, developer of the Spark NLP library, is releasing Spark NLP version 2.2, improving accuracy and enabling new use cases as prioritized by customers and the community.

Major new features include OCR based coordinate highlighting, BERT embeddings refactoring and tuning, new tools for accuracy evaluation in Python, and more.

This includes:

Named Entity Recognition with deep learning now has `includeConfidence` param that returns confidence scores on prediction metadata.
Named Entity Recognition with deep learning approach now has `enableOutputLog` outputs training metric logs to file, making it easier to track and optimize long model training runs.
OCRHelper now returns a coordinate positions matrix for text converted from PDF documents.
A new annotator called PositionFinder consumes OCRHelper positions to return rectangle coordinates for CHUNK annotator types. This enables visualizing where each chunk originally came from in a PDF.
The evaluation module now also ported to Python. This provides accuracy metrics for each epoch in a machine learning or deep learning training run for new NLP models.
WordEmbeddings now include coverage metadata information. Two new static functions `withCoverageColumn` and `overallCoverage` offer metric analysis.
A new parameter in BERT `poolingLayer` allows for polling layer selection. This has shown to improve accuracy for some domain-specific NLP use cases.

For more information about this news, visit www.johnsnowlabs.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Super Early Bird Pricing for KMWorld 2026 Available for a Limited Time!
Register NOW for November 16-19. Use code SUPERSAVINGS.

John Snow Labs improves natural language processing solution

Mining Business Knowledge From Unstructured Data

Checklist Report - Preparing for Agentic AI: KM Playbook

2026 State of KM & AI Report

More

Agentic AI at the Core: Building Faster, Smarter Search Experiences

Knowledge at Your Fingertips: Building Workflows with Embedded Intelligence

GenAI Without Limits: Harnessing KM for Accuracy, Trust, and Scale

The AI Knowledge Maturity Model: Assessing Readiness and Measuring Progress

More Webinars

Super Early Bird Pricing for KMWorld 2026 Available for a Limited Time!Register NOW for November 16-19. Use code SUPERSAVINGS.

John Snow Labs improves natural language processing solution

Super Early Bird Pricing for KMWorld 2026 Available for a Limited Time!
Register NOW for November 16-19. Use code SUPERSAVINGS.