John Snow Labs improves natural language processing solution

John Snow Labs, developer of the Spark NLP library, is releasing Spark NLP version 2.2, improving accuracy and enabling new use cases as prioritized by customers and the community.

Major new features include OCR based coordinate highlighting, BERT embeddings refactoring and tuning, new tools for accuracy evaluation in Python, and more.

This includes:

Named Entity Recognition with deep learning now has `includeConfidence` param that returns confidence scores on prediction metadata.
Named Entity Recognition with deep learning approach now has `enableOutputLog` outputs training metric logs to file, making it easier to track and optimize long model training runs.
OCRHelper now returns a coordinate positions matrix for text converted from PDF documents.
A new annotator called PositionFinder consumes OCRHelper positions to return rectangle coordinates for CHUNK annotator types. This enables visualizing where each chunk originally came from in a PDF.
The evaluation module now also ported to Python. This provides accuracy metrics for each epoch in a machine learning or deep learning training run for new NLP models.
WordEmbeddings now include coverage metadata information. Two new static functions `withCoverageColumn` and `overallCoverage` offer metric analysis.
A new parameter in BERT `poolingLayer` allows for polling layer selection. This has shown to improve accuracy for some domain-specific NLP use cases.

For more information about this news, visit www.johnsnowlabs.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Register Now to SAVE BIG & Join Us for KMWorld 2025, November 17-20, in Washington, DC.

John Snow Labs improves natural language processing solution

Special Report- Shadow AI: Managing the Unseen Copyright Risks in Your Organization

Supercharging Your Customer Experience Program With AI and Automation

Special Report- The Role Metadata Plays in the Information Lifecycle

More

The Rise of GenAI Agents and AI-Powered Search

Explainability and Interpretability: Building Trustworthy AI Models

The KM ROI Challenge: Measuring the Impact of Your Investment

Unlocking the Power of Intelligent Document Automation

More Webinars