Appen includes new speech and text datasets for AI training

Appen Limited, a provider of high-quality training data for organizations that build effective AI systems at scale, is offering new off-the-shelf (OTS) datasets, designed to make it easier and faster for businesses to acquire the high-quality training data needed to accelerate their artificial intelligence (AI) and machine learning (ML) projects.

The new OTS datasets include human body movement and innovative baby crying sounds, as well as scripted speech and images with text suitable for optical character recognition (OCR) for high-demand but hard-to-acquire languages, such as Arabic, Croatian, Greek, Hungarian, Thai, and more.

With the expanded datasets, Appen’s total OTS offering includes over 250 datasets, comprising of over 11,000 hours of audio, over 25,000 images and over 8.7 million words across 80 languages and multiple dialects.

Appen’s OTS datasets are a fast, cost-effective tool to jumpstart an AI or ML project with consistent high-quality training data. Teams expanding their AI capabilities can also leverage OTS datasets to effectively improve accuracy, develop new model skills and incorporate other improvements into their AI models.

All Appen datasets are developed using a fully transparent, opt-in methodology, so AI specialists can be assured their data is clean and compliant, eliminating the potential risk of backlash and reputation damage.

“AI teams around the world working on projects with tight deadlines and flexible data requirements can benefit from using off-the-shelf datasets,” said Wilson Pang, CTO of Appen. “OTS datasets shorten time to value and provide access to high-quality data at a lower total cost than using traditional methods. We at Appen take the necessary steps to ensure that all our datasets are ethically sourced and demographically balanced, enabling companies to maintain responsible AI practices by minimizing bias in their models and ensuring fair treatment of data annotators.”

The most experienced AI experts combine OTS datasets with on-demand data collection and annotation projects to meet their complex AI model training data needs.

Appen is the leader in offering continued support through a range of specific data collection services, such as ongoing data annotation and smart labeling, through AI-powered tools and automated workflows to maximize efficiency.

For more information about this news, visit https://appen.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Register Now to SAVE BIG & Join Us for KMWorld 2025, November 17-20, in Washington, DC.

Appen includes new speech and text datasets for AI training

Special Report- Shadow AI: Managing the Unseen Copyright Risks in Your Organization

Supercharging Your Customer Experience Program With AI and Automation

Special Report- The Role Metadata Plays in the Information Lifecycle

More

The Rise of GenAI Agents and AI-Powered Search

Explainability and Interpretability: Building Trustworthy AI Models

The KM ROI Challenge: Measuring the Impact of Your Investment

Unlocking the Power of Intelligent Document Automation

More Webinars