AI and the Building Blocks of Intelligent Content
The data, information, and analytics economy runs on well-curated, structured data. No matter your industry—having good curated data and content is critical. It’s increasingly important as more data and content are generated. Intelligent tools to sift through content are more robust and at the same time, more “needy.” That means modern technology platforms, systems, and even content consumers require well-structured data and content to perform well. As most artificial intelligence (AI) practitioners state—“nothing starts without good data.”
This article explores the use of AI, machine learning (ML), and natural language processing (NLP) in the construction of intelligent data and content.
The terms have exploded recently in the business vernacular but the concepts trace back to the 1920s when the Austrian engineer Gustav Tauschek obtained the first patent for his “Reading Machine". Optical character recognition (OCR) is taken for granted today and the process is commoditized. However, early OCR systems needed to be trained with images of each character and worked on one font at a time. Sounds like machine learning, right? Likewise, the field of computer vision is a precursor to AI image recognition and traces its origins back to the 1960s. While there were developments in all these areas throughout the 20th century, and earlier, it’s come of age in the past 15 to 20 years due to the confluence of fast computers, inexpensive large-scale storage, the Internet, and data collections. AI, specifically ML, needs large amounts of training data, and fast computers to process it all—today we have that.
How are AI, ML, and NLP Relevant to Data Structure?
How do businesses go beyond the hype of these terms to ensure they implement the right technology and truly get valid ROI?
Many organizations have extensive and valuable data and content buried in paper, PDF, and Word files. The content is not structured and not necessarily even digitized. Unraveling this, especially if the content includes complex tables, charts, figures, foreign characters, chemical formulae, etc., was impractical. Until now. Advances in technology and in AI make intelligent data and automated content structure possible where it just wasn’t feasible before.
Technology enables multidimensions with what was considered “flat content.” You can’t do anything with flat content—no search, no filters, no interactivity, no related content. But if you properly structure data and content, you apply a layer of semantic intelligence that benefits the downstream delivery and consumption of that information. In other words, you can deliver or receive information with precision.