The effect of ChatGPT on KM
Not the only game in town
First, let’s consider that ChatGPT is not the only game in town; it’s just the one getting all the headlines. Funnily enough, it’s not even the biggest; that dubious prize goes to Google with DeepMind. Many different products and services are leveraging LLMs, and many more are due to enter the market within the next few years. Although much smaller, they have the potential to be more accurate.
Imagine an LLM trained solely on U.S. legal case law or understanding only the European healthcare systems. In other words, LLMs that have only had access to carefully curated and peer-reviewed data rather than being trained on masses of random data. They will, by default, be narrower in focus but also more accurate and valuable. If we take Mullen’s formula of “Weight=Truth,” we might tweak that to “Quality=Truth.” To put it another way, lower volumes but higher-quality data make for better business.
Like any AI system, the quality of its output is determined by the quality of the data used to train it. ChatGPT and its ilk do not, cannot, understand or weigh up source bias. Therefore, the data used and the output generated cannot be bias-free, nor can these systems ever promise or deliver accuracy. What they deliver are best guesses. That’s not a problem if you are using an LLM to run a spell- or grammar-check. Where the problems arise is when you use these tools for “discovery” questions because the answers provided may come from sources that do not apply to you. Think of a legal question run against one of these systems: Can it tell you, or would it know (without explicitly asking) that “this” law has now replaced “that” law?
Generative AI affects search
Enterprise search vendors have been working with LLMs for a while now, most commonly BERT, and more recently, Hugging Face. Microsoft announced that it will leverage similar technology into its Syntex search functionality this year. The goal here is to move away from simply providing you with a series of links in response to your search query and instead providing summaries and contextualized answers. That’s good stuff and something we should applaud. Of course, data cleanliness and governance challenges remain, but on a much smaller and more manageable scale. Similarly, we are seeing the emerging use of language models to summarize complex documentation sets. That area will grow substantially. Language generation tools can automate, to some degree, the generation of marketing materials and general business documentation. Many valid use cases will soon see these technologies take hold. It’s quite possible that something similar to ChatGPT could write 80%–90% of a legal contract, and only require 10% extra human lawyer effort to complete and check it. That’s a solid business case.