-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Big data: expediting and validating analyses

Article Featured Image

The data lake

“Data lake” is a relatively new term that refers to a heterogeneous set of data that may include structured, unstructured and semi-structured data stored in its native format. Putting data in a data lake is quick and does not require processing. Another advantage is that all the enterprise data in the data lake is in one spot, unlike the more typical situation in which it is scattered throughout different departments in an enterprise.

“A data lake is as much an approach as a technology,” says Kamran Khan, CEO of Search Technologies. “Putting data in a single store makes it easier to enrich or analyze. In addition, re-indexing the data is much faster.” A frequent problem, especially in large organizations, is that people do not want to tag their documents. “Using big data and statistical analyses, the information can be made useful even when this structure is not applied,” Khan adds.

Because a schema has not been applied to the data when it was put in the repository, the analyses that can be done are less limited. At the same time, the lack of structure leads to lack of governance; for example, retention schedules that depend on certain metadata will not be applied. The data may be more accessible, but it is not managed. Therefore, issues may arise with data quality.

“In the past, a KM solution would focus on finding information,” Khan says, “but now users want to analyze it. The future of KM will be to combine search and big data because what you can do with the two techs together is fantastic.” For many companies, data lakes will prove to be a useful component of a knowledge management strategy.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues