Bitext engages in the semantic arena
Semantics and related information retrieval disciplines benefit from innovations outside the United States. For many years, I assumed that cracking tough problems in knowledge management required expertise honed at Carnegie Mellon University, Massachusetts Institute of Technology or Stanford University. Important work is being done in the institutions and companies founded by professionals who graduated from those universities. But there are strong signals that semantic innovation is not uniquely American.
Several years ago, I learned about Bitext, based in Madrid. I met with Antonio Valderrábanos, the founder and CEO, and several of his senior engineers in the firm's Madrid offices. At that time, I learned that Bitext was founded to make it possible for computers to handle text intelligently. In the last three years, the company has expanded its client base and its technology solutions. Bitext has responded to the need of organizations to extract business value from big data repositories: social media (Twitter, Facebook), blogs and forums, news, internal documentation and other types of new content.
In March 2011, received the European Seal of e-Excellence Award at CeBIT 2011, the world's largest IT and communications trade fair. It was the first award ever granted in the language technologies category. In January, I contacted the company to learn more about its 2012 plans. In my conversation with Valderrábanos, he was thinking about what enterprise customers require in 2012.
The Bitext founder said, "Users will want information as it is produced and for their context (for example, for deciding on a vendor for a component or a restaurant right on the spot). This means that in 2012, latency and context are critical. Software performance benefits from today's new processors and infrastructure. But for context, we think the value for our enterprise customers comes from easy integration with third-party applications. As a provider of semantic engines for third parties, how to integrate effectively is a key issue."
But for many enterprise li-censees, integration is less important than deploying systems that work. Valderrábanos said, "I use the word ‘reliability' to describe the systems users actually use. If we are going to say that a product is being criticized for its support service or making business decisions from data displayed on a mobile device, high reliability (high precision and coverage) is key to gaining user confidence and, in the end, usability."
Bitext positions its technology as adding semantic functionality to any existing system. Many vendors advocate a "rip and replace" approach. Bitext's method adds natural language functionality or support for structured and unstructured information to any existing search solution. Its architecture allows the firm's technology to integrate with any enterprise application. That agility makes it possible to add semantic functions to a wide range of enterprise knowledge management applications.
In contrast to some of the companies offering technology that can handle large volumes of social content, Bitext has adopted its methods, based on computational linguistics and natural language processing, to the challenge of big data. "We were born for this," said Valderrábanos.
The company's research has made possible some high-value methods that are a departure from the more traditional methods used by Attensity, Lexalytics and other vendors with semantic big data methods.
The core of the Bitext method is to perform a morphological, syntactic and semantic analysis of a content stream: for example, Twitter messages, Facebook posts, Tumblr content and Web logs. What sets Bitext approach apart is that the company uses the same semantic engine for its solutions: For natural language search, Bitext uses NaturalFinder. For social media analytics, Bitext has developed NaturalOpinions, and for text analytics, the company's product is NaturalExtractor.
The power and flexibility of the Bitext engine enables easy development of new languages. Bitext now offers support for English, Spanish, French, Italian, Portuguese, Brazilian, German and Dutch. In development are languages such as Arabic, traditional Chinese, simplified Chinese and Japanese. What struck me as important is Bitext's ability to handle dialect variants; for example, in Spanish, nuances of Mexican, Columbian and Argentinian are supported. Some of the most widely used social content processing systems are unable to discern dialectal nuances, which can lead to flawed outputs.
In addition, Bitext executes a real-time semantic analysis of a content stream such as Twitter messages, Facebook posts, Tumblr content or Web logs. The Bitext approach extracts reputation information, marketing insights, social insights and even stock market performance predictions.
Valderrábanos said, "We support semantic search. Our system enriches traditional search methods with linguistic knowledge, both in the query and the indexing process. And the Bitext system performs automated content classification or tagging of high volumes of data. The system can handle e-mail, data used in customer service environments and social content."
Social media analytics
He continued, "We combine new methods and integrate them with a classical approach-indexing search text and metadata, and taking this approach to new business uses. For example, when analyzing social media, we tag texts with sentiment (positive, negative ... ), category (price, image, product ... ) and other client-requested tags. Then, we index both the raw text and its tags so users can search for positive comments within the category of "customer support." This is an effective way to integrate search into social media analytics, which may seem obvious but is, in our experience, difficult to deliver to the average business professional.
When asked about Bitext and big data, Valderrábanos responded, "In one client engagement, Bitext performs bottom-up analysis of large amounts of text. For example, now we analyze 3 million tweets every morning to extract the names of companies or people, and topics or themes. Each day we discover and present the connections between and among these entities-for example, which companies are related to which topics. This functionality powers intelligence by discovering and connecting information items."
The Bitext system delivers actionable information for the knowledge worker. Valderrábanos explained, "We analyze relevant text bottom-up, extracting entities and concepts or topics. With this general overview of the full set of texts, we help clients define the details of the specs of what the user wants to extract. With this, we give clients a view of what's actually in their data; what's showing up and what's connected to whom, what, when, where, etc."
The Bitext approach makes use of what Valderrábanos calls the "linguistic touch." He said, "We see information retrieval and text analytics as text problems, not as software problems. Our main focus in any project lies on the semantic analysis of valuable text, on the way we approach content (text collections, lexicons, taxonomies, ontologies), rather than on the type of software tools we use."
Big data requires outputs that are more than long laundry lists or tables of data. Bitext's approach supports third-party tools, such as like FusionCharts or Litebi, to convert text analysis into dashboards where users can choose either a top-level view of main issues or a detailed drill-down to the verbatim text. The visualization component integrates with the Bitext alert function so users know when significant new data are available.
One of Bitext's most interesting capabilities is its NaturalOpinions component. Valderrábanos said, "The cornerstone of our offering for volume and latency problems is the fact that NaturalOpinions analyzes thousands of text units (sentences, tweets ... ) per second on a low-level server. Also, NaturalOpinions can be integrated in distributed architectures to maximize throughput. That provides a high-performance refresh rate and is critical in the new world of social media analytics."