In-Q-Tel, the venture capital arm of the CIA, has invested significantly in knowledge management technology. What drives In-Q-Tel and just what does it expect from its investments? KMWorld's Editor in Chief Hugh McKellar interviews In-Q-Tel's Greg Pepus, senior director of federal and intelligence community strategy. Pepus describes his technical role as helping identify promising technology companies and products, helping provide the intelligence community with a vision of how it might deploy those technologies to face business challenges, and then helping in the investment and project management activities related to those investments. We also asked him to discuss some of the companies in which the organization has invested. More information about In-Q-Tel's history and investments can be found at its Web site.
Q. KMWorld: First, In-Q-Tel's name. I get the "In" and the "Tel" parts, but what's with the "Q?"
A. Pepus: The Q is just the play on the character Q who provides all of the cool gadgets in the James Bond movies. [The Wall St. Journal did describe In-Q-Tel "as outside the box as government gets."]
Q. KMWorld: Tell us a little about In-Q-Tel's strategy. Your investments are not necessarily purely software licensing but also research and development support. Correct?
A. Pepus: Typically, when we identify a company, we enter into a venture-type relationship with them, which means there is an investment component. [We want to contribute to a] commercially viable activity that's beneficial to the company and adds some things that In-Q-Tel would like to see in the product. And it's a way for us to work more closely with the company to help guide its whole product strategy. We [generally] deal with companies that are at either Version 1.0 or their technology is so compelling at the alpha and beta stage that we take the risk to invest in them because it is so promising. If [a company] survives that early stage, it can hold great, great promise.
Typically, the deal may involve a limited number of licenses, or it may compose more than that. It really depends on the situation. The company may be in the very early stage of the product, and we can't make use of it yet. Or it may be that we are going with a company that has a relatively mature product, and we can take it and run with it right away. I want to make clear that we are not really doing R & D. There's none of that going on or [the kind of work] done by university or national labs. We are really much, much later on [in the developmental stage].
These companies have to be locked and loaded, ready to go to that commercial space and start making sales or very soon thereafter. And you know when you have a product, even if it's not ready for release for three months, well six months before that you better be working very hard spending a lot of shoe leather to get out there and start getting your customers lined up.
We invest so we get a seat at the table with those companies early. We have a board observer position, and we are able to influence where it goes with its product. There is the aspect that at the end of the day, we want the technology, obviously. That's sort of our, for a lack of a better term, our special sauce, if you will, of getting in there with that venture relationship. It really sets us apart from being just a customer. The analogy that I use is: Because when you are [only] a customer, you are often the last to know anything is going wrong with the company.
Q. KMWorld: We understand you can't discuss specifically how or why the intelligence community is investing in or deploying software from vendors, but could you give us some background on In-Q-Tel investments?
A. Pepus: We are out there combing the woods, working with entrepreneurs all over the United States, and for that matter, even worldwide. We get business plans in, and we continually evaluate new technologies in a variety of different areas, of which knowledge management tools are one, and other technology areas tangential to knowledge management.
Our overall approach is that the government people generally aren't so different from those in commerce and industry. You have line-of-business activities people who deal with large volumes of data. At least in the IT space, we need to focus on tools that help deal with volume problems, help us find the right things to read and basically deal with the world of documents that we live and breathe every day.
Q. KMWorld: Exactly how do you define KM?
A. Pepus: I am not so much the chief knowledge officer type, which [I believe] is more of organizational development, the soft side of psychological business activity in terms of how you structure your business, how your teams work and how you deal with information within your organization. I am more on the hard side of how do we take the information that we receive, how do we organize it to the best effect and how do we make sure that we know what we don't know. That's where I really think knowledge management falls.
Tangential to all this are better tools for the discovery of information--intelligent agents, different types of analytical processing to do search. So we are not just talking about keyword search but touching on things such as directed search or directed discovery of information, as well as all the automation applied therein--and then some truly analytical visualization tools. I'm referring to [software] that once you've found the target of information you're interested in, enables you to narrow down the scope of what you need to see. Then you want to be able to organize that and gain [relevancy] at a glance so that you can truly improve productivity and get some meaningful information, some meat, if you will, off the bone for what you are trying to do.
Q. KMWorld: Just out of curiosity, are you doing any sort of work that the old Total Information Awareness office under DARPA [the Defense Advanced Research Projects Agency,]?
A. Pepus: I am not aware of any activity that we have done that has anything to do with or touches on that activity.
Q. KMWORLD: I'd like to hear your thoughts on some of the companies in which you've invested.
Pepus: Let's start with Inxight since I was one of the first people to bring them in here; I am very familiar with it. Inxight has been collecting capabilities in the unstructured text mining and analytical processing area for a long time. And we were intrigued by a number of capabilities it had. One was just the very consistent and widespread use of the entity extractor. It supports multiple languages and is a relatively mature technology. It's used in a lot of other [third-party] applications, and it is very reliable way to extract noun entities out of documents. We found that a very useful technology.
Inxight has other capabilities it has woven together and linked to its Smart Discovery suite--things like document summarization, categorization and visualization--making a powerful [set] of tools. I look at a lot of other companies that have individual pieces of this, but Inxight is one of the few companies that has managed to put it all together in one package. [Inxight] basically summarizes information, categorizes it, extracts entities and visualizes that information. It's all Web-capable, and it has Web services turned on. So it's very compelling and useful in many different ways.
Q. KMWorld.: What about Attensity ?
A. Pepus: You might look at Inxight and say, "Well, why invest in a company like Attensity that appears to be a competitor?" My answer is that all of our companies have unique capabilities. While it's true when you build a portfolio of companies you may end up with some competitive elements, Attensity has something unique going for it in its approach to natural language processing. For want of a better term, it understands eighth-grade grammar. And it does a very good job at it. Attensity's technology is one of the first that could reliably do entity relationship extraction. That is, it finds relationships between entities. For example, you can dump a whole bunch of articles from AP and say, "Find me all the lawsuit relationship events." And it would go through and find out that the Justice Department sued Microsoft, or HP sued IBM or Cisco Systems was sued by so and so--and do it reliably, depending on the domain of information you were in. It is able to extract those event relationships and then spit them into a database.
That's very hard to do, because in entity relationship extraction you get into all sorts of [variables]. How do I make sure that things that happened earlier in a document are referenced later on--things that we cognitively can understand as human beings but are often not easily recognized by the machine? So you start getting in very complex semantic problems of understanding language and understanding context. And while Attensity has not solved all of those problems, it certainly chipped away at a lot of semantic understanding that goes into natural language processing, and it is a pure natural language processing company. It has a very robust set of heuristics describing the English language, and that's how it goes about parsing up a sentence and telling you what's a noun, what's a verb, what's an adverb, etc., subsequently providing the context to show the relationship between entities.
Q. KMWorld: You've invested in Stratify. Can you tell us a bit about the rationale behind it?
A. Pepus: I wasn't here when this investment was made, but I followed the company in KMWorld when it was called Purple Yogi and then Stratify. It has a very good approach for dynamic generation of taxonomies, which can then be managed by a few people rather than having armies of librarians and science people to deal with taxonomies, development and ongoing morphing.
You can use Stratify for individuals, departments and/or for an enterprise. People can manage these multiple taxonomies in different contexts. Stratify has all the tools and development capabilities that allow you to create multiple hierarchical taxonomy trees. It has some visualization capabilities to assist in managing your taxonomies as well, so one can make correct use of them and to prune or add areas of a taxonomy that aren't well defined or used. There are a lot of very interesting capabilities that we thought were useful.
Q. KMWorld.: What about Convera ?
A. Pepus: We have an investment in Convera, which is a search engine company with a lot of KM features, particularly now that it has Dr. Claude Vogel, who used to head Semio. As a result, it is starting to get a lot more into the hierarchical taxonomy management. Convera is doing these domain cartridges, which are a kind of canned taxonomies. It's a very good idea. [Let's say] I work in banking, so give me the taxonomy that pertains to banking and then I can start from there. Convera is sort of an interesting investment for In-Q-Tel because it is a larger company and could benefit from having relationships with other companies in our portfolio.
Q. KMWorld.: Speaking of search, could you talk a little about Endeca?
A. Pepus: I think people need other approaches than just keyword search. Particularly [when you] look at the world through different perspectives or layers. I may be, say, a mid-level person doing a very specific job in business vs. a top executive. The executive needs to have a bird's-eye view of the world and be knowledgeable a little bit about everything, while a mid-level manager might need to be involved much more in depth about what's going on in a specific situation. What's nice about Endeca is that with the directed search engine and all of its underlying analytics, it can really deliver a lot of very useful information in ways that a normal search engine could never do.
Even today, if you search for a name, for example, on Google, you'll get a lot of hits that don't have anything to do with the name. And so that leads me to believe there's still a lot of work to do in search. Endeca represents a leap ahead in that area, I think its attraction with the customer base is really proving it out.
Q. KMWorld: I noticed, too, that you have invested in Kofax because of the capabilities it offers from the acquisition of Mohomine.
A. Pepus: I think [the acquisition of Mohomine] was a good [move] for Kofax, being a big document content scanning and management company. I believe the big focus for [Kofax] is to use Mohomine within its Ascent engine to provide multilingual categorization capabilities. Since Kofax already has a strong multilingual approach, Mohomine does a good job adding the new capabilities in categorization to the Ascent product. Unlike Stratify, Mohomine takes a pre-existing taxonomy and doesn't need a lot of documents to train it. In fact, with a given taxonomy you can get pretty good results with three or four fairly rich example documents. I think that's one of the things In-Q-Tel found very [attractive].
Q. KMWorld: In-Q-Tel has a relationship with Tacit, as well, right?
A. Yes, [Tacit's] ActiveNet is in Version 3, and I continue to be very impressed. First of all, doing what [Tacit] does is harder than what a lot of other software companies do because it requires a significant effort to implement it. It's not like you can buy Tacit for small groups of people. You really need an enterprise. You need a meaningful number of people who really don't know each other well in order to get the return on investment for Tacit. But it is so compelling in the world of business. Look at a situation where you've got mergers going on all the time. How are two companies going to find out whom they need to talk to within each [company]? Look at what's going on in the government with the Department of Homeland Security. Look at what is going on with a lot of the financial mergers on Wall Street--just all over the place. It's not like people and companies share their information. What really makes business work, though, is if you know the right person to call to find out what you don't know.
Tacit makes it easy because you can literally weave it into an existing application. It will tell you right away whom you need to find, if the person is online with instant messaging, do they have e-mail, do they have a cell phone number? And it does all this in a very secure manner. Tacit automatically creates and manages a profile of a person's skills and capabilities. And I can tell you right now having lived in a world where we want people to use things like profiles, Tacit has really got to be one of the only ways to make a 100% bullet-proof process. It builds its profiles automatically by monitoring what you write in your e-mail and other places, automatically understanding--continuously and securely--what you know and what you don't know. Then, it can help you and other people fill in gaps of knowledge. It's just brilliant. It's such a great idea. There are a lot of other things going on today with social network analysis and whatnot. But, really, Tacit is the only company that I know that works in such an interesting and effective knowledge management-oriented way.
Q. KMWorld: You've invested in Spotfire . Why?
A. Pepus: One of the areas we're looking at this year is visualization. What I like about Spotfire is that it's user-focused visualization tools give you the ability to carefully and finely control and analyze what you visualize. You can granularly dial up and down each item of what information you are displaying on the screen. People need better tools for their specific information selection criteria and analysis needs. That's exactly what Spotfire does. It presents a variety of beautiful and meaningful visualization tools--some of the standard ones that you would see like out of an Excel spreadsheet--and others more creative and clever concepts. You can then feed your data sets in and, at a glance, get a lot of visual meaning out of a very complex set of data. You can do a lot of fine, granular control on what you are visualizing right on the screen.
Q. KMWorld: Since you have discussed visualization tools, is there another company In-Q-Tel has invested in that shows some promise?
A. Pepus: One company of interest is PixLogic. You know when you search on Google, you're using a variety of search algorithms that are focused on text, word stemming and relationships of statistics and natural language to give you the result you need. But when you search Google Images, it only works because somebody, somewhere has put metadata in about an image.
What businesses really need is a tool that can search images just like Google searches text. That is, a tool that doesn't need textual metadata associated with an image to find a specific image in a collection of documents. Say, for example, that I have a collection of company logos. I'd like to be able to search through them by providing a graphical example without [manually] providing metadata about those images. PixLogic does this by providing a pixel-level search for shapes and angles.
It's an early-stage company, and it has a way to go in its technology but it shows some great promise. From a knowledge management standpoint, you could run the PixLogic search engine over a corpus of images, generating a bunch of metadata automatically and dynamically. You don't even have to manually touch the stuff to add metadata. And then when you search that corpus people can use PixLogic's search interface. Provide an image of a red arrow, and PixLogic will find all the red arrows in this corpus. You could scope it in so that it finds all arrows vs. something color- or attribute-specific. It's very flexible in that regard.