What is the Semantic Web?
Background of the Semantic Web
Development work on exposing Web-based information to more sophisticated methods of searching and organizing has reached the point that practical applications can be created. But some of the behind-the-scenes technical work is still ongoing, and completing a project requires knowledge of a new array of specifications and languages. Organizations are thus left with the choice of remaining on the sidelines, and potentially falling behind competitors, or entering new technological territory where there are a limited number of models of successful applications.
Today's Web is composed primarily of unstructured data, such as HTML pages. HTML pages can be searched via keyword queries, but this technology is limited. These searches cannot identify the type of information on a page. For example, they cannot determine that a string of text is a person's name or that it is the price of a product. Therefore, unlike information in a structured database, information on a Web page cannot be automatically related to information on another page, such as to extract different pieces of data about the same person and combine that information into a single personal profile. The vision of the semantic Web is to establish such capabilities.
Semantic Web Technology
The idea of using semantic technology on the Web is almost as old as the Web itself, but the concept has only in the last several years begun gaining traction. This has occurred in large part because of the creation of several key standards discussed below.
As a result, the semantic Web has progressed significantly in recent years, opening the doors for developers to solve problems that could not be otherwise addressed and to create applications that are not possible with conventional Web technology. But for non-developers, there isn't an easy way, such as a turnkey commercial product, to get started on semantic work.
All content looks the same to conventional Web search technology. A search engine does not distinguish between names, products, prices, and so on. Semantic technology, on the other hand, is a searching infrastructure that resembles database queries, in which different types of information (e.g., addresses, names) are entered in different, corresponding fields. Compared to conventional Web sites, on semantic Web sites information is more than undifferentiated text. The name of a product could be tagged to indicate that it is a certain type of product, and a number displayed next to the product name could be identified by a search utility as a price. Computers could then use this semantically meaningful content to automatically manipulate, combine, compare, and sort information.
The concept of a semantic Web has been around for well over a decade. In recent years, however, the idea has begun to attract more interest and is now being implemented on a small scale. These developments have been enabled by the establishment of the following key standards:
- Resource Description Framework (RDF). RDF is a World Wide Web Consortium (W3C) standard for how to label and describe online content so that computers can treat it as knowledge. A central part of RDF is defining various Web resources so that any system can determine what that resource is. For instance, a resource may be a particular product or it may be a price that applies to a product. In order to apply these definitions, RDF uses uniform resource identifiers (URIs). The term URI is new to many people, but the concept is widely known because of the term uniform resource locator (URL), which is synonymous with a Web page's address. A URL is one type of URI. It is intended specifically for finding files on the Web. The Internet Society defines a URI as a compact sequence of characters that identifies an abstract or physical resource. This specification defines the generic URI syntax and a process for resolving URI references that might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
- SPARQL. Whereas databases are commonly searched using the query language SQL, there is now a query language for the semantic Web: SPARQL. It is a W3C standard that enables queries to be made across RDF data sources. It also defines the XML format in which query results will be returned.
- Web Ontology Language (OWL). Like RDF, OWL is a system for categorizing and defining information, but it is more granular and flexible. Less mature than RDF, it is envisioned for use in sub-communities of the semantic Web, for cases in which RDF alone does not provide enough flexibility.
- Simple Knowledge Organization System (SKOS). SKOS is a group of specifications, created using RDF, that describe semantic Web "knowledge organization systems," which include any type of classification system or vocabulary on the semantic Web.
Applications of the Semantic Web
Until several years ago, the semantic Web was primarily in a research phase. The few implementations of it were mainly to demonstrate the potential of the idea. While much of the activity related to semantic technology still takes place within the academic community, there are now real-world examples of the technology to use as a model.
An early example is the Norwegian National Broadcasters' efforts to use the semantic Web to store extensive metadata about its collection of recordings, with the goal of making the archives highly searchable.
One of the more familiar of today's semantic Web initiatives is the Friend of a Friend Project (FOAF). FOAF is designed to make information on people's personal homepages machine readable, and thus to enable such data to be automatically interrelated. As with many aspects of the semantic Web, FOAF builds on Web 2.0 concepts such as personalization. FOAF, which uses RDF, provides a set of terms that can be used to describe people in a standard way. For instance, it defines classes with a syntax such as "foaf: Person." While FOAF is typically talked about and used as a way to describe people, it has the potential to describe other entities, such as companies.
The efforts described above show the potential of the semantic Web, but they also make it clear that only a limited amount of its potential has yet been realized. Some of today's semantic Web projects have minimal "curb appeal," limited functionality, and are not user-friendly enough to expand much beyond the crowd of highly technically literate people now using them.
Also, many of today's semantic Web projects resemble Web 2.0 services that are already live and being heavily used. For instance, the semantic Web concept of tagging is being used extensively by sites such as del.icio.us, and the idea of extracting and re-combining data in new ways is being put to use by application mash-ups. The semantic Web could eventually go beyond the Web 2.0 by creating a broader, standardized system of interrelated information, but such a development is down the road.
The Future of the Semantic Web
Although semantic Web applications have been ready to be built for a while, there is still some technical work being done to add capabilities or make improvements. For now, many developers may approach semantic applications with caution or stay on the sidelines altogether. One area that continues to be worked on is SPARQL, with some developers pushing for new features.
As touch-up work continues on semantic Web standards, development will proceed on real-world projects that make use of existing semantic technology. Today, the question is no longer whether the semantic Web will come to fruition in the foreseeable future. Instead, the key question about the semantic Web is how wide its scope will be. In one vision of its future, it will grow but remain in essence a specialty technology, with the overwhelming majority of the Web remaining unstructured data. In another vision, the Web itself will become broadly and seamlessly semantic, with everything from social networking sites to retailers using the technology.
New ideas in semantic Web technology, techniques, and applications can be seen in the annual Semantic Web Challenge. This contest asks participants to develop one of two types of application: one that provides a useful service to end users on the Web (the Open Track of the contest) or one that uses a data set comparable to those that might be an the actual Web (the Billion Triples Track).
Most of the developments going on in the field of the semantic Web are small and technical, and software engineers can follow these changes through industry resources such as the W3C's Semantic Web blog or the Topix feed on Semantic Web news.
Perhaps the best bellwether of the semantic Web's prospects over the next few years is the amount of work being done on commercial offerings. By this measure, a significant expansion in real-world applications does not yet appear to be on the horizon. Oracle has been one of the biggest supporters, releasing Oracle Database 11g Semantic Technologies, which it bills as a platform to build and manage semantic applications. Other industry trendsetters such as IBM and Microsoft have the technology on their radars, but they have largely confined their interest to their research departments. Until these companies, or competitors with similar clout, start to announce applications with potentially broad appeal, the pace of the semantic Web's growth will likely remain measured.
About the Author
Geoff Keston is the author of more than 250 articles that help organizations find opportunities in business trends and technology. He also works directly with clients to develop communications strategies that improve processes and customer relationships. Mr. Keston has worked as a project manager for a major technology consulting and services company and is a Microsoft Certified Systems Engineer and a Certified Novell Administrator.
This article is based on a report published by Faulkner Information Services, a division of Information Today, Inc., that provides a wide range of reports in the IT, telecommunications, and security fields. For more information, visit www.faulkner.com<-->
Companies and Suppliers Mentioned