Register Now for our Webinar - ECM: To the Cloud and Beyond

Structure is coming back

This article appears in the issue January 2011, [Volume 20, Issue 1]


   Bookmark and Share

I was paging through a copy of one of the Mac magazines that actually now carry more information about the iPhone that I don’t have than the Macintosh that I do, when I saw a review of Office 11. I still use Office when I have to—I’ve come to prefer iWork Pages simply because of the aesthetics—so I read it with interest. Not too much there that I care about ... except for one little item that caught my eye. In the grand cycle of structure vs. style, Word has taken a small step back toward structure.

The two things are apart but related. A headline is a structural element of a newspaper, but you identify the headlines because of their style: They are bigger and bolder than subheadings and paragraphs. As if to confuse matters on purpose, the structure of documents created with Microsoft Word is expressed in what Word calls “styles.” If you’ll just step into the WayBack Machine for a paragraph, it will all become clear.

Back when WordPerfect ruled personal computers and Microsoft Word was just an upstart, the two of them stood for different approaches to word processing. WordPerfect thought of a document as a long string of characters with formatting codes stuck into the stream. Word thought of a document as a set of named elements, each with its own style. This meant that a Word user could change all 15 elements named “Subtitle” from 10 point bold to 12 point italics simply by making a change to a style dialog sheet, whereas WordPerfect would require you to select each subtitle and make the change by hand.

Way back when

Word even used to make it obvious to users that that’s how its documents worked. Way back when, Word would show you the names of the elements in the left margin of the window. Over time and many releases, Word hid the style bar as an option and then by default. Eventually, it was removed even as an option. Still, users could always go to the Format pulldown, select Styles and make adjustments to named elements just as in the old days. But the de-emphasis on the structure of Word documents undoubtedly led many people to prefer to reformat their subheads and list items by selecting them and applying changes locally.

And there’s nothing wrong with doing so. Except that it’s morally wrong. Well, at least inelegant.

You can easily distinguish those who understand Word documents as structured and those who do not—those who use Word as Word and those who use it as if it were WordPerfect and the year were 1985—by looking at the bottom of a normal paragraph. Are there two carriage returns? If so, the user thinks Word is WordPerfect. If instead the user is controlling the inter-paragraph spacing by adjusting the margin in the paragraph’s style sheet, then that person understands how Word was designed to work: Each type of structural element has its own properties, including a bottom margin property.

Why it matters

Now, in Office 11, Word has once again made the structure visible. In the left margin, you can see colored numbers that refer to the list of structural elements. It’s not as visible as in the old days when you could see the element’s name directly, but it at least may get people to understand a bit more the structure behind what they’re writing.

Of course you’re free to use Word any way you want, and it often is easier just to hit the return key twice rather than navigate Word’s complex style menus and dialogue boxes. St. Peter isn’t really going to keep you out when he realizes that you’re a double-spacer (although in my religion, he would gently sigh). Nevertheless, it has become only more important since the WordPerfect days to understand documents as structured rather than as long strings of text.

First, when we write and when we read, the structure of documents matters. We style them differently so that we can tell them apart. We want to tell them apart because their differences carry information: The caption is different than the text, the text is different than the subhead, the subhead is different than the title, the title is different than the footnote. Each of those elements plays a particular role in how we understand the document. Word’s structured model is closer to our own mental model. Of course, it seems odd to say that we have a mental model that most of us don’t recognize, but, well, I don’t make the rules.

Second, it’s how Web documents work. HTML expresses the structure of documents but (generally) not their styling. For example, a Web document’s title may well be marked as a structural element called “H1,” but the formatting of an H1 element is not itself defined by HTML. The format is expressed in a style sheet that applies the same format to all the H1 elements in the page. Working with Microsoft Word as an editor of documents structured in the same way as Web pages means we don’t have to switch mental models when we’re writing for online or for on paper.

Structure is useful

Third, as more and more of our written world goes online, we want to be able to treat documents programmatically ... which is made vastly easier if the documents are structured. If, for example, you want a Web page that will validate the amounts users type into various form fields, the program you write will have to find all those fields. To do that, beneath the surface, those fields are going to have to be named and treated as structural elements. Likewise, if you want to rifle through 10,000 pages of an online repair manual to find the ones that talk about problems with the heating system, your program is going to look for a structural element of the page that has a name that identifies which subsystem it concerns. Structuring documents makes them far more useful to programs that make life easier for us and that can mine information that makes us smarter.

Internet Age’s WordPerfect

This, by the way, makes some choices of Google Docs hard to understand. The programming extension language Google Docs has introduced can’t make use of document structure because Google Docs is resolutely unstructured. Hit the return key twice and it doesn’t even make a new HTML paragraph element but instead creates two line spaces. Google Docs is the WordPerfect of the Internet Age. That’s just bizarre given that the power of programming structured docs was proved even before Word was invented, with the old technical documentation word processor created by Interleaf (where I worked for eight years).

Still, the wheel turns. Structure is coming back. May it stay ascendant for a good long time. 


Search KMWorld

Connect