Content Declension: Adaptive content for the Hierarchy of Information Needs

Last week, I wrote a piece called, “A Hierarchy of Information Needs.” It described one way to think about how an information seeker’s question, problem, or interest at any moment can, in effect, blind him or her to other content, no matter how it’s formatted, nor how much the content’s creator wants it to be seen. Usually, the need for the “ephemera” of life becomes the dire matter to be resolved first.

That got me thinking about how else one might use the idea of a hierarchy, which led me to this pondering:

Consider how often a content owner would like for your content to be the most important thing in front of an information seeker at a moment. Let’s say, further, that the content is a full, well-created, powerful “story,” which will bring lasting value to the information seeker, if only you could get it to shift down the hierarchy, from “story” to “reference,” to “ephemera.” If only your content could be fleeting, you reason, and if it could wink out of existence as soon as it’s served its purpose, then it would be seen, explored, and valued in all its fullness and glory.

Content “Declension,” or manifesting your content at each level of the “Hierarchy of Information Needs”

I’m going to call this process—of setting off a content cascade through the hierarchy—“Content Declension,” which I will further call just one process of “Content Grammar.”

In many languages (other than English), nouns “decline” to suit the context in which they’re used. They take different prefixes and suffixes, and sometimes they take on entirely different forms, in order to communicate their role in a sentence, roles that are called “cases.” As a basic example of how this process works, you’ll recognize the vestiges of this process in English:

He is the subject, and the subject is about him, and his story is fascinating. 
(Nominative case he declines to dative case him, and then to genitive case his…)

OK, it’s a thin example, but Content Declension is the process of establishing patterns and formats for the different cases (or “contexts”) in which your content appears.

When you are creating content, it is vital to consider how it will be able to satisfy your information seekers’ most immediate needs, while providing paths deeper into the whole content. In one sense this is about creating useful, meaningful abstracts of your content, but it’s also about establishing consistent formats for each level, so that no matter what the underlying content, it will be clear how it all fits together, and where you are at each level of the content’s inherent “hierarchy.”

Let me use a blog post as an obvious example. This is easy-peasy on a printed page, since the article appears in a fixed position and format. In digital publication, however, the content declensions are complex.

full_storyWhen we think of the full article, standing alone on an HTML page, the answer is easy: We have the full “Story” form, with all its parts, in all their glory: All the text, the byline, the images and videos, as well as the comments, contact links for the author, and perhaps legal information, too.

At the side, however, is another box, called “Related Stories,” which is a “Reference” content component. With a glance, you can see other content you may want to read, but you don’t have to go there if you don’t want to. Inside that container are the stories’ “Ephemera” declensions. They probably include the headline, a thumbnail, a lead-in blurb, and maybe the byline. It just depends on how the designers chose the elements.

So all together, in this example, we have to plan three declensions of the same content: The full Story, the Reference, and the Ephemera— the same content, in three case forms.

It is vital to consider all the contexts in which your content will appear to the information seeker: In sidebar lists, in search results, in printed documents, in content links, and even in URLs. The more you can plan for the contexts in which your content appears, the better you can present it in a form (and format) that will suit the seeker’s present need.

Why is this important? It’s another step in making your content “adaptive” in preparation for “responsive design.”

But there are contexts, and there are contexts.

recent_postsFor the content of our time, there are infinite possibilities for what content is going to show up where, on what platform, in what physical context, and on and on, as we as content strategists are painfully aware. We have also been introduced recently to “responsive design” as a method of resolving some of that uncertainty and “adaptive content” as a way to teach the content about itself, so it can communicate its topics and other meta-properties to the design, so that it can shift.

But I would say that there is an additional property that we have not yet systematized, which is “content context.”

  • What happens when this content is called as a “link?” What do you, the content designer, want to present as the properties in the link?
  • What if the “link” is in a “related links” container? Should it be the same “link” as when it appears in the “Search Results” list? How can the metadata communicate which content ephemera should appear when it appears in one context or another?
  • How can we ensure that when this story is called from a blog post, it declines in one way, and when it’s called from a Twitter feed, it declines differently?
  • What if you want to provide hooks for other contexts, so that related content is served up in some contexts, but not others, when someone else is specifying the display?

search

 

A Call for the Next Evolution of Standards: Content Grammar

Content declension, as a standard, would need to address two issues. First, it would require that content experience designers imaging the functions and contexts in which a full version of content might appear, so that a responsive design could address differences in display for different contexts.

But it would also require that we establish a standard system to name these contexts, like any other evolution of markup. We would need to say that a link.related-links would be different from a link.search-results, to be followed by the fields, attributes, or properties that should appear in those cases. Something like that.

As Content Strategy is evolving, we are uncovering new questions and puzzles related to the “substance” of the digital universe, and I think this is an important next phase, like the “semantic web,” we might call it the “grammatical web.” I expect that if we sit here some more and think at it long enough, we’ll come up with more “Content Inflections,” like “Conjugations.”

Let me know what you think.

 

The PDF Tar Pits: Where content is trapped, struggles, sinks, and dies…

I’m working on a small government web project at the moment, and I was asked to assess the content to propose some content types. As I have looked across the landscape, there were very few content types, really. But then, as I continued my survey, I noticed these suspicious, dark black patches. I couldn’t see beneath their surface, so I started probing, moving closer…THEN, I was caught in sticky, black goo that started to pull me under, as panic rose in my throat…

Caught in the PDF Tar Pits

This is web content that was authored in MS Word, converted to portable document format (PDF) files, and then uploaded to the website, rather than loading it into a content management system (CMS) as text and images. PDF document libraries sprawl insidiously across the internet landscape, trapping living, breathing content in their depths, ossifying into solid rock—unusable, un-reusable—until some content strategist chips away the asphalt to discover the bones of content that is probably extinct, or at least years out of date.

I understand how these PDF libraries are created and why. Really, I do. It’s easier to have content owners whack up content however they want, then just toss the PDFs online, rather than spend the time to consider the content carefully, giving it the time, attention, and respect it deserves.

Let me offer, though, some reasons for helping to pull the poor, thrashing, doomed content back out of the tar.

  1. Oh, PDFs are searchable, so it’s OK to dump…er…upload them.

    It’s true. Adobe over the years has made provision for lots and lots of embedded metadata, so it’s easier to find them. But while search engines can index PDFs so that they can be found, the real human beings who are searching for that content cannot scan them to see whether it’s the content they’re seeking without opening them. Don’t make your visitors become content excavators. Don’t make them open that PDF to skim it.

  2. But this is how I want it to look.”

    It’s true. There are many times when our designers spend hours creating beautiful, high-end, printed publications. That’s good. That’s their art and craft. But creating print-ready publications does not release us from the responsibility of making all that content directly accessible as html text and images. You can certainly make BOTH available, as indeed you should.

  3. We’ll just make a content type for PDFs.”

    It’s true. It is indeed important to have content types to represent files in libraries, ready for download. They need to make their metadata available to the CMS, so that appropriate related files can be offered up alongside other primary content. But when text and image content are caught in the PDF tar, their own content type is masked. Unless the content is pulled from the PDF, your CMS cannot manage the true content types correctly. Files of the “PDF” type will be indistinguishable, one from another, all sticky and black as they are.

  4. This is how we got it from the content owner, so that’s how we’re going to publish it.”

    It’s true. Content owners own their content. (I know, it sounds silly.) They spend a lot of time, laboring in MS Word to format it just so. When they hand over their œuvre to you for posting, you’re stuck between appreciation of their efforts, compassion that they spent so much time on it, and horror that it’s going to require stripping it of all its format before it can be reformatted for the CMS. If you have content owners who are open to the liberation of “just give me the text,” then you can make their lives (and yours) easier, and the content escapes oblivion. If not, then although it means a longer road, reformatting the content will take you safely around the tar.

  5. PDFs of unstructured documents can never be reused as structured content.

    Finally, the most important reason for eschewing the PDF is that when content owners create MS Word documents, they almost never—like, ever!—understand the difference between “format” and “structure.” So they skip blithely through their document, clicking bold here, italic there, and changing fonts and colors according to how they think it will communicate their intentions, without capturing the meaning of those formatting changes in the structure of the content. If unstructured documents are then converted to PDFs and put online, they will be unusable as structured content, and meaningless to semantic search.

Time to Drain the Tar Pits

The simplest guidance you can give your clients, content owners, and stakeholders is to reserve PDF files for content that has been designed to be printed, and then only as a supplement to the live web content. You can probably get away will making PDFs available for content that no one will ever really need, like legal reports and other specific content types that will actually be easier to consume as printed documents rather than as web documents. Even in those circumstances, abstracts of that content should be posted, so that content consumers will be able to preview the documents before committing themselves to downloading them.

A Hierarchy of Information Needs

I have spent a lot of my professional life observing—and trying to support—communities in their “information sharing,” more commonly known as “communication.” On one side stands the “Organization,” also known as the “content owners,” wanting to pump a steady stream of content to the other side, variously designated the “Users,” “Employees,” “Customers,” etc. I’ll just call them the “Community.” Bridging this Great Information Divide, we have “Technology,” and by “technology,” I mean any of the channels we use to communicate, whether through “grapevines,” telephone calls, in-person visits, snail mail, e-mail, social media or any other means.The Great Communication Divide

Often, I have been charged by the “Organization” to “fix” communication with the “Community,” so that everyone will know what we want them to know, and by extension, to do what they’re supposed to do. (That goal, by the way, merits careful examination…)

It doesn’t matter whether we’re talking about internal communications, marketing campaigns, support sites, or conversing with your kids, humanity is engaged in a continual struggle to reach others with information we consider important, and yet no matter what methods we try, no matter how many channels we use, no matter what the most sophisticated technology we have available, the goal of “getting through” eludes us.

It’s no better from the community’s perspective. We have simple questions (we feel), and there doesn’t seem to be any easy way to answer them. The organization seems to bury the content we want (because we know it must be there somewhere), blocking our way instead with worthless stuff like “Frequently Asked Questions” that don’t include any of our questions. We refer to this as not being “user friendly.”

Though I am not a scientist and have conducted no formal research, I have nevertheless evolved some practical approaches over the years to explain and address the difficulties inherent in how people share information.

My first insight into these difficulties was that there is a Terrible Truth about People and Information: “People will not know a thing until they are ready to know it, and when the terrible moment arrives in which they are ready to know it, they will ask you why you didn’t tell them.”

I want to tell you about another insight, as well. Although Clay Shirky says that “information overload” is bunk, insisting instead that the problem lies in our own poor “filtering” skills, I still think the concept points toward another “terrible truth” about our relationship with information.

Next question, please: Prioritizing the information job queue

At any moment, we human beings maintain a “job queue” of questions and information needs of greater or lesser importance, and we can only satisfy them one at a time. When we set out to find information, if a single question is clamoring for all of our attention, we can absorb no other information—it becomes completely invisible—until we have answered that single, burning question. If on the other hand, we aren’t actively seeking information, whenever we come across some information, we pay it more or less attention depending on where we place the need for that information in the job queue. If we are aware of no need of or interest in that content, it likewise becomes completely invisible.

How we establish our information priorities is complex, but I have observed that different “classes” of information seem to arrange themselves into an order, a pyramid much like Maslow’s “hierarchy of needs” (1943), that underlies and partitions our priorities. Information that deals with our basic human needs and fears, for example, tends to be of higher priority than more complex content that addresses higher-order questions. Further, certain information classes are more readily bumped to the front of the queue, as it were, and when the need for that information is immediate and powerful, it can eclipse the other classes completely.

A Hierarchy of Information Needs

hierarchy-of-needs

Ephemera: The plankton of the information food chain.

Ephemera are information bits, whose useful life is brief, but which surround us like a fog. They include event times and places, alerts and reminders, breaking news, and upcoming deadlines. When their moment has passed, they become worthless, but while we need them, we can be absolutely single-minded in our search.

Reference: I just need to look something up. Hang on…

Next in the hierarchy (and probably a very close second place) come the “references.” These include all the standard listings, calendars, directories, maps, and any other information displays that people want to consult as quickly as possible.

Ironically, people’s need for references to the ephemera of life must be satisfied before they can pay attention to anything else. Look at your web traffic and search results, and you will invariably see that people are looking for something like today’s lunch menu, the weather forecast, the current job openings, or bus schedules. Information consumers need more of the ephemera, more often, than any other content class, and they often find them through references.

Procedure: How do you work this thing??

Procedural information provides instructions and explanations of ordinary, essential processes. While procedures are the stuff of help files, demonstration videos, product manuals, and other collateral, they also include shopping carts, job applications, tax forms, and any other system that accomplishes tasks.

When people have all their current facts and searchable references safely within reach, they next need some resource that gives them access to the processes and procedures to accomplish the task at hand. Not only does this class include salient, coherent, and simple explanations of a process, it also includes all the forms and other triggers that help people to set the wheels in motion to reach their goals.

Only now do we reach the “higher” orders of content. Again ironically, content owners often want the community to prize these content types above all others, yet unless the community can satisfy their needs for the three previous classes, they are unlikely to benefit from the next two.

Story: Tell me about it.

Stories are powerful. When well done, they carry more meaning and information than almost any other method of communication. Stories, in this sense, include not only articles or tales, but also infographics, videos, and any presentation of information that will require more than a passing glance to comprehend. Stories also include marketing messages, by the way, as well as customer product reviews, blog posts, and forum discussions.

Organizations, especially those that hope to engage their audiences and build lasting relationships with them, generally want to “jump right to the stories” of their content and ignore the other stuff. The great investment they make, however, in creating powerful, meaningful story content may be completely lost if their audiences aren’t able to satisfy their more primary ephemeral, reference, and procedural information needs. For example, if your customers can’t find the instructions on your site for the doo-dad they just bought from you, they’re not likely to see your offers for add-on doo-dads or services.

Foundation: It’s all in the small print.

Foundation content rises to the pinnacle of the pyramid because it almost never changes, requires the greatest amount of attention to grasp, and is needed by the fewest number of people, and then only rarely. It can include your terms of service or privacy policy, the archives of stories that have been retired and have only historical value, or text published to fulfill legal requirements. (As a side note, sometimes foundational content, like credit card fee schedules and terms, should be transformed into “stories,” so that people will actually understand them. Just sayin’…)

If foundational content is cluttering up the path, if it stands in the way of achieving goals, then our attention and patience grow very short, indeed. Foundation content should be well-organized, well-structured, and easily accessible, but it also needs to be relegated to the “conventional” places, such as in footer navigation or investor sub-sites, so that those who need it will be able to find it.

Resist the temptation to put your priorities ahead of the community’s

Organizations often want to banish information like today’s lunch menu from the home page of their intranet, driven by a concern that “trivial” content cheapens the whole content collection. The most prominent positions should be reserved, they insist, for more “substantive” content. That’s a dangerous long-term strategy.

By acknowledging and embracing the community’s information priorities, you can help them satisfy their needs, accomplish their goals, and lead them to the content you most want them to see. In other words, give them what they need to satisfy their most basic needs, and they will have attention to pay to other things.

Sorting your content into these classes can guide both your content and information architectures. Not only does it help model the types correctly for your system, but it also suggests the navigation and interface that will give your community easy access to the information they need at the moment they need it most.

Adaptive Content: Our primary platform is burning; Time to jump.

The Burning PlatformWe were honored at our last enterprise web developers’ conference to welcome Karen McGrane (@karenmcgrane) as our first keynote presenter. I have known Karen since we were both attendees of the “Content Strategy Consortium” at the 2009 Information Architecture Summit, and every encounter, every opportunity to listen to her speak, has been an inspiration to me.

Currently, Karen is giving a talk called “Adapting Ourselves to Adaptive Content,” and many of you may have heard her give it as the closing keynote at the Content Strategy Confab 2012 in Minneapolis. For any who haven’t had the pleasure yet, I’d like to review my principal revelations from that marvelous talk.

As our conference theme was vaguely articulated as “mobile,” she addressed herself to the issues of how to ensure that our content plays well, when we have no idea on what sort of device or in what context people may be encountering and consuming our content. But more important than the “how-to” aspects, my main revelation from the talk was how hard it can be for us as content designers and producers to let go of control—to confront and release the idea that our content has a “primary platform,” from which are derived all the formats for the devices and contexts we can imagine and plan for.

Abandoning the “primary platform”

I think the greatest insight I gained from Karen’s adaptive content talk is the idea that historically, all content has been designed and created for a “primary platform,” whose format is well understood. After its initial publication, it must then be reformatted to meet the design realities of any other contexts in which it is to appear.

For example, a slick sales brochure is created as a print document. In this case, the paper page is its “primary platform.” The designer kerns and justifies, styles and tweaks, until a beautiful product has emerged, ready to be handed out at tradeshows or mailed out to prospective donors.

Then someone says, “Hey, we need to get this ‘up on the web,’” and it is (implicitly or explicitly) understood that it should look as much like the printed piece as possible. The brochure is then exported as a PDF, and on some webpage, there is a link to download it.

But then, someone notices that the brochure PDF doesn’t look right on a phone…or a tablet. The display is either too small to read, or it doesn’t rotate well from portrait to landscape. It is handed back to the designers to be “fixed.”

The design team then becomes trapped in an inescapable cycle of creating multiple formats for every content piece, first for print, then for web, then for mobile devices. The need to rework the design for different contexts multiplies the time and cost of creating the content.

Some designers, feeling the pain of the rework process, recommend “designing for mobile first.” But then “mobile” becomes the “primary platform,” and the need for redesigning and reformatting content for other contexts remains.

Responsive  Design: Teaching your design to adapt to its surroundings

Ethan Marcotte has sounded the call for “Responsive Web Design,” which from the visual designer’s perspective, offers a solid approach to putting intelligence into the CSS code, so that a design “knows” what device is calling it, and it can respond with the appropriate styling and format to match. By incorporating media queries and relative measures, web designers can teach their designs to accommodate a wide range of devices and formats. This brilliant work is revolutionizing the way we make design decisions and write code.

But if “responsive design” is about teaching the design know the device, “adaptive content,” according to Karen, is about teaching the content to know itself.

γνῶθι σαυτόν: Teaching your content to “know itself”

“Designers are control freaks,” admits Jared Ponchot at Lullabot in a blog post on responsive design. News Flash: So are writers, editors, and other content producers. “Hello. I’m Stephen, and I’m a content control freak.” I can only say that self-knowledge is the first step toward wisdom.

But it’s time to admit that we’re powerless over technology and its users. We can never know enough about our users, their needs, or their devices—let alone how devices will have changed by next year—to teach our content how to adapt to them. Instead, we must build into the content solid information about its structure and meaning, so that we can allow others to make decisions about how it should look and behave.

(It’s probably more like parenting than we care to admit: Parents do their best to rear their children and help them to know themselves, but eventually they must let go and let them be their own adults. They have to stop following them around to make decisions for them. I can hear my mother saying, “But you’ll always be my content…!”)

Karen points to National Public Radio’s “content API,” which streams no design information, but only content and its structure. Because the API doesn’t know anything about devices, devices can present the content according to their native styling instructions. The NPR website has templates to style the content for the main platforms, but application developers can also write native applications to style the content for their particular target devices and contexts.As technology changes, so will the styling, but the content remains well-structured and ready for anything.

Design can only be “responsive” when content is “adaptive.”

On reflection, I think the primary message of Karen’s talk is that we’ll get the most out of “responsive” design when we learn to make our content “adaptive.” We’ve long said that structure and presentation—content and design—should be independent of one another. Well, folks, it looks like this time we have to mean it. It will require both disciplines—and facing down our control needs—to provide rich content that plays well across the dizzying array of platforms.

Time for a deep breath. Time to jump…

The Trouble with Semantic Markup: Response to schema.org

First thing this morning, checking in on the Twitter streams, I saw Jeff Evans (@joffaboy) announce the article, “Google, Bing & Yahoo’s New Schema.org Creates New Standards for Web Content Markup.”

Initial tweet

My heart began pounding as soon as I read the title. The arch-rivals of search, the biggest dogs in the yard, the great institutions of the web were collaborating to propose a solution to the problem of markup that has plagued me from the beginning: Markup doesn’t really address the substance of the web, just its most basic structure. My hopes were further raised by the mention of a “recipe” content type, which if you follow my writings, you’ll recognize as a regular example.

I retweeted in a flash: This is what I’ve been looking for!

My first retweet

Then, I visited schema.org, and all my hopes came crashing to Earth again. The Search Giant monsters have created a new monster.

My second retweet

Quick Overview

As I understand it, schema.org is proposing additions to HTML that the “Big Three” search engines are going to interpret, in order to improve the accuracy of search results. By augmenting the markup in web content, they are together settling on a standard vocabulary, so that they will all be recognizing the same language. Presumably, once they’ve built this standard language into their sorting algorithms, any content that has these augmentations will rise to the top of search results, above content that doesn’t.

In principle, that sounds good, doesn’t it?

I’d like to offer some reflections on a few practical implications of this effort.

Corporations try to head off the “free” Semantic Web

For-profit companies have been watching in dismay for twenty years the rise of the “free” WorldWide Web. Content is free. Software is free. Social Networking is free. And more and more of the web is being driven by “free” efforts, like the WorldWideWeb Consortium. Volunteerism is a huge threat to capitalism, and they know it.

Among the greatest of these free efforts is the quest for the Semantic Web, which in its simplest terms, seeks a set of standards for describing the meaning of content. Human language is always problematic—as are those who use it—because words are never just words. The meaning of words is rich, contextual, ambiguous, and worst of all, ever changing. There are a lot of really, really smart people, all over the world, almost exclusively volunteer (with some corporate support), working hard to figure this out. If you want to get a sense of the complexity of it all, talk to Rachel Lovinger (@rlovinger) at Razorfish. She’s one of the true semantic geeks, and I’ll just have to take her word on most of what she says. She’s fab.

But instead of supporting this “free” effort, the Search Giants have imposed a de facto standard for the Semantic Web, and they’re pushing it with the strength of their size and popularity. Like the Zen question of the tree in the forest:

If a search engine doesn’t support your semantic standard, will anyone find your content?

I am suspicious of their motives. I read it as an effort to bypass all the work that’s already gone into the Semantic Web.

Markup is more than basic structure and presentation

It has been a great struggle since the beginning of the web to strike the appropriate balance between the structure of content and its presentation. In other words, what content is should be distinct from how content looks. But HTML—even up to HTML5—still only addresses the most basic aspects of content, and even now, offers only tags that address the pieces of the “webpage”—like the “header” and “navigation.” There isn’t markup to describe the content’s substance.

CSS as semantic markers

Cascading Stylesheets, in a roundabout way is one approach to the problem, although it’s originally meant to control the presentation of the content. Let me give an example.

Lists are a primary content structure. We create lists for everything—ingredients, footnotes, archives, contacts, links, Q&A, references, etcetera ad nauseum—but HTML offers us only two choices: “Ordered lists” (numbered) and “Unordered lists” (bulleted).

If your website had a list of links in a sidebar and a list of staff names on a contact page, you use the same basic markup:

<ul>
    <li><a href= “http://url.for.link/1” title= “This is the first list item”>Link Text 1</a></li>
    <li><a href= “http://url.for.link/2” title= “This is the second list item”>Link Text 2</a></li>
</ul>

…and then…

<ul>
    <li>Contact Name 1</li>
    <li>Contact Name 2</li>
</ul>

Here’s the problem: The web browser has a default way of rendering these lists, and they will look exactly the same, except that the links will be underlined. If you want to distinguish them from each other, you can add CSS classes, which give you a way to style them differently.

Now, CSS gurus (the best of whom are really content strategists underneath it all) will tell you that you should NEVER use class names that describe how something looks, like “class= ‘blue_text’.” The class names should describe what they are, which is, in fact, a semantic indication:

<ul class="links”>
    […]
</ul>

…versus…

<ul class=“contacts”>
    […]
</ul>

Using these identifiers, the designer can define precisely how each component of a website should look. In a better world, however, they could also be used to identify what they are. Defining standard CSS classes and identifiers as part of XHTML would be one approach to encoding the meaning into markup.

But not Google, Bing, and Yahoo—Noooooooo.

The Search Giants, though, instead of building on CSS or any other existing approach, have introduced another “standard,” which superimposes another layer of markup on top of the feeble XHTML we already have. Here is the example from schema.org:

<div>
    <h1>Avatar</h1>
    <span>Director: James Cameron (born August 16, 1954)</span>
    <span>Science fiction</span>
    <a href="../movies/avatar-theatrical-trailer.html">Trailer</a>
</div>

Before I go any further, I have to say that this code doesn’t look like any real XHTML I’ve ever seen, and that’s a worry right from the start. Nevertheless…

Once they’ve applied their markup augmentations, again right from schema.org, it becomes:

<div itemscope itemtype="http://schema.org/Movie">
    <h1 itemprop="name">Avatar</h1>
    <div itemprop="director" itemscope itemtype="http://schema.org/Person">
    Director: <span itemprop="name">James Cameron</span> (born <span itemprop="birthDate">August 16, 1954)</span>
    </div>

    <span itemprop="genre">Science fiction</span>
    <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>

There are many, many, many things wrong with this picture.

All the complexity of XML without any of its simplicity

XML is the mother of all markup. In fact, XHTML is just one markup language based on the XML standard. Using XML as the basis of your web code is an elegant—but very complex—solution to defining your content. When it’s all worked out, however, it lets you replace that gobbledygook above with something more like this:

<movie>
    <title>Avatar</title>
    <director>
        <name>James Cameron</name>
        <birthdate> August 16, 1954</birthdate>
    </director>
    <genre>Science fiction</genre>
    <trailer url= “../movies/avatar-theatrical-trailer.html” />
</movie>

Putting it simply, by augmenting XHTML with another layer of markup, the Search Giants have complicated the code immensely, making it just as complex as if they had done it in XML, but without any of the benefits of XML’s simple elegance.

Content is rarely this simple

The examples above deceive us, in any case: Yes, we can add fields to CMS templates for isolated metadata like “title” and “director,” but what about the main content itself? What about the meaning embedded in the article? Let’s say we’re writing an article about motion picture history, and we include the following sentence:

<p>James Cameron, best known for directing the sci-fi thriller,
“Avatar,” was born on August 16, 1954.</p>

All of the information in the schema.org example is present in that sentence, and if we were searching for content about James Cameron, we would have to rely on full-text searching.

If we were to use the schema.org augmentation, in order to make it all accessible to the search engines, it would get very messy, something like:

<p>
    <span itemscope itemtype ="http://schema.org/Movie">
        <span itemprop="director" itemscope itemtype="http://schema.org/Person">
        James Cameron
        </span>
    </span>,
best known for directing the
    <span itemscope itemtype ="http://schema.org/Movie">
        <span itemprop="genre">sci-fi thriller</span>,
        <span itemprop="name”>Avatar</span>
    </span>
,” was born on
    <span itemscope itemtype ="http://schema.org/Movie">
        <span itemprop="director" itemscope itemtype="http://schema.org/Person">
        <span itemprop="birthDate">August 16, 1954</span>
    </span>
    </span>.
</p>

Not for mere mortal content authors

Now we come to the main practicality of content: Content authors.

I have marked up a lot of content in my career, and I am an obsessive, precise, exacting author. On the other hand, I’ve implemented CMS templates and tried to configure the best WYSIWYG editors to be able to apply the right CSS classes within content. And I’ve worked with a lot of content owners to teach them the importance of good markup.

Here’s the hard reality: No matter how powerful the technology, no matter how carefully designed and coded the CMS templates, no matter how sophisticated the WYSIWYG editor, and no matter how much training we offer, any markup will ultimately succeed or fail on the content authors’ ability to use it.

And that brings me to my main issue with the Semantic Web.

The Semantic Web cannot rely on encoding alone

If the main difficulty of searching the web is in understanding the meaning of the content (given all the languages, people, markup skill, and so many more factors), then we can really only solve it the hard way: Intelligent reading. We cannot rely on the human beings who create content to make it speak for itself, by making sure that everything is tagged correctly. They just can’t do it.

We cannot rely on markup because XHTML is insufficient, XML is too complicated for more than data structures, and the schema.org effort is unrealistic. In the end, each method may play a limited role in addressing the findability of content, but ultimately, it will require some other kind of intelligence—intelligence in the interpreting of meaning, rather than its encoding.

I don’t know what will happen with the schema.org markup augmentations. Personally, I hope that it just sags under its own weight and disappears into the marshes from whence it came. And I heartily encourage all the folks who are working on this problem to keep at it: There’s no path to success here but the long one. Eventually, perhaps new kinds of computers will be able to understand us weird, wonderful human beings, but for now, we remain inscrutable to the mechanical, algorithmic mind.