Semantic Markup & Microformats
Beginning with Standards
During the past year I’ve become interested in using web standards and semantic markup. The idea is to separate content from presentation. For example, I have followed a variety of holy wars over the use of <i> (italics) and <em> (emphasis) elements. Italics are presentations, emphasis describes the content. The <i> element has been depreciated in favor of <em>, but that leaves a few things hanging. For instance, how should I code a book or movie title?
I follow the convention that book titles should be italicized, eg. The Stars My Destination. But if <i> has been depreciated, that leaves <em>, which isn’t the correct choice — I’m not emphasizing the title, I just want to note that it is a book title.
So far I’ve just been talking about a visual distinction. But, as I mentioned at the beginning, I should be separating content from presentation — as well as adding meaning to that content. So, how is a machine to distinguish a book title from the rest of the content? Presumably the <cite> element, which is vaguely defined as "[c]ontains a citation or a reference to other sources." While I’m not convinced that a passing reference to a book constitutes a citation, it is better than using <em> or hacking up a meaningless <span>.
The next problem is that, by default, <cite> renders it’s content in italics. I prefer following MLA style, which calls for books, movies, plays, etc. to be italicize (or underlined), while articles, short stories, poems, etc. are quoted: The Stars My Destination (a novel), Fences (a play), Close Encounters of the Third Kind (a movie), "The Tyger" (a poem), "Bastille Day" (a song), etc. Clearly a basic <cite> element won’t work for all of these variations.
Additionally, at work I have to use AP style for marking up works. AP has a number of quirks, beginning with no italics. Additionally, books, movies, plays, poems, etc. are capitalized and quoted; reference works are not quoted, nor receive any other distinguishing marks.
So, with different types of works and different style guidelines, the <cite> element as-is simply won’t suffice. We’ll need to add some classes in order to distinguish types of work. This will give us hooks for styling — <cite class="book"> can be italicized or underlined, <cite class="poem"> would be render normal and even have quotes automatically added around it (at least in CSS 2 compliant browsers). Of course, if you need to follow AP or other style, changing just a couple of properties in the style sheet will take care of that. In the end, we’ve managed to visually style elements to our desire and added a bit of meaning.
Enter Microformats
With the advent of microformats, I now see a way to add more meaning to the classes I would have used simply for styling purposes. We can add meaningful values the <cite> (or other) element that indexers and others can make use of, all without adding, changing or otherwise hacking existing (X)HTML.
Dougal Campbell brought up the idea of a microformat for music tracks and other media on the microformats mailing list. I gather that there are others interested in this as well.
His idea goes far beyond what I was thinking (he mentions track name, running time, etc., ala ID3 tags) and that’s a good thing. Mention has been made of coming up with a format that would capture ISBN, author, editor, pages, etc. for books, magazines, et al. I’d like to see this microformat get created, as long as it is done in a modular format.
In other words, Amazon.com could use the full format for marking up books, but in a blog entry, I can use just a minimum of mark up to distinguish that I’m referring to a book; i.e. <cite class="scific novel">The Stars My Destination</cite>.
I’d also like to keep it element independent. In other words, I may implement this using the <cite> element, but if you feel strongly about using <em> or definition lists, you can do that. In fact, a definition list is probably a good way for Amazon.com to mark up titles (a block of title-related information) versus a passing reference (an inline mention).
I think that this is an exciting proposition and am interested to hear from others. What are your thoughts?
The URI to TrackBack this entry is: http://www.tjameswhite.com/blog/archives/2005/07/semantic-markup-microformats/trackback/
Comments »
No comments yet.
RSS feed for comments on this post.
Leave a comment
Comments are moderated, if you've commented before, it will show up automatically. If not, it will be submitted for approval. Please leave a name and e-mail address. They are for my verification only and do not appear online in any way shape or form. Without a name and/or and email address I don't know who you are your comment will not be approved.
Line and paragraph breaks automatic, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>