Let’s Talk about Semantics

by .

It’s time we had “the talk”. I could get you a book or recommend some sites from Dr Mike’s special bookmarks folder, but the best way to make sure you get the right idea is to do it myself. I’m talking about HTML semantics. Understanding the thinking behind the naming of elements will help your markup shine.

Semantics and the Web

Semantics are the implied meaning of a subject, like a word or sentence. It aids how humans (and these days, machines) interpret subject matter. On the web, HTML serves both humans and machines, suggesting the purpose of the content enclosed within an HTML tag. Since the dawn of HTML, elements have been revised and adapted based on actual usage on the web, ideally so that authors can navigate markup with ease and create carefully structured documents, and so that machines can infer the context of the wonderful collection of data we humans can read.

Until — and perhaps even after — machines can understand language and all its nuances at the same level as a human, we need HTML to help machines understand what we mean. A computer doesn’t care if you had pizza for dinner. It likely just wants to know what on earth it should do with that information.

HTML semantics are a nuanced subject, widely debated and easily open to interpretation. Not everyone agrees on the same thing right away, and this is where problems arise.

What’s the point?

Discussions over the importance of semantics are happening all the time, and every so often there’s an uproar over specific articles on the subject. Divya Manian caused a stir in her Smashing Magazine article Our Pointless Pursuit Of Semantic Value in , in which she argued we have become too caught up in trying to use HTML5′s semantics, and that the benefits aren’t worth it:

Allow me to paint a picture:

  1. You are busy creating a website.
  2. You have a thought, “Oh, now I have to add an element.”
  3. Then another thought, “I feel so guilty adding a div. Div-itis is terrible, I hear.”
  4. Then, “I should use something else. The aside element might be appropriate.”
  5. Three searches and five articles later, you’re fairly confident that aside is not semantically correct.
  6. You decide on article, because at least it’s not a div.
  7. You’ve wasted 40 minutes, with no tangible benefit to show for it.

This generated a storm of responses, both positive and negative. In Pursuing Semantic Value Jeremy Keith argued that being semantically correct is not fruitless, and he even gave an example of how <section> can be used to adjust a document’s outline. He concludes:

But if you can get past the blustery tone and get to the kernel of the article, it’s a fairly straightforward message: don’t get too hung up on semantics to the detriment of other important facets of web development.

I’ll admit I’ve been in a situation where I’ve dug myself a mind hole, trying to decide which element is “correct” and then depressing myself with thoughts of how irrelevant it all seems. What gives me strength is the thought that I’m not marking this up for me, but for everyone who can benefit from the enhanced meaning. Whether it’s the browser, a search engine spider, an accessibility tool, the person you pass the project on to, or even future you returning to the project in six months time, the markup indicates how the content should be interpreted. As long as your markup makes sense and isn’t over the top (e.g., if you just replace all your HTML 4 <div>s with <section>s, you may be misunderstanding the element or even your content), then don’t worry so much about it all.

To help you choose the most appropriate element, we released a flowchart of HTML5 sectioning elements that you can print off and follow whenever you get stuck. If all else fails, don’t forget about your old buddy <div>. Then again, maybe you’ve stumbled across a need for something new…

flowchart of HTML5 sectioning elements

Naming Things

Of all the possible new element names in HTML5, the spec is pretty set on things like <nav> and <footer>. If you’ve used either of those as a class or id in your own markup, it’s no coincidence. Studies of the web from the likes of Google and Opera (amongst others) looked at which names people were using to hint at the purpose of a part of their HTML documents. The authors of the HTML5 spec recognised that developers needed more semantic elements and looked at what classes and IDs were already being used to convey such meaning.

Of course, it isn’t possible to use all of the names researched, and of the millions of words in the English language that could have been used, it’s better to focus on a small subset that meets the demands of the web. Yet some people feel that the spec isn’t yet doing so.

What about adding more elements?

The character Dr Zoidberg from the TV animation Futurama, with the caption: <article>? Why not <zoidberg>?

When I first met fellow HTML5 Doctor Bruce Lawson, I asked him this question: If we have elements like <article>, why don’t we have one for products of a shop? I understand more about HTML now than I did then, but at the time it seemed like a very logical element to add. Why restrict ourselves to documents with <article>s when the web has evolved beyond that with shops, applications, and games? I’m sure many of you have felt the same way. Some have even put cases to the WHATWG suggesting more elements like the often requested <comment>.

Recently, editor of the HTML5 spec Ian Hickson wrote responses to some of these requests for new elements for comments, explaining why <article> within <article> suffices for marking up comments:

<article> isn’t just for articles. That’s the point.

Note that its name is irrelevant here. It could be called <pineapple> — what matters is what it is defined to mean, not what its name is. And its definition is one that covers both articles and comments. They are both self-contained compositions.

As much as I am all for the much needed addition of <pineapple>, Ian makes a good case for why we don’t need <comment> (or similar). Dr Bruce often uses the following analogy when trying to explain <article>:

Don’t think of <article> as a magazine article. Think of it as an article of clothing, an independent entity that can be arranged in conjunction with other articles of clothing, but is a complete thing in itself.

I hear you ask, That’s all well and good, Mike, but what does understanding <article> have to do with adding new elements? To add something to the spec, you need to:

  1. document actual use cases,
  2. show how developers are working around the lack of this feature now, and
  3. make a compelling case for why HTML5 needs it.

Step one is to make sure what you’re asking for can’t be achieved with what already exists. This is exactly the problem with the proposals for <comment>. Any difference from <article> is so minimal that it isn’t worth adding.

If every proposal for a minor deviation of an existing element was accepted, the spec would be bloated with far more elements than necessary. User agents would have to keep up with the vast array of markup possibilities for a given bit of content — as would you, the author. And think about this: at the end of the day, what are you actually going to use that new element for? Really?

That’s not to say you should be discouraged from putting your ideas forward. Take it as advice on what to look for when you want to contribute something to the spec. It may even help you to think differently about your work. Some problems just need a fresh perspective to help solve them.

HTML5 Doctor, or: How I Learned to Stop Worrying and Love HTML

Once you learn to look past element names and think of their essential meaning, it gets a bit easier to write markup. Think of your chunks of content in terms of how they relate to each other and in which contexts they can be used. In our article archive, we’ve covered a lot of elements with examples of their use. If you’re ever in doubt, I highly recommend our flowchart of HTML5 sectioning elements to help you along.

Try to keep things simple. Overthinking your markup will only cause more problems than it’s worth. And let’s face it: it’s not like you can’t change it later!

You may also be interested in adopting Microformats, something Dr Oli Studholme has written about here on HTML5 Doctor. Extending HTML5 — Microformats is recommended reading for those who want to take their markup a bit further. Dr Oli has also looked at Extending HTML5 with Microdata, which gives us a whole new way to add extra semantic information.

Finally, try utilising HTML5 markup in ways that gain the benefit of these new elements. Create tools, use your own documents to test how well-formed and structured they are, and test how portable your content really is. Who knows, you may even find yourself writing a proposal for the next addition to the spec.

Do you know something the web needs that’s missing from HTML5? Let us know in the comments.

24 Responses on the article “Let’s Talk about Semantics”

Andy Walpole says

I think generally the new tags such as and are a step in the right direction for HTML.

I wouldn’t advise anybody to get too caught up in finding an objective yes / no definitive usage of these tags as they are not designed as rules to follow but as coding assistants.

If you spend any time looking at source code then you’ll see as many different implementations of HTML5 tags as you do sites that use them.

I’ve used HTML5 tags on various projects but whether you use divs or semantic tags really doesn’t make any difference to your coding skills – and it certainly won’t matter one fig to your client

Infamously, a HTML5 shiv is needed to provide support for IE versions 8 and below. As HTML5 tags provide no practical value should we be adding a layer of JavaScript just so that we can be theoretically correct? It seems a bit daft to do so really

Morten Jonassen says

Agreed. It’s all too easy to get caught up in a long process of choosing the “right” tags.

Nevertheless, I still believe it is important to be thorough, in order to apply the best possible semantic meaning to the document.

Niels Matthijs says

Great article (I wish more people would worry about html).

From where I stand, the semantic idea of html is slowly changing. For a long time we relied on classes and ids to provide semantic meaning, I think those days are almost gone. Tags + microdata are all you need to enrich your documents with semantic data that other services/parsers can use.

Classes and ids are still important, not as true semantic indicators, but as hooks for those who follow after the html phase (the people css’ing, javascripting or parsing your html in whatever way possible). Semantic value is still a great help in providing useful class names and ids to elements, much more helpful than going purely on visual information (cfr OOCSS), as they describe what component are rather than how they look.

Once again I would like to give people the advise to try and html based on wireframes alone. Don’t bother about the design, don’t bother about js functionality. If you do it right, your html should be robust and flexible enough to match 95% of the requirements of any design or any functional requirement.

Neil says

You really can spend way too much time trying to generate the perfect document outline (if there really is one on most web pages).

I was recently working on a tab layout where each tab item was a h1, I thought perfect so each tab content is an article, but wait I have a no-js version which needs the menu on each page, so now I have several h1′s, hmmm so really they are sections, but I want them to be articles, frustrating.

My philosophy now is not to get too bogged down in the semantics, go with what feels right and go back if necessary and time permitting, let’s face it by the time the spec is ratified we will probably be on hover boards…

Ken Snyder says

You gloss over the fact that semantic elements are critical for machines.

Think about how much time Google spends trying to figure out what part is header, footer, and navigation and what part is actual content. The word “Doctor Network” may appear on every page… but what if you want to find the page about the Doctor Network. Yes, metatags do help, but tags like <header>, <footer>, <nav>, and <article> are critical for machines.

In my current job, we have done a good deal of page scraping. Let me tell you, markup out there is appauling! The accuracy of content indexing (e.g. search engines) would dramatically improve if everyone used more semantic markup.

Thanks for your flowchart! It is a great place to start the discussion on semantic HTML.

Christian Krammer says

Great article. I sometimes also cudgel my brains about which HTML element to take to markup this special part of the document. Unfortunately I don’t build too many HTML5 these days, since we are still stuck on HTML4 here (don’t ask me why). Therefore it’s often even harder to decide what to take. But your article will help me very much in the future about this decision, thanks.

Prashant says

Superb Article!
Nice explanation of why there is no < comment > element and the < article > can be used in replacement of the same.
I found DIVya’s opinion part right and part false. First of all if you have read the specs carefully and know the correct use of each tags , you wouldn’t spend 40 minutes thinking about what tags to be used(ok, initially the amateurs who are just beginning to code PSD to HTML5 might take about 40 minutes to decide which tag to be used). At times, it’s quite tricky whether the content can be in < section > or < article > or a < div >. It’s usually a good practice to search online if someone has got a good reason for using a particular tag for a purpose. This way we would be able to create more better codes not only for the current project but also for future projects.
But at times its good and safe to use div instead . This usually happens for a project with limited time and budget :(

Dan M says

Don’t think of <article> as a magazine article. Think of it as an article of clothing, an independent entity that can be arranged in conjunction with other articles of clothing, but is a complete thing in itself.

That is a fantastic comparison. I don’t think I’ve ever fully understood the purpose behind <article> until just now.

The purpose of <nav>, on the other hand, is a much easier to grasp. To me, this says that <nav> is a better naming choice than <article>.

I think Ian is wrong when he says the name is irrelevant. The name of an element is our first interaction with it, and our first attempt to grok its meaning. <nav> is easier to understand than <article>, which is easier to understand than <pineapple>.

How many times am I going to have to quote Dr Bruce’s comparison to explain why comments can be marked up with <article>? Probably many times, when the ideal answer is “none”.

Qwertj says

I think that we must discern the semantic value of a tag and the markup value.
Ten years ago a tag was purely for markup a page: we have, em, b, i, u, many tags without a really semantic value but with a central role in stylish.
Now HTML5 is semantic, so section and article aren’t styling much different (I mean by user agents), but their meaning are much different!
It will be a good thing to see even a comment tag, but not for weird styling purpose, but for a big help to spiders and other software which surf the net and analyze informations.
So I can say to Google “hey, this is a comment, so don’t treat it as the article above”

This is my point of view, please tell me if it’s too simplistic

P.S sorry for possibily bad english

Marc Diethelm says

Ha, choosing the right HTML5 tags is simple, but adding appropriate ARIA attributes and values to the markup that’s a bloody challenge!

Neil Haskins says

What about my tag?

Dipin Kumar Krishnan says

Hello Mike,

What do you suggest when we form a page layout using HTML5? I used to use so can we use now?

Dipin Kumar Krishnan says

My original comment missing some content. So re-posting: What do you suggest when we form a page layout using HTML5? I used to use ‘div’ so can we use ‘sections’ now?

Shae Kuronen says

Great article!

Seems to me that would have been a 100 times more semantic to added an <item> tag instead of <article>

<aside> has always seemed cumbersome as well, though I’m not sure what would be better

and why no <main> or perhaps <content>

<header>
<content>
<footer>

It seems to me, the pursuit of semantic meaning should clarify, not confuse.

Mike Robinson says

Nice article. FYI, people keep sending email to me that is probably for you.

Rich says

I can’t help but feel that most HTML 5 tags are pointless. While I agree completely about correct semantics you try explaining that to a potential customer. You won’t.

All they are interested in is cost and end results, and shouldn’t we all? Semantics aren’t going to pay the bills.

Any front end dev worth his or her salt will already be well versed with semantics with HTML 4 and it seems like all HTML 5 delivers is new tags which have been named after the most common class names we use on div’s anyway.

And what is the benefit of all this?

You have to fix for older browsers.

What’s the point in that? Another step in completing the site, another potential thing that can go wrong, and another thing that you will have to bill you’re client for that they won’t understand or see the benefit of. A div will do.

Mike Robinson says

@Mike Robinson: Sorry about that! I’ll have a word with the postman.

@Rich: The new elements enable user agents to easily identify the purpose of content on a page, rather than trying to guess based on more arbitrarily named ids and classes. While it is true to say many developers have become accustomed to using appropriate names for their classes, not everyone does and the chosen names most certainly don’t aways match.

The fix for older browsers is pretty stable now, keep the shiv in your base template and you don’t really have to think about it. Using the correct elements in your mark-up will become second nature, taking the same of less amount of time to use as a div+id/class combo. Your clients might not appreciate it, but rarely is the cleanliness of HTML for their benefit. The real benefactors are the user agents, who can use these semantics to parse the pages in more sophisticated ways, such as to power accessibility tools.

These benefits may not be readily available to the developer, but that is not a reason to avoid the new elements. Many of the tools that utilise this either do so quietly or are yet to be created. In fact, any one of us here reading these comments could create tools to take advantage of the new semantics. Why not lay the groundwork for a better future?

Edward says

If <comment> isn’t necessary because of <article>, why is <nav> exist when we have <section>? Further, the description of article as independently distributable components doesn’t make sense or still has way too many interpretations in many contexts. Does a comment make sense on its own or only with the parent article? (I could see different views with even that very common example). What about parts of a court opinion, an appendix to a book, or any number of domain-specific documents or content? If there are numerous ways for the this to be interpreted, then it really has no value as a standard.

Mike Robinson says

@Edward: <nav> differs from <section> as there is a significant difference between the two types of information they contain. <nav> contains navigational items which are used to assit the user, where as <section> is for sectioning content.

The same logic has to be applied to <article> vs <comment> – A comment is just another piece of content that happens to have a direct relationship to the associated article. Articles themselves can sometimes be a commentary on another article, should they too be considered as appropriate for <comment>? The degree of difference between the main body of content and the related comments is so minor that it is safer to not add a new element.

fritz stelluto says

Interesting piece, I like the “article of clothing” example (was the tag planned like that or are you post-rationalizing? never mind…. :-)

but I don’t think you have made a solid case for the current, arbitrary set of tags, you merely state that the procedure to get more tags added is cumbersome. A whiff of Kafka there.

<article> within <article> can be interpreted as comments, but they could also be excerpts in a blog index for example. There is no way to tell the two apart, and yet the difference is quite important. One is content published by the site owners, the other is (often) unvetted user generated content.

I find the distinction between an article as good you can buy and an article as a collection of words you can read is significant. It helps if you want to build a price comparison scraper for example.

I agree with the <article> within <article> above who said tag names are important to developers, and you should have chosen them more wisely. As someone else suggested in an <article> within <article> above, item would have been a better choice. Anyway, we are stuck with it.

Personally of the semantic element I use header. footer, aside, and nav as they make sense, and boycott article and section as they are so confusing to developers I doubt spider bots take any notice of them.

Basically, the emperor is wearing no articles and I totally agree with Divya.

Lika says

Hi,

As for me, this is very interesting because I don’t realy understood the difference between for ex. article and section or section and nav tags.

Before i start to use html 5, I always wrote class=”content”for my main div.

And I have a question about new suggestions for html5 and css3.

Can i write it here?

I don’t know if anybody already speak about this.
I search a little, but can’t find information, maybe I use wrong keywords.

So, I think that it will be very interesting to add displacement (or offset) for background images. You know something like this:

Please notify me if this future is already here :)

Lika says

My sample image is disappeared in previous comment.

Jason Burnett says

Although I will likely get a great deal of flack for this, I can’t help but feel that we are headed in the wrong direction with regards to the Semantic Web.

I understand the importance of being able to find semantic information in our content and I realize that a semantic structure makes a huge difference in search engine accuracy and when scraping, scavenging. I don’t think that we should avoid semantic declarations, but I also don’t think it should be HTML. Or at least it should be an extension to HTML called HTSL or something.

I wish that HTML had been purified to HT followed by the intended use of said Hyper Text: Markup Language, or Semantic Language. Then in the future, when we have device-specific contexts or audience relative contexts, we don’t have to rewrite the entire spec, we add a flavor of HyperText and call it Contextual Language.

Then, a simple engine would give priority to HT content that was semantically enabled or contextually enabled, etc.

But to extend the HT used to layout a document by adding new arbitrary elements named conspicuously like a media format that is quickly disappearing seems dirty and short sited to me. In other words, I would much prefer to have where the markup language provides the structure, the various other aspects are added to the markup.

Then generalize the markup language to include things like: or or or or . A semantic element (or context element or style element) seems so much more reasonable.

And when complaining about bulky markup by ADDing these elements, I say that device specific browsers could provide an option to recompile the markup to avoid the extra semantics and contexts and whatever they come up with a year from now. If your reader device doesn’t need the extra contexts or can’t really use the semantics, the device can easily get rid of it.

Until then, I am going to struggle with trying to stylize my semantic elements and just put up with being yelled at for my poor element choice. Jeesh.

Thanks for the great article. It clears up a few things and helped clarify my belief in the way things should be.

Jase

Bartdude says

As I read this almost 3 years old article and comments, I realize situation hasn’t changed much… Semantics is still hard to sell cause tools that make a real use of it don’t exist yet or are dedicated to a tiny minority of users therefore with no ROI. I guess we’ll have to wait until semantics have an impact on pagerank to see the clients, project managers and account managers beg for top notch semantic sites…

My personnal semantic nitpicking discussion is about the use of ol and ul tags… Most of the time we use ul, even when there is actually an order. Best example would be navigations: do you randomly build your navigation tree ? Certainly not… You wouldn’t put your “home” in the middle of the nav, and most probably not even at the end. So although there are probably several “natural” orders, there are also many “non-natural” orders, justifying the use of ol in this case and many others, at least from where I see it.

This just to add, if necessary, a proof to your point that discussions about semantics are endless… But so interresting :-)

Join the discussion.

Some HTML is ok

You can use these tags:
<a href="" title="">
<abbr title="">
<b>
<blockquote cite="">
<cite>
<del datetime="">
<em>
<i>
<q cite="">
<strong>

You can also use <code>, and remember to use &lt; and &gt; for brackets.