Extending HTML5 — Microdata

by .

For those who like (to argue about) semantics, HTML5 is fantastic. Old presentational elements now have new semantic meanings, there’s a slew of new semantic elements for us to argue about, and we've even in<cite>d a riot or two. But that's not all! Also in HTML5 is microdata, a new lightweight semantic meta-syntax. Using attributes, we can define nestable groups of name-value pairs of data, called microdata, which are generally based on the page’s content. It gives us a whole new way to add extra semantic information and extend HTML5.

Sometimes, it is desirable to annotate content with specific machine-readable labels, e.g. to allow generic scripts to provide services that are customised to the page, or to enable content from a variety of cooperating authors to be processed by a single script in a consistent manner.

For this purpose, authors can use the microdata features described in this section. Microdata allows nested groups of name-value pairs to be added to documents, in parallel with the existing content.

Instead of elements, these name-value pairs are defined via attributes:

  • itemscope — defines a group of name-value pair(s), called an item
  • itemprop="property-name" — adds a property to a microdata item. The name of the property can be a word or URL, and the value is the ‘content’ of the element with this attribute:
    • For most elements, the value is the element’s text content (not including any HTML tags)
    • For elements with a URL attribute, the value is the URL (<img src="">, <a href="">, <object data="">, etc.)
    • For the <time> element, the value is the datetime="" attribute
    • For <meta itemprop="" content="">, the value is the content="" attribute
  • itemref="" — allows a microdata item to include non-descendent properties by referring to the ids of element(s) containing them
  • itemtype="" — defines the item’s type when used on the same element as itemscope. The itemtype value is a URL that acts as an identifying vocabulary name.
  • itemid="" — allows a vocabulary to define a global identifier for a microdata item, for example an ISBN number on a book. Use itemid on the same element as the item’s itemscope and itemtype attributes.

Let’s go through these new attributes and see how to use them in practice with everyone’s favourite example band, Salter Cane.

Microdata syntax

itemscope and itemprop

<p itemscope>I’m going to the <span itemprop="name">Salter Cane</span> gig next week.
Excited!</p>

The presence of itemscope on the <p> element makes it into a microdata item. The attribute itemprop on a descendent element defines a property of this item (in this case, name) and associates it with the value Salter Cane (the <span>’s content). An item must have at least one itemprop to be valid.

itemprop names can be words or URL strings. Using URLs makes the name globally unique. If you use words, it’s best to use a vocabulary and the names defined in the vocabulary, which also makes the names unique. We cover this in the section Typed items and globally unique names.

itemprop value from an attribute

For some elements, an itemprop’s value comes from an attribute of the element, not the element’s text. This applies to values from attributes containing URLs, the datetime attribute, and the content attribute.

<p itemscope>I’m going to the <a itemprop="url"
  href="http://www.saltercane.com/">Salter Cane</a> gig 
  <time itemprop="date" datetime="2010-07-18">next week</time>. Excited!</p>

This defines an item with the properties url and date containing the values http://www.saltercane.com/ and 2010-07-18, respectively.

Note that the link’s itemprop="url" value is http://www.saltercane.com/ and not the element’s “Salter Cane” text content. In microdata, the following elements contribute their URLs as values:

  • <a href="">
  • <area href="">
  • <audio src="">
  • <embed src="">
  • <iframe src="">
  • <img src="">
  • <link href="">
  • <object data="">
  • <source src="">
  • <video src="">

Similarly, the <time> element’s value is 2010-07-18 and not its text content (i.e., “next week”).

Conversely, the URL-containing attributes of these HTML5 elements are not used as property values:

  • <base href="">
  • <script src="">
  • <input src="">

It's still possible to use the text of one of these elements as its value — e.g., <a href="">desired value</a>. We just need to add an additional itemprop:

<p itemscope>I’m going to the <a itemprop="url" href="http://www.saltercane.com/"><span itemprop="name">Salter Cane</span></a> gig <time itemprop="date" datetime="2010-02-18">next week</time>. They’re gonna rawk!</p>

This defines an item with three properties: the url is http://www.saltercane.com/, the name is Salter Cane, and the date is 2010-07-18.

We’ll return to the <meta> element’s content attribute in just a moment.

Nested items

We can make a property into a nested item by adding itemscope to an element with itemprop.

<p itemscope>The <span itemprop="name">Salter 
  Cane</span> drummer is <span itemprop="members"
  itemscope><span itemprop="name">Jamie
  Freeman</span>.</span></p>

This defines an item with two properties, name and members. The name is Salter Cane, and the members is a nested item, containing the property name with the value Jamie Freeman. Note that members doesn’t have a text value.

Items that aren’t part of other items (i.e., anything with itemscope but not itemprop, or the child of an element with itemprop) are called top-level microdata items. The microdata API returns top-level microdata items and their properties, which includes nested items.

Multiple properties

Items can have multiple properties with the same name and different values:

<span itemprop="members" itemscope>The band members are
  <span itemprop="name">Chris Askew</span>,
  <span itemprop="name">Jeremy Keith</span>,
  <span itemprop="name">Jessica Spengler</span> and
  <span itemprop="name">Jamie Freeman</span>.</span>

This defines the property name with four values, Chris Askew, Jeremy Keith, Jessica Spengler and Jamie Freeman.

One element can also have multiple properties (multiple itemprop="" names separated by spaces) with the same value:

<p itemscope><span itemprop="guitar vocals">Chris Askew</span>
  is so dreamy.</p>

This defines the properties guitar and vocals, both of which have the value Chris Askew.

In-page references via itemref

Items can use non-descendant properties (name-value pairs that aren’t children of the itemscope element) via the attribute itemref="". This attribute is a list of the IDs of properties or nested items elsewhere on the page.

<p itemscope itemref="band-members">I’m going to the
  <a itemprop="url" href="http://www.saltercane.com/">
  <span itemprop="name">Salter Cane</span></a> gig
  <time itemprop="date" datetime="2010-07-18">next 
  week</time>. Excited!</p>
  …
  <p>Salter Cane are <span id="band-members"
  itemprop="members" itemscope>
  <span itemprop="name">Chris Askew</span>, 
  <span itemprop="name">Jeremy Keith</span>, 
  <span itemprop="name">Jessica Spengler</span>
  and <span itemprop="name">Jamie Freeman</span>.</span></p>

This defines the properties url, name, and date. Additionally, it references the ID band-members, which contains the item members with four name properties, each of which have a different value.

Using <meta> to add content via attributes

If the text you want to add isn’t already part of the page’s content, you can use the content attribute on the <meta> element (<meta itemprop="" content="">) to add it.

<p itemscope><span itemprop="name" itemscope>Jessica Spengler
  <meta itemprop="likes" content="Mameshiba"></span>’s fans are always
  really raucous.</p>

Unfortunately, some current browsers move <meta> elements into the document's <head>. The elegant workaround is to use an in-page reference via itemref, so it’ll be included by tools that understand microdata even if the browser has moved it.

<p itemscope><span itemprop="name" itemscope itemref="meta-likes">
  Jessica Spengler<meta id="meta-likes" itemprop="likes" content="Mameshiba">
  </span>’s fans are always really raucous.</p>

Both of these code snippets define the property name with the value Jessica Spengler and the nested property likes with the value Mameshiba.

While microdata is best suited for annotating your existing content, by using <meta>-based or hidden values, microdata doesn’t have to be tied to a page’s content. In general, adding hidden content to a page is a bad idea. It’s easy to forget about and not keep up-to-date. If the information would be useful to some users, add it to the page’s content. If it’s inconvenient to add the content inside an item, consider putting it in a <footer> and including via an in-page reference.

Typed items (itemtype) and globally unique names

We can tie an item to a microdata vocabulary by giving it a type, specified via the attribute itemtype="" on an element with itemscope. The itemtype="" value is a URL representing the microdata vocabulary. Note that this URL is only a text string that acts as unique vocabulary identifier — it doesn’t actually need to link to an actual webpage (although it’s nice when it does). After doing this, we can use names in the vocabulary as itemprop names to apply vocabulary-defined semantics.

<p itemscope itemtype="http://schema.org/MusicGroup"> I went to
  hear <a itemprop="url" href="http://saltercane.com/"><span itemprop="name">
  Salter Cane</span></a> last night. They were great!</p>

This example defines the property url with the value http://saltercane.com/ and the property name with the value Salter Cane according to the http://schema.org/MusicGroup vocabulary (MusicGroup is a specialised kind of Organization vocabulary on schema.org).

Alternatively, you can use URLs for itemprop names. In this case, there’s no need to use itemtype as the vocabulary information is already contained in the name. These are referred to as globally unique names. While vocabulary-based names must be used inside a typed item to have the vocabulary-defined meaning, you can use a URL itemprop name anywhere.

Let’s rewrite the above example using URL-based names:

<p itemscope>I went to hear <a
  itemprop="http://schema.org/MusicGroup/url"
  href="http://saltercane.com/"><span
  itemprop="http://schema.org/MusicGroup/name">Salter Cane</span>
  </a> last night. They were great!</p>

This allows you to use multiple vocabularies in the same code snippet, even if they use the same property names.

Global identifiers with itemid

Sometimes an item may be identified by a unique identifier, such as a book by its ISBN number. This can be done in microdata using a global identifier via the attribute itemid="", if specified by the vocabulary. itemid can only appear on an element with both itemscope and itemtype="", and must be a valid URL.

<p itemscope itemtype="http://vocab.example.com/book"
  itemid="urn:isbn:0321687299">
  <!-- book info… -->
</p>

This defines an item containing information about a book identified by the ISBN number 0321687299, as long as the http://vocab.example.com/book vocabulary defines global identifiers like this.

This example uses a theoretical example from the spec, as schema.org Book vocabulary currently only defines ISBN as an itemprop, although they’ve mentioned plans to add itemid global identifiers to their vocabularies in the future. Global identifiers are defined for the WHATWG vocabularies for vCard and vEvent, with values like UID:19950401-080045-40000F192713-0052 and UID:19970901T130000Z-123401@host.com respectively.

Microdata in action

So now that we know how, why would we want to use microdata?

One use is adding extra semantics or data that we can manipulate via JavaScript in a similar way to custom data attributes (data-*). But if we use a vocabulary via itemtype or URL-based itemprop names, microdata becomes considerably more powerful.

While microdata is machine-readable without needing to know the vocabulary, using a vocabulary means others can know what our properties mean. This allows the data to take on a life of its own. Say what? Well, in effect, using a vocabulary makes microdata a lightweight API for your content.

If you visited someone’s homepage, wouldn’t it be great if you could add their contact information to your address book automatically? The same is true for adding an event you’re attending to your calendar. As the syntax examples were a bit example-y, let’s see how to do that using a real-world example — an upcoming event I’m organising (well, it was upcoming!):

<section>
  <h3><a href="http://atnd.org/events/5181" title="WDE-ex Vol11『iPad
  のウェブデザイン:私たちがみつけたこと 』 : ATND">WDE-ex Vol.11 — Designing
  for iPad: Our experience so far</a></h3>
  <p>On <time datetime="2010-07-21T19:00:00+09:00">July 21st 19:00
  </time>-<time datetime="2010-07-21T20:00:00+09:00">20:00</time>
  at <a href="http://www.apple.com/jp/retail/ginza/map/">Apple Ginza</a>,
  <a href="http://informationarchitects.jp/" title="iA">Oliver Reichenstein,
  CEO of iA</a>, will share the lessons they’ve learned while creating
  three iPad apps and one iPad website.</p>
</section>

WDE-ex Vol.11 — Designing for iPad: Our experience so far

On - at Apple Ginza, Oliver Reichenstein, CEO of iA, will share the lessons they’ve learned while creating three iPad apps and one iPad website.

A Web Directions East event — in code and displayed

Now we could start making up our own itemprop names on an ad-hoc basis, but this effectively prevents anyone else from using our data. By using a vocabulary and following its rules, others can also use our data. It’s a good idea to use a vocabulary, so where do we find one?

Using schema.org vocabularies

The schema.org vocabularies have now superseded the old Google vocabularies, so I’ve updated the article to reflect this.

Bing, Google and Yahoo have collaborated on a set of microdata vocabularies under the name schema.org. By using these vocabularies we can convey semantic information in our content in a way these search engines can understand. While adding semantics using these vocabularies won’t affect your search ranking, the included data may be shown in search results. The main vocabularies schema.org offers are:

  • Creative works: CreativeWork, Book, Movie, MusicRecording, Recipe, TVSeries…
  • Embedded non-text objects: AudioObject, ImageObject, VideoObject
  • Event
  • Organization
  • Person
  • Place, LocalBusiness, Restaurant…
  • Product, Offer, AggregateOffer
  • Review, AggregateRating

They’re the cross-search engine successors to Google’s earlier Rich Snippets vocabularies. Unlike Rich Snippets, which came in microformats and RDFa versions, schema.org vocabularies controversially only support microdata at the moment.

Taking our fairly standard HTML5 code for Oliver’s iPad event above, let’s add some microdata pixie dust using the schema.org vocabularies. First, the speaker (using the http://schema.org/Person vocabulary) and the venue (using the http://schema.org/Organization vocabulary):

<section>
  <h3><a href="http://atnd.org/events/5181" title="WDE-ex Vol11『iPad
  のウェブデザイン:私たちがみつけたこと 』 : ATND">WDE-ex Vol.11 — Designing
  for iPad: Our experience so far</a></h3>
  <p>On <time datetime="2010-07-21T19:00:00+09:00">July 21st 19:00
  </time>-<time datetime="2010-07-21T20:00:00+09:00">20:00</time> at
  <span itemscope itemtype="http://schema.org/Organization">
  <a itemprop="url" href="http://www.apple.com/jp/retail/ginza/map/">
  <span itemprop="name">Apple Ginza</span></a></span>,
  <span itemscope itemtype="http://schema.org/Person">
  <a itemprop="url" href="http://informationarchitects.jp/" title="iA">
  <span itemprop="name">Oliver Reichenstein</span>, CEO of iA</a>
  </span>, will share the lessons they’ve learned while creating three
  iPad apps and one iPad website.</p>
</section>
Adding schema.org microdata attributes for the speaker and location (ref: standalone file). Because they’re attributes, there is no change in how the HTML is displayed

While this content will still look the same, adding these microdata items means the following information is now machine readable:

  • A company’s name (name)
  • That company’s URL (url on <a>)
  • A person’s name (name)
  • A URL associated with that person (url on <a>)

In JSON (ref: Live Microdata) this would be:

{
  "items": [
    {
      "type": "http://schema.org/Organization",
      "properties": {
        "url": [
          "http://www.apple.com/jp/retail/ginza/map/"
        ],
        "name": [
          "Apple Ginza"
        ]
      }
    },
    {
      "type": "http://schema.org/Person",
      "properties": {
        "url": [
          "http://informationarchitects.jp/"
        ],
        "name": [
          "Oliver Reichenstein"
        ]
      }
    }
  ]
}

So we now have a semantic association between a company and a URL and between a person and a URL. With the right tool, we could add this information to an address book automatically.

Next, let’s add microdata attributes to the event data, using the http://schema.org/Event vocabulary:

<section itemscope itemtype="http://schema.org/Event">
  <h3><a itemprop="url" href="http://atnd.org/events/5181" title="WDE-ex
  Vol11『iPad のウェブデザイン:私たちがみつけたこと 』 : ATND"><span
  itemprop="summary">WDE-ex Vol.11 — Designing for iPad: Our experience so
  far</span></a></h3>
  <p itemprop="description">On <time itemprop="startDate"
  datetime="2010-07-21T19:00:00+09:00">July 21st 19:00</time>-
  <time itemprop="endDate" datetime="2010-07-21T20:00:00+09:00">20:00
  </time> at <span itemprop="location" itemscope
  itemtype="http://schema.org/Organization"><a itemprop="url"
  href="http://www.apple.com/jp/retail/ginza/map/"><span itemprop="name">
  Apple Ginza</span></a></span>, <span itemscope
  itemtype="http://schema.org/Person"><a itemprop="url"
  href="http://informationarchitects.jp/" title="iA"><span itemprop="name">
  Oliver Reichenstein</span>, CEO of iA</a></span>, will share
  the lessons they’ve learned while creating three iPad apps and
  one iPad website.</p>
</section>

On - at , , will share the lessons they’ve learned while creating three iPad apps and one iPad website.

A code sample with microdata describing an event (ref: standalone file). Adding the microdata attributes doesn’t change how the HTML is displayed.

Now things are a lot more interesting! We can extract the following data:

  • Event details
    • Event name (summary)
    • Event start and end time (startDate and endDate)
    • Event summary (summary)
    • Event URL (url on <a>)
    • Event location (location), which is represented by:
      • Company name (name)
      • Company URL (url on <a>)
  • Speaker details (well, someone connected with the event, although this connection isn’t explicit)
    • A person’s name (name)
    • That person’s associated URL (url)

Google provides a Rich Snippet testing tool, which shows what data it can extract from the schema.org microdata:

Data extracted from schema.org microdata by the Rich Snippet testing tool

Google could use this additional data in search results as follows:

Google Rich Snippet testing tool preview, showing data we marked up using microdata

The summary (linked to the event URL) and the date and venue are included. Nice! Just by using a vocabulary, Google (or a script supporting microdata) can discover lots of useful data about our event without needing one of those pesky natural language interpreters (otherwise known as humans).

Anyone who has used microformats before will also notice these vocabularies look very similar to hCard and hCalendar, although there are a couple of name changes — e.g., hCalendar’s class="dtstart" becomes itemprop="startDate".

While the schema.org vocabularies are all the search engines are promising to support, you can extend these vocabularies yourself. The safest ways to do this would be via:

Using Google’s Rich Snippets vocabularies

Google also has some basic vocabularies (precursors of schema.org vocabularies) for the following kinds of data, under the moniker of Rich Snippets:

  • people
  • businesses and organizations
  • events
  • products
  • reviews
  • recipes

These vocabularies support microformats and RDFa, two other ways to add extra semantics to our content, in addition to microdata. Apart from this difference, they’re basically identical to the matching schema.org vocabularies, except they use www.data-vocabulary.org instead of schema.org in the itemtype. While Google still supports them, the newer schema.org offers more vocabularies that are also supported by Bing and Yahoo, so choose schema.org vocabularies as long as you’re happy with microdata. You might still want to check out the Rich Snippets documentation, as it includes code samples and is generally better than schema.org’s at the time of writing.

Using WHATWG/microformats.org vocabularies

If you’re familiar with microformats or want more properties than Google’s vocabularies, the WHATWG HTML5 specification actually contains microdata vocabularies for both the vCard and vEvent specifications that hCard and hCalendar are based on, plus a licensing vocabulary.

Let’s take our earlier example and rewrite it using these vocabularies instead:

<section itemscope
  itemtype="http://microformats.org/profile/hcalendar#vevent">
  <h3><a itemprop="url" href="http://atnd.org/events/5181" title="WDE-ex
  Vol11『iPad のウェブデザイン:私たちがみつけたこと 』 : ATND"><span
  itemprop="summary">WDE-ex Vol.11 — Designing for iPad: Our experience so
  far</span></a></h3>
  <p itemprop="description">On <time itemprop="dtstart"
  datetime="2010-07-21T19:00:00+09:00">July 21st 19:00</time>-
  <time itemprop="dtend" datetime="2010-07-21T20:00:00+09:00">20:00</time>
  at <span itemprop="location" itemscope
  itemtype="http://microformats.org/profile/hcard"><a itemprop="url"
  href="http://www.apple.com/jp/retail/ginza/map/"><span itemprop="fn org">
  Apple Ginza</span></a></span>, <span itemscope
  itemtype="http://microformats.org/profile/hcard"><a itemprop="url"
  href="http://informationarchitects.jp/" title="iA"><span itemprop="fn">
  Oliver Reichenstein</span>, CEO of iA</a></span>, will share
  the lessons they’ve learned while creating three iPad apps and one
  iPad website.</p>
</section>
An HTML code sample with microdata describing an event, using vCard- and vEvent-based vocabularies (ref: standalone file)

Currently, search engines don’t map these vocabularies to schema.org ones. It’s possible they will at some stage, so decide which vocabularies to use based on what information you want to mark up, as the data is accessible regardless.

Criticism on microformats.org

Despite these vocabularies being based on vCard and vEvent and using microformats.org as their URLs, the microformats wiki actually warns against using the vCard and vEvent microdata vocabularies, stating:

For common semantics on the web … microformats are still simpler and easier than microdata, and are already well implemented across numerous services and tools.

Personally, I think the difference is marginal. If you use the recommended microformat profile links, I’d say it’s a wash. (But of course no one does ;). Microdata is actually simpler to use for date/time data than the microformat equivalents (although it is less permissive for fuzzy or ancient antiquity times), and it's more explicit, for example, avoiding the internationalisation issues of the “implied fn optimisation”. Tool support is a valid concern, but again I expect this to change over time — microdata is relatively new after all.

Browser support

The microdata specification describes the microdata DOM API, which allows us to access top-level items and their properties. Unfortunately, no browser supports this yet. Opera has experimental support in their latest snapshot, with support expected in Opera 12. It’s also being implemented by Mozilla.

Browser support for microdata (as of )
BrowserSupport
ChromeNo
SafariNo
FirefoxWork in progress
OperaIn snapshot 12.00-1033
Internet ExplorerNo

But that’s okay because this data is still useful for search engine robots and third party tools. For example, Bing, Google, and Yahoo are using microdata with the schema.org vocabularies in search results.

Tools for making and using microdata

With the right tools, we could use this data complete with its explicit semantics to, for example, add a microdata-annotated event directly into a calendar — very handy if you were planning to attend. With the release of schema.org, tools for microdata are starting to appear, but are still somewhat thin on the ground. (No Firefox plugin?! Inconceivable!). However we have libraries for three languages now:

Live Microdata converts microdata into JSON. If you’re using WHATWG/microformats.org vocabularies for vCard or vEvent, it also produces vCard and iCal output. The PHP Microdata library allows you to parse microdata in an HTML file, returning JSON or a PHP array. Mida allows you to extract microdata as JSON, and search for or inspect items. It supports defining vocabularies, and includes schema.org vocabularies. You can even use it from the command line.

Until there is native API support in browsers, we can use these libraries to access microdata. For example, you could put microdatajs on your own server to provide vCard and iCal file downloads, to allow adding the data to an address book or calendar app.

Validator.nu will validate your use of microdata, but not whether it conforms to a vocabulary. Google’s Rich Snippets testing tool also validates microdata. In addition, if you’re using the schema.org or Rich Snippets vocabularies it should display how that data could be incorporated into search results, as we saw in the Using schema.org vocabularies examples above. At the time of writing it isn’t, possibly due to this tool being updated for schema.org vocabularies.

If you have valid microdata in these vocabularies search engines will understand that data, but currently you have to “register your interest” in having your rich snippets actually be displayed (more information).

If you’d like a simple way to create microdata, with schema.org there are now several web-based form options:

  • Schema Creator supports the schema.org vocabularies for Person, Product, Event, Organization, Movie, Book, and Review
  • Schemafied supports the schema.org vocabularies for aggregate rating, article, contact point, creative work, event, local business, media object, nutrition information, offer, organization, person, place, postal address, product, rating, recipe, and thing, and also includes relevant items from other schemas
  • Microdata Generator supports the schema.org vocabularies for Attorney, Auto Dealer, Dentist, HVAC, Local Business Schema (NAP), Locksmith, Physician, Plumber, Real Estate, and Restaurant
  • HTML5 Microdata Templates supports the WHATWG and Rich Snippets vocabularies for events, organisations, people, reviews, and content

For CMSs, there’s a Schema for WordPress plugin, and initial work on adding microdata to Drupal too.

Other vocabularies

Here's a short list of microdata vocabularies and their itemtypes:

Person
Organisation or business
Calendar
Review
License
Products and services
Atom feed
  • hAtomhttp://microformats.org/wiki/hatom
Recipes

It’s also possible to use RDFa vocabularies by both specifying the itemtype and using URLs for itemprop names. Refer to schema.rdfs.org for Linked Data versions of schema.org vocabularies, and the RDF vocabulary clearing-house (“namespace lookup”) http://prefix.cc.

Making your own vocabulary

If you don’t see a suitable vocabulary, you could make your own. The microdata vocabularies in the HTML5 spec are included as examples of how to do it. Basically:

  1. Work out your vocabulary’s rules. This is a little like setting up a database — work out names for each type of data, then think what kind of data each name’s value should/must contain (URL, datetime, free text, text with restrictions…), and whether something needs to be the child of something else.
  2. Make up a URL on a domain you control, and ideally put your vocabulary specification there.
  3. Use the URL in itemtype="" to reference your vocabulary.

There are, however, very good reasons not to make your own vocabulary. They can be quite hard to create, as evidenced by the work that goes into microformats vocabularies. For truly site-specific data, you’re fine with HTML5 custom data-* attributes, or using microdata the same way. But to really get the quasi-API benefits of microdata, you need to use a vocabulary that’s on more than just your site. To make a vocabulary like that, you need to cover not just your own needs, but 80% of the needs of everyone else in the same subject area.

First, check out microformats.org to see if there’s anything in roughly the same area you can just microdata-ify. After that, try RDFa vocabularies. If you still have no luck, try collaborating on a vocabulary with other people in your subject domain. If you’re going to write your own microdata vocabulary from scratch, I’d recommend trying to write a microformat first, as you’ll get a lot of good feedback and they have good info on how to write one. It’s easy to then convert the resulting microformat vocabulary into a microdata vocabulary.

Conclusion

We’ve gone through the building blocks of microdata: a simple five-attribute combo of itemscope, itemprop="", itemref="", itemtype="", and itemid="" on most any element, plus using content on <meta>. We’ve looked at how to combine these attributes to add complex semantic annotations and relationships to your content. We’ve also looked at using common vocabularies, or even creating a vocabulary, to allow the annotated data to be reused widely. This makes creating a meta-API for your website easy enough that even a designer could do it! ;-)

But microdata is not the only way to extend the semantics of HTML5 and add extra meaning. We’ve already looked at Microformats, and RDFa is up next.

Further reading and related links

My thanks to Salter Cane for agreeing to be microdata-ified. Much obliged! Their new album Sorrow is worth checking out… ;-)

Changes

  1. Added PHP Microdata Parser to the tools section
  2. Updating article for schema.org vocabularies which are also supported by Bing and Yahoo, and supersede the Google-only Rich Snippets ones. Also updating browser support (Opera) and tools (Mida) sections
  3. Finished schema.org updates, added valicator.nu to the tools section, and rewrote itemid section for clarity, since Dr Bruce was having a spot of bother with it

47 Responses on the article “Extending HTML5 — Microdata”

Brian LePore says

Hmm, this seems very interesting. I think I am going to look in to converting Local Load to use Microdata instead of data-* as that seems to be a more appropriate

Eric says

Interesting. Are there any tools out there similar to the ones for microformats that you can use to help auto-generate the markup for you for microdata? I’ve often used tools like this http://microformats.org/code/hcard/creator as a starting point to do most of the heavy lifting and then I just go in and tweak the markup afterward.

Demetris says

What’s a monkier? :-)

Thank you for this article, and for your previous one on microformats and HTML5. My confusion about microdata persists, but I think I am making some progress.

Oli Studholme says

@Demetris — a letter-transposition monkey? :) Well spotted! If you have any questions leave a comment…

Molot says

Interesting article, thanks the author

jkulak says

Thanks for sharing. A bit confusing and complex in the beggining, but gives the global overview on the subject.

I’m sticking to microformats for the time being.

Oli Studholme says

@jkulak — if you have a chance, please let me know what bits you found confusing and complex. I’d love to improve the article with your feedback. Microformats are good too ;)

vinay says

This has been very informational. Oli you have done a great job in explaining the basics as well as the complex bits of Microdata. :)

Karl says

I must admit I started to lose interest when the examples were kind of hand-crafted HTML – basic HTML is still a challenge for content authors with other jobs. Using microdata for events and vcards etc where you can control the information capture and exposure is more practical for me though. I implemented microformats back in 2006 on our county council’s Intranet for the address book and events calendar so I like the look of your microdata tool for helping model future HTML5 work. Cheers! :)

Oli Studholme says

I recently got an email enquiry about whether you can have several events at the same location, but only mark up the location microdata once (a canonical location). You can do this via the itemref property. For example in the example with microdata using Google vocabularies, you can declare the location item in the event, and use itemref to link to the canonical location info somewhere on the same page

…<span itemprop="location" itemscope
  itemtype="http://www.data-vocabulary.org/Organization/" itemref="location"><a
  href="#location">Apple Ginza</a></span>…

The location microdata at id="location" will then be used in the event.

<p id="location">
  <a itemprop="url" href="http://www.apple.com/jp/retail/ginza/map/">
  <span itemprop="name">Apple Ginza</span></a>
  …
</p>

Remember to add an in-page link to the itemref so that humans can get to the canonical info too!

HTH

Alohci says

@Oli

I think it is itemref=”location” not itemref=”#location”

Oli Studholme says

@Alohci — d’oh. thanks, fixed.

Klemen Slavič says

Given the toolset you’ve outlined in the article, I haven’t seen any that would allow you to browse and validate against vocabularies, so I made my own tool:

Microdata tool

It’s still a work-in-progress, but it already contains validation for most of Google’s data-vocabulary.org types.

Michael Byers says

I have occassionally used Microformats on websites but I think you’ve shown me that Microdata would be a better approach especially when using HTML5.

Lukas Najduk says

A very good and well explained Article about all this Microdata.

Especially the part which explaining when values are taken from href=”", content=”" and other attributes like that was very helpful for me. I was watching about 10 other Websites and couldnt find this information there. Wondering why these guys dont pass this information, as from my understanding it’s an important part of Microdata.

Keep up the good work.

kind regards,
Lukas

Animesh says

I need microdata for a poetry website — we have common entries like poets, their books, and references to some other books/stories (prose).

Is it already available?

Animesh says

@Oli Studholme:

I’ve a poetry website. The poem is in archaic language and is distributed as a GIF file in a custom script (unicode option doesn’t work very well for us). Most of the poems are excerpts from classic texts or books. We also translate these poems.

Within the above constraints, I wish to provide the following semantic information to the search engines (or any parser):

- First few lines of the poem in unicode.
- Specify first few lines of the translations.
- The poet of the poem.
- The book from which the poem has been taken.
- Any information on current publishers of the poem or the book.
- Classify the poem into genres (this might be useless for the search engines).

Probably I am making more sense to you now, or not?

Best regards,
Animesh

Oli Studholme says

@Animesh — To start with (and stating the obvious), any content you want search engines to read needs to be included as text. Just like a visually impaired user, search engines can’t read gif images. I’m sad to hear that Unicode doesn’t support all the glyphs you need, as that would make this much easier. However if you’re able to give the first few lines in Unicode, you can also include all of the poem’s text, even if it’s not perfect. I assume that there are some rules for writing your script in Unicode that deal with missing glyphs, so just follow those and do the best you can. Also ask about planned support for your script on a Unicode mailing list to find out about possible future support. If the issue is font support organise with others to create a free font you can then use via @font-face and offer for download.

As far as citation information in microdata, you could use the schema.org Creative Work vocabulary (for the poem), together with the Book vocabulary (for the book):

  • Creative Work
    • name — poem’s name
    • author — poem author’s name
    • genre — poem’s genre
    • inLanguage — poem’s language (must be from IETF BCP 47)
  • Book
    • name — book’s name
    • author — book author’s name
    • publisher — book’s publisher
    • publishDate — book’s date of publication
    • isbn — book’s ISBN number

There’s nothing for the first few lines of the original or translation, but as I said above you really should put all of the original and all of the translation in as text. You can use HTML5’s standard lang="" attribute to indicate the language of the content. I cover this in passing in Quoting and citing with blockquote, q, cite, and the cite attribute.

HTH

Bartek Szkurlatg says

Great guide to Microdata! Love it!

Animesh Kumar says

Thanks a bunch Dr. Oli Studholme

Oli Studholme says

I’ve updated “Extending HTML5 — Microdata” with info on schema.org, browser and tool support, and a clearer description of itemid. Thanks for the prompt Philip!

Niklas Auteranu says

Nice guide. It is saved me a lot of time.

Mathew says

Say that i have a resort website, which has some content that microdata can easily be applied to but does not contain other things like contact number, location, etc.. ( probably a bad example, what resort site would not have this info :P )

Are there any known negative effects or penalties that can be applied if oneself was to use the <meta> tag to define other properties?

To me this holds potential for negative effects because essentially you’re putting hidden information into the webpage. yet on the other hand, you’re simply providing extra information to the search engines to help them provide better information to the user.

Opinions?

Oli Studholme says

@Matthew — I’m a little unsure of your question. re: your first paragraph, there’s generally no problem with only including some of a vocabulary’s properties. For example, checking with Google’s validator the requirements for schema.org vocabularies are surprisingly minimal (I haven’t checked with the schema.org list about this yet).

Re: your second paragraph, it’s possible to extend a vocabulary, and schema.org has information on their “extension mechanism”.

A couple of first principles: extensions to a schema.org vocabulary, or using a custom vocabulary, will not provide extra information to search engines, as despite being machine-readable microdata is opaque unless you know the vocabulary. The only thing it may do is give them an indication of popular data/vocabularies they should add support for. Also, while there’s no requirement for microdata to be connected with content on the page, microdata that isn’t will tend to metacrap. Any public vocabulary microdata value (schema.org etc) should be content.

Mathew says

@ Dr. Oli Studholme – Thank you for your response.

Apologies. I’ve worded my first paragraph badly, additionally i put brackets around “meta” tag which was then completely scrubbed from the post. [fixed (I hope) —oli]

What i was hoping to get was an opinion on whether it is good or bad practise to use the meta tag to define an itemprop.

For Example:
meta itemprop=”openingHours” content=”Mo-Su 7:00-11:00″

I know that Google will not use content hidden via css display:none but does this apply to the meta tag?

On a side note: I liked the comment about Metacrap. I watched a video on how it is very easy to provide fake data such as ratings through the use of microdata.

Alice Wonder says

I lost all interest as soon as I saw it using meta elements outside the document head, where it belongs.

Alice Wonder says

OK – I haven’t lost *all* interest, but it seems to me that if your content is properly presented so that your users can figure out what the smurf you are talking about, then a search engine should be able to as well.

If search engines are written sloppily enough that they can’t and rely on Microdata to get it right, then this will be abused the same way the keywords meta tag was abused. SEO tricks that are dishonest and exist just to get users to a site hoping they click the annoying hover ads.

I’ll have to look into it some more, keywords meta tag has legitimate use even though search engines ignore it now (I sometimes use them to benefit site specific search), maybe this could be of benefit in some respects to internal web app stuff, though data-* is largely what I use for that.

Oli Studholme says

@Mathew — I’ve hopefully fixed up your missing <meta>. Unfortunately we haven’t updated the commenting system yet, so you’ll need to escape < in code samples using &lt;.

There’s no difference between adding an itemprop to <meta> or to another element (see sidenote), but you still generally shouldn’t do it as you’re making the data inaccessible to anyone but search engines. Opening hours are a terrible thing to put in <meta>.

Google doesn’t treat <meta> as spam, but the info is only used to make your listing more informative, and will not affect rankings.

Shawn says

Thanks! I was unsure for a while how to use anything that had multiple values(i.e. itemprop="keywords").

Mathew says

@Issac Lisitano: Cool Story Bro…

Irvo says

Great article! Now I’m use HTML5 Microdata on my site.

Inoe says

Thanks for this comprehensive overview.

mazi fko says

How do you actually put or integrate the microdata into a webpage html. We sell products online, and I don’t know how I can use this microdata stuff on my website. Can you help?

Oli Studholme says

@mazi fko — umm use a text editor? :S For example, from this:

<a href="http://informationarchitects.jp/">Oliver Reichenstein</a>

to this:

<span itemscope itemtype="http://schema.org/Person"><a itemprop="url" href="http://informationarchitects.jp/"><span itemprop="name">Oliver Reichenstein</span></a></span>

If that still doesn’t make much sense, maybe you should try the Schema Creator for product microdata

loupiote says

can itemref=”…” be used in a element?

if yes, doesn’t this conflict with the fact that are supposed to have a content=”" attribute?

loupiote says

can itemref=”…” be used in a <meta …> element?

if yes, doesn’t this conflict with the fact that <meta …> are supposed to have a content=”” attribute?

loupiote says

in other words, is this legal:

<meta itemscope itemref=”band-members”>

or

<meta itemprop=”author” itemscope itemref=”author-id”>

John B says

is this legal:

Well, no – but not because it’s a meta tag. itemref should surround a block of html which contains information about an item (ie your band members), so unless you are using a </meta> to close the tag, better stick to spans, divs, ps, etc.

As to whether it will validate – that’s another question!

Vipul S. Chawathe says

Mapping amongst microdata RDFa microformat is documented by W3 at https://dvcs.w3.org/hg/htmldata/raw-file/default/ED/html-data-guide/index.html
IMHO, microformat is generalized parseable RDFa snippet that when nested into XHTML5, note the “X”, makes XSL transformation applicable, so serving hNews as atom feed maybe left to server & consumer of content has better flexibility with regards to consumption technique.
Search providers just needed short-hand vocabulary for simpler parsing by their crawling bots as the transformations besides updating their indexing database benefit the consumer in the content providers’ interest rather than build the search providers reputations. So they used their monopoly over consumers thereby saving on parsing effort, by putting the onus on content provider for overloading the markup with their short-hand towards SEO, viz. the overly restrictive microdata that beyond schema.org’s vocabulary serves only academic fancy.
I dare say with regards to being restrictive, microdata is anti-OWL of sorts.

Vipul S. Chawathe says

Vocabularies recognized to serve initial context are seen at http://www.w3.org/2011/rdfa-context/rdfa-1.1.html The prefixes are to the best of my understanding imported by using rel=”profile” to enable RDFa as mentioned at 1.4.1.3 in http://dev.w3.org/html5/rdfa/rdfa-module.html I experimented with Bing Webmaster Diagnostic Tool Markup Validation as well as Google Structured Data Tester & the result clearly shows lack of sportsman spirit towards non-schema.org vocabularies leading them towards becoming orphan step-children of semantic web. Dialects like gr & foaf & even individual microformats have precisely specific scopes whereas schema.org certainly aims to cause loss of context to the semantic content composer. The predecessors of schema.org that made it to W3Cs RDFa initial context & the prefix notation from RDFa are the minimum that major players should accommodate rather than harshly monopolise their implementation to interpret itemtype attribute, considering validator.nu as validator finds the markup to be valid.

Eoin Oliver says

I’m sorry I couldn’t help myself.

You open up with the sentence:

For those who like (to argue about) semantics, HTML5 is fantastic.

It should read:

For those who like [to argue about] semantics, HTML5 is fantastic.

TJ Greene says

Hey Oli,

Thanks for this great reference.

I agree with you, HTML is fantastic. We just converted our site from XHTML to HTML5 and are really enjoying the added benefits.

Actually, we are using the Genesis Theme Framework and their latest version fully supports HTML5. Converting it was super easy and took less than 15 minutes. I wrote a tutorial to help other Genesis users make the switch and included a link to your site for anyone needing more information.

Thanks again.

Join the discussion.

Some HTML is ok

You can use these tags:
<a href="" title="">
<abbr title="">
<b>
<blockquote cite="">
<cite>
<del datetime="">
<em>
<i>
<q cite="">
<strong>

You can also use <code>, and remember to use &lt; and &gt; for brackets.