HTML 5 + XML = XHTML 5

by .

I like the xhtml syntax. It’s how I learned. I’m used to lowercase code, quoted attributes and trailing slashes on elements like br and img. They make me feel nice and comfy, like a cup of Ovaltine and The Evil Dead on the telly.

But you might not. You might want SHOUTY UPPERCASE tags, no trailing slashes and attribute minimisation. And, in HTML 5, you can choose.

Thanks to the “pave the cowpaths” principle, it’s up to you. As you like it. What you will. Whatever you want, whatever you like.

But let no-one tell you that HTML 5 kills XML—meet XHTML 5.

XHTML 5 is the XML serialisation of HTML 5 and, as you’d imagine, it has all the stricter parsing rules that you’d expect (and are used to if, like me, you grew up with XHTML DOCTYPES). It must be served with an XML MIME type, such as application/xml or application/xhtml+xml (so no rendering in Internet Explorer for the moment) and will throw a wobbly at the slightest well-formedness violation. (See Serving XHTML with the Right MIME Type for more information.)

Usual XML rules apply: no document.writes allowed, no DOCTYPE required, some syntax and script differences to trip up the unwary and you can use namespaces.

The main differences are summarised on the official WHATWG wiki Differences Between HTML and XHTML. It’s also possible to write polyglot documents that can be processed as either by browsers, depending on the MIME type used.

Magne emailed the Doctor to ask “Is it OK to use HTML5 tags in a page with the XHTML 1.1 doctype? Which one should I use, as in, which one is the recommendation now?”

If you want to use the new features, you need to use HTML 5 DOCTYPE or XHTML 5. Given that Internet Explorer cannot process XML, for pragmatic reasons the Doctor advises HTML 5.

83 Responses on the article “HTML 5 + XML = XHTML 5”

  • Kroc Camen says:

    application/xhtml+xml? Yes, you can serve as pure XML, but xhtml+xml may be marginally more relevant.

    Secondly, you’ll need XHTML5 in order to support Firefox 2 / Camino, so I would recommend including a link to the document on this site that covers that.

  • kl says:

    You can use lowercase and quoted attributes in HTML too.

  • Bruce Lawson says:

    Kroc – well, sending html5 (in its html5 form) as xhtml just for a gecko parsing bug is not the same thing as “real” xhtml, so I’m loathe to confuse those two issues.

    Re application/xml vs application/xhtml+xml – you’re right. It was a brain fart on a hot day and I’ve corrected the article; thanks.

  • […] W3C announced a few days ago that the XHTML2 working group will likely be closed down by the end of the year, and focus will be shifted to the development and polishing of the HTML5 standard (which also offers an XML flavour). […]

  • Ben Ward says:

    Nice summary for those who love the XML. One additional detail I’d point out, is the reverse: Which is that the HTML version of the HTML5 syntax also makes XML-like self-closing elements valid for elements that are expected to be self-closing, e.g. you may choose to write <link />. As such, XHTML syntax as we know it is now HTML, too.

    This, for those who want to continue writing XHTML in the HTML5 world is a bridge: They can write XHTML and serve it as HTML just as they do now, but it will be valid.

    At some point, if IE ever plays ball, those people can swap over to the correct mime type and start using namespaces and so on.

  • […] interested in HTML 5, I draw your attention to an article that I presciently wrote yesterday on XHTML 5, for those who worry unnecessarily that XML has been […]

  • Bruce Lawson says:

    @Ben – excellent points; thanks for making them.

  • […] The good thing about XTHML was that it enforced well-formed markup, with strict provisions for lowercase code, quoted attributes, and trailing slashes for empty elements. Thankfully HTML5 this coding convention too, and can be served as a serialized XML document dubbed XHTML5. […]

  • Dari says:

    Wait a second. Actually how do browsers know if document is xhtml 1.0 strict or xhtml 5? According to docs http://www.whatwg.org/specs/web-apps/current-work “XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification.” + note below. Because both documents share same namespace and doctype could be just thrown away all xhtml 1.x documents are probably xhtml 5 right now.

  • andy says:

    I thought that WHATWG criticised W3C for XHTML (XML usage in HTML).
    Apart from xml mime type that is not supported by IE, there are more disavantages — xml parser will stop when welll-formed error occurs, browser must wait until the code for a page is fully loaded.

    So, the only advatage of XHTML5 is cleaner, strict code?

    Thanks

  • Kroc Camen says:

    @andy – Firefox 3.0 and above don’t have to wait for the full XML to render the page.

  • The name must be XHTML2, not?

  • Bruce Lawson says:

    Daniel, no – XHTML 2 was a competing specification that last week was terminated by the w3c http://www.brucelawson.co.uk/2009/goodbye-xhtml-2/

    XHTML 5 is the XML version of HTML 5, so the numbering is the same to link with that and avoid confusion with the dead spec.

  • […] XHTML5 – “But let no-one tell you that HTML 5 kills XML—meet XHTML 5.” – Good, good coding standards will still have to be used. […]

  • Torrance says:

    (Merde. Let me escape all my tags. Feel free to delete the prior post.)

    Perhaps this should be in another article, but coming from writing XHTML 1.1, I’m having trouble understanding which tags no longer need to be closed, and which seem to be able to be omitted altogether.

    I understand that self-closing tags no longer need the forward slash at the end, but I’ve seen someone suggest that <li> doesn’t need its respective closing tag, and nor even does <p>. Surely this is not true? And why is <body> no longer strictly necessary? What else has become optional?

    I think for my own sanity, I’ll be sticking to XHTML syntax even in HTML 5. Makes me feel a lot cleaner knowing everything is neatly closed.

  • oli says:

    @Dari—XHTML5 and HTML5 are differentiated solely by mime type. That means unless you specifically use the application/XHTML+XML mime type it’s HTML5, and as mentioned in the article you’re probably not going to do that because IE would barf (what a surprise).

    @andy—the main advantage of using XHTML (ie based on XML) would probably be the easy inclusion of other XML-based things like SVG, XForms and MathML. Another benefit of using XHTML1 has traditionally been stricter validation which is a big help to authors. Hopefully this will migrate to HTML5 browsers in the form of a strict mode or something. As the article says though, if you have *any* IE users then XHTML5 is probably more time than it’s worth.

  • andy says:

    @oli — yep, but if you have to use text/html in these days, you can’t put another xml application to the code. So only way how to do that is embed it via object element. Besides svg, I don’t know any other xml apps which is usable because of its poor support — MathML, CML, …

    I use xhtml just because I like pure well-formed code, but any other objective reason I can’t say.

    And as I mentioned, I can’t put up with unaccessible browsers’ xml parsers.

  • Dari says:

    @oli – I said about the difference between xhtml1 and 5, not html->xhtml. Because I don’t see any. Technically document with xhtml 1.0 doctype should be xhtml 1.0 but how do UA know that?

  • […] muy amablemente me dió el permiso para publicar una traducción de su artículo llamado “HTML5 + XML = XHTML 5” publicado el 2 de julio de 2009 en HTML5 Doctor. Así que ¡muchas gracias […]

  • Is there currently a validator that will validate HTML5 as XHTML (any version) Strict, as clean code was the only advantage I could make use of with XHTML, and I still want to be able to do?

  • rovas says:

    @Magne: You could try this one:
    http://validator.nu/
    There are a lot of options.
    Cheers!

  • […] right; HTML 5 allows you to use whichever syntax you are most comfortable with. Doctor Bruce has the diagnosis: I like the XHTML syntax. It’s how I learned. I’m used to lowercase code, quoted attributes and […]

  • […] HTML 5 + XML = XHTML 5 – An general outlook of how HTML 5 will correlate with XML. […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] HTML 5 + XML = XHTML 5 – 概览 HTML5 怎样和 XML 实现交互。 […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] arguments and that XHTML was dead. The truth of course was slightly more complicated as HTML 5 can be reasonably presented as XHTML. Either way we now seem to have one standard to unite behind which brings us closer to […]

  • […] Bruce Lawson nicely summarizes the whole deal on using XML on HTML, writing your markup in a XML format and serving them as HTMLis still valid. This also […]

  • Thanks for the article. Anyway, what exactly is the doctype for xhtml 5 please ? I didn’t find an real answer on the web. Can i use this one for example ?

    Thanks :)

  • Arg I meant:

    <?xml version="1.0" encoding="UTF-8" ?>
    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

  • Alohci says:

    The point about the character sequence <!DOCTYPE html> is that it is the shortest set of characters that will cause browsers to use standards mode, when the page is served as text/html. It’s not so much a doctype as an incantation. A page is only XHTML5 if it is served with a xml mime type (e.g. application/xhtml+xml), and in such cases browser use standards mode anyway. So while you can include <!DOCTYPE html> if you wish, it is entirely optional.

  • bruce says:

    “It’s not so much a doctype as an incantation”

    Perfectly put, Alohci!

  • e-sushi says:

    Jumping in a bit late… people have to understand that HTML5 is made to “accept older doctypes and handle them”. In other words, the HTML5 doctype allows you to use almost anything you want, as long as it confirms to previous standards. Translated this means that you can take your XHTML1.1 document, slap the doctype line to a minimal and push it up to the server. HTML5 will be read.

    About the “header” for the MIME type, I can not see why you would serve the document with anything else but “text/html”, because in fact, you are serving just that! Using HTML5 you can write xml or non-xml syntax, as long as the “html,head,body”-structure is kept. Geez, I know it is hard to accept that you don’t get a resticting doctype next, but hey… that’s what it’s all about; making HTML5 the doctype “to rule them all” by being open for old an even some new stuff. See, HTML5 is not about canvas and co, HTML5 is about wrapping up ALL previous doctypes and adding goodies like canvas etc. to it.

    My 2 cents… get rich using them! ;)

  • […] HTML 5 + XML = XHTML 5 – 概览 HTML5 怎样和 XML 实现交互。 […]

  • […] HTML 5 + XML = XHTML 5 – 概览 HTML5 怎样和 XML 实现交互。 […]

  • […] Everything you know about XHTML is wrong – Goodbye XHTML 2 – HTML 5 + XML = XHTML 5 – 2022, or when will HTML 5 be ready? XHTML 1.0 ist nicht (viel) mehr als HTML mit XML-kompatiblen […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] bin einfach mal so frei und werfe dir ein paar Links an den Kopf. HTML 5 + XML = XHTML 5 | HTML5 Doctor HTML vs. XHTML – WHATWG Wiki HTML 5 Reference Der letzte ist vielleicht sogar der interessanteste. […]

  • […] HTML 5 + XML = XHTML 5 – 概览 HTML5 怎样和 XML 实现交互。 […]

  • […] HTML 5 + XML = XHTML 5 – 概览 HTML5 怎样和 XML 实现交互。 […]

  • rzlatic says:

    i like the way xhtml forced people to write syntax correctly. html5 (without x) will again leave the space for dirty code and i don’t like that way of developping new standard.

  • Bruce Lawson says:

    rzlatic – why do you say it’s “dirty” code? the specification allows it. therefore, by definition, it’s not dirty

  • e-sushi says:

    Anyone saying “dirty code” does not seem to know the specifications. Oh well, I guess that’s what happens if you don’t know what you’re talking about.

    For your every information, check this: http://www.w3.org/TR/html401/index/elements.html

    So leave out the closing P and LI tags, keep your class names and IDs short, ignore stuff like APPLET tags, keep your code dense and before you know it: *blam*, you saved a truckload of bandwith.

    I know this will break some online tools, but that’ll just happen because they were coded by noobs, not nerds. Geez, HTML was never meant to be XML. The different names allready prove that. They’re just two markup languages with completely different origins and different goals. Losing XML from HTML “can” be done… but if you rather stick to ‘XHTML’, that’s up to you. As a matter a fact, I embrace the idea – it’ll make my site rank better than yours.

    After all, the way I described how you should code your sites happens to be also is the way Google recomends to do it and the way Google also rates sites. (see Google’s “speed optimizing tool”). Among other reasons, this should ring the loudest bell.

    My 2 cents: onward to the past… gimme back them websites that work everywhere and do everything. Combining oldschool HTML with a newschool DOCTYPE.

    Now if everyone upgrades to IE9 soon, we’ll all be happy together! ;)

  • Ken says:

    If someone wants to (re)learn HTML in the near future, should the person learn HTML5 or XHTML5 ?

    I’m kinda getting confused with these new HTML technologies. In the past, around 2002-ish, I recall learning HTML 4 / CSS and applying strict XHTML 1 principles to make the code uniform and validate properly (it also helped me learn coding by following rules). I didn’t learn about XML since that was not applicable for my needs.

    I stepped away from web designing for a while, and I come back to see a new organization has been formed along with HTML5.

    I’m glad I found this blog thread because I was (actually still am) confused about HTML5 validation.

    So if my understanding is correct, HTML5 is loose liberal code and XHTML5 is strict code that must follow guidelines like XHTML in the past, except XHTML5 will not work with Internet Explorer? But XHTML functioned fine with IE in the past as well as other browsers.

    I looked up sample codes and they are fundamentally the same (like in the past), so I’m not comprehending why this is an issue.

    Can someone please clarify this for me please. Much appreciated!

  • Alohci says:

    @Ken. In a way, you’re right, nothing has fundamentally changed. Just as before, HTML and XHTML share a common vocabulary and content model, and to that degree you need not choose – learn one and you learn both. The differences simply lie in the syntax. You can choose the orthogonality of XHTML syntax or the error-resilience and wider browser compatibility of the HTML syntax.

    As for XHTML working in IE, no it’s never done that¹ and nothing’s really changed. What has changed in HTML5 is the way that the relationship between HTML and XHTML is described, such that it more clearly describes the reality of using these syntaxes with browsers. What HTML5 describes has always been true, but was widely misunderstood.

    Bruce’s article above explains the situation pretty clearly, and in particular you need to grasp how browsers react to mime types. A widely cited article on this matter that all web authors should read and understand is Sending XHTML as text/html Considered Harmful, written by HTML5 spec editor Ian HIckson, although it’s not specifically about HTML5. Note, however that the article is a mixture of fact and opinion, and while the facts are immaculate, the opinion is just that.

    Describing HTML5 syntax as “loose, liberal” versus XHTML5 being “strict” is not quite right. The rules for the HTML syntax are every bit as precise and rigorous as those for XHTML, but they’re different and the HTML syntax provides more valid alternatives. Perhaps more importantly, what HTML permits that XHTML doesn’t, is error recovery when those syntax rules are violated in (X)HTML documents.

    ¹ The IE9 Platform Preview does properly support XHTML.

  • Mathias says:

    Great article! Sadly, it doesn’t provide a working example of an XHTML5 document and fails to explain why you can’t just use the lowercase <!doctype html> like you can in regular HTML5. I published a post explaining this the other day: How to use the XHTML serialization of HTML5, aka ‘XHTML5’, and why you shouldn’t Hope it gives you some more insight!

  • […] HTML 5 + XML = XHTML 5 – 概览 HTML5 怎样和 XML 实现交互。 […]

  • […] IS an XHTML version of HTML 5 that requires you to still close all tags and I have read articles that encourage you, […]

  • […] (Aussi disponible en espagnol, Traducción de “HTML 5 + XML = XHTML 5″, en Portuguais et évidemment dans sa version originale en anglais.) […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • […] HTML 5 + XML = XHTML 5 – An overview of how HTML 5 will interact with XML. […]

  • OnLine says:

    Hy,
    Is there an specific DOCTYPE for XHTML with HTML5?
    Or can be used XHTML syntax and ?

  • […] HTML 5 + XML = XHTML 5 | HTML5 Doctor Me too! "I like the xhtml syntax. It’s how I learned. I’m used to lowercase code, quoted attributes and trailing slashes on elements like br and img. They make me feel nice and comfy, like a cup of Ovaltine and The Evil Dead on the telly." (tags: xhtml html5 html web standards) […]

  • ofelquis says:

    Hi, i liked your post.
    Congratulation…. and
    sorry my english is not good :(

  • XHTML5 is desirable owing to its being XML readily mashable with XSL so if one defines custom microformat, using RDFa, its customizable for different clients from feed & screen reader parsers to crawling bots!

  • […] muy amablemente me dió el permiso para publicar una traducción de su artículo llamado “HTML5 + XML = XHTML 5” publicado el 2 de julio de 2009 en HTML5 Doctor. Así que ¡muchas gracias […]

  • Brooke says:

    I’m really impressed with your writing talents as well as with the structure to your blog. Is this a paid theme or did you modify it yourself? Either way stay up the nice high quality writing, it is uncommon to look a nice weblog like this one these days..

  • hello there and thank you for your information – I
    have definitely picked up something new from right here. I did however
    expertise a few technical issues using this site, since I experienced to reload the web site
    many times previous to I could get it to load correctly.
    I had been wondering if your web hosting is OK?
    Not that I’m complaining, but sluggish loading instances times will sometimes affect your placement in google and can damage your high quality score if advertising and marketing with Adwords. Well I am adding this RSS to my email and could look out for a lot more of your respective fascinating content. Make sure you update this again soon.

  • Hi Bruce Lawson.
    Could we use this article as reference in our certification course of our plataform SimplexPortal?
    Our course is free for designers and developers.
    This article is under common creative license so I suppose that there are no problem to write this reference to this content.

    Thanks

  • Bruce Lawson says:

    Ángel Cervera Claudio – yes, but please link back to this URL.

  • Gregory says:

    What is the !DOCTYPE declaration for XHTML5??? Is it the same as HTML5?????????

  • Alohci says:

    @Gregory – Yes, exactly the same. Or you can omit the DOCTYPE altogether as pages served as application/xhtml+xml are always handled in standards mode by browsers.

  • Cesc says:

    To close or not to close tags… that’s the dilemma.

    I think the answer should be easy:

    Closing tags should be provided by editors to clarify code and to provide some validations (there are still some rules, and we live in the semantic editors era), also automatic writers used by code generators usually will have more clean specifications through the use of closing tags; remember that closing tags can be avoided, but they don’t “remain open”, they implicitly close.
    Tools must be provided to compact such “xhtml” and drop no significant closing tags which are a useless overhead.

    But in practice the answer is not easy, because there aren’t any tool like those and it is actually not the current trend, so, I must disagree with a specification introducing more mess into the mess.

    Is such overhead significative nowadays? They were thinking on mobile applications (five years ago)?

  • […] mucho agrecemiento para el que me muestre un codigo xhtml 5 valido :p. Prueba de que existe xhtml5: http://html5doctor.com/html-5-xml-xhtml-5/ (mas la w3c en su poca […]

  • Teri Murphy says:

    So, if I have 10,000 pages with XHTML 1.0 doctypes, do I REALLY have to change them all to the HTML5 doctype before I can use new HTML 5 elements like video and form tags?

  • Join the discussion.

    Some HTML is ok

    You can use these tags:
    <a href="" title="">
    <abbr title="">
    <b>
    <blockquote cite="">
    <cite>
    <del datetime="">
    <em>
    <i>
    <q cite="">
    <strong>

    You can also use <code>, and remember to use &lt; and &gt; for brackets.