HTML5 – Check it Before you Wreck it with Mike[tm] Smith

by .

The W3C’s Mike[tm] Smith (AKA @sideshowbarker) is the man with his head in the W3C validation markup checking tool source code; he makes the magic happen.  Questions were asked for the HTML5 Doctor reader’s delight and edification.

Russian Translation: Не проверив HTML5-кода, не суйся в воду — с Майком™ Смитом

First off tell us a bit about what you do and what you work on

mike smith with phone and beer

Mike[tm] Smith – Deputy Director @W3C – permissive work mode edition


I don’t work. I’m an old-world boulevardier.

I drink tea with my pinky extended and I only expend effort on anything if it somehow amuses me to do so. For the last few years it’s amused me to spend time working on software for helping people check whether or not their documents meet certain requirements in the HTML spec.

nu markup checker

What’s the difference between DTD and schema based checking?


DTDs are chiseled into stone tablets. And so for processing they require stone-tablet-aware toolchains. Sadly however the Web was not built on stone-tablet processing so we’ve had to look around for other solutions. In the case of document-conformance checking we’ve turned to using things like RelaxNG schemas that while lacking the quaintness of DTDs are a far more powerful means for expressing certain kinds of document-conformance requirements. So it’s a tradeoff.

W3C Validator

What’s the difference between conformance checking and validation?


Validation is an oldthink word. Use it for when you want to make people think you’re a sort of fossil or relic of some earlier time. Kind of like the word groovy or XHTML.

Lots of people don’t know this but the etymology of that word validation is from the days when our ancestors were mostly pig thieves and they were given actual badges for spelling their own names correctly, and usually a pat on that back too. Good job!

Document-conformance checking is the current party-approved goodthink way of talking about looking for problems in HTML documents. And do note that we call it document conformance and not authoring conformance, and we talk about conformance requirements for documents, not conformance requirements for authors. That’s because you as an author are a human being; a technical specification can’t place requirements on you, it can only place requirements on documents you create. And related tools don’t evaluate you as a human being for conformance to particular technologies; instead they just evaluate the documents you create.

Anyway, document-conformance checking has the nasty ugly part conformance which is a hurtful word really but you gotta look past that part and only pay attention to the word checking which is mostly a happy helpful type of word.

So I call the tool at validator.w3.org/nu the Nu Markup Checker instead of the blah-blah-Validator because I want to spread the happiness of the word checker in the sense of doing something actually useful for people instead of just giving them a pat on the back. It’s an automated thing which checks stuff for you that’d otherwise be really tedious for you to check manually. So it helps you. Maybe it should be called the Nu Help-You Checker.

As far as what it checks, it looks for unintentional mistakes you might have made: misspelled element names or attribute values where some stray character snuck in. That kind of stuff. And it alerts you about those sorts of things so you can fix them.

It also looks at other kinds of requirements defined in the HTML spec designed to help you not make broken HTML documents and web applications that aren’t going to work the way they should or that might otherwise result in degraded user experience. Some of those requirements are gray-area judgment calls, but it’s helpful to have a common baseline-ish set of those kinds of requirements actually defined in a spec.

Other words for what this tool does that aren’t yet party-approved goodthink are words like linter and static-analysis tool. But the difference with this thing I work on is, the linting rules are actually defined in a spec, instead of something, say, Doug Crockford (to pick a name at random) woke up one morning and just pulled out of his hat.

What’s the difference between errors and warnings?


An error is for something that’s clearly a mistake, like a misspelled element name or an attribute value that has some crazy garbage characters or whatever that showed up somehow and shouldn’t be there.

But an error is also for some cases of stuff that the HTML spec for other reasons just says, this must be an error. The spec explains that the reasons for some of those other things being defined as errors; basically it’s just that they can create certain kinds of problems that are not always easy to anticipate.

There’s a long list of those kinds of problems that are defined as errors but some examples include markup cases that are bad for accessibility, usability, interoperability, security, or maintainability—or that can result in poor performance, or that might cause your scripts to fail in ways that are hard to troubleshoot.

Along with those some cases are defined as errors because they can cause you to run into quirks in HTML parsing and error-handling behavior—so that, say, you’d end up with some unintuitive, unexpected result in the DOM.

Finally there are some other errors defined for markup cases that just don’t make any sense and would most likely only be used by mistake, or cases that clash with default styling behavior.

Warnings, on the other hand, are for things that the spec doesn’t define as an outright errors but that still might be problems. Sometimes warnings get added to the checker experimentally, as a way to test out whether they’re useful to you or not. (That’s part of the reason the checker continues to be labelled as experimental.)

Is there a use in using HTML4/XHTML doctypes?


There’s absolutely no reason whatsoever for using an HTML4 doctype. Just put the <!DOCTYPE HTML> doctype on your HTML documents and make sure they’re served as text/html and be done with it. Move on with your life. But if for some reason you really want to serve your documents as application/xhtml+xml you don’t have to put an XHTML doctype on them—you can can still just use <!DOCTYPE HTML> like the rest of us. (But you probably don’t want to be using application/xhtml+xml and XHTML anyway. Again, lose the haircut—there’s a whole world out there waiting for you.)

What are the pitfalls for users of HTML checking/validation tools?


I guess the same pitfalls as you’d running into asking some really helpful and really thoughtful person for help with anything: They’ll actually make an effort to help you instead of just shining you on or giving you a this-pig-thief-can-spell-his/her-own-name badge. The help they give you may not always be what you want to hear, or it may be some advice that you already know yourself you can safely ignore. Such is life.

What are the upsides?


The upsides are that you catch mistakes you might have otherwise missed.

There are differences between W3C HTML and WHATWG HTML conformance rules, how so?


Some things defined as errors are judgment calls. Specs are written by human people, not machines. Different people can make different judgment calls—“reasonable people can disagree” or whatever other less trite way there is for expressing that sentiment. If you walk around this world expecting complete consistency from mankind everywhere you’re going to stumble onto a few serious disappointments now and then.

What if I find an error in the W3C HTML validator/checker?


Report it at w3.org/Bugs/Public/enter_bug.cgi?product=Nu%20Markup%20Checker or at bugzilla.validator.nu or github.com/validator/validator/issues.

Can I run a local copy of the W3C HTML conformance checker?

Yeah. The best way to do that is to download a release from github.com/validator/validator/releases and, for using that, to follow the instructions at validator.github.io/validator and at validator.github.io/validator/#web-based-checking.

And if you use grunt, check out github.com/jzaefferer/grunt-html which is a grunt plugin for HTML checking that uses code from github.com/validator/validator as its backend.

Any tips/advice for sane using of HTML conformance checking tools?

Is this some kind of trick question? I guess the only advice I’d give is that you should remember that tools are machines, and you are not a machine. (Assuming this question wasn’t asked by a machine.) So when evaluating error and warning messages that you get from any HTML checker, use your own human judgment. And if your judgment is that a particular checker message isn’t really helping you, then just ignore it. This isn’t a popularity contest, you won’t be hurting anybody’s feelings.

Or better yet if you care to take the time, use the “Message filtering” feature at validator.w3.org/nu which lets you persistently ignore any checker messages you find unhelpful or annoying or just don’t want to see any more.

Currently the W3C HTML checking tools don’t check/throw errors for SVG1.1 and some web component attributes, any plans to add support?


Yeah. That stuff is on my TODO list. I’ll get to it eventually.

What’s the deal with unknown attribute errors? many JS libraries use them, what should developers do?

The problem is that the checker is a machine and it’s not smart enough to tell the difference between some attribute with an unknown name that you’re using on purpose and some attribute whose name you misspelled by mistake. If we just told the checker to let through all unknown attribute names without checking, then we wouldn’t be able to help you catch the case where you misspelled something by mistake.

The workaround is that if you’re using some unknown attribute name on purpose, then exploit the “Message filtering” option at validator.w3.org/nu to tell the checker you don’t want to see messages about that particular attribute any more. And they’ll go away.

Does the validator check for use of ARIA? if so what is it checking?

Yes it checks for errors in the use of ARIA markup in HTML documents, including now some limited checking for errors in use of ARIA with SVG elements in HTML documents and also in standalone SVG documents.

For HTML elements it’s checking against requirements in the HTML spec itself but that are now also specified at [ARIA in HTML] – specs.webplatform.org/html-aria/webspecs/master as a separate standalone document, with the plan that for ARIA, the HTML spec can soon be updated to just reference the ARIA requirements in that document.

For SVG elements, my plan’s to soonishly update the checker to follow a similar standalone document at [Web developer rules for use of ARIA attributes on SVG1.1 elements] – specs.webplatform.org/SVG1.1-ARIA/webspecs/master

ARIA checking in < HTML5 what’s the deal? Will/should/can it be supported?


Nobody should be using anything but “HTML5”, and we shouldn’t be trying to help them do it. HTML5 is just HTML. We outgrew the whole version thing a long time ago now. <!DOCTYPE HTML> will be 10 years old soon. Common sense won. Here in the 21st century we can’t really help anybody who’s putting an HTML4 or whatever ancient doctype on a new document. That’s a lost cause. Certainly we’d not be helping by providing some way for them to do that and to put ARIA markup into their documents and then we tell them that’s OK. That’s called enabling behavior, in clinical terms.

When using the w3c html validator to check my HTML5 I see the following:
“The validator checked your document with an experimental feature: HTML5 Conformance Checker. …” does this mean there is a more stable validation tool I should be using?

The idea of stable doesn’t really apply here. But yeah there is another tool you should be using. You should use validator.w3.org/nu directly. It has more features and is better in every possible way.

That tool is an experimental tool, but in a good sense. And the plan is for it to always remain that way. The validator.w3.org/nu/about.html page tries to help set the right expectations about what the goals are and what experimental means:

The Nu Markup Checker is an experimental tool and its behavior remains subject to change. In particular, because new types of error checks continue to be actively added to the checker, there is no guarantee provided that if the checker reports zero errors for a particular document at one point in time, it will report zero errors for that same document at some later point in time.

The Nu Markup Checker should not be used as a means to attempt to unilaterally enforce pass/fail conformance of documents to any particular specifications; it is intended solely as a checker, not as a pass/fail certification mechanism.

Web components checking?


If you mean checking custom elements, my answer is that custom elements aren’t yet widely supported in multiple browser engines, so I don’t think it’s useful for me or anybody else to put too much of time and energy yet into figuring out how to deal with checker behavior for documents that contain custom elements.

If/when custom elements do ever become widely supported across more browser engines, then we should figure out how to deal with checker behavior for them. That’s actually going to be complicated and messy to do—but that’s the case for a lot of stuff in the Web platform and I’m sure we’ll figure out something together that we can all live with, just as we all have together for lots of other complicated Web-platform stuff.

What is the difference between the w3c validator and the nu markup checker?

The legacy W3C validator is at validator.w3.org and its core is built on old stuff like Perl and DTDs and SGML and old specs from the 20th century like HTML4 and nobody is actively maintaining its code at this point. The only good news about it is that for checking any document with a modern <!DOCTYPE html> doctype, it actually uses the backend from the Nu Markup Checker to check the document, and then just passes back all the messages from that.

The Nu Markup Checker is at validator.w3.org/nu and it’s built on slightly less old stuff like Java and RelaxNG and on specs from the current century like “HTML5” and has the big advantage of actually being actively maintained. And it has more features, like the “Message filtering” feature that lets you filter out message you don’t want to see.

Checking the source code versus the HTML DOM output, one better? issues?

I guess there’s good use cases for both. A limitation with checking the DOM is that at validator.w3.org/nu itself we can’t really provide a way to have it go grab the DOM of some arbitary HTML document on the Web and then check that. There needs to be a browser engine somewhere in between to actually parse the document into a DOM representation in memory and execute your JavaScript on that and then serialize that resulting DOM back out to a text representation you can feed to a checker. But if you have an HTML document you want to check and you actually open it in your browser you can then use something like the bookmarklet at codepen.io/stevef/full/LasCJ to send the serialized DOM from that document to validator.w3.org/nu for checking.

Bonus Questions

Should pre-HTML5 doctypes be flagged with a warning, in the W3C Validator, now HTML5 is a REC?


I dunno, maybe. On the one hand there are gazillions of existing documents out there with older doctypes that are working just fine the way they are now, so no reason to screw with them. On the other hand, if somebody’s actually taking time to run one of those documents through an HTML checker, then they may be doing that for some good reason and maybe we would be helping by alerting them to obsolete doctype in there so they can go in an update it.


Nothing changed in browsers. Browsers have always supported that and it doesn’t cause any problems and we’d not be helping anybody by making it an error. So we made it a non-error.

On WCAG 2.0 Parsing Criteria

Our client’s accessibility consultant is telling them that they must have valid HTML in order to be WCAG 2.0 compliant. Is that true? – Shoptalk Show


I have no idea. I’m not a WCAG expert and I’ve never even read the WCAG 2.0 spec. And the HTML checker is not a WCAG checker. Or at least it doesn’t claim to be.

Steve Faulkner
WCAG 2.0 has a success criterion that requires markup documents have no parsing errors. The nu markup checker flags parsing errors along with other machine checkable HTML conformance criteria. We have created a WCAG 2.0 Parsing error bookmarklet that filters the results from the nu markup checker to only display parsing errors/warnings.

Note: this bookmarklet is experimental and not the law and even when filtered some of the errors/warnings displayed may not have any practical negative effect on the accessibility of the document. It is provided as an aid to filter out some of the irrelevant (to WCAG) issues only. Mike and I have talked about providing the filter as a built in feature of the nu markup checker, so hope to make that happen.

Will there be a valid HTML5 icon?

No, there won’t be a Valid HTML icon any time soon and likely not ever.

The reason is basically that “This is valid” icons/badges promote the idea that there’s significant value in making public claims of pass/fail document-conformance requirements in standards.

But the HTML5 checker is by design not intended to encourage anybody to use it as a means to make public assertions of simple pass/fail conformance of any documents to any particular specifications; it’s intended solely as a checker — for people to use to catch unintended mistakes in documents and fix them — not as a pass/fail certification mechanism.

There won’t be any proper Valid HTML5 icon forthcoming, so if you’d like to use one in your content, you’ll probably need to create one on your own.

Thanks Mike!

Pro tip – always check your HTML with Rock’n’Roll playing… LOUD!

More questions people?

7 Responses on the article “HTML5 – Check it Before you Wreck it with Mike[tm] Smith”

  • Daniel Davis says:

    Good interview — entertaining and informative.

    If you’re trying to de-emphasise the validation/rubber-stamping role of the tool, I wonder whether you should gradually migrate the URL to checker.w3.org as well. It’s a losing battle trying to get people to refer to validator.w3.org as not a validator.

    On the other hand, there will always be those who want some kind of seal of approval to show their boss/clients/mum so even though you may not like it, maybe the validation aspect (and dare I say accompanying “Valid HTML” icon) still has its place.

  • Oscar says:

    Great article Mike. It’s really nice to have a face to put against great work.

    Just a couple of things from me.

    If you guys want to change the term from ‘validation/valid’ etc. to ‘conformance’. You should do this across the board to remove confusion from switching between the two terms.

    A better statement to make around the tool & accessibility would be; to meet the parsing requirement (SC4.1.1) – ALL markup that is defined in the specification appropriate to the document (set via the DOCType) needs to conform (i.e. they’re using XHTML DOCType, they need to close off all their tags, etc. And for peeps using HTML5/HTML4 DOCType, this is not an issue.) I’ve been flagging people for using HTML5 tags, etc under a XHTML 1.0 Strict DOCType. I know browsers are ‘kind’, but we cannot let this crazy practice to continue right?
    Finally, for elements not defined in the specification (i.e. web components), these should be noted as warnings maybe? With the warning being: ‘these are out of scope for coverage by the tool’. Then cases like the infamous metadata tags (particularly the IE mode tag) will no longer scare new people into thinking their code is ‘invalid’.

    Cheers,
    O.

  • A better statement to make around the tool & accessibility would be; to meet the parsing requirement (SC4.1.1) – ALL markup that is defined in the specification appropriate to the document (set via the DOCType) needs to conform (i.e. they’re using XHTML DOCType, they need to close off all their tags, etc. And for peeps using HTML5/HTML4 DOCType, this is not an issue.) I’ve been flagging people for using HTML5 tags, etc under a XHTML 1.0 Strict DOCType. I know browsers are ‘kind’, but we cannot let this crazy practice to continue right?

    The parsing requirement is defined in the WCAG 2.0 success criteria. I would suggest that flagging unclosed tags as per the XHTML doctype requirements is just make work for developers, that has no effect upon displayed content or accessibility. Browsers don’t care about the XHTML doctype if the content is being served as text/html.

  • If you guys want to change the term from ‘validation/valid’ etc. to ‘conformance’. You should do this across the board to remove confusion from switching between the two terms.

    The Nu Markup Checker is a HTML5 checker. It’s HTML5 that has changed the terms used:
    1.10 Conformance requirements for authors and 2.2.1 Conformance classes.

    Finally, for elements not defined in the specification (i.e. web components), these should be noted as warnings maybe? With the warning being: ‘these are out of scope for coverage by the tool’. Then cases like the infamous metadata tags (particularly the IE mode tag) will no longer scare new people into thinking their code is ‘invalid’.

    Given the current limitations of the checking tool software it is not a simple thing to do. Mike and I have discussed this and it needs further investigation. Mike has indicated that he is working on other more pressing and more do-able updates to the code, at this time.

  • Nice interview. Learned a few things.

  • Oscar says:

    The parsing requirement is defined in the WCAG 2.0 success criteria. I would suggest that flagging unclosed tags as per the XHTML doctype requirements is just make work for developers, that has no effect upon displayed content or accessibility. Browsers don’t care about the XHTML doctype if the content is being served as text/html.

    Yes. This was a bad example to use. I pull devs up on this for code quality/correctness. It doesn’t affect accessibility. If an application/website is architecturally designed correctly, it should be just adding a forward slash to a few lines in the masterpage/template. This is much better solution to using the script you guys at TPG created to hide the ‘non-important warning/errors’ from the W3 conformance checker.

  • Oscar says:

    Thanks for the other feedback. Have incorporated those into the advice I give to devs, etc. :)

  • Join the discussion.

    Some HTML is ok

    You can use these tags:
    <a href="" title="">
    <abbr title="">
    <b>
    <blockquote cite="">
    <cite>
    <del datetime="">
    <em>
    <i>
    <q cite="">
    <strong>

    You can also use <code>, and remember to use &lt; and &gt; for brackets.