Absent Elements and Validation

by .

We received the below question from Guy Carberry who was wondering what effect changing the doctype on your HTML or XHTML pages to the HTML 5 doctype will have on those elements that are deprecated current draft.

The Question

There are several aspects to consider from this question so lets start by taking a look at Guys question in full.

Jeremy Keith says that we can change the Doctype declaration on XHTML 1.0 syntax pages to <!DOCTYPE html> and it will immediately become HTML 5.

I'm a bit confused about what that means for the Absent and changed elements and attributes info detailed here: http://www.w3.org/TR/html5-diff/#absent-elements

"Some attributes from HTML 4 are no longer allowed in HTML5. If they need to have any impact on user agents for compatibility reasons it is defined how they should work in those scenarios."

So if this is that case, how can you switch the doctype declarations and all be fine?

Thanks for your help,

Guy.

The Doctors Response

In answering the question we will need to break the question down into the following:

  1. Which elements or attributes are deprecated?
  2. Do I use any of those elements or attributes on my site?
  3. How much do I need to worry about validation?
  4. What effect will this have for backwards compatibility?
  5. Can my content be accessed in all my target browsers and user agents?

Lets consider each of these in some more detail below.

1. Which elements or attributes are deprecated?

The elements deprecated (from HTML 4.01) from the HTML5 specification are: basefont, big, center, font, strike, tt, frame, frameset, noframes, acronym, applet, isindex, dir

You can find a list of those elements and attributes along with detailed reasons of why they've been removed by reading HTML5 differences from HTML5.

2. Do I use any of those elements or attributes on my site?

If your site uses any of the above named elements or attributes you may consider changing them to more semantic elements as introduced in HTML5. Alternatively you may decide to remove them and use CSS for presentation in the case of <font> or <center>.

Guy told use that he works for a large university. The site uses a large number of <acronym> tags which have been removed from HTML 5 as detailed above. To solve this issue for Guy, I feel he has three options if he wishes to make the site: HTML5:

  1. Change the doctype to HTML5 and leave the <acronym> elements in. (Bear in mind the HTML won't validate)
  2. Change the doctype to HTML5 edit the <acronym> elements to become <abbr> elements as advised by the specification
  3. Do nothing and leave the site as HTML 4.01 or XHTML

It's difficult to be able to advise Guy on which option he should choose without knowing the full extent of the effort involved to implement these solutions or knowing the sites goals and project teams views on validation. Speaking of validation, the whole area is a minefield and it's what we'll attempt to cover next.

3. How much do I need to worry about validation?

While validation is undoubtedly important for your markup and your CSS, in my opinion it isn't crucial to a site. Allow me to explain, we recently received a couple of emails pointing out that this site doesn't validate. While there were some errors that have now been corrected, a primary reason why is the use of ARIA roles in the markup. These attributes currently aren't allowed in the current specification, however there is work underway to make this happen.

To illustrate this point let's look at Google, the search giant. If you look at the source on Google's search pages you'll see they use the HTML5 doctype.

<!DOCTYPE html>

However, those pages don't validate because they use the <font> and <center> elements amongst others things that we already know have been removed from the specification. Does this mean that users stop visiting Google? No.

Remember too that the specification is yet to be finalised and may still be changed (thus breaking you're perfectly valid docments), in partnership with this changes to the specification may not immediately take be implemented in the validators. We also need to bear in mind that HTML5 takes a "pave the cowpaths" approach to development, meaning that the Hixie, et al will look at what authors already do and improve upon it.

4. What effect will this have for backwards compatibility?

Leading on from the point about Google as mentioned previously we know Googles site not validating doesn't matter but is it still backwards compatible? Yes. As far as we know, all the browsers support the <font> tag. It's the validators that don't. If you want to use an antiquated element you can do, the browser will still handle it - it just won't validate.

In terms of backwards compatibility there is no reason why you couldn't start using new HTML 5 elements in your HTML 4.01 or XHTML 1.x documents, it could even be described as progressive enhancement. For example, why not include the <video> element? It won't validate but maybe that doesn't matter? The browser will still be able to render video, hence "paving the cowpaths". Having said that if you're going to the effort of including new elements why not go the whole hog and change the doctype too?

5. Can my content be accessed in all my target browsers and user agents?

Again to look at Google, it still renders in older browswers and devices but to all intents and purposes it's HTML5. Another aspect that the specification defines is that new features should work in or not break in older browsers.

Remy pointed out in a response to Guy that

Remember that the browser is still going to handle this, regardless of whether HTML5 supports this or not. Just like and XHTML document although doesn't validate with frames, it still renders fine and still supports frames.

So from that we can conclude that browsers will still render deprecated elements or practices.

Summary

To wrap up I think we can answer Guys question by saying yes, changing the doctype will make your site HTML5, but whether or not it validates is in essence an entirely personal or business decision. Whether this means you are making use of all the benefits of HTML5's new elements, javascript API's or offline storage is a whole different subject.

5 Responses on the article “Absent Elements and Validation”

Rich says

Just a couple of errors:

“What affect will this have for backwards compatibility?” -> effect, not affect

“but to all intense and purpose” -> intents, not intense

Good article – the point is that you may as well use HTML5 where possible, as the doctype doesn’t really affect how browsers work (other than quirks mode).

Most of the deprecations are pretty trivial anyway, all have obvious ways to replace them using either more appropriate elements, or CSS.

Barney says

There are some good points helpfully elucidated in this article but the general point and gist of things is a bit unclear, especially as regards your references to Google.

The point that Google are confident enough to apply the generic HTML doctype, and that it clearly doesn’t break their pages in any tangible ways, should clear up a lot of FUD that people get when it comes to the ultimate value of tick-all-the-boxes validation and the practicalities of user-agent parsing — however the advantages (in fact, any incentive — theoretical or otherwise) are completely missing from the analysis. Telling the browser you’re giving it an HTML5 page and then serving loads of HTML4 is confusing, misleading, and ultimately pointless (right?).

I disagree that having makes a page de facto HTML5 (and I think you’re misinterpreting Jeremy Keith’s article on the subject) — what this is saying, literally, is “this is an HTML page of some kind or other but I won’t to commit on any further specifics (ie variant, version, strictness)”. Google’s page says it is HTML, no more, no less, but the actual code could legitimately be described as HTML4.01. It doesn’t actually contain any HTML5.

In theory (not sure about practice), this vague html doctype might be considered good practice in a template for a huge site with both legacy and bleeding-edge elements that might contain deprecated HTML4 on some pages, and as-yet-unsupported HTML5 on others. But when I say ‘good practice’, it’s only inasmuch that it’s better practice than no doctype at all — at least the browser knows it’s parsing HTML and not, for instance, an MP3, a PDF or simple TXT — how much better it is than having a single more specific, but often or always incorrect HTML-variant doctype (say, XHTML strict), is a whole other debate.

The end impression I get from the article is that since browsers will always interpret HTML and XML as best they can in their own individual ways anyway — and seeing as by-the-book validation ultimately doesn’t affect the end user experience — doctype is simply irrelevant. I’m not saying this is horribly misguided or short-sighted, and this isn’t a rhetorical criticism.

I think there is an important statement that could sum up this article:

In practice, doctype validation is subjective.

But I also think that’s meaningless without elaboration into a few sub-points:

1) doctypes have no intrinsic value — it’s their interpretation by the user agent(s) that counts.

2) Accurate specificity in the doctype means less guess-work on the user agent end, which means that if used correctly up to the browser’s understanding, it will be able to start rendering immediately without first processing the content and working out how it should process it. FF3 will render my element whether I tell it upfront I am serving HTML5 or not, just as IE8 won’t render it with just as much attention paid to the doctype — whatever it may be.

3) A document that fails w3 validation is not necessarily a bad thing, but a document that browsers have difficulty rendering, or render differently, is. Just as we currently wrap s inside s for Flash that will end up with at least some of the code being unrenderable and hence ignored by the browser, so should users using elements put alternative content within the tags if they want the vast majority of users to get any content in that part of the document (and if you want IE6 users to see your page as you intend, don’t rely on CSS rules applied to HTML5-spec elements).

Sorry to bang on a bit, but I think the topic’s well worth the debate.

Rich says

@Rich thanks for the spots, should be corrected now :)

@Barney, you make some great points there. I think you’re right about Google giving it an HTML5 page and then serving loads of HTML4 is confusing, misleading, and ultimately pointless, but the point being let’s illustrate the fact that “implementing” HTML 5 is that easy.

I also think you’ve summed up your point about doctype validation being subject very well albeit that a few bit’s seems to have been stripped from the markup?

I’m not sure I follow on the part about IE6 though, if you’ve applied the javascript shiv (as required by all versions of IE) you can style elements normally thought most will require display:block – or are we at crossed wires?

Anyway thanks for your input.

Rich

Alohci says

Nice article. One quibble however.

In my opinion is wrong to characterize Google’s use of <!doctype html> as meaning that they are using HTML5. All they are doing is making use of a single piece of information that came out of the research for HTML 5 , which is that <!doctype html> is the fewest number characters that will cause all browsers to use full standards mode. For pages that are famous for not wasting characters, this makes perfect sense, regardless of any version of HTML being targeted.

Jukka K. Korpela says

The article does not seem to discuss the practical problem of validating a mixed-markup page, i.e. a page that contains legacy markup that is not allowed in HTML5 but also new HTML5 markup that is not valid under any document type defined in HTML specifications.

If you use , you get error messages about legacy markup, and if you use e.g. HTML 4.01 Transitional doctype, you get error messages about any new HTML5 feature. This is a problem because there might be, in either case, so many error messages that it gets difficult to find the _real_ errors, such as typos in tag or attribute names or lack of obligatory end tag or wrong nesting.

Is anyone working on a HTML 4 + HTML5 document type definition, i.e. a DTD that allows both HTML 4.01 Transitional and new HTML5 markup? I know that HTML5 is not SGML-based, but can it be treated as sufficiently SGML-like to make the idea feasible?

Join the discussion.

Some HTML is ok

You can use these tags:
<a href="" title="">
<abbr title="">
<b>
<blockquote cite="">
<cite>
<del datetime="">
<em>
<i>
<q cite="">
<strong>

You can also use <code>, and remember to use &lt; and &gt; for brackets.