26th October, 2007
The Furious Sleep
Friday, 10:58 am in CodeGirl
Don’t all groan at once, but today I was thinking about HTML validation. Those (two) of you who are long-time readers will know that I have a habit of banging on about this topic every once and a while, but it occurred to me today while I was looking for books to take to the doctor’s this afternoon1 that I don’t think I’ve ever done it particularly coherently. So this is my attempt to do just that.
So here’s the deal; once upon a time, I used to be a hard-core XHTML sceptic. I didn’t get the point of ‘validation’ or why I should sacrifice the aesthetics I’d come to know and love – the Sliced Table Layout – in the name of it. What did I care if one random person using Lynx couldn’t get to my site? It looked fine in my browser, and the browsers of all my friends; screw everyone else.
To tell you the truth, I still think like this. It’s the reason I don’t really care that a lot of my sites – especially the ones designed in the past six months or so – look shonky on IE6 and, to a lesser extent, IE7. It’s the reason I don’t lose any sleep over the fact that v-s.net side-scrolls in 1024×768, even though that’s the resolution most of my visitors apparently use.2 But still, I’m a believer.
I’m not a believer in XHTML; I think the whole thing is ill-conceived, and despite what Zeldman and pretty much everyone else might try and tell us… I don’t think it’s the ‘future’. It could be and it should be, but I don’t think it will be, at least not until all the supporting technologies catch up and by that time there will invariably be a new Next Big Thing to contend with. I don’t write in XHTML – v-s.net is the exception, and as soon as I get a large block of time to do a re-write, I will – and with only minor provocation will encourage others not to do the same. But I still believe in standards; my vehicle of choice nowadays is HTML 4.01 Strict. The main reason is that I’ve always felt the arguments around using XHTML are a bit disingenuous. HTML is not going to become unsupported any time soon, and in fact writing ‘XHTML’ documents won’t make your sites somehow futureproof against the vapourware “next gen” browsers that only support XHTML.
Why? I think it’s time for a sub-heading…
HTML vs. XHTML Revisited
I’ve pointed this out previously but not particularly articulately, so here’s my second attempt at it (I’m rehashing a bit here, so bare with me):
Your XHTML is not valid.
It’s not valid because you aren’t actually serving XHTML at all; you’re serving HTML. Well, strictly speaking text/html; this is the Content-Type that your server is sending to your browser. Now, the Content-Type of a file affects what the application that receives it does with it, and as those of you who are clever (or have seen this argument before) may have surmised, HTML is treated differently than ‘true’ XHTML, which should be served as application/xhtml+xml. The parser for XHTML is actually completely different than the parser for HTML; this becomes evident in older browsers such as IE6, which don’t have one at all and will consequently ask users to download XHTML files. Even the vaunted Firefox’s XHTML support is limited prior to version 3.
So what are you actually serving up when you stick that XHTML doctype in? Well, sorry but… it’s tag soup. Invalid HTML markup, in other words. Now, it works, but it only works because the syntactic flexibility of HTML allows it to work. Unlike the HTML parser, which attempts to render even the most malformed code as best it can, the XHTML parser will die horribly on error. Kaput; no more. And this is where we run into problems with most current implementations of XHTML-over-HTML, because they do contain dodgy code, usually through no fault of the author and rather simply because they are running last-gen CMSes that aren’t designed to serve up 100% valid code 100% of the time. The Mozilla site above says it best:
There is a fad of serving
text/htmlto IE but serving the same markup with no added value asapplication/xhtml+xmlto Mozilla. This is usually done without a mechanism that would ensure the well-formedness of the served documents. Mechanisms that ensure well-formed output include serializing from a document tree object model (eg. DOM) and XSLT transformations that do not disable output escaping. When XHTML output has been retrofitted to a content management system that was not designed for XML from the ground up, the system usually ends up discriminating Mozilla users by serving tag soup labeled as XML to Mozilla (leading to a parse error) and serving the same soup labeled as tag soup to IE (not leading to a parse error).Quoted From: Mozilla Web Author FAQ (emphasis mine)
In short, it’s not a good idea to switch to ‘true’ XHTML if you’re doing it by hand.
There are other downsides to serving XHTML, the main one being that the different DOM will break certain parts of JavaScript, including the elements that most tracking systems use to run.
Of course, what this all means is that your XHTML-over-HTML isn’t helping to make your site “future-compatible”. Since what you’re actually serving is (invalid) HTML, you are still reliant on browsers that have HTML parses; which is to say, all of them. This is why I think Zeldman’s argument is misleading in a lot of ways, not to mention is teaching some bad habits. And it gets picked up as part of the whole Validation Fad that floats around, buoyed by endless WPR sites who bang on and on and on about it without really knowing what the fuck it all means.
So, what to do?
The Case for HTML 4.01 Strict
It gets to the point where you just have to ask yourself what the hell you are actually trying to achieve. Because I’m not against validation or cross-browser compatibility or accessibility or the separation of content and presentation or semantic code. In fact, I love all these things. And you know what’s even better? All of this stuff is completely achievable using the Internet’s forgotten doctype: HTML 4.01 Strict.
HTML Strict is pretty much exactly what you’ve been trying to do with your XHTML validation trend, except untroubled by the aforementioned errors that make using XHTML technically unsound. In fact, if you go through your XHTML site and just change every /> to > you’ve pretty much got yourself an HTML Strict document right there. It’s that easy. HTML Strict preserves all your cross-browser compatibility; in fact, it arguably enhances it, since browser support for HTML is much, much more mature than support for XHTML. And since browsers are most likely rendering your ‘XHTML’ code as HTML anyway, what have you got to lose? HTML Strict preserves all the same rules and best practices with regards to semantic coding and you can still give yourself that validation link at the bottom of your page.
For what it’s worth.
Validation vs. Semantics
Here’s the deal, kids; everyone and their dog knows that to be part of the ‘in crowd’ nowadays you need that little tag down the bottom of the page that boasts your X/HTML compatibility to the world. Your page needs to be valid, and the whole thing reminds me a little of a famous quote about grammar:
Colorless green ideas sleep furiously.
Quoted From: Noam Chompsky
Before you all go, “Buh?” at once, this is a critique by linguist Noam Chompsky3 about certain methods of understanding grammar, and while that’s somewhat irrelevant what it is pointing out is that while a sentence in English can be completely grammatically valid, it can also be completely nonsensical. And the thing is, I see furious sleeping in supposedly ‘XHTML’ code all the time. Take a look at the following example:
<div class="post"><div class="post-title">My Post Title</div>
<div class="post-body">This is a very common way to see weblog posts displayed.<br/><br/>
But you have to ask yourself... why?</div></div>
This is an example of colourless green in XHTML and I see it all the time. Sure, the above example is valid (unless I’ve fat-fingered it, of course) but it tells us absolutely nothing about the content. And this is where the message of the Strict doctypes has gotten a bit lost in the rush to validate. Because, really, it’s not about having code that is 100% valid, 100% of the time. Valid code is nice, but more than anything it should be a side effect of code that is semantic. Because the whole point of the HTML/CSS model of code is to separate out presentation from content, which is a kind of simplified adaptation of the multi-tier architecture model that’s all the rage in apps dev right now. Most people ‘get’ that the CSS is the presentation layer; it describes how the code looks. What people generally seem slightly less sure of is that this means that the X/HTML is then not simply an ‘anchor’ into the CSS, but rather a separate entity on its own that is supposed to be used to describe what the code is. This is what it means for code to be ‘semantic’. It’s not just about throwing nested <div>s everywhere.
If you go back to the example above, then, a more semantic way of marking up the same code would be:
<div class="post"><h3>My Post Title</h3>
<p>This is a slightly less common way to see weblog posts displayed</p>
<p>Though it's gaining more popularity as people understand more about semantic rather than simply 'valid' code.</p></div>
Semantic code and the separation of presentation and content are I think the most important aspects of the transition to Strict doctypes; XHTML or HTML. Much more so than validation. Sure, validation is important, but mostly because it helps us to write semantic code in the first place. It’s not essential since (most of us) aren’t serving up true XHTML and therefore our websites won’t totally implode if we have the odd tag out of place.
- I’m getting my first Gardasil shot this arvie. All you Australian ladies out there between age 13 and 26 who haven’t yet gone to do the same; shame on you! The shots are being subsidised by the Commonwealth for a limited time and they prevent the incredibly gross threat of genital warts (also cancer, which is less gross but more fatal). If you miss the subsidised period the vaccinations cost something like $460; how often can you say you got something worth $460 for free? ^
- Yes, I know; I was shocked too. Just goes to show you… ^
- Whom you may know for his political activism, but the man’s day job is linguistics. ^
- Comments By » ~Belinda [h], ~Arwym Starlight [h]
- « Previous
- Next »
Related Posts
Comments
-
Screw validation!
I jest, what you wrote about it seems quite relevant. But eh, I don’t care for it too much to be honest.I’m Australian and I’m getting my second Gardasil shot next week.

-
TBH I’m not sure why any non-programmers are even remotely interested in the debates around semantic code and validation.
I mean, realistically it’s not that important, but it’s a habit that you get into; I do all my sites this way now, and I have to admit that while it was a struggle at first now it’s actually easier. Plus it makes hella-neat HTML and, like, I really dig that because I am a giant nerd. 
-
I was thinking, you have some very good points. I don’t completely agree with XHTML markup validation and all that stuff, to be honest. There are some things I would have preferred to keep. But it seems that my code, when done with Strict XHTML Markup in mind, looks much more consistent, less bloated, easier to read. For whoever wants to look at my pages’ source codes, I guess. I am one who looks at almost every website’s source code whenever I can.

Even though I am not completely fond of it, I continue ‘validating’ my code whenever I can. Though I sometimes just say, “to hell with it!” Like you said, it is not THAT important, at least not at this moment.I still have to learn a lot. My code is still not that clean. It doesn’t always make sense. But I’ll keep this article and others in mind in order to improve.
-
It’s not the – how can I put this? – it’s not the notion of the Strict doctype in general that bothers me (I like the separation of presentation and content; A++ idea, would separate again) but rather the use of XHTML.
XHTML got pushed memetically as the next “in thing” by sites like A List Apart (that’s Zeldman) and if you go read the arguments as to why they’re (almost) all… completely spurious. The reality of it is that modern browsers don’t parse XHTML correctly, so what they’re serving your ‘XHTML validated’ page as is junk HTML. And in that case why bother to use XHTML at all? Why not use HTML Strict? Unless you’re performing back-end XSLTs or something, the functional differences between HTML Strict and XHTML Strict are pretty much negligible. There just doesn’t seem to be a decent argument for using XHTML, so I’m really, really baffled as to why it’s so popular.

The other thing is, validation really doesn’t matter. Validation only matters if you’re using XHTML and you’re serving it as
application/xhtml+xmlbecause if that’s the case, even the tiniest validation error will cause your entire page to die. But since most people serve XHTML-as-HTML, errors just get neatly absorbed by the browser (because HTML isn’t strictly parsed). What people really need to be focusing on is making sure their code is semantic; because that’s the whole point of the Strict doctypes. It’s only once that’s done that they should start iterating in the validation.I dunno. It’s just… it’s not that I care if people are using XHTML or HTML or validating or not. What bothers me is people adopting a technical standard without fully appreciating the issues behind it, and then bleating about how their way is the only way. I mean, you can serve as much valid, non-semantic faux-XHTML as you like, and I’m totally fine with that… so long as you can give me a reason as to why you chose that over something else (and “because ALA said so” is really not good enough).
-
Yes, I know of people who see A List Apart’s articles as if they were rules and standards (lol a Bible?) that everyone should follow. I guess that the author has a lot of influence over this entire community. That is, perhaps, one of the reasons why XHTML has become so popular.
I hadn’t looked at it from that angle, because I’ve only read what’s on Wikipedia and W3Schools. And I do it mostly because the people who know about XHTML expect you to use it, and to validate according to it. So if you enter the field of web design as a professional, whoever knows about this will try to convince you that this is the only right thing, and that you MUST use XHTML, because HTML is a thing from before the Web 2.0 era, etc. So it’s like with almost every other trend, and the popular culture. Even if you know that it doesn’t make much sense, you are almost forced to be in it.
It’s like, for example… I am still wondering why it’s so bad to force a ‘target’ on your links. Seriously. I don’t think it’s that big of a deal. And I don’t understand the reasons why I can’t validate in Strict XHTML with that.
There’s a lot to comment about this matter, but I really should get back to studying. >.> Heh, I get distracted too easily.
-
So if you enter the field of web design as a professional, whoever knows about this will try to convince you that this is the only right thing, and that you MUST use XHTML, because HTML is a thing from before the Web 2.0 era, etc.
Nah, not necessarily true; I do some freelance webdesign when I absolutely can’t avoid it and it depends on the client. Because 99% of your clients (and, hell, most webdesigners) don’t understand the real whys behind XHTML either, so if you can present a neat case to them about why it’s really not the “way of the future” most will bow to your technical expertise; that is, after all, why they’re paying you. It’s a skilled profession; a webdesigner who just does things because they’re ‘trendy’ doesn’t deserve the title.
Of course, sometimes there are legitimate technical reasons why you use XHTML. Like, with the work I do for the OTW, we use XHTML because our back-end framework is Ruby on Rails, and RoR’s code libraries output XHTML. In this case it’s just easier to take the path of least resistance rather than trying to re-write the whole Rails framework (not that, yanno, it doesn’t need one, but that’s a different issue).
I am still wondering why it’s so bad to force a ‘target’ on your links.
Four reasons:
- How a window opens is behavioural, not semantic/structural, therefore it doesn’t belong in the markup.
- It makes an environmental assumption (that you’re using windows). The markup isn’t supposed to be making environmental assumptions.
- It was originally designed for use with frames, which are also depreciated (because they’re presentation, not structure).
- There are also a bunch of accessibility concerns with things like screen readers and so forth.
I really should get back to studying.
Coincidentally, I really should get back to working.
This is more fun.
