The Montoya Herald, a weblog about Blueprint, jQuery, design, music and life, publishing on the web since September 2005. Written by Christian Montoya: developer, designer and entrepreneur.

The Montoya Herald — ChristianMontoya.com

Search

I Recommend

Genesis Rocket

Like What I Do?

My Amazon.com Wish List

On this domain

Elsewhere

Serve your weblog as HTML 4.01

Posted on February 13, 2006.

You've probably heard the whining across the blogosphere by developers and standardistas alike: "my weblog has an XHTML doctype but I can't ensure valid code so I'm giving up on validation." It was probably to be expected with the trend to serve XHTML with the "text/html" mime type and all the wannabe standards advocates who thought it would be a good idea to write weblog software with XHTML doctypes. There's no need to give up, though, with a little PHP help. Here's how I've managed to serve my site has HTML 4.01 Strict, and you can do it too.

The first step is to put an HTML 4.01 doctype at the top of the document, and remove any XML attributes from the "head" tag. So we change:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

to:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>

Then we need to remove any instance of self-closing tags in the markup. Besides making the markup invalid, these tags present problems for user agents that are not as forgiving as browsers, such as Java based parsers. The function basically runs all the markup through the PHP processor and does a simple string replace. This is what it looks like:

function fix_code($buffer) {
return (str_replace(" />", ">", $buffer));
}
ob_start("fix_code");

That's all you need. The full header looks like this:

<?php
function fix_code($buffer) {
return (str_replace(" />", ">", $buffer));
}
ob_start("fix_code");
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">

Why this is a good idea

Problems

This seems to work fine, but it could make your pages load slower… this is something I'm still looking into. A better method might be to just hack your weblog software to always output valid HTML, but this is hard, and it gets worse when you want to upgrade. Another possibility would be to just replace the markup (using PHP) in areas where you know that it's needed, instead of doing it for the whole page. In the end you might decide not to bother changing anything at all, but I kind of like the warm feeling I get knowing that I'm not misusing XHTML in every one of my blogs.

Update: I've tested this on my own site and according to the timer that displays in the footer, this technique does not make pages load slower. Or, the difference is so small that it's insignificant. Either way, I'm convinced it's a reliable solution. Go ahead and give it a try.

Credits

This code was adapted from the article Serving XHTML with the correct mime type using PHP.

Get a trackback link

5 Trackbacks/Pingbacks

  1. Pingback: Minima Orchid » JonLandrum.com on September 8, 2006
  2. Pingback: Minima Orchid · JonLandrum.com on January 8, 2007
  3. Pingback: Path to Web Dev Enlightenment on December 6, 2007
  4. Pingback: Joshua Goodwin’s blog » Website design galleries are rubbish on June 28, 2008
  5. Pingback: Dear Web Gurus, et al. | The Montoya Herald on July 7, 2009

8 Comments

  1. Marco on February 23, 2006

    Hmmm… I like the idea but I get the 'why bother?' feeling. I mean: name me one example of where I'm shooting myself in the foot with my 'Almost Valid XHTML' website? To me it's all textbook material with very little real-world consequences. Feel free to prove me wrong though!

  2. C Montoya on February 23, 2006

    The only way you are shooting yourself in the foot is that the whole idea of 'Almost Valid XHTML' sets a bad example for those who visit your website and learn from you. It gives people the idea that XHTML doesn't have to be valid and while this may be true when XHTML is served as mime-type TEXT/HTML, it is not true when served correctly as APPLICATION/XHTML+XML. The real problem with XHTML on the web today comes around when people do try serving XHTML correctly, and discover that their pages don't work at all.

    XHTML is being abused a lot of late and I've been telling other designers, developers, and people learning this stuff to use HTML 4.01 for everything that does not benefit from XHTML.

    Here's a link for further reading that you might have already read, but if not, it should explain what I'm talking about: http://hixie.ch/advocacy/xhtml

  3. James AkaXakA on March 3, 2006

    Damn right. As my header states:

    It's also worth noting that you can leave self-closed tags in your html 4 without too much problem, it should still validate. What's more important however is that leaving them in has no adverse effect in any browser, so a dynamic function to get them out is fairly unnecessary.

  4. C Montoya on March 3, 2006

    James: Actually, I think the self-closed tags could have an adverse effect in any browser that does not have the tag-soup bug that allows us to close tags that way, which is why it is a good idea to remove them. The function does not slow down the loading at all, so it's "zero-weight." But you are right, the self-closed tags won't have an adverse effect in all the browsers that people serve XHTML 1.0 as text/html to. Validation in HTML 4 is a lot less of an issue than in XHTML, which is why everyone who constantly whines about validation issues should just stop using XHTML :)

    And I'm assuming something is missing from your post?

  5. Josh Peters on March 8, 2006

    Good post!

    There is practically no benefit to generating XHTML and serving it as text/html. I wish more people in the design world would understand this. It's one of these little server tidbits that really make or break the XML deal. If your server does indeed serve XHTML as an application of XML (as opposed to a text type) then the user agents benefit from it by being able to use their (hopefully faster) XML processors versus their tag soup processors.

    Validation means precious little when the content isn't served correctly. It's as if I produce a sculpture and tell people it is a painting. It's still art, but it isn't fully realized until the creation and audience are on the same page.

  6. Paul Collins on May 25, 2007

    Just wondering, would this apply to the visual editor in your weblog software, or would it still try to create XHTML?

    I am using wordpress and my biggest pain is having to disable the visual editor every time I want to add Youtube content (because of the EMBED tag).

    Thanks for the tip BTW

  7. Christian Montoya on May 25, 2007

    Paul, this applies to the visual editor because it happens right before the template code is displayed, after all the processing on the part of Wordpress. It won't, however, make any of the markup more semantic; just valid HTML.

  8. Paul Collins on May 26, 2007

    Thanks Christian, much appreciated.

Leave a comment

Use Markdown or basic HTML. For posting code, use Postable. Please keep comments respectful and on topic.