e-academy – IT training excellence in Cardiff, Newport, Bristol and South Wales

Creating valid HTML

We look at the process of validation - testing that the HTML in your Web pages is properly written - and what its advantages are.

13 October 2009

At some point during the process of learning how to create website pages, you're going to come across the phrase 'validation'. Validation is a process which seems to divide website designers - some are passionately for it, some just as passionately against it. So, what is it?

When creating Web pages, either coding by hand or using a WYSIWYG tool such as Dreamweaver, the underlying structure is a fairly simple mark-up language:

HTML (or XHTML). HTML works by enclosing content in tags: for example, the <strong> tag (used in conjunction with </strong> to close the tag) will typically render the content within it as bold.

Simple enough. But these tags need to be written and nested in the correct order. For example, <strong><em>word</em></strong>is correct, whereas <strong><em>word</strong></em> is not. The closing tags are in the wrong order; this is known as 'invalid HTML'.

Now some browsers may well render this correctly, at the time you test it, so the error may go unnoticed - but the problem may surface in a later version of a browser, which perhaps may be less forgiving of invalid HTML.

On a fairly substantial Web page, these errors can quickly mount up - well everyone's human. Even Dreamweaver, which isn't human, can create invalid HTML when left to its own devices (and, indeed, often creates lots of HTML that isn't really needed).

Is this is a problem?

It can be. One of the most time-consuming parts of creating a Web page isn't the actual coding, it's testing the page in various browsers and debugging the issues - usually where the page is displaying slightly (or wildly) differently. A website designer should test in all current major browsers, which includes: Internet Explorer 6, 7 and 8 (the Trident rendering engine), Firefox (the Gecko rendering engine), Opera (the Presto rendering engine) and Safari and Google Chrome (the WebKit rendering engine).

These all behave differently, so even a well-coded Web page can misbehave a little in one or more of these browsers. When there are problems with the validity of the code, the issues can become far more pronounced. So, when debugging, an essential first step is to rule out the possibility that your code is in error - this is done by validating the code. In effect, checking that its structure is OK.

The easiest way to do this is with the World Wide Web Consortium's free HTML/XHTML validation tool. It only takes a second or two, and any problems within the HTML are clearly listed out, ready for you to fix. Let's see how this page checks out. (Phew! It's valid!)

There are more advanced tools. A great example is CSE HTML Validator, which has significant advantages over the W3C's tool. For example, it also provides spell checking and accessibility checking - although for Mac and Linux people, it's sadly Windows only.

Trying to resolve browser incompatibility issues without first validating the code is a fool's game - it can take far more time to track down an issue and can even result in a developer putting further bad code in place, in order to resolve the initial issue.

But there are other reasons to validate. Your code is also more likely to perform better in browsers that aren't even released yet - the biggest overall push in browser development is towards standards-compliance, so non-compliant websites could find themselves rendering poorly once standards are more tightly enforced. (Today's browsers are often forgiving of sloppy code, in fact many leading websites don't validate - but that's no reason for us to ignore the need.)

Code that doesn't validate doesn't meet accessibility standards. One of the most basic accessibility requirements is that code 'conforms to published grammars' - ie, that it validates.

And it can provide a search engine advantage. If search engine spiders don't know where tags open and close, what's content and what's code can become ambiguous, leading to important content not being indexed properly.

A clean page is faster loading than a badly written page - because the browser isn't having to take a stab at how to display things.

But it also provides evidence and assurance of a job well done, that you care about your craft and that you have worked to create measurably error-free code. It means you are coding to recognised standards.

Although the initial experience of validating a page can be a bit crushing ("how many errors?!") once you get into the swing of it, it's pretty simple and saves you a vast amount of time and effort in the long run.