A couple of weeks ago we had accidentally broken our production server (for a particular report) because of broken HTML. It was an unclosed tag which rendered everything after that tag to just plain white. Our comprehensive test suite failed to notice it because it didn't look at details like that. And when it was tested manually we simply missed the conditional situation when it was caused. Neither good excuses. So it got me thinking how can we incorporate HTML (html5 in particular) validation into our test suite.
So I wrote a little gist and used it a bit on a couple of projects and was quite pleased with the results. But I thought this might be something worthwhile to keep around for future projects or for other people who can't just copy-n-paste a gist.
With that in mind I put together a little package with a README and a setup.py
and now you can use it too.
There are however some caveats. Especially if you intend to run it as part of your test suite.
Caveat number 1
You can't flood htmlvalidator.nu
. Well, you can I guess. It would be really evil of you and kittens will die. If you have a test suite that does things like response = self.client.get(reverse('myapp:myview'))
and there are many tests you might be causing an obscene amount of HTTP traffic to them. Which brings us on to...
Caveat number 2
The htmlvalidator.nu
site is written in Java and it's open source. You can basically download their validator and point django-html-validator
to it locally. Basically the way it works is java -jar vnu.jar myfile.html
. However, it's slow. Like really slow. It takes about 2 seconds to run just one modest HTML file. So, you need to be patient.
Comments
I love these methods which can be used automatically to test things.
If your data is separated from your template you can check to see if the data is in the output automatically. This is why I enforce a separate JSON file to a html/pdf/email/etc template when I have the choice to make the decision.
For example, if your data is {"name": "Peter Pan"} then you can check that the string "Peter Pan" is in the visible output (eg. by using something like jQuery.fn.text(), OCR, or other method ). Same with rows of tabular data in reports.
Along with validation, it's a fairly nice automated way to test your data is actually being output. No more "Hello {name}}," and such.
Of course, if your template is doing a lot of filtering and formatting(date, number or text formatting), this is not as effective. Because it's transforming the data, so you can't use your input data as your expected data.
Spell checking is another example of a test you can apply without too much extra work.
A long time ago I wrote django-output-validator to do this kind of thing. https://pypi.python.org/pypi/django-output-validator/1.5 but it ran as a middleware and notified you immediately (in development) if a generated page was invalid HTML.
However, with HTML5, HTML validation got a whole lot harder, and ridiculously complex and slow, as you discovered. Since HTML5 is also a moving target, and what matters is whether it works in browsers, I decided it wasn't worth maintaining that package.
In your case, a simple check for well-formedness would have found it, and that can be coded up with a few lines of Python. It might be a more practical solution that attempting full HTML5 validation.