django-html-validator

Monday, Oct 20, 2014
2 comments Python, Web development, Django

URL: https://github.com/peterbe/django-html-validator

A couple of weeks ago we had accidentally broken our production server (for a particular report) because of broken HTML. It was an unclosed tag which rendered everything after that tag to just plain white. Our comprehensive test suite failed to notice it because it didn't look at details like that. And when it was tested manually we simply missed the conditional situation when it was caused. Neither good excuses. So it got me thinking how can we incorporate HTML (html5 in particular) validation into our test suite.

So I wrote a little gist and used it a bit on a couple of projects and was quite pleased with the results. But I thought this might be something worthwhile to keep around for future projects or for other people who can't just copy-n-paste a gist.

With that in mind I put together a little package with a README and a setup.py and now you can use it too.

There are however some caveats. Especially if you intend to run it as part of your test suite.

Caveat number 1

You can't flood htmlvalidator.nu. Well, you can I guess. It would be really evil of you and kittens will die. If you have a test suite that does things like response = self.client.get(reverse('myapp:myview')) and there are many tests you might be causing an obscene amount of HTTP traffic to them. Which brings us on to...

Caveat number 2

The htmlvalidator.nu site is written in Java and it's open source. You can basically download their validator and point django-html-validator to it locally. Basically the way it works is java -jar vnu.jar myfile.html. However, it's slow. Like really slow. It takes about 2 seconds to run just one modest HTML file. So, you need to be patient.

Comments

René Dudfield October 20, 2014

I love these methods which can be used automatically to test things.

If your data is separated from your template you can check to see if the data is in the output automatically. This is why I enforce a separate JSON file to a html/pdf/email/etc template when I have the choice to make the decision.

For example, if your data is {"name": "Peter Pan"} then you can check that the string "Peter Pan" is in the visible output (eg. by using something like jQuery.fn.text(), OCR, or other method ). Same with rows of tabular data in reports.

Along with validation, it's a fairly nice automated way to test your data is actually being output. No more "Hello {name}}," and such.

Of course, if your template is doing a lot of filtering and formatting(date, number or text formatting), this is not as effective. Because it's transforming the data, so you can't use your input data as your expected data.

Spell checking is another example of a test you can apply without too much extra work.

Luke Plant October 24, 2014

A long time ago I wrote django-output-validator to do this kind of thing. https://pypi.python.org/pypi/django-output-validator/1.5 but it ran as a middleware and notified you immediately (in development) if a generated page was invalid HTML.

However, with HTML5, HTML validation got a whole lot harder, and ridiculously complex and slow, as you discovered. Since HTML5 is also a moving target, and what matters is whether it works in browsers, I decided it wasn't worth maintaining that package.

In your case, a simple check for well-formedness would have found it, and that can be coded up with a few lines of Python. It might be a more practical solution that attempting full HTML5 validation.

Previous:: Premailer on Python 3 October 8, 2014 Python
Next:: localForage vs. XHR October 22, 2014 JavaScript

Related by category:: How I run standalone Python in 2025 January 14, 2025 Python; get in JavaScript is the same as property in Python February 13, 2025 Python; How to resolve a git conflict in poetry.lock February 7, 2020 Python; Best practice with retries with requests April 19, 2017 Python

Related by keyword:: How much faster is Cheerio at parsing depending on xmlMode? December 5, 2022 Node, JavaScript; Fastest way to turn HTML into text in Python January 8, 2021 Python; Interesting float/int casting in Python April 25, 2006 Python; Check your email addresses in Python, as a whole May 22, 2020 Python, MDN

django-html-validator

Caveat number 1

Caveat number 2

Comments

Related posts