domstripper - A lxml.html test project

November 20, 2008
1 comment Python

I'm just playing with the impressive lxml.html package. It makes it possible to easily work with HTML trees and manipulate them.

I had this crazy idea of a "DOM stripper" that removes all but specified elements from an HTML file. For example you want to keep the contents of the <head> tag intact but you just want to keep the <div id="content">...</div> tag thus omitting <div id="banner">...</div> and <div id="nav">...</div>. domstripper now does that. This can be used for example as a naive proxy that tranforms a bloated HTML page into a more stripped down smaller version suitable for say mobile web browsers. It's more a proof of concept that anything else.

To test you just need a virtual python environment and the right system libs to needed to install lxml. This worked for me:


$ sudo apt-get install cython libxslt1-dev zlib1g-dev libxml2-dev
$ cd /tmp
$ virtualenv --no-site-packages testenv
$ cd testenv
$ source bin/activate
$ easy_install domstripper

Now you can use it like this:


>>> from domstripper import domstripper
>>> help(domstripper)
...
>>> domstripper('bloat.html', ['#content', 'h1.header'])
<!DOCTYPE...
...

Best to just play with it and see if makes sense. I'm not saying this is an amazing package but it goes to show what can be done with lxml.html and the extremely user friendly CSS selectors.

How to unit test the innards of a Django view function

November 15, 2008
3 comments Django

Seconds ago I got this running and haven't yet fully understood what I've just done but the results are exactly what I need and this is going to be great.

Basically, in Django you have views like this:


def _render(template, data, request):
   return render_to_response(template, data,
             context_instance=RequestContext(request))

@login_required
def club_page(request, year):
   variable1 = year / 4
   variable2 = variable1 * 2
   return _render("foo.html", locals(), request)

Now in my unit tests I don't want to have to call the view function and then have to dissect the resulting HTML just to figure out if the view function prepared the correct variables. So, here's my solution to this problem:

Truncated! Read the rest by clicking the link below.

Comic-Con 2008 photos

October 29, 2008
0 comments Misc. links

Comic-Con 2008 photos I'm not a comic book kinda guy at all and I don't know who half of these costumes are supposed to be but it's nevertheless very impressive. Looking through the thumbnails you spot a couple of people who seem to take it very seriously but a large majority seem to just enjoy themselves. Just goes to show that people always like to dress up.

Django vs. Java

October 25, 2008
1 comment Django

From the django-users mailinglist which I'm becoming more and more helpful in:


> Could you share approximately how big your project is? I know it's
> hard to find a real measure for this, but how about number of database
> tables?

A project I worked on over the summer used a Database that was 130
tables, and getting 1gb updates every 2 minutes. I was witting a new
web app to do calculations on the data and the company wanted to use
Java since thats what they knew best and had spend huge amounts of
money (1 mil +) to support with Sun Servers, and such. But I knew
python and django would be a better fit for this particular app, but
the boss wouldnt listen. So we had 10 Developers working on the Java
version (Including me) and over 3 months we got it about 85% done,
though it had no unit tests. During the same three months, I worked on
my own time after work and basically had no life for the whole time, I
was able to get the web app 100% complete with unit tests. That
convinced my boss that Django was a good fit.

The site is an internal app that I cannot give access to (And I
actually had to get permission to give what info I have), but I can
say that Django is a suitable framework for what you are looking for. 

Christ! 10 developers and no unit test!? Someone should remind them that you don't write unit tests for your bosses pleasure but for your own sanity and productivity.

I know that this quote is totally unscientific since Dj, as he says, can't back it up but it's a story interesting enough.

Flash advert hell

October 24, 2008
1 comment Web development

Flash advert hell I actually don't mind a bit of adverts on websites but sometimes it just gets too much. On this page I couldn't even read the text since as I scroll down the huge ad over the text scrolls with the page.

Feels greedy like they've just thrown more and more ads in without thinking about the original design.

World Plone Day here in London, England

October 21, 2008
0 comments Plone

On the 7th of November my company, Fry-IT is hosting the London arm of the World Plone Day in our office. It's basically an event for Plone developers, Plone companies and other people interested in Plone to meet up and promote or learn or share something about Plone. To quote the "FAQ":

"What is the goal of the World Plone Day?
The World Plone Day (WPD) is a worldwide event. Our goal is to promote and educate the worldwide public about of the benefits of using Plone in education, government, ngos, and in business."

The "invite" is here. I'm really looking forward to meeting everybody and put faces to people whose blogs and mailing lists posts I read. My colleague Lukasz has a nice map to our office and I recommend that you contact him to say you're coming.

Why bother with MySQL...

October 9, 2008
2 comments Linux

...over PostgreSQL? I've just read through this document:MySQL vs PostgreSQL and it's obvious paragraph after paragraph that PostgreSQL is the better database. Performance, features and community are all in PostgreSQL's favor. There is almost nothing in MySQL's favor apart from obscure things like faster count(*) (without conditionals) and built in replication support. In the last two weeks I've also had the great fortune of playing with full textindexing in both MySQL and PostgreSQL and again, MySQL sucks ass and PostgreSQL (8.3) is really impressive and fast. (I've used both databases quite extensively over the past 8 years as a web developer)

I once heard that Google uses MySQL for its user database with a custom built transaction machine. And I read that Google engineers had donated some great code to the MySQL project. But why do they bother? What do they know that other engineers don't? And why is MySQL so popular with cheap stack-em-high LAMP hosting sites?

I do understand that PostgreSQL came off a bad start 5 years ago(ish) when it didn't support Windows which meant that newbies had to use MySQL and that stigma is still lingering but that was a very long time ago.

I guess it takes a lot of convincing to switch from one technology to another once you've set your mind on something. That's why we're human. A proof of this is shown if you scroll down to the bottom of this page there's a little simple survey and despite being on a long article with objective convincing arguments that PostgreSQL is better MySQL is doing quite well. Why?

When '_properties' gets stuck as a persistent attribute

October 1, 2008
1 comment Zope

Doing some on-site consulting on an old Zope CMS that has been developed by many different developers over many years. It's pretty good and has lots of powerful features but over the years certain things have been allowed to slip. One problem was that you couldn't click the "Properties" tab. The reason was that it was trying to fetch properties that didn't exist anymore. What had happened was that the class attribute _properties (which is used by the "Properties" tab in the ZMI) had been stored as a persistent attribute. Here's how to solve that:


def manage_fixPropertiesProblem(self):
    """ fix so _properties becomes a class attribute instead """
    if '_properties' in self.__dict__.keys():
        del self._properties

    return "Awesome!"