When using mutables in Python you have to be careful:


>>> a = {'value': 1}
>>> b = a
>>> a['value'] = 2
>>> b
{'value': 2}

So, you use the copy module from the standard library:


>>> import copy
>>> a = {'value': 1}
>>> b = copy.copy(a)
>>> a['value'] = 2
>>> b
{'value': 1}

That's nice but it's limited. It doesn't deal with the nested mutables as you can see here:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.copy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'else'}}

That's when you need the copy.deepcopy function:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.deepcopy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'Something'}}

Now, suppose we have a custom class that overrides the dict type. That's a very common thing to do. Let's demonstrate:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(name='Value')
>>> b = copy.copy(a)
>>> a['name'] = 'Other'
>>> b
{'name': 'Value'}

And again, if you have a nested mutable object you need copy.deepcopy:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(data={'name': 'Something'})
>>> b = copy.deepcopy(a)
>>> a['data']['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

But oftentimes you'll want to make your dict subclass behave like a regular class so you can access data with dot notation. Like this:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'

Now here's a problem. If you do that, you loose the ability to use copy.deepcopy since the class has now been slightly "abused".


>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/copy.py", line 172, in deepcopy
    copier = getattr(x, "__deepcopy__", None)
  File "<stdin>", line 3, in __getattr__
KeyError: '__deepcopy__'

Hmm... now you're in trouble and to get yourself out of it you have to define a __deepcopy__ method as well. Let's just do it:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
...     def __deepcopy__(self, memo):
...         return ORM(copy.deepcopy(dict(self)))
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
>>> a.data['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

Yeah!!! Now we get what we want. Messing around with the __getattr__ like this is, as far as I know, the only time you have to go in and write your own __deepcopy__ method.

I'm sure hardcore Python language experts can point out lots of intricacies about __deepcopy__ but since I only learned about this today, having it here might help someone else too.

Comments

Post your own comment
Marius Gedminas

There are many `__special__` names that cause problems of this kind when you accidentally provide them by overriding `__getattr__`. I've found it best to always do

    def __getattr__(self, name):
        if name.startswith('__'):
            raise AttributeError(name)
        ...

Actually, your custom getattr raises KeyError for missing attributes, which is a strange thing to get from an expression that looks like `obj.attrname`. I would suggest catching KeyError and raising AttributeError. This would actually have avoided your original `__deepcopy__` error too.

(Note: I've replaced initial spaces with non-breaking spaces so this blog won't mangle the indentation. If you actually copy & paste this code, expect interesting SyntaxErrors ;)

Peter Bengtsson

Excellent! Thanks!

Anonymous

The reason you need to add __deepcopy__ is that your __getattr__ is buggy. Try changing it to this instead:

def __getattr__(self, key):
if key in self:
return self[key]
raise AttributeError(key)

Chris Arndt

You could have just changed the __getattr__ method to look up names starting with two underscores as attributes instead of as dict keys. Shadowing the names of all the special methods which start and end with two underscores is probably a bad idea.

I would have posted example code but I can't figure out how to prevent the comment form submission from stripping all the whitespace at the start of the line. :(

Patryk Zawadzki

How about not breaking all the magic methods in the first place?

class Foo(dict):
····def __getattr__(self, attr):
········if attr.startswith('__'):
············raise AttributeError(attr)
········return self[attr]

Peter Bengtsson

Thanks! I like that.
This is why I keep blogging and expose my weaknesses because people like you come in and nudge me in the right direction.

K Lars Lohn

the class that originally inspired this posting was the configman/socorro DotDict class. It is a class meant to be a derivative of dict. It was originally written like this:

....class DotDict(dict):
........__getattr__ = dict.__getitem__
........__setattr__ = dict.__setitem__
........__delattr__ = dict.__delitem__

This gave the result of a mapping that happened to have a convenient dot notation for accessing the values. Essentially, it hijacks the attribute notation for use in accessing items. This is fraught with peril as the deep copy conundrum demonstrates. Overriding the __getattr__ and __setattr__ functions always seems cause trouble and confusion.

While the original implementation was expedient, I suggest that we may want to look at the problem from the other direction. Let's override the __getitem__ and __setitem__ methods instead. That seems to be less perilous if not a bit more verbose:

....class DotDict(collections.MutableMapping):
........def __init__(self, initializer=None):
............if isinstance(initializer, collections.Mapping):
................self.__dict__.update(initializer)
............elif initializer is not None:
................raise TypeError('can only initialize with a Mapping')
........def __getitem__(self, key):
............return self.__dict__[key]
........def __setitem__(self, key, value):
............self.__dict__[key] = value
........def __delitem__(self, key):
............del self.__dict__[key]
........def __iter__(self):
............return ((k, v) for k, v in self.__dict__.iteritems()
....................if not k.startswith('_'))
........def __len__(self):
............return len(self.__dict__)

With this implementation, there is no interference with other magic methods. Deep copy works fine without having to override it with a local implementation.

This implementation will raise an AttributeError rather than a KeyError if a key is not found. This is somewhat of an irreconcilable difference. For use in configman/socorro, we're more interested in following the collections.Mapping interface, so we ought to have a KeyError rather than an AttributeError. Adding this method ought to fix that up:

........def __getattr__(self, key):
............try:
................return super(DotDict, self).__getattr__(key)
............except AttributeError:
................if not key.startswith('_'):
....................raise KeyError(key)
................raise

K Lars Lohn

pastebin of my code from the previous posting:

http://lars.pastebin.mozilla.org/1526920

K Lars Lohn

ok, I withdraw that __getattr__ method. It superficially gave me the result that I wanted, but for the wrong reason. super(DotDict) doesn't have a __getattr__, so an AttributeError is raised.

__getattr__ is only called when the normal mechanism for finding an attribute have failed. So this is already the degenerate case. We ought to just be raising the KeyError in there without trying to first call some theoretical super implementation of __gettattr__.

As an aside, the whole idea at this point that things beginning with '_' are to be treated as special is suspect. By the time we get to a call of __getattr__ magic methods have already been resolved. So in your mind, delete my special handling of '_' in a key name.

Your email will never ever be published.

Related posts