Wednesday 28 May 2008

Migrating models

There has been some discussion on the app engine list recently about the best way to handle data migration when your model changes. I thought I'd put together a short summary of some of the options which are open to add, rename, remove or change the type of an existing field.

For the examples here, I'm going to start with a simple model example and gradually change the class. At each stage we can see how the data store handles the change.


>>> from google.appengine.ext import db
>>> class Entity(db.Model):
... """Dummy class for testing"""
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... birth_year = db.IntegerProperty()
... height = db.IntegerProperty()


We also want some test data. I think one name should be enough here.
We'll add it to the data store and then read it back to check it's
there:



>>> e = Entity(first_name='Abraham', last_name='Lincoln', city='Washington', height=76, birth_year=1865).put()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.city
Abraham Washington



Adding a field


Adding a new field is easy enough, we get it set to the default value
(or None if there is no default):



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... birth_year = db.IntegerProperty()
... height = db.IntegerProperty()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, getattr(e, 'version', 'not set')
Abraham 2


Watch out though, until you commit an entity the new field won't be
accessible in queries:



>>> print [e.first_name for e in Entity.all().filter('version =', 2)]
[]
>>> for e in Entity.all(): key = e.put()
>>> print [e.first_name for e in Entity.all().filter('version =', 2)]
[u'Abraham']




Renaming a field


How about renaming a field? There is of course no relationship between
the original and the new field value:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.IntegerProperty()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.yob
Abraham None


To rename a property (in this example birth_date becomes yob), create
a property with the new name and make the original into a
DummyProperty. Then a bit of fixup code in the class __init__ can
handle the migration:



>>> class DummyProperty(db.StringProperty):
... """A property whose only value is None and nothing
... gets saved to the data store"""
... def validate(self, value):
... return None
... def get_value_for_datastore(self, model_instance):
... return []


The actual renaming is done by picking up the old and new names as
keyword arguments to the initialiser and the doing any required
conversion:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... birth_year = DummyProperty() # Old field
... yob = db.IntegerProperty() # New field
... height = db.IntegerProperty()
... def __init__(self, parent=None, key=None, _app=None,
... birth_year=None, yob=None, **kwds):
... super(Entity,self).__init__(parent, key, _app,
... yob = birth_year if birth_year is not None else yob,
... **kwds)
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.yob, e.birth_year
Abraham 1865 None


To update the datastore just grab a bunch of objects which have the
old field (sorting the query works as an existence test) and put each
entity. Once we've done that the field disappears from the index in future
queries, but it does become accessible using the new attribute name:



>>> len([ e.put() for e in Entity.all().order('birth_year').fetch(10)])
1
>>> len([ e.put() for e in Entity.all().order('birth_year').fetch(10)])
0
>>> [ e.first_name for e in Entity.all().order('yob').fetch(10)]
[u'Abraham']


Once the migration has been completed we can remove the DummyProperty
from the model.


If the old field isn't a property which supports sorting then you'll
need to find another way to do the migration: using the version field
would be good here, but only as long as you know the version number is
actually set in all the data.




Removing a field


Removing a field is not a problem, the extra data in the database is
simply ignored:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.IntegerProperty()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, getattr(e, 'city', 'not set')
Abraham not set


Alternatively we can make it into a dummy property and use the same
trick as before to make sure the content is actually removed:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = DummyProperty()
... yob = db.IntegerProperty()
... height = db.IntegerProperty()
>>> len([ e.put() for e in Entity.all().order('city').fetch(10)])
1
>>> len([ e.put() for e in Entity.all().order('city').fetch(10)])
0
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.city
Abraham None




Changing field type


If we need to change the type of a property and don't try to handle
it, then we'll get an exception thrown



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.FloatProperty()
...
>>> [(e.first_name, e.height) for e in Entity.all()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BadValueError: Property height must be a float


Fortunately that can be fixed using __init__ in the same way as a
rename. This time (for variety) I decided to use the version number
to force the migration:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=3)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.FloatProperty()
... def __init__(self, parent=None, key=None, _app=None,
... height=None, version=None, **kwds):
... if height is not None: height = float(height)
... super(Entity,self).__init__(parent, key, _app,
... height = height, version=3,
... **kwds)
>>> len([ e.put() for e in Entity.all().filter('version <', 3).fetch(10)])
1
>>> len([ e.put() for e in Entity.all().filter('version <', 3).fetch(10)])
0
>>> [(e.first_name, e.height) for e in Entity.all()]
[(u'Abraham', 76.0)]


You can also use this to fixup bad content: if something gets into the
datastore which won't validate then you can't read it, but if you can
recognise the bad data you can check for it in __init__ and
correct it.


1 comments:

Anonymous said...

This was really helpful, thank you!