There has been some discussion on the app engine list recently about the best way to handle data migration when your model changes. I thought I'd put together a short summary of some of the options which are open to add, rename, remove or change the type of an existing field. For the examples here, I'm going to start with a simple model example and gradually change the class. At each stage we can see how the data store handles the change. >>> from google.appengine.ext import db >>> class Entity(db.Model): ... """Dummy class for testing""" ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... city = db.StringProperty() ... birth_year = db.IntegerProperty() ... height = db.IntegerProperty()
We also want some test data. I think one name should be enough here. We'll add it to the data store and then read it back to check it's there:
>>> e = Entity(first_name='Abraham', last_name='Lincoln', city='Washington', height=76, birth_year=1865).put() >>> e = Entity.all().filter('last_name =', 'Lincoln')[0] >>> print e.first_name, e.city Abraham Washington
Adding a fieldAdding a new field is easy enough, we get it set to the default value (or None if there is no default): >>> class Entity(db.Model): ... """Dummy class for testing""" ... version = db.IntegerProperty(default=2) ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... city = db.StringProperty() ... birth_year = db.IntegerProperty() ... height = db.IntegerProperty() >>> e = Entity.all().filter('last_name =', 'Lincoln')[0] >>> print e.first_name, getattr(e, 'version', 'not set') Abraham 2
Watch out though, until you commit an entity the new field won't be accessible in queries: >>> print [e.first_name for e in Entity.all().filter('version =', 2)] [] >>> for e in Entity.all(): key = e.put() >>> print [e.first_name for e in Entity.all().filter('version =', 2)] [u'Abraham']
Renaming a fieldHow about renaming a field? There is of course no relationship between the original and the new field value: >>> class Entity(db.Model): ... """Dummy class for testing""" ... version = db.IntegerProperty(default=2) ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... city = db.StringProperty() ... yob = db.IntegerProperty() ... height = db.IntegerProperty() >>> e = Entity.all().filter('last_name =', 'Lincoln')[0] >>> print e.first_name, e.yob Abraham None
To rename a property (in this example birth_date becomes yob), create a property with the new name and make the original into a DummyProperty. Then a bit of fixup code in the class __init__ can handle the migration: >>> class DummyProperty(db.StringProperty): ... """A property whose only value is None and nothing ... gets saved to the data store""" ... def validate(self, value): ... return None ... def get_value_for_datastore(self, model_instance): ... return []
The actual renaming is done by picking up the old and new names as keyword arguments to the initialiser and the doing any required conversion: >>> class Entity(db.Model): ... """Dummy class for testing""" ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... city = db.StringProperty() ... birth_year = DummyProperty() # Old field ... yob = db.IntegerProperty() # New field ... height = db.IntegerProperty() ... def __init__(self, parent=None, key=None, _app=None, ... birth_year=None, yob=None, **kwds): ... super(Entity,self).__init__(parent, key, _app, ... yob = birth_year if birth_year is not None else yob, ... **kwds) >>> e = Entity.all().filter('last_name =', 'Lincoln')[0] >>> print e.first_name, e.yob, e.birth_year Abraham 1865 None
To update the datastore just grab a bunch of objects which have the old field (sorting the query works as an existence test) and put each entity. Once we've done that the field disappears from the index in future queries, but it does become accessible using the new attribute name: >>> len([ e.put() for e in Entity.all().order('birth_year').fetch(10)]) 1 >>> len([ e.put() for e in Entity.all().order('birth_year').fetch(10)]) 0 >>> [ e.first_name for e in Entity.all().order('yob').fetch(10)] [u'Abraham']
Once the migration has been completed we can remove the DummyProperty from the model. If the old field isn't a property which supports sorting then you'll need to find another way to do the migration: using the version field would be good here, but only as long as you know the version number is actually set in all the data.
Removing a fieldRemoving a field is not a problem, the extra data in the database is simply ignored: >>> class Entity(db.Model): ... """Dummy class for testing""" ... version = db.IntegerProperty(default=2) ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... yob = db.IntegerProperty() ... height = db.IntegerProperty() >>> e = Entity.all().filter('last_name =', 'Lincoln')[0] >>> print e.first_name, getattr(e, 'city', 'not set') Abraham not set
Alternatively we can make it into a dummy property and use the same trick as before to make sure the content is actually removed: >>> class Entity(db.Model): ... """Dummy class for testing""" ... version = db.IntegerProperty(default=2) ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... city = DummyProperty() ... yob = db.IntegerProperty() ... height = db.IntegerProperty() >>> len([ e.put() for e in Entity.all().order('city').fetch(10)]) 1 >>> len([ e.put() for e in Entity.all().order('city').fetch(10)]) 0 >>> e = Entity.all().filter('last_name =', 'Lincoln')[0] >>> print e.first_name, e.city Abraham None
Changing field typeIf we need to change the type of a property and don't try to handle it, then we'll get an exception thrown >>> class Entity(db.Model): ... """Dummy class for testing""" ... version = db.IntegerProperty(default=2) ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... yob = db.IntegerProperty() ... height = db.FloatProperty() ... >>> [(e.first_name, e.height) for e in Entity.all()] Traceback (most recent call last): File "<stdin>", line 1, in <module> BadValueError: Property height must be a float
Fortunately that can be fixed using __init__ in the same way as a rename. This time (for variety) I decided to use the version number to force the migration: >>> class Entity(db.Model): ... """Dummy class for testing""" ... version = db.IntegerProperty(default=3) ... first_name = db.StringProperty() ... last_name = db.StringProperty() ... yob = db.IntegerProperty() ... height = db.FloatProperty() ... def __init__(self, parent=None, key=None, _app=None, ... height=None, version=None, **kwds): ... if height is not None: height = float(height) ... super(Entity,self).__init__(parent, key, _app, ... height = height, version=3, ... **kwds) >>> len([ e.put() for e in Entity.all().filter('version <', 3).fetch(10)]) 1 >>> len([ e.put() for e in Entity.all().filter('version <', 3).fetch(10)]) 0 >>> [(e.first_name, e.height) for e in Entity.all()] [(u'Abraham', 76.0)]
You can also use this to fixup bad content: if something gets into the datastore which won't validate then you can't read it, but if you can recognise the bad data you can check for it in __init__ and correct it. |
1 comments:
This was really helpful, thank you!
Post a Comment