Wednesday 28 May 2008

Migrating models

There has been some discussion on the app engine list recently about the best way to handle data migration when your model changes. I thought I'd put together a short summary of some of the options which are open to add, rename, remove or change the type of an existing field.

For the examples here, I'm going to start with a simple model example and gradually change the class. At each stage we can see how the data store handles the change.


>>> from google.appengine.ext import db
>>> class Entity(db.Model):
... """Dummy class for testing"""
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... birth_year = db.IntegerProperty()
... height = db.IntegerProperty()


We also want some test data. I think one name should be enough here.
We'll add it to the data store and then read it back to check it's
there:



>>> e = Entity(first_name='Abraham', last_name='Lincoln', city='Washington', height=76, birth_year=1865).put()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.city
Abraham Washington



Adding a field


Adding a new field is easy enough, we get it set to the default value
(or None if there is no default):



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... birth_year = db.IntegerProperty()
... height = db.IntegerProperty()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, getattr(e, 'version', 'not set')
Abraham 2


Watch out though, until you commit an entity the new field won't be
accessible in queries:



>>> print [e.first_name for e in Entity.all().filter('version =', 2)]
[]
>>> for e in Entity.all(): key = e.put()
>>> print [e.first_name for e in Entity.all().filter('version =', 2)]
[u'Abraham']




Renaming a field


How about renaming a field? There is of course no relationship between
the original and the new field value:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.IntegerProperty()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.yob
Abraham None


To rename a property (in this example birth_date becomes yob), create
a property with the new name and make the original into a
DummyProperty. Then a bit of fixup code in the class __init__ can
handle the migration:



>>> class DummyProperty(db.StringProperty):
... """A property whose only value is None and nothing
... gets saved to the data store"""
... def validate(self, value):
... return None
... def get_value_for_datastore(self, model_instance):
... return []


The actual renaming is done by picking up the old and new names as
keyword arguments to the initialiser and the doing any required
conversion:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = db.StringProperty()
... birth_year = DummyProperty() # Old field
... yob = db.IntegerProperty() # New field
... height = db.IntegerProperty()
... def __init__(self, parent=None, key=None, _app=None,
... birth_year=None, yob=None, **kwds):
... super(Entity,self).__init__(parent, key, _app,
... yob = birth_year if birth_year is not None else yob,
... **kwds)
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.yob, e.birth_year
Abraham 1865 None


To update the datastore just grab a bunch of objects which have the
old field (sorting the query works as an existence test) and put each
entity. Once we've done that the field disappears from the index in future
queries, but it does become accessible using the new attribute name:



>>> len([ e.put() for e in Entity.all().order('birth_year').fetch(10)])
1
>>> len([ e.put() for e in Entity.all().order('birth_year').fetch(10)])
0
>>> [ e.first_name for e in Entity.all().order('yob').fetch(10)]
[u'Abraham']


Once the migration has been completed we can remove the DummyProperty
from the model.


If the old field isn't a property which supports sorting then you'll
need to find another way to do the migration: using the version field
would be good here, but only as long as you know the version number is
actually set in all the data.




Removing a field


Removing a field is not a problem, the extra data in the database is
simply ignored:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.IntegerProperty()
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, getattr(e, 'city', 'not set')
Abraham not set


Alternatively we can make it into a dummy property and use the same
trick as before to make sure the content is actually removed:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... city = DummyProperty()
... yob = db.IntegerProperty()
... height = db.IntegerProperty()
>>> len([ e.put() for e in Entity.all().order('city').fetch(10)])
1
>>> len([ e.put() for e in Entity.all().order('city').fetch(10)])
0
>>> e = Entity.all().filter('last_name =', 'Lincoln')[0]
>>> print e.first_name, e.city
Abraham None




Changing field type


If we need to change the type of a property and don't try to handle
it, then we'll get an exception thrown



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=2)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.FloatProperty()
...
>>> [(e.first_name, e.height) for e in Entity.all()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BadValueError: Property height must be a float


Fortunately that can be fixed using __init__ in the same way as a
rename. This time (for variety) I decided to use the version number
to force the migration:



>>> class Entity(db.Model):
... """Dummy class for testing"""
... version = db.IntegerProperty(default=3)
... first_name = db.StringProperty()
... last_name = db.StringProperty()
... yob = db.IntegerProperty()
... height = db.FloatProperty()
... def __init__(self, parent=None, key=None, _app=None,
... height=None, version=None, **kwds):
... if height is not None: height = float(height)
... super(Entity,self).__init__(parent, key, _app,
... height = height, version=3,
... **kwds)
>>> len([ e.put() for e in Entity.all().filter('version <', 3).fetch(10)])
1
>>> len([ e.put() for e in Entity.all().filter('version <', 3).fetch(10)])
0
>>> [(e.first_name, e.height) for e in Entity.all()]
[(u'Abraham', 76.0)]


You can also use this to fixup bad content: if something gets into the
datastore which won't validate then you can't read it, but if you can
recognise the bad data you can check for it in __init__ and
correct it.


Friday 16 May 2008

Blogging with docutils

I've always liked using reStructuredText to write documents, it seems to fit with my way of thinking, and it is really quick to produce quite a pretty but clean looking document.

So far for blogging I've been using Windows Live Writer to compose my posts, and to get the formatting and layout I want I usually have to drop down to HTML view and hack it around a bit, but for this blog I'm using reST along with a slightly hacked front end so that it supports syntax highlighting (no way am I going to do multi-coloured Python code manually).

I grabbed the latest docutils from subversion. There seem to be several alternative versions of syntax highlighting in the docutils sandbox (and elsewhere on the web) but not yet a standard one. Eventually I settled on rst2html-highlight.py in the sandbox. Sadly it didn't do exactly what I wanted, so I had to modify it a bit. With the original I can write something like:

Now try printing some queries:

.. code-block:: pycon

>>> q = Person.all()
>>> print showQuery(q)
Person.all()
>>> print showQuery(q.filter("last_name =", "Smith"))
Person.all().filter('last_name =', 'Smith')

If we have more than one filter, showQuery will always output them in


Which gives me:




Now try printing some queries:



>>> q = Person.all()
>>> print showQuery(q)
Person.all()
>>> print showQuery(q.filter("last_name =", "Smith"))
Person.all().filter('last_name =', 'Smith')


If we have more than one filter, showQuery will always output them in




but I added in the ability to format an included code block:



In the code samples which follow the test class is:

.. code-block:: python
:include: ../Person.py
:start-after: from google.appengine.api import users

First some setup code for the tests:


which results in:




In the code samples which follow the test class is:



class Person(db.Model):
"""Dummy class for testing"""
first_name = db.StringProperty()
last_name = db.StringProperty()
city = db.StringProperty()
birth_year = db.IntegerProperty()
height = db.IntegerProperty()


First some setup code for the tests:




Ok, I like that: I just wrote 4 slightly different includes to get those examples and absolutely no copy/paste.



The highlighting is done using pygments which has a long list of supported languages. It also claims to come with a directive for docutils, but so far as I can see that isn't included in the egg so for now I'll stick with the one in docutils sandbox (of course if I installed docutils from an egg then I wouldn't have that one either).



Once I've formatted the document then I cut and paste into Windows Live Writer and upload it to the blog. I updated my blog skin to include the appropriate css in the skin so any changes to formatting will then apply consistently throughout the site. Eventually I might try to shortcut the cut/paste/upload step but for now it gives me an easy way to preview the post in the final skin.



You can find the code for all of this at http://code.google.com/p/kupuguy/source/browse/trunk/appengine-doctests

Wednesday 14 May 2008

Decoding a Query back to a string

On the App Engine Google group, Thomas Kuczek asked:

I have a query object representing a query. How can I print the resulting Gql to log it with the logger framework?

I thought this sounded like an interesting and possibly useful thing to do, so I wrote a small module which can convert either a db.Query or a db.GqlQuery object to a meaningful string.

Warning

This code depends on details of the query implementation (a lot of the required fields are private), so I can be fairly safe in saying that not only may it break in the next release of the App Engine environment, but it almost certainly will break.

In the code samples which follow the test class is:

class Person(db.Model):
"""Dummy class for testing"""
first_name = db.StringProperty()
last_name = db.StringProperty()
city = db.StringProperty()
birth_year = db.IntegerProperty()
height = db.IntegerProperty()


First some setup code for the tests:

>>> from google.appengine.ext import db
>>> from Person import Person
>>> from showquery import showQuery, showGqlQuery


Now try printing some queries:

>>> q = Person.all()
>>> print showQuery(q)
Person.all()
>>> print showQuery(q.filter("last_name =", "Smith"))
Person.all().filter('last_name =', 'Smith')


If we have more than one filter, showQuery will always output them in sorted order rather than the order in which they were input. This is simply to make doctests easier:

>>> print showQuery(q.filter('height <', 72))
Person.all().filter('height <', 72).filter('last_name =', 'Smith')
>>> print showQuery(q.order("-height"))
Person.all().filter('height <', 72).filter('last_name =', 'Smith').order('-height')


We can even handle an ancestor on the query although the key may be a bit of a mouthful:

>>> p = Person(first_name='Duncan', last_name='Booth', city='Oxford', height=183)
>>> key = p.put()
>>> print showQuery(q.ancestor(p))
Person.all().ancestor(datastore_types.Key.from_path('Person', 1, _app=u'test_app')).filter('height <', 72).filter('last_name =', 'Smith').order('-height')


There is also a showGqlQuery function to convert GQL back to the equivalent query:

>>> q = db.GqlQuery("SELECT * FROM Person WHERE last_name = :1 AND height < :2", "Smith", 72)
>>> print showGqlQuery(q)
SELECT * FROM Person WHERE last_name = :1 AND height < :2


Notice that once again the output may have the clauses in a different order than they were originally input.

>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE last_name = :name AND height < :height"))
SELECT * FROM Person WHERE height < :height AND last_name = :name


We can also handle literal values in queries:

>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE last_name = 'Smith'"))
SELECT * FROM Person WHERE last_name = 'Smith'


Sorting is also handled. The ORDER BY clause does preserve the original order:

>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE height < :1 ORDER BY last_name ASC"))
SELECT * FROM Person WHERE height < :1 ORDER BY last_name ASC
>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE height<:1 ORDER BY last_name DESC, height ASC"))
SELECT * FROM Person WHERE height < :1 ORDER BY last_name DESC, height ASC


Ancestor, limit, and offset classes also all work. If you specify limit and offset separately then they are output together:

>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE ANCESTOR IS :1 AND height < 72"))
SELECT * FROM Person WHERE ANCESTOR IS :1 AND height < 72
>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE ANCESTOR IS :1 LIMIT 10,5"))
SELECT * FROM Person WHERE ANCESTOR IS :1 LIMIT 10,5
>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE ANCESTOR IS :1 OFFSET 3"))
SELECT * FROM Person WHERE ANCESTOR IS :1 OFFSET 3
>>> print showGqlQuery(db.GqlQuery("SELECT * FROM Person WHERE ANCESTOR IS 'xxx' LIMIT 3 OFFSET 5"))
SELECT * FROM Person WHERE ANCESTOR IS 'xxx' LIMIT 5,3


The source code (showquery.py) looks like this:

from google.appengine.ext import db
from google.appengine.api import datastore

def showQuery(query):
"""Represent a query as a string"""
kind = query._model_class.kind()
ancestor = query._Query__ancestor
filters = query._Query__query_set
orderings = query._Query__orderings
hint = None
limit = None
offset = None

res = ["%s.all()" % kind]
if ancestor is not None:
res.append("ancestor(%r)" % ancestor)
for k in sorted(filters):
res.append("filter(%r, %r)" % (k, filters[k]))
for p, o in orderings:
if o==datastore.Query.DESCENDING:
p = '-'+p
res.append("order(%r)" % p)

return '.'.join(res)

def showGqlQuery(query):
"""Represent a GQL query as a string"""
proto = query._proto_query
kind = query._model_class.kind()
filters = proto.filters()
boundfilters = proto._GQL__bound_filters
orderings = proto.orderings()
hint = proto.hint()
limit = proto.limit()
offset = proto._GQL__offset

select = "SELECT * FROM %s" % kind
where = []
order = []

for k in sorted(filters):
for clause in filters[k]:
name, op = clause
if name==-1: name = 'ANCESTOR'
where.append("%s %s :%s" % (name, op.upper(), k))

for k in sorted(boundfilters):
where.append("%s %r" % (k, boundfilters[k]))

for p, o in orderings:
order.append("%s %s" % (p, 'DESC' if o==datastore.Query.DESCENDING else 'ASC'))

gql = select
if where:
gql += ' WHERE '+' AND '.join(where)
if order:
gql += ' ORDER BY ' + ', '.join(order)
if limit != -1:
if offset != -1:
gql += ' LIMIT %s,%s' % (offset,limit)
else:
gql += ' LIMIT %s' % limit
elif offset != -1:
gql += ' OFFSET %s' % offset
return gql

Monday 5 May 2008

Unit testing with Google App Engine

The App Engine isn't too friendly for testing: you can't use interactive mode, and there's some setup needed to get a test harness.

I've found that using doctest works quite nicely with the appengine, provided you get that initial setup out of the way. If your application interacts with the datastore, then you probably want to set up some objects before querying them and to my mind that sits better with the doctest model of a long chatty document describing a pseudo-interactive session than the xUnit style of separate tests each starting from a clean system doing something and making a single assertion about the resulting state. For a particular project you can update the test framework to perform common setup for the tests.

The structure I've come up with sets up the environment to run tests under the appEngine framework. It looks for tests in two places: *.py files in the project directory, and tests/*.tests. For the Python files doctest will run tests from all docstrings found in the file (and you must have at least one docstring in every Python source file otherwise you'll get an error, but you don't have to actually have any tests in it). Each docstring forms a separate test. For the other files they are just plain text (or you can use reST) and the entire file is run as one test (so the state is preserved until the end of the file).

Here's a sample doctest file:

DateTime testing

The appengine DateTimeProperty property type just looks like an
ordinary datetime object when you use it.

>>> from datetime import datetime, timedelta
>>> from google.appengine.ext import db
>>> class Test(db.Model):
... date = db.DateTimeProperty(auto_now_add=True)
... date2 = db.DateTimeProperty()

>>> obj = Test(date2=datetime(2008,1,1))

The date attribute is set automatically to the current time, but we
can use ellipsis to skip over the part that varies and check that it
is very nearly the current time:

>>> obj.date #doctest: +ELLIPSIS
datetime.datetime(...)
>>> obj.date - datetime.now() < timedelta(1)
True

For the other attribute we can test the extact value:

>>> obj.date2 == datetime(2008,1,1)
True
>>> obj.date2.strftime("%a, %d %b %Y %H:%M:%S +0000")
'Tue, 01 Jan 2008 00:00:00 +0000'

 


Download an example appengine project with this test and execute tests/runtests.py to run it.

Introduction

I created this blog some time ago to write occasional technical wurblings or brain-dumps. However I also started a food blog at the same time and this one just became a test ground for playing with blog skins and seeing how things worked.

No I've finally decided to start using this blog as well. Don't expect regular or frequent posts, or any kind of consistency of topics.