google app engine: from sql to excel

As many other people, i also got my google app account (even crateated a stupid test application. it’s fun to try out such a radically different hosting-environment.

but there is an issue with it seems many do not realize:

the “database” backend of google-app-engine (i will call it BigTable in the following text) is not a relational (read “SQL”) store, and it will never be. for example, it does not support SQL JOINs. but it’s worse than that. because of it’s architecture, JOINS will never be fast there. BigTable is essentially a collection of spreadsheet-tables, where you can do some basic searches, that’s all. oh, and transactions.

for this reason, there probably never will be a BigTable django-ORM wrapper. of course technically it’s possible to implement in python all the missing features, but it’s performance characteristics will not be the same as of a relational-database. so you will not be able to simply take your mysql/postgresql-optimized application, and deploy on it, and all is fine. you will have to restructure your application.

and if you have to restructure your app anyway, why do you need the django-ORM? you can as well write google-app-engine-specific code.

(on the other hand, maybe there could be a more stupid django-orm, that does not assume a relational-db-backend, and it could work with the various non-relational databases like BigTable or hBase or other column-oriented databases…)

the basic idea when writing BigTable code is that read-operations will happen much more often than write-operations. so do more at write-time, and less at read-time. denormalize tables.

for example, take a simple forum-application. it stores discussions. a discussion has comments. now let’s see how we could implement 2 basic features: “add comment” and “list discussion-names with comment-count”.

SQL:

  • “add comment”: store a new comment-entry, which contains a link (a foreign-key) to it’s discussion
  • “list discussion-names with comment-count”: do an SQL query like: SELECT discussion.name,count(1) from discussion LEFT OUTER JOIN comment GROUP BY comment.discussion_id; (let’s not discuss right now if it’s inner or outer join etc. it’s quite late at night here, so maybe it’s not 100% correct. but it should be enough to demonstrate the situation)

BigTable (one possible solution):

  • “add comment”: store a new comment entry, which contains a link to it’s discussion. also, count the number of comments for this discussion, and store this value in the discussion-table
  • “list discussion-names with comment-count”: select * from discussion

of course the whole denormalize-your-database-if-you-want-performance mantra is nothing new. if i remember correctly, Flickr also does this. but still, for most developers, (also for me), it’s just painful to give up our nice, clean, normalized db-tables.

p.s: please note, that all the info is not based on my own performance-benchmarks. it’s more a summary of what i’ve read in the the google-appengine documentation and the google-appengine mailing list.

Trackback URL for this post:

http://www.nekomancer.net/trackback/161