<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Gábor's blog</title>
  <link rel="alternate" type="text/html" href="http://www.nekomancer.net/blog/2008/04/13/google-appengine-sql-excel"/>
  <link rel="self" type="application/atom+xml" href="http://www.nekomancer.net/node/161/atom/feed"/>
  <id>http://www.nekomancer.net/node/161/atom/feed</id>
  <updated>2008-05-14T13:54:18-05:00</updated>
  <entry>
    <title>google app engine: from sql to excel</title>
    <link rel="alternate" type="text/html" href="http://www.nekomancer.net/blog/2008/04/13/google-appengine-sql-excel" />
    <id>http://www.nekomancer.net/blog/2008/04/13/google-appengine-sql-excel</id>
    <published>2008-04-13T15:52:12-05:00</published>
    <updated>2008-05-14T13:54:18-05:00</updated>
    <author>
      <name>gabor</name>
    </author>
    <category term="python" />
    <category term="programming" />
    <summary type="html"><![CDATA[<p>As many other people, i also got my google app account (even crateated a <a href="http://viewrequest.appspot.com">stupid test application</a>. it&#8217;s fun to try out such a radically different hosting-environment.</p>

<p>but there is an issue with it seems many do not realize:</p>

<p>the &#8220;database&#8221; backend of google-app-engine (i will call it <a href="http://en.wikipedia.org/wiki/Bigtable">BigTable</a> in the following text) is not a relational (read &#8220;SQL&#8221;) store, and it will never be. for example, it does not support SQL JOINs. but it&#8217;s worse than that. because of it&#8217;s architecture, JOINS will never be fast there. BigTable is essentially a collection of spreadsheet-tables, where you can do some basic searches, that&#8217;s all. oh, and transactions.</p>

<p>for this reason, there probably never will be a BigTable django-ORM wrapper. of course technically it&#8217;s possible to implement in python all the missing features, but it&#8217;s performance characteristics will not be the same as of a relational-database.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>As many other people, i also got my google app account (even crateated a <a href="http://viewrequest.appspot.com">stupid test application</a>. it&#8217;s fun to try out such a radically different hosting-environment.</p>

<p>but there is an issue with it seems many do not realize:</p>

<p>the &#8220;database&#8221; backend of google-app-engine (i will call it <a href="http://en.wikipedia.org/wiki/Bigtable">BigTable</a> in the following text) is not a relational (read &#8220;SQL&#8221;) store, and it will never be. for example, it does not support SQL JOINs. but it&#8217;s worse than that. because of it&#8217;s architecture, JOINS will never be fast there. BigTable is essentially a collection of spreadsheet-tables, where you can do some basic searches, that&#8217;s all. oh, and transactions.</p>

<p>for this reason, there probably never will be a BigTable django-ORM wrapper. of course technically it&#8217;s possible to implement in python all the missing features, but it&#8217;s performance characteristics will not be the same as of a relational-database. so you will not be able to simply take your mysql/postgresql-optimized application, and deploy on it, and all is fine. you will have to restructure your application.</p>

<p>and if you have to restructure your app anyway, why do you need the django-ORM? you can as well write google-app-engine-specific code.</p>

<p>(on the other hand, maybe there could be a more stupid django-orm, that does not assume a relational-db-backend, and it could work with the various non-relational databases like <a href="http://en.wikipedia.org/wiki/Bigtable">BigTable</a> or <a href="http://hadoop.apache.org/hbase/">hBase</a> or other <a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS">column-oriented databases</a>&#8230;)</p>

<p>the basic idea when writing BigTable code is that read-operations will happen much more often than write-operations. so do more at write-time, and less at read-time. denormalize tables.</p>

<p>for example, take a simple forum-application. it stores discussions. a discussion has comments.
now let&#8217;s see how we could implement 2 basic features: &#8220;add comment&#8221; and &#8220;list discussion-names with comment-count&#8221;.</p>

<p>SQL:</p>

<ul>
<li>&#8220;add comment&#8221;: store a new comment-entry, which contains a link (a foreign-key) to it&#8217;s discussion</li>
<li>&#8220;list discussion-names with comment-count&#8221;: do an SQL query like: <span class="geshifilter"><code class="geshifilter-text">SELECT discussion.name,count(1) from discussion LEFT OUTER JOIN comment GROUP BY comment.discussion_id;</code></span> (let&#8217;s not discuss right now if it&#8217;s inner or outer join etc. it&#8217;s quite late at night here, so maybe it&#8217;s not 100% correct. but it should be enough to demonstrate the situation)</li>
</ul>

<p>BigTable (one possible solution):</p>

<ul>
<li>&#8220;add comment&#8221;: store a new comment entry, which contains a link to it&#8217;s discussion. also, count the number of comments for this discussion, and store this value in the discussion-table</li>
<li>&#8220;list discussion-names with comment-count&#8221;: <span class="geshifilter"><code class="geshifilter-text">select * from discussion</code></span></li>
</ul>

<p>of course the whole denormalize-your-database-if-you-want-performance mantra is nothing new. if i remember correctly, Flickr also does this. but still, for most developers, (also for me), it&#8217;s just painful to give up our nice, clean, normalized db-tables.</p>

<p>p.s: please note, that all the info is not based on my own performance-benchmarks. it&#8217;s more a summary of what i&#8217;ve read in the the google-appengine documentation and the <a href="http://groups.google.com/group/google-appengine">google-appengine mailing list</a>.</p>
    ]]></content>
  </entry>
</feed>
