First Steps -Understanding Etherpad

I am curious to understand more about the design considerations behind Etherpad- It’s works awesome- and obviously great code by top-notch coders.

When I tried to do so, I couldnt find  much information how to proceed understanding their  design/codebase.

Hence, I wanted to share my first experience trying to understand Etherpad’s implementation,and hopefully save  some time for someone just venturing.

This assumes that you have Etherpad up and running locally -otherwise, please do look  at  information from Etherpad google groups /wiki .

Using phpmyadmin ,mysql-query-browser, or  mysql-admin you can see that  Etherpad  has  43 tables.

I then looked a bit on tools to visualize  Database Schema-hopefully to make the job of understanding easier!

From this discussion on StackOverflow , I decided  to try out  DbScehma.

This tool has a free and paid version, with different features–  I installed the free version on my  Ubuntu 9.10  machine

First thing you  see  is that none of the tables of Etherpad are linked to each other -no foreign keys-  all 43  of them!

Etherpad avoids  joins  like plague!!

There does not seem to be much documentation of their table schema – what each table is meant for- and why it’s used- and I think effort has to go towards this.

When you add content and check the database, I noticed that these two tables got updated with content:

PAD_REV_META_TEXT  (has a field called ‘DATA’)

PAD_REVS_TEXT  (also has a field ‘DATA’)

b)On the UI, you can see that each verision of saved text has an ID- which  is shown in the timeslider. We have to figure out what happens text is saved-how different versions are maintained- and where.

A bird’s  eye view of the schema would be something like this:

(a) a series of tables prefixed by  PAD_    (18 tables)

(b) a series of tables prefixed by billing_   (5 tables)

(c)a series of tables prefixed by checkout_ (3 tables)

(d)  a series of tables  whose names does not follow  a pattern  (6 tables)   :  cpds_testtable,db_migrations,eepnet_signups,just_a_test, persistent_vars, pne_tracking_data

(e)  Tables prefixed pad_  (2) :  pad_guests, pad_cookie_userids

(f) a series of tables prefixed pro_  (7 tables)  (Etherpad had a “professional”  edition-seems to be related to this)

(g)  Tables for statistics  (2)    :          statistics, usage_stats

The tables where all the ‘interesting’ action should be taking place would be those in PAD_

In the Etherpad-source:

This file seems to be a good starting point:


and in particular, the variable pad .

It’s functions include getID(), create(),destroy(),writeToDB() ,geRevisionText(),getAuthorData(),setAuthorData(),getCoarseChangeset()

The coarseChangeset  sizes 10,100,1000  seems to be associated with the tables  PAD_REVS10, PAD_REVS10_META  and so on.

the pads meta-data attributes  are specified in variable pad- which is used for insertion in PAD_REV#_META  insertion.

The function writeToDB(): Writes to the  different pad tables are done there.

This was my first step in exploring Etherpad-  Any insights into  design/implementation of  Etherpad, or how-to-do-the-same-saving-time  would be greatly appreciated!  🙂


4 responses to “First Steps -Understanding Etherpad

  1. Nice article. I’m also studying etherpad’s code although I started on the front end engine. I know that etherpad has the ability to use multiple Ajax methodologies for communicating with the server. Which one it uses is dependent on the browser. For instance, it can use Comet streaming for some versions of some browsers, sometimes I see it to be using Comet long polling. When using older versions of browsers, it defaults into short polling.

    When trying to understand etherpad’s front end, I use an older version of firefox with firebug to force it not to use streaming(you can’t monitor requests and responses when it’s streaming using firebug >.<).

    One of the key things to understand is etherpad's Operational Transformation. This is important since all deltas each user makes are transformed into OpCode strings that are sent to the server and passed on to other users on the same document. The encoding of the opcode is not easy to understand. When viewed in firebug, it looks like some sort of weird mathematical formula.

    The PRO tables by the are for when you activate the PRO mode. Our installation in the company I work in uses that mode so we can create Team sites and password protect documents.

    • Hi Hyangenlo!

      Great that you liked the article ! 🙂
      Thanks for sharing information about Operational transformation… Was not hacking etherpad for a while.. I shall begin the process again! 🙂


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s