I am curious to understand more about the design considerations behind Etherpad- It’s works awesome- and obviously great code by top-notch coders.
When I tried to do so, I couldnt find much information how to proceed understanding their design/codebase.
Hence, I wanted to share my first experience trying to understand Etherpad’s implementation,and hopefully save some time for someone just venturing.
This assumes that you have Etherpad up and running locally -otherwise, please do look at information from Etherpad google groups /wiki .
Using phpmyadmin ,mysql-query-browser, or mysql-admin you can see that Etherpad has 43 tables.
I then looked a bit on tools to visualize Database Schema-hopefully to make the job of understanding easier!
From this discussion on StackOverflow , I decided to try out DbScehma.
This tool has a free and paid version, with different features- I installed the free version on my Ubuntu 9.10 machine
First thing you see is that none of the tables of Etherpad are linked to each other -no foreign keys- all 43 of them!
Etherpad avoids joins like plague!!
There does not seem to be much documentation of their table schema – what each table is meant for- and why it’s used- and I think effort has to go towards this.
When you add content and check the database, I noticed that these two tables got updated with content:
PAD_REV_META_TEXT (has a field called ‘DATA’)
PAD_REVS_TEXT (also has a field ‘DATA’)
b)On the UI, you can see that each verision of saved text has an ID- which is shown in the timeslider. We have to figure out what happens text is saved-how different versions are maintained- and where.
A bird’s eye view of the schema would be something like this:
(a) a series of tables prefixed by PAD_ (18 tables)
(b) a series of tables prefixed by billing_ (5 tables)
(c)a series of tables prefixed by checkout_ (3 tables)
(d) a series of tables whose names does not follow a pattern (6 tables) : cpds_testtable,db_migrations,eepnet_signups,just_a_test, persistent_vars, pne_tracking_data
(e) Tables prefixed pad_ (2) : pad_guests, pad_cookie_userids
(f) a series of tables prefixed pro_ (7 tables) (Etherpad had a “professional” edition-seems to be related to this)
(g) Tables for statistics (2) : statistics, usage_stats
The tables where all the ‘interesting’ action should be taking place would be those in PAD_
In the Etherpad-source:
This file seems to be a good starting point:
etherpad/trunk/etherpad/src/etherpad/pad/model.js
and in particular, the variable pad .
It’s functions include getID(), create(),destroy(),writeToDB() ,geRevisionText(),getAuthorData(),setAuthorData(),getCoarseChangeset()
The coarseChangeset sizes 10,100,1000 seems to be associated with the tables PAD_REVS10, PAD_REVS10_META and so on.
the pads meta-data attributes are specified in variable pad- which is used for insertion in PAD_REV#_META insertion.
The function writeToDB(): Writes to the different pad tables are done there.
This was my first step in exploring Etherpad- Any insights into design/implementation of Etherpad, or how-to-do-the-same-saving-time would be greatly appreciated!
I took the liberty to copy your excellent intro to the etherpad doc etherpad:
http://doc.etherpad.org/CunUJwhhOH
Thanks a lot Egil!
Nice article. I’m also studying etherpad’s code although I started on the front end engine. I know that etherpad has the ability to use multiple Ajax methodologies for communicating with the server. Which one it uses is dependent on the browser. For instance, it can use Comet streaming for some versions of some browsers, sometimes I see it to be using Comet long polling. When using older versions of browsers, it defaults into short polling.
When trying to understand etherpad’s front end, I use an older version of firefox with firebug to force it not to use streaming(you can’t monitor requests and responses when it’s streaming using firebug >.<).
One of the key things to understand is etherpad's Operational Transformation. This is important since all deltas each user makes are transformed into OpCode strings that are sent to the server and passed on to other users on the same document. The encoding of the opcode is not easy to understand. When viewed in firebug, it looks like some sort of weird mathematical formula.
The PRO tables by the are for when you activate the PRO mode. Our installation in the company I work in uses that mode so we can create Team sites and password protect documents.
Hi Hyangenlo!
Great that you liked the article !
Thanks for sharing information about Operational transformation… Was not hacking etherpad for a while.. I shall begin the process again!
Regards,
Arvind