I am curious to understand more about the design considerations behind Etherpad- It’s works awesome- and obviously great code by top-notch coders.
When I tried to do so, I couldnt find much information how to proceed understanding their design/codebase.
Hence, I wanted to share my first experience trying to understand Etherpad’s implementation,and hopefully save some time for someone just venturing.
This assumes that you have Etherpad up and running locally -otherwise, please do look at information from Etherpad google groups /wiki .
Using phpmyadmin ,mysql-query-browser, or mysql-admin you can see that Etherpad has 43 tables.
I then looked a bit on tools to visualize Database Schema-hopefully to make the job of understanding easier!
From this discussion on StackOverflow , I decided to try out DbScehma.
This tool has a free and paid version, with different features- I installed the free version on my Ubuntu 9.10 machine
First thing you see is that none of the tables of Etherpad are linked to each other -no foreign keys- all 43 of them!
Etherpad avoids joins like plague!!
There does not seem to be much documentation of their table schema – what each table is meant for- and why it’s used- and I think effort has to go towards this.
When you add content and check the database, I noticed that these two tables got updated with content:
PAD_REV_META_TEXT (has a field called ‘DATA’)
PAD_REVS_TEXT (also has a field ‘DATA’)
b)On the UI, you can see that each verision of saved text has an ID- which is shown in the timeslider. We have to figure out what happens text is saved-how different versions are maintained- and where.
A bird’s eye view of the schema would be something like this:
(a) a series of tables prefixed by PAD_ (18 tables)
(b) a series of tables prefixed by billing_ (5 tables)
(c)a series of tables prefixed by checkout_ (3 tables)
(d) a series of tables whose names does not follow a pattern (6 tables) : cpds_testtable,db_migrations,eepnet_signups,just_a_test, persistent_vars, pne_tracking_data
(e) Tables prefixed pad_ (2) : pad_guests, pad_cookie_userids
(f) a series of tables prefixed pro_ (7 tables) (Etherpad had a “professional” edition-seems to be related to this)
(g) Tables for statistics (2) : statistics, usage_stats
The tables where all the ‘interesting’ action should be taking place would be those in PAD_
In the Etherpad-source:
This file seems to be a good starting point:
and in particular, the variable pad .
It’s functions include getID(), create(),destroy(),writeToDB() ,geRevisionText(),getAuthorData(),setAuthorData(),getCoarseChangeset()
The coarseChangeset sizes 10,100,1000 seems to be associated with the tables PAD_REVS10, PAD_REVS10_META and so on.
the pads meta-data attributes are specified in variable pad- which is used for insertion in PAD_REV#_META insertion.
The function writeToDB(): Writes to the different pad tables are done there.
This was my first step in exploring Etherpad- Any insights into design/implementation of Etherpad, or how-to-do-the-same-saving-time would be greatly appreciated! :)