Books I read in 2013

After many,many years- perhaps almost half a decade- I accidentally rebooted my interest in reading  in 2013- more precisely,almost towards the last few months of 2013. Hopefully I get to read more in the coming years!

These are the books I read:

(a) A Tale for the time being  – This was shortlisted for the Man Booker in 2013, and seemed to be the popular choice tipped to be the winner, but ended losing.  It is set in the backdrop of tsunami which hit Japan few years back.There is a certain elegance and simplicity in the narration that keeps you curious about what is to come.The ending was a slight dampener for me – I cant say I am  a big fan of quantum physics interpreted by a practicing Buddhist priestess . However, the book is a really  good take on the ephemeral moments of life. I suspect this has a lot to do with the author being a Buddhist priestess- she is in her zone while articulating about things she probably understands best. A worthwhile read.

(b) One hundred years of Solitude  Overall,I liked this book. I say this almost grudgingly. Not for the faint of heart. Highly recommended for bald /balding people, who will not be able to pull their hair while cursing the author. The author, in my opinion, did not take his book seriously- and did not care a bit for the mental health of his readers.

I like that honesty.No pretensions about catering to anyone’s  semi-intellectual-half-grown brain,and giving an illusion of increase in net intelligence. I suspect the author woke up each morning and started his day by asking himself “How can I make my book weirder,today?”, and succeeded mostly.I liked the book because it was weird. The genre is officially called “Magical Realism” .To give an example,if  I were to write a book about the emotional catharsis of  homosexual satanic demons in Papua New Guinea wearing pink suits, hopping upside down, on one hand, while holding a mobile phone to their ear – that would be magical  realism.

The story is about events which occur across almost endless generations of a kooky,morally-challenged, intellectually-challenged(?) family in lala-land. Across generations people share the same names- Arcadio/ Aureliano. There is a ghost too, for good measure, and random weird things keep happening all the time. And there is non-linear narrative. So, you are never sure which  character is alive or dead.After a while you stop bothering about mundane things like life and death.Lots of fun.

(c)  Kafka on the Shore  by  Haruki Murukami : This is an  English translation of  a Japanese work.I was blown away by this book. The genre it falls under is “Surrealism”  .The story woven has layers upon layers upon layers. It is profound,hard-hitting,philosophical,and very weird, a.k.a surreal. Since this is an English translation of a Japanese work, there are certain minor language idiosyncrasies of the translation- which can mostly be ignored.

(d) India after Gandhi  by Ramchandra Guha  : A nice book on the contemporary history of India. ( After independence). Very,very interesting and informative. Just take a look at the number of references the author has used to write this book! That’s some major hard work.Additionally, the author is honest enough to admit this- On very recent events, (late 90′s /2000′s)- he holds very strong views on what transpired. Hence, as a historian, his view would be clouded/biased- and should be viewed as such. For major events between 1947- 1980′s – it’s an amazing read- for instance, he would juxtapose events happening in the world- while something was happening in India-and why it was significant. For instance, in early 1950′s when India held its first general election, there were coups/wars/failed government world-wide- and everyone was speculating when India would disintegrate/government overthrown.

(e) Memoirs of a Geisha  : An enjoyable,time-pass book. Plot-wise, not very substantive-I think the central theme is heavily inspired from “Cinderella and her evil step-sisters”. The strength of the book is that it sheds light about Japan during the world-war era, and about Japanese culture at that time. The author is not Japanese,so I am not a judge of the accuracy or relevance of what he has written. So, if a Japanese dude were to ask me about my pet cobra or my pet tiger, I wouldn’t hold it against him if he face-palms when I ask him about the interesting Geishas he has run into.

(f) Dalai Lama’s cat (Strictly speaking,I read this in 2014):  A feel-good,short and sweet book. Uplifting when feeling upset/emotionally down.A nice read.

Revisited this book:

(g) Flow:The Psychology of Optimal Experience  : All about Flow  . Worth reading,just getting to know the general idea.It’s sometime verbose- I liberally skipped several parts.

(h)   Tesla -Man out of time :  About one of the greatest ever minds to visit Earth in a human body.A nice read. On the flip-side-I  think there was a time when Tesla was not recognized much for his contributions. Now, I guess if someone actually invents a time-travelling- machine the “Edison-sucks-Tesla-rocks-club” mob  would come with knives and hand-grenades shouting

“But Tesla already did that-what’s new?” -ie, I think he is now deified,much like what Thomas Alva Edison probably was -in his heyday.

On Content based Publish Subscribe Systems

Sometime back, I had answered a question on stackoverflow about Publish/Subscribe Systems . While the direction of my answer was not exactly what the guy who asked the question was looking at (He was not interested in performance aspects/design trade-offs) , I think anyone who stumbles here might find the problem interesting : Here is  a gist of my answer on stackoverflow.

The general problem is called Content based Publish Subscribe , and if you search for papers in the same area, you would get a lot of results : For instance- this paper

Here are few things the system would need

1) A data-store for the subscriptions which needs to store: a)Store the list of subscribers b)Store the list of subscriptions

2) A means for authenticating the requests for subscriptions and the nodes themselves a) Server-Subscribers communicate over ssl. In the case of the server handling thousands of SSL connections – It’s a CPU intensive task, especially if lots of connections are set up in bursts.
b) If all the subscriber nodes are in the same trusted network, need not have ssl.

3) Whether we want a Push or Pull based model:

a)Server can maintain a latest timestamp seen per node, per filter matched. When an event matches a filter, send a notification to the subscriber. Let the client then send a request. The server then initiate sending matching events.

b)Server matches and sends filter to clients at one shot.

Difference between (a) and (b) is that, in (a) you have more state maintained on the client side. Easier to extend a subscriber-specific logic later on. In (b) the client is dumb. It does not have any means to say if it does not want to receive events for whatever reason. (say, network clog).

4) How are the events maintained in memory at the server-side?

 a)The logical model here is table with columns of strings (C1..CN), and each new row added is a new event.
b)We could have A hash-table per column storing a tupple of  (timestamp, pointer to event structure). And each event is given a unique id. With different data-structures,we can come up with different schemes. 
 c) Events here are considered as infinite stream. If  we have a 32-bit eventId, we have chances of integer-overflow.
 d) If we have a timer function on  the server, matching and dispatching events,what is the actual resolution of the system timer? Does that have any implication? 
 e) Memory allocation is a very expensive operation.  If your filter-matching logic is going to do frequent allocations/ freeing, it will adversely affect performance.  How can we manage the memory-pool for this particular operation? Would we different size-buckets of  page-aligned memory? 

5) What should happen if the subscriber node loses connectivity or goes down? (a)Is it acceptable for the client to lose events during the period, or should the server buffer everything? (b)If the subscriber goes down,till what historical time in the past can it request matching events.

6) More details of the messaging layer between (Server,Subscriber) (a) Is the communication between the server and subscribers synchronous or asynchronous?
(b)Do we need a binary-protocol or text-based protocol between the client/server? (There are trade-off’s in both)

7) Should we need any rate-limiting logic in server side? What should we do if we starve some of the clients while serving data to few others?

8) How would the change of subscriptions be managed? If some client wishes to change it’s subsciption then, should it be updated in-memory first before updating the permanent data-store? Or vice-versa? What would happen if the server goes down, before the data-store is written-to? How would we ensure consistency of the data-store- the subscriptions/server list?

9)This was assuming that we have a single server- What if we need a cluster of servers that the subscribers can connect to? (Whole bunch of issues here: ) a)How can network-partitioning be handled? ( example: of say 5 nodes,3 nodes are reachable from each other, and other 2 nodes can only reach other?) b) How are events/workload distributed among the members of the cluster?

10) Is absolute correctness of information sent to the subscriber a requirement,ie, can the client receive additional information,that what it’s subscription rules indicate? This can determine choice of data-structure- example using a probabilistic data structure like a Bloom filter on the server side, while doing the filtering

11)How is time-ordering of events maintained on the server side? (Time-order sorted linked list? timestamps?)

12)Will the predicate-logic parser for the subscriptions need unicode support?

In conclusion,Content-based pub-sub is a pretty vast area- and it is a distributed system which involves interaction of databases,networking,algorithms,node behavior(systems go down,disk goes bad,system runs out of memory because of a memory leak etc) – We have to look all these aspects. And most importantly, we have to look at the available time for actual implementation, and then determine how we want to go about solving this problem

 

Rooting HCL ME Y2 and installing Google playstore

One of my friends has a HCL  ME Y2 Tablet, and asked me if I could root it.  To do the same,It was a good learning experience browsing relevant forums, and knowing more about how rooting works internally.

HCL ME Y2- is a relatively recently launched Tablet in  India- has ICS 4.0.13 ( Launch Date seems to be mid August 2012 over here ), and as such  does not find much love in XDA Developers  or Android Central forums   for  any queries of rooting  /Installing playstore, when compared to other  popular  brands.

For 15,000 INR,  HCL ME Y2 is a pretty solid piece of hardware-  I would have said it worth the price- but then, after the launch  Nexus  Tablet series by Google, for the equivalent amount of  money, Nexus 7 beats  it.

If you google about  HCL Me y2, you would find it has earned the wrath of several customers for the lack of  customers for the lack of App-store lock in by HCL.

Superoneclick   does not work. The equivalent methods of Rooting HCL me U1  Tablet  doing rounds, including some youtube videos didn twork too.

For Rooting the device the tool at Unlockroot.com did the trick.

In a nutshell, this is what I did –  I copied  Playstore apk to /system/app/ , and launched it from the installer.

 

If you do not have GoogleServicesFramework.apk,then app store will install but  exit immediately after it is started- and Damn- I spent so much time trying to figure out why!!!  ( Can refer this thread  where someone installed app store on Kindle http://forum.xda-developers.com/showthread.php?t=1893410   )

 

Some Fun with Hash Table performance analysis

I  was trying to solve a puzzle some  days back- and it  essentially could be mapped to the following question :

Given  a hashtable of bucket size ‘M’ ,  continue inserting  elements into the hash table till all the buckets have atleast  one  value  associated with it. What is the  estimated number of insertions that should be done to achieve this? 

The answer I derived corresponds  with a quick implementation I did  using a random number generator-  While this is not the best way to  infer correctness, a large magnitude of  difference over many trial runs should give of a warning sign.

So,here goes:

Let the size of the hash table =M, and the hash function have normal distribution.

Refreshing the definition of  Estimated value for a random variable:

E(X) : X1*Probability(X1) + X2*Probability(X2)  …. + Xn*Probability(Xn) 

Let Xn  refer to the event of nth insertion in the same bucket.

E(X=2)  :  Expected  Number of insertions  for inserting 2 elements  are in two different buckets

E(X=3) :   Expected Number of insertions for inserting 3 elements in three different buckets, and so on.

The total required number of insertions= E(1) +E(2)+ E(3) ….. E(M)

So, Let the first element be inserted into the hash table.

The second time, Probability of another element going to the same bucket = 1/M

Third time, Probability of another element going to the same bucket in the second insertion=1/M * 1/M.

Continuing,

E(1) =1+ 1/M +1*(1/M^2)  + 1*(1/M^3) + 1*(1/M^4)   till infinity  etc  :

This is an infinte geometric progression,where the sum converges to    1 / ( 1-(1/M) ) =  M/( M -1)

This is for the first element.

Next,we proceed to a state where we have 2 elements in two buckets.

The expected number of insertions ,for  the keys to map on to the same 2 buckets will be :

E(2) =1+ 2/M +  (2/M)*(2/M)  + (2/M)*(2/M)*(2/M) + …. infinity.

The sum of this series= M/M-2

Now, if we continue the same reasoning, and do summation of all the infinite series,we will have

M*(1/M-1 +1/M-2 + 1/M-3 ….  1 )

The term within the summation is = (1+1/2+1/3+1/4 ….. 1/K)  which is greater than  ln(K+1)   

So,the expected number of insertions is approximately MLogM    ( Log to the base ‘e’  where M is the size of the hash table)

Jogging in Sankarmat,Bangalore

For the last month or so, I have been going for a jog in the morning at a park in Sankarmat, and also doing yoga- very modestly put,a  supreme achievement for me :-)

The park is actually decently well maintained-  has a ground, a lawn,a  walking/running track, and some good  greenery inside.

As a consequence of jogging at about the same time, in the same park, I see many dogs, and many people- same dogs, and almost the same people. The charm grows on you, when you  observe  little things which keep repeating over time. Like-

A playful german shepherd running unchained, and at some distance trying to catch up, a boy barely a feet taller.  Really aggressive,competitive volleyball played by people with receding hairlines (I am guessing around 40-50) ,  mouthing Kannada expletives  I don’t understand yet,and still managing to pet the German shepherd when it enters their court and disrupt their game.

Retired,old guys with walking sticks on benches with the day’s newspaper  quite obviously disgusted with what they are reading.   The group laughter therapy of folks younger than those with walking sticks.  (with time, the synchronous roaring laughter and clapping doesn’t seem funny and becomes a part of the ambient surroundings). The  well-intentioned middle-aged homemaker on a brisk walk in group sizes  not more than four.  The solitary guy who walks in the opposite direction in the track. Some serious Suryan facing Suryanamaskara – folded hands, closed eyes,minus any yoga or  butt-moving, plus heavy-duty prayer petitions.

Street dogs which also take a walk alongside in the track, in any direction (And thankfully don’t yet object to  people jogging!!)

Very,very occasionally- younger guys/girls   with the mandatory gear-  ear-phones/tracks/shoes/.

Cricket with city imposed constraints- One side runs, out of the ground out, current out, no run-up,who-ever-hits-the-ball-in-the-house-of-the-crazy-lady-buys-a-new-ball-and-match-is-abandoned,resourceful handling of the ball in the gutter.

Pretty aggressive kannada-expletive laced badminton, alongside volleyball- again middle-aged folks. School kids in karate classes nearby, and dutifully yelling in sync – I could go on.

In short,the place in the morning has an almost timeless charm-and to see this you have to look through the colored lens which ignores the badly maintained public toilet outside the park,the  pollution and dust on the main road,the non-existent traffic rules enforcement  in early mornings-but the charm is as real as it gets!

Logfile Inspector for Linux kernel logs-To track online resources you referred

Why such a Log File Inspector?

I have had this experience on more than one occasion-

I  would  have a sudden burst of enthusiasm to know about some relatively obscure topic in low-level systems – I Google for it – I  am then taken to some forum, or a thread in a mailing list, where the  specific piece of information might be 15′th reply, on  a particular thread. I then have an ‘Aha!’ moment,and sense of satisfaction of learning something new.

Unfortunately, a substantial part what I  would have learnt in this fashion, might  not be thoroughly understood- .  This could be attributed to many things like-

(a) Not working on a project( coding,design) on the same topic- Lack of constant exposure, familiarity.

(b) Not having a clear understanding of a closely related  pre-requisite topic.

One related problem is we dont really keep track of how we got to an article- the keywords typed on an search engine- We could  go through our search history-or organize bookmarks-but after a substantial period of time, the EXACT CONTEXT of how you reached there would be blurred.

What I am doing:

Hence I thought- Would it be worthwhile, if I maintained some meta-data (Abbreviation Expansion, A Brief Summary,Links visited)for some keywords in each message  written to the  kernel log file.

For instance, a message like:

CPU0: Thermal monitoring enabled (TM2)

I would be saving:

Title: Thermal Monitoring

Message: TM2 (as Intel claims) can control the frequency (to be more exact, FID – FSB frequency multiplier) and CPU voltage (VID), while TM1 modulates CPU clock. Due to reduced voltage, TM2 allows to retain better processor performance in case of overheating at the same reduction of power consumption level.

Links Visited (can specify multiple links):

http://ixbtlabs.com/articles2/intel-thermal-features/

The objective of maintaining this meta-data is purely aimed at improving  our level of  systems understanding-  and most importantly, possibly provide a basis for exploring information  about diverse aspects of systems, while also  and keeping track of  online resources what you have read.

I certainly  got to read about a lot of interesting topics this way- With some effort going into actually  typing/filling out this metda-data,I would be eliminating the pain of Re-googling, and Re-forum/mail-list reading.  This script could be improved with a Database Base-back-end, but if it is only for personal use, then it might not be necessary.

Here are the files:

logInspect.pl

#!/usr/bin/perl
use Message;
use Switch;
use Getopt::Long;
use Pod::Usage;
Getopt::Long::Configure("no_ignore_case");

$LOGFILE = "/var/log/kern.log";
#Message.pm encapsulates  each field  of kern.log.  I could have directly used dmesg - but I wanted to keep it generic, so I that I could write a wrapper around #other log files in /var/log  :)
#The purpose of this script is to organize links from online resources, brief information, about individual messages in linux log files(in this case kern.log)
#In this file there are two functions- one for information about cpu, and other about hashtables
# It could be useful to store the links, messages in a Database.

sub printMessage
{
# printHint is similiar to File permission numbering schemes , where 4=Title, 2=Message, 1=Links
$numLinks=scalar(@_) ;
$printHint=shift;
$title=shift;
$BriefMessage=shift;

if($printHint>=4 && $printHint <=7)
{
print( "Title:".$title."\n");
}
if( $printHint==2 || $printHint==3 || $printHint==7 || $printHint==6)
{
print ("Message:".$BriefMessage."\n");
}
if($printHint==1 || $printHint==3 || $printHint==5 || $printHint==7)
{
print("Links:"."\n");
for($i=3;$i<=$numLinks;$i++)
{
$link=shift;
print($link."\n");
}
}

}

sub cpu
{
$printHint=shift;
$param=shift;
$lineNum=shift;
$object=new Message($param);
$mess=$object->getInfo();
print("Line:".$lineNum." Info:".$object->getInfo()."\n");
if($mess=~m/Initializing cgroup subsys cpuset/i)
   {
$title="CPU Chipset";
$mess="Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to a set of tasks.Cpusets constrain the CPU and Memory placement of tasks to only the resources within a tasks current cpuset.  They form a nested hierarchy visible in a virtual file system.";
$l1="http://www.mjmwired.net/kernel/Documentation/cpusets.txt";
&printMessage($printHint,$title,$mess,$l1);
}

if($mess=~m/Initializing cgroup subsys cpu/i)
{
$title="Control Groups";
$mess="A *cgroup* associates a set of tasks with a set of parameters for one or more subsystems.A *subsystem* is a module that makes use of the task grouping facilities provided by cgroups to treat groups of tasks in particular ways. A subsystem is typically a \"resource controller\" that schedules a resource or applies per-cgroup limits, but it may be anything that wants to act on a group of processes, e.g. a virtualization subsystem";
$l1="http://www.mjmwired.net/kernel/Documentation/cgroups.txt";
printMessage($printHint,$title,$mess,$l1);

}

if($mess=~m/x86 PAT enabled/i)
{
$title="Page Attribute Table";
$mess="x86 Page Attribute Table (PAT) allows for setting the memory attribute at the page level granularity. PAT is complementary to the MTRR settings which allows for setting of memory types over physical address ranges. However, PAT is more flexible than MTRR due to its capability to set attributes at page level and also due to the fact that there are no hardware limitations on number of such attribute settings allowed";
$l1="http://www.mjmwired.net/kernel/Documentation/x86/pat.txt";
printMessage($printHint,$title,$mess,$l1);
}

if($mess=~m/sched-domain/i)
{
$title="CPU Scheduling domain";
$mess="A scheduling domain is a set of CPUs whose workloads should be kept balanced by the kernel. Generally speaking, scheduling domains are hierarchically organized: the top-most scheduling domain, which usually spans all CPUs in the system, includes children scheduling domains, each of which include a subset of the CPUs. Thanks to the hierarchy of scheduling domains, workload balancing can be done in a rather efficient way.Every scheduling domain is partitioned, in turn, in one or more groups, each of which represents a subset of the CPUs of the scheduling domain. Workload balancing is always done between groups of a scheduling domain. In other words, a process is moved from one CPU to another only if the total workload of some group in some scheduling domain is significantly lower than the workload of another group in the same scheduling domain";
$l1="http://www.mjmwired.net/kernel/Documentation/scheduler/sched-domains.txt";
$l2="http://book.opensourceproject.org.cn/kernel/kernel3rd/opensource/0596005652/understandlk-chp-7-sect-5.html";
printMessage($printHint,$title,$mess,$l1,$l2);
}

if($mess=~m/Thermal monitoring enabled/i)
{
$title="Thermal Monitoring";
$mess="Two automatic thermal monitoring mechanisms (TM1 and TM2) can force the
processor to reduce its power consumption. TM1, introduced with Intel,controls the processor core temperature by modulating the duty cycle of processor clock.
Thermal Monitor 2 (TM2) is an advanced mechanism of CPU overheating protection, implemented in Pentium M processors and, according to Intel, in new Pentium 4 models (nevertheless, you can currently see it only in the server modification – Xeon Nocona). A considerable difference of the new mechanism is that TM2 (as the manufacturer claims) can control the frequency (to be more exact, FID – FSB frequency multiplier) and CPU voltage (VID), while TM1 modulates CPU clock. Due to reduced voltage, TM2 allows to retain better processor performance in case of overheating at the same reduction of power consumption level. TM2 can be detected using the CPUID instruction, and it can be controlled by MSR";
$l1="http://ixbtlabs.com/articles2/intel-thermal-features/";
$l2="http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBIQFjAA&url=http%3A%2F%2Fedc.intel.com%2FDownload.aspx%3Fid%3D2612&ei=6RphTKqgBIecsQPxnNihCA&usg=AFQjCNEQaS5JP08eg9ljU-YL6aZWiaZBhg&sig2=nHpfVbRZFLMxiVu0z8x-Pw";
printMessage($printHint,$title,$mess,$l1,$l2);
}

if($mess=~m/MCE banks/i)
{
$title="Machine Check Exceptions";
$mess="A machine check exception is an error dedected by your system's processor. There are 2 major types of MCE errors, a notice or warning error, and a fatal execption. The warning will be logged by a \"Machine Check Event logged\" notice in your system logs, and can be later viewed via some Linux utilities. A fatal MCE will cause the machine to stop responding and the details of the MCE will be printed out to the system's console.";
$l1="http://www.advancedclustering.com/faq/im-getting-mce-machine-check-exception-errors-what-does-this-mean.html";
$l2="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005184";
printMessage($printHint,$title,$mess,$l1,$l2);
}
if( $mess=~m/ACPI: SSDT/i)
{
$title="ACPI and SSDT";
$mess="In computing, the Advanced Configuration and Power Interface (ACPI) specification provides an open standard for unified operating system-centric device configuration and power management. ACPI, first released in December 1996, defines platform-independent interfaces for hardware discovery, configuration, power management and monitoring."."\n"."SSDT
The System Service Descriptor Table (SSDT) is used by the Windows kernel when dispatching system calls. The SSDT itself is exported in kernel-mode through the nt!KeServiceDescriptorTable global variable. This variable contains information relating to system call tables that have been registered with the operating. In contrast to other operating systems, the Windows kernel supports the dynamic registration (nt!KeAddSystemServiceTable) of new system call tables at runtime. The two most common system call tables are those used for native and GDI system calls. ";
$l1="http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface";
$l2="http://uninformed.org/index.cgi?v=8&a=2&p=10";
printMessage($printHint,$title,$mess,$l1,$l2);
}

if($mess=~m/HPET/i)
{
$title="High Precision Event Timer";
$mess="The High Precision Event Timer is a hardware timer used in personal computers,jointly developed by Intel and Microsoft.An HPET chip consists of a 10 MHz 64-bit up-counter and 3 independent 64-bit comparators as in the Programmable Interval Timer and Intel 8253. The HPET chip also includes 29 additional 32-bit comparators/timers for random interrupt generation for user software. However some operating systems fail to configure these interrupts on user software interrupt routines even on systems with multicore CPUs.The HPET 32/64-bit extended parts can only be reached through memory mapped I/O that is set up by the BIOS via ACPI. There can be at most 8 HPET chips on a motherboard. PCs with FSB and DDR-RAM need 2 HPET chips. That means modern PCs have 6 PC 8253-Timer compatible counters/comparators and 58 of the 32-bit HPETs/comparators, for 64 interrupt and IRQ capable timers in all ";
$l1="http://www.mjmwired.net/kernel/Documentation/hpet.txt";
$l2="http://en.wikipedia.org/wiki/High_Precision_Event_Timer";
$l3="http://fpmurphy.blogspot.com/2009/07/linux-hpet-support.html";
printMessage($printHint,$title,$mess,$l1,$l2,$l3);
}

if($mess=~m/TSC synchronization/i)
{
$title="Time Stamp Counter Synchronisation";
$mess="Linux has a number of high resolution timesources to choose from, the fastest TSC (Time Stamp Counter) unfortunately is not always reliable. Linux chooses TSC by default, and during boot checks for inconsistencies, if found it switches to a slower safe timesource. The slower time sources can be 10 to 30 times more expensive to query then the TSC timesource, and may have a measurable impact on Coherence performance. Note that Coherence and the underlying JVM are not aware of the timesource which the OS is utilizing";
$l1="http://wiki.tangosol.com/display/COH32UG/Performance+Tuning";
$l2="http://book.opensourceproject.org.cn/kernel/kernel3rd/opensource/0596005652/understandlk-chp-6-sect-2.html";
printMessage($printHint,$title,$mess,$l1,$l2);
}

if($mess=~m/power states:/ )
{
$title="CPU Power States";
$mess="The CPU power states C0-C3 are defined as follows:
C0 is the operating state.
C1 (often known as Halt) is a state where the processor is not executing instructions, but can return to an executing state essentially instantaneously. All ACPI-conformant processors must support this power state. Some processors, such as the Pentium 4, also support an Enhanced C1 state (C1E or Enhanced Halt State) for lower power consumption,[7].
C2 (often known as Stop-Clock) is a state where the processor maintains all software-visible state, but may take longer to wake up. This processor state is optional.C3 (often known as Sleep) is a state where the processor does not need to keep its cache coherent, but maintains other state. Some processors have variations on the C3 state (Deep Sleep, Deeper Sleep, etc.) that differ in how long it takes to wake the processor. This processor state is optional.";
$l1="http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface";
$l2="http://www.hardwaresecrets.com/article/611";
printMessage($printHint,$title,$mess,$l1,$l2);
}

if($mess=~m/cpuidle/i)
{
$title="CPUIDLE -a new CPU power management infrastructure to manage idle CPU";
$mess="cpuidle separates out the drivers that can provide support for multiple types
of idle states and policy governors that decide on what idle state to use
at run time.A cpuidle driver can support multiple idle states based on parameters like
varying power consumption, wakeup latency, etc (ACPI C-states for example).
A cpuidle governor can be usage model specific (laptop, server,
laptop on battery etc).Main advantage of the infrastructure being, it allows independent development
of drivers and governors and allows for better CPU power management";
$l1="http://lwn.net/Articles/221791/";
printMessage($printHint,$title,$mess,$l1);
}
}

sub hashTables
{
$printHint=shift;
$param=shift;
$lineNum=shift;
$object=new Message($param);
$mess=$object->getInfo();
print("Line:".$lineNum." Info:".$object->getInfo()."\n");

if($mess=~m/PID hash table/i)
{
$title="PID Hash Table";
$message="There are Four hashtables to speed up communication between processes-PID,Thread Group Leader,Group leader,Session Leader. Dynamically allocated.";
$l1="http://books.google.com/books?id=cbbMrRNiC4cC&pg=PT110&lpg=PT110&dq=PID+hash+table&source=bl&ots=Tv_9aXr7_h&sig=d4vtFpDsi7p2VQvyRWA6rwpxFQ0&hl=en&ei=dqphTMrLI4-msQPWsYybCQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBUQ6AEwAA";
$l2="http://lxr.linux.no/linux+v2.6.35/include/linux/pid.h#L104";
printMessage($printHint,$title,$message,$l1,$l2);
}

if($mess=~m/Dentry cache hash table/i)
{
$title="Dentry Cache";
$message="When a file is opened, the dentry cache is populated with entries representing the directory levels representing the path. An inode for the object is also created representing the file. The dentry cache is built using a hash table and is hashed by the name of the object. Entries for the dentry cache are allocated from the dentry_cache slab allocator and use a least-recently-used (LRU) algorithm to prune entries when memory pressure exists. You can find the functions associated with the dentry cache in ./linux/fs/dcache.c (and ./linux/include/linux/dcache.h)";
$l1="http://www.ibm.com/developerworks/linux/library/l-virtual-filesystem-switch/index.html";
$l2="www.halobates.de/memory.pdf";
printMessage($printHint,$title,$message,$l1);
}

if($mess=~m/Inode-cache hash table/i)
{
$title="Inode Cache";
$message="The Virtual File System maintains an inode cache to speed up accesses to all of the mounted file systems. Every time a VFS inode is read from the inode cache the system saves an access to a physical device.The VFS inode cache is implmeneted as a hash table whose entries are pointers to lists of VF inode which have the same hash value. The hash value of an inode is calculated from its inode number and from the device identifier for the underlying physical device containing the file system. Whenever the Virtual File System needs to access an inode, it first looks in the VFS inode cache. To find an inode in the cache, the system first calculates its hash value and then uses it as an index into the inode hash table. This gives it a pointer to a list of inodes with the same hash value. It then reads each inode in turn until it finds one with both the same inode number and the same device identifier as the one that it is searching for";
$l1="http://www.ibm.com/developerworks/linux/library/l-virtual-filesystem-switch/index.html";
$l2="http://www.science.unitn.it/~fiorella/guidelinux/tlk/node110.html";
printMessage($printHint,$title,$message,$l1,$l2);
}
if($mess=~m/Mount-cache hash table/i)
{
$title="Mount Cache Hash Table";
$message="Couldnt get clear info";
$l1="";
printMessage($printHint,$title,$message,$l1,$l2);

}

if($mess=~m/IP route cache hash table/i)
{
$title="Route Cache";
$message="When determining the route by which to send a packet, the kernel always consults the routing cache first.The routing cache is a hash table used for quick access to recently used routes. If the kernel finds an entry in the routing cache, the corresponding entry will be used. If there is no entry in the routing cache, the kernel begins the process of route selection";
$l1="http://linux-ip.net/html/routing-selection.html";
printMessage($printHint,$title,$message,$l1);
}

}

sub openKernelLog{

open(LOGFILE) or die("Could not open log file.");
$linecount=1;
$option=$_[0];
$printHint=$_[1];

foreach $line (<LOGFILE>) {
    chomp($line);              # remove the newline from $line.
    $linecount=$linecount+1;
    switch($option)
     {
     case("c")
      {
    if($line=~m/cpu/i )
      {
      &cpu($printHint,$line,$linecount);
      }
      }

      case("h")
      {
      if($line=~m/hash/-i)
      {
      &hashTables($printHint,$line,$linecount);
      }

      }
      }
    # do line-by-line processing.

}

}

sub Options
{

print("usage: -op -d where op::option d::displayLevel.Default is  display level is 7-sum of Title(4),Message(2),Links(1)."."\n"."Appropriate sums give correct combinations"."\n");
print("cpu                                   -c"."\n");
print("hashtable                             -h"."\n");
print("displaylevel                          -d"."\n");
print("exit                                  -x"."\n");
print("Oth param...".$_[0]);
if($_[0]!="-op")
{
die("First option should be preceded by -op");
}
$option=$_[1];
if($_[2]!="-d")
{
die("Display level should be  preceded by -d");
}
$displayLevel=$_[3];
if($_[0]=="-x")
{
exit(1);
}
openKernelLog($option,$displayLevel);

}

my ($help,$option,$displayLevel);

# getting options
GetOptions(
    'op=s'    => \$option,
    'd=i'     => \$displayLevel,
) or die("Incorrect Usage") ;

#-- prints usage if no command line parameters are passed or there is an unknown
#   parameter or help option is passed

print("Usage: --op --d where op::options d::displayLevel.Default is  display level is 7-sum of Title(4),Message(2),Links(1)."."\n"."Appropriate sums give correct combinations"."\n");
print("Example:: perl logInspect.pl --op c --d 7"."\n");
print("cpu                                   -c"."\n");
print("hashtable                             -h"."\n");
print("displaylevel                          -d"."\n");
print("exit                                  -x"."\n");
openKernelLog($option,$displayLevel);

The second file:
Message.pm

#!/usr/bin/perl

package Message;

sub new
{

my $class=shift;
$unparsedMess=shift;
$char=' ';
$offset=0;
@spaceIndex;

   $result = index($unparsedMess, $char, $offset);
   push(@spaceIndex,$result);
   # print("Index in Mess=".$result."\n");
  while ($result != -1) {
    $offset = $result + 1;
    $result = index($unparsedMess, $char, $offset);
    push(@spaceIndex,$result);
  #print("Index in Mess=".$result."\n");
  }

@bracketIndex;
$char='[';
$result=index($unparsedMess,$char,0);
push(@bracketIndex,$result);
$offset=$result;
$result=index($unparsedMess,']',$offset);
push(@bracketIndex,$result);

 my $self = {
        _Date =>substr($unparsedMess,0,$spaceIndex[2]) ,
        _Time =>substr($unparsedMess,$spaceIndex[2],$spaceIndex[3]-$spaceIndex[2]),
        _Kernel=>substr($unparsedMess,$spaceIndex[3],$spaceIndex[5]-$spaceIndex[3]),
        _TimeElapsed=>substr($unparsedMess,$bracketIndex[0]+1,$bracketIndex[1]-$bracketIndex[0]-1),
        _Info=>substr($unparsedMess,$spaceIndex[10]),
    };
    # Print all the values just for clarification.
    #print "Date is $self->{_Date}\n";
    #print "Time is $self->{_Time}\n";
    #print "Kernel is $self->{_Kernel}\n";
    #print "TimeElapsed is $self->{_TimeElapsed}\n";
    #print "Information is $self->{_Info}\n";
    bless $self, $class;
    return $self;
}
sub getDate {
    my( $self ) = @_;
    return $self->{_Date};
}
sub getTime {
    my( $self ) = @_;
    return $self->{_Time};
}
sub getKernel {
    my( $self ) = @_;
    return $self->{_Kernel};
}
sub getTimeElapsed {
    my( $self ) = @_;
    return $self->{_TimeElapsed};
}
sub getInfo {
    my( $self ) = @_;
    return $self->{_Info};
}
1;

Essentially- , in the file logInspect.pl, you need the add meta-data for a message- similar the example mentioned before. Matching is done using regular expressions.

The Message.pm, -serves as a wrapper around messages in kern.log ( It is elegant/convenient  to have a wrapper around log file message-which provides for better separation of logic )

I would love to learn more about how you improved your level of systems-understanding- did something unconventional work better?

This is the first time I am trying  out something like this- I would love to know more  about criticism/feed-back about the same!

P.S :

I had also posted a question about the best way to learn from from linux log files on Stackoverflow over here . As the community is highly active, you might discover  a better way to  go about this problem.

Bloom Filter : Variants & Applications

If that does not ring a bell,  here’s the link to  wiki article on Bloom Filter :-)

In this post, I discuss about Bloom filter variants, Applications of Bloom filters, and my project on improving the performance of Standard Bloom Filter.

40 years  have passed since this probabilistic data structure- “Bloom filter ” was conceived, and even now papers on variations of  Bloom filters are accepted in top-notch conferences like SIGCOMM and SIGMETRICS .An off-the cuff comment by one of my professors stuck on in my mind -” There still are new data structures to be discovered“- which is really, really intriguing :-)

A Google search on “Bloom filter survey papers”, and similar searches  brings up  heavily-cited survey papers 2003 like  “Network Applications of Bloom Filters: A Survey” – or nice presentations like this but  a LOT of time has passed since then!

Bloom Filter -Variants

The Standard Bloom filter has these characteristics:

  • No  deletion operation
  • Use the same group of ‘k’ hash functions for all elements insertion/lookup

There are numerous variations of the standard Bloom filter, and there are plenty of well-cited papers on the same.

Let me try to understand the points of variation that can affect the  Design and Implementation of a Bloom filter by listing out the factors: Please do comment if you have a better idea to organize points of variation of Bloom filters- and the use-case of those variants.

Parameters of Bloom Filter which could vary:

a) Static /dynamic size of Bit vector,number of hash functions

b) Single vs Multiple, Parallel Bloom Filters

c)  Single vs  Multiple group of hash functions (Partitioned Hashing)

d) Standard/Counting bloom filters (Bit vector entry is a counter)

d) Single Threaded vs Multi-threaded Bloom filter (should be relevant for counting bloom filter- with the deletion operation)

e) Cache aware/Cache oblivious( memory hierarchy optimized)

f) Variations in Number and type of Hash functions used

g) Set membership  vs Multiset membership  (Spectral Bloom filter)

h)Bloom filter on Single core vs Multi-core (parallelize hash functions? )

i) Bloom filter-memory access – distributed/shared/not shared

j) Bloom filter with TTL (Time to live) for data inserted,  storing ,retrieving based on time cluster

k) Bloom filter  variants with dynamic bounds on number of elements stored

l) Storage of data inserted associated with the Bloom filter  – Single copy/ Multiple copy

m)Bloom filter variants with bounds on false-postive rate.

From performance standpoint, these are the main factors:

a)False positive rate

b)Memory access time

c)Hash function time /space complexity

My Project:

In my semester long project, the main reference work were these two papers-:

a)  Fast hash table lookup using extended bloom filter: an aid to network processing - (2005  SIGCOMM paper)

b) Building high accuracy bloom filters using partitioned hashing (2007  SIGMETRICS paper)

I  worked on improving  the False positive rate, and Memory Access time of the Standard Bloom filter.

For false positive rate, it was by using partitioned hashing, and reducing entropy. For memory access time, it involved implementation of schemes, where single/multiple copies of the data were stored in secondary storage.For further details, the report is over here

On Application of Bloom filters-Where do you use them?

I  find it both  intellectually stimulating and highly irritating when I  end up spending time reading literature on clever variations of Bloom filters-  like a cache efficient bloom filter , that would perform better than a standard bloom filter )- when I want to get a solid understanding, and really want to KNOW, at a design level, when Bloom filter would be a good choice, what are it’s variants, when to use them, when not to use them, how to implement them, and  what  can you do with them!  Many online sources lack depth, and many research papers have too much emphasis on the math-so it’s actual usage is less clear

As  the crux of the Bloom filter is that it does NOT have false-negative, it is used to perform a quick look up if the key is present or not, and if so, retrieval is done from slower/costlier secondary data storage.  This could be anything-a Hashtable, database, etc

At low level- there are two prominent use cases-which I have elaborated below.

Packet classification

Check out this paper:  Fast packet classification using bloom filter

Packet classification at line speed is an Engineering design problem.  A really great book to learn more about this is Network Algorithmics by George Varghese, Professor at UCSD . To those new to packet classification- If you have 200 million packets arriving per second at a router,  and say, you have a Rule table  fwith IP prefix of the form (Source IP, Destination IP,source TCP PORT, Destionation TCP Port) , which specifies what to do with each packet(drop,forward,etc), how do you do achieve this with least latency,best memory requirement,least cost? Hardware parallelism?   Bit Tries? Hash Tables?Turns out Bloom filter is used to improve naive cross-product algorithm.

File System

Check out this paper: Avoiding the disk bottleneck in the data domain deduplication file system, published in 2008.

The  basic idea is this:  If you can break  files into segments, and  keep track of segments which are identical at the binary level ,then you can compress the file storage. This is referred to as ” Identical Segment Deduplication”. We know that  linux file system maintains an ‘inode cache’,  in memory, to offset slow reading of inodes from the disk.Going deep into Data domain’s implementation is difficult to summarize- They use a Bloom filter to store segments  already read. So, with this,they could potentially avoid costly disk look-ups.

Google’s BigTable used Bloom filters, and so does HBase- it’s open source version. Here’s a link to Bloom filter documentation  HBase docs

I spent  sometime researching about the use of Bloom filter: Here’s a bunch of interesting links:

  • Stackoverflow: This thread, and  there would be a bunch of others too-as the community is very active!
  • This post from Rapleaf  was an interesting read.

To summarize what Rapleaf did- They had three components – a hash table with 40 million keys, a database with those keys, and 500 million keys which needed to be verified if it was in the 40 million keys.  So, they first inserted the Hash table entries into a bloom filter. Then,for each of those 500 million entries, they verified if it was present in the bloom filter. Since Bloom filters give false -positive,a check had to be made for the Database.

  • This is about how  Chrome uses Bloom filter)
  • This is   an Assembly level Bloom filter!

I haven’t touched upon the Maths behind performance analysis- average/ best case/  ,else the scope of this blog would spiral out of proportion!

From my experience, reading the performance analysis  section of papers on Bloom filter variants, is definitely the way to go- even if it might make you tear your hair :P

I am very curious to know more about the application of Bloom filters, and it’s variants- Please do comment, and let me know if you have something interesting to share!