Forget OLAP cubes and pivot tables! As is the case most of the time, 90% of the insight can be gleaned from very simple data plots. To be sure, the remaining 10% does shed valuable new light, but is also orders of magnitude relatively more tedious to distill.

One of my favorite tools for performing quick sanity checks on data and even for inferring high-level trends is a histogram; it simply partitions your data points into a fixed number of buckets, with each bucket holding points that fall within a given range. The resultant bucket sizes are then available to eye-ball, often plotted as bars whose lengths are proportional to the number of elements in the corresponding buckets.

When your data is generated from Unix/Linux scripts as is typically the case with most LAMP based systems, migrating it into CSV tables to histogram within Excel, or even firing up your free copy of OpenOffice is a cumbersome overhead. A simple script will often suffice to generate compelling histograms.

Here is one such script I had written a long time ago. I have used it so many times that I feel that someone else, somewhere else, is bound to benefit from it too.

As an example of its usage, here is the command line and output for generating a histogram over a set of 10K Gaussian random numbers. (As a by-product, observe the cool trick exploiting the Central Limit Theorem to generate a normally distributed random number using an awk one-liner. Normalized Gaussian randoms are useful to deliberately add controlled noise to a process, for example in selecting the top 3 (with variety) out of a ranked list of 10 ads to show on a publication.

$ gawk 'BEGIN {for(i=0;i<1e4;i++)print rand()+rand()+rand()}' \ | histo -stars -scale 50 -interval 0.15 # NumSamples = 10000; Max = 2.94838; Min = 0.0360424 # Mean = 1.5030500454; Variance = 0.253799739614845; SD = 0.503785410283828 # Each * represents a count of 50 0.0360 - 0.1860 [ 13]: * 0.1860 - 0.3360 [ 58]: ** 0.3360 - 0.4860 [ 125]: *** 0.4860 - 0.6360 [ 250]: ***** 0.6360 - 0.7860 [ 374]: ******** 0.7860 - 0.9360 [ 552]: ************ 0.9360 - 1.0860 [ 758]: **************** 1.0860 - 1.2360 [ 942]: ******************* 1.2360 - 1.3860 [ 1026]: ********************* 1.3860 - 1.5360 [ 1130]: *********************** 1.5360 - 1.6860 [ 1145]: *********************** 1.6860 - 1.8360 [ 1006]: ********************* 1.8360 - 1.9860 [ 854]: ****************** 1.9860 - 2.1360 [ 652]: ************** 2.1360 - 2.2860 [ 501]: *********** 2.2860 - 2.4360 [ 306]: ******* 2.4360 - 2.5860 [ 174]: **** 2.5860 - 2.7360 [ 108]: *** 2.7360 - 2.8860 [ 25]: * 2.8860 - 3.0360 [ 1]: *

Enjoy!

&