This program shows the data density of a two-dimensional distribution. This can't be shown easily on a scatter plot, because the density of points rapidly becomes too high. There is no definite way of defining data density at any point, so the program uses the reciprocal of squared distance from the sum of the points. This definition of data density leads to the problem that data density goes to infinity at a pixel that is exactly on a point. To avoid this, a small fudge factor is added to each point.
The program is implemented in straightforwards ANSI C and should be compileable on any platform with a C compiler.
Here's an example. The data density plot gives a much better visual fix on the data density than the scatterplot. The data file used to generate the plots is here fourgaaussians.csv . We use another file with four columns to show the ability to select x and y columns gaussplusuni.csv . This file consists of four Gaussians in columns "A" and "B", plus columns representing the Gaussian with a uniform scatter ("C" "D").
datadensity infile.csv densitymap.jpeg
This will create a data density map with default settings.
Firstly we might have a data set where x is much bigger or smaller than y. For instance x might be height in inches, whilst y could be salaries in pounds. Just feeding in the raw data won't give sensible results. However we can scale x by the xscale factor, to get data into more or less the same range.
Then we need to play about with the fudge factor. The higher the fudge factor, the more smear in the data. Make the fudge factor large, and the plot congeals together, and you only get a rough idea of the density. Make it too small, and it reverts to what is effectively a scatterplot. There's no "right" answer for the value to set the fudge factor to. It depends what question you're asking of the data.
|fudge 0.01||fudge0.1||fudge 0.5||fudge 3|
If you just want to look at the data density plot casually to get an idea what the data looks like, the defualt output settings will be good enough. But if you want a plot for publication, you'll want to change the output settings. The width and height of the plot in pixels can be set. By defualt, the range of the plat is from minmium to maximum of points actually in the data, but this can be over-ridden. Jpeg, GIF or bmp files can be output.
The colourscheme is the fun part. By default the plot uses the jet palette, in which cold colours represent areas of low density and hot colours areas. But you can change this. Palettes incorporated into the program include
If you want to use your own colourscheme, it's a bit tricky but possible. Set up a GIF or BMP file with a palette corresponding to the colourscheme you want. Then pass the name of the file as the option to -colourscheme. The program then uses that palette.
Here's the same image rendered with the zebra palette.
There are too many lines. This is easily fixed by setting -levels to 12.
Now the image is suitable for black and white representation.
Note if you have a lot of points the calculation may take some time.