Gretl Manual: Gnu Regression, Econometrics and Time-series Library | ||
---|---|---|
Prev | Chapter 8. Graphs and plots | Next |
Boxplots are not generated using gnuplot, but rather by gretl itself.
These plots (after Tukey and Chambers) display the distribution of a variable. The central box encloses the middle 50 percent of the data, i.e. it is bounded by the first and third quartiles. The "whiskers" extend to the minimum and maximum values. A line is drawn across the box at the median.
In the case of notched boxes, the notch shows the limits of an approximate 90 percent confidence interval. This is obtained by the bootstrap method, which can take a while if the data series is very long.
Clicking the mouse in the boxplots window brings up a menu which enables you to save the plots as encapsulated postscript (EPS) or as a full-page postscript file. Under the X window system you can also save the window as an XPM file; under MS Windows you can copy it to the clipboard as a bitmap. The menu also gives you the option of opening a summary window which displays five-number summaries (minimum, first quartile, median, third quartile, maximum), plus a confidence interval for the median if the "notched" option was chosen.
Some details of gretl's boxplots can be controlled via a
(plain text) file named .boxplotrc which
is looked for, in turn, in the current working directory, the
user's home directory (corresponding to the environment
variable HOME
) and the gretl user directory
(which is displayed and may be changed under the "File,
Preferences, General" menu). Options that can be set
in this way are the font to use when producing postscript
output (must be a valid generic postscript font name; the
default is Helvetica), the size of the font in points (also
for postscript output; default is 12), the minimum and maximum
for the y-axis range, the width and height of the plot in
pixels (default, 560 x 448), whether numerical values should
be printed for the quartiles and median (default, don't print
them), and whether outliers (points lying beyond 1.5 times the
interquartile range from the central box) should be indicated
separately (default, no). Here is an example:
font = Times-Roman fontsize = 16 max = 4.0 min = 0 width = 400 height = 448 numbers = %3.2f outliers = true
On the second to last line, the value associated with numbers is a "printf" format string as in the C programming language; if specified, this controls the printing of the median and quartiles next to the boxplot, if no numbers entry is given these values are not printed. In the example, the values will be printed to a width of 3 digits, with 2 digits of precision following the decimal point.
Not all of the options need be specified, and the order doesn't matter. Lines not matching the pattern "key = value" are ignored, as are lines that begin with the hash mark, #.
After each variable specified in the boxplot command, a parenthesized boolean expression may be added, to limit the sample for the variable in question. A space must be inserted between the variable name or number and the expression. Suppose you have salary figures for men and women, and you have a dummy variable GENDER with value 1 for men and 0 for women. In that case you could draw comparative boxplots with the following line in the boxplots dialog:
salary (GENDER=1) salary (GENDER=0)