Chapter 7 Base graphics

7.1 Basic constructions and scatter plots

We have already seen how to add a line to a scatter plot, to complete the graph we can specify a title, change the axis labels, and add a legend:

We can also add a third dimension by coloring the points according to a condition:

In general, abline(a, b) allows to add a line with intercept a and slopeb to an existing plot. For example, an alternative way to add the regression line is as follows:

To add any line (not necessarily a straight line) we use the lines() function. For example, to add the lowess (locally-weighted polynomial regression) smoothing curve:

We can add new points using points():

We can control the range of axes using xlim() and ylim():

7.2 Line graphs

The lines() function adds a line to an existing graph and cannot be used to create a new graph. To draw a line connecting the successive elements of a vector we will rather use plot(x = vector, type = l):

The type parameter allows you to connect (or not) the points in different ways:

With plot = 'h' we get a bar plot:

7.3 Graphical parameters

We have already seen that it is possible to control certain graphical elements using parameters:

Element Parameter
point pch
type of line graph type
color col
line lty,lwd
axis label xlab,ylab
axis dimensions xlim,ylim
label dimensions cex
orientation axes labels las

See ?par for a description of the values that these parameters can take.

To impose parameters on all the graphics produced during a session, we will use the par() function. par() is often used to view two or more plots in the same window with the parameter mfrow = c(l, c). In this case the graphs are displayed in a grid with l rows and c columns. Try the following:

Changes are implemented until when the session is closed, or the graphics engine is reset by dev.off() or by clicking on the Clear all plots brush in RStudio.

par() is also used to change the size of the margins. This is sometimes useful when the labels on the axes do not fit in the window, as in this example:

We modify the parameter which sets the left margin:

7.4 Histograms

As we have already noted in the subsection 6.2.1, by default, hist() displays the histogram with the counts. To display the histogram in the density scale we use the option freq = FALSE. In this case, the area of each rectangle will be equal to the proportion of observations in the corresponding class (so that the total area of all the rectangles is one).

To superimpose the curve of a given density:

Rather than viewing the histogram of the data, we can show the estimated density (using kernel density estimation methods):

7.5 Exercises

  1. We consider the midwest dataset from the ggplot2 package (an advanced graphics package).
    • Install and charge the package.
    • Consult the help for the description of the variables in midwest.
    • Reproduce the following graphics using basic functions:

  1. We consider the adult dataset available on the site archive.ics.uci.edu/ml/datasets/Adult. The dataset consists of 48,842 rows and 14 columns.
    • Import the data into R from the file adult.data. Look in the adult.names file for the names of the variables.
    • Describe each variable appropriately according to its type.
    • Describe the relationship between the variables age andclass.