Chapter 4 Vectors, matrices and functions

4.1 Vectors

The R basic object is the vector (a scalar is considered as a vector of length one). The most used function to create a vector is the concatenation:

price <- c(150, 162, 155, 157); price

## [1] 150 162 155 157

Indexing is done through brackets:

price[1] # Unlike in Python, the first index is always 1!!

## [1] 150

price[c(1.3)]

## [1] 150

price[-(1:2)] # to extract all elements except the 1st and 2nd

## [1] 155 157

One can also use a boolean indexing vector, the extracted elements are obviously those corresponding to the TRUE values. For example to extract prices greater than 156:

price > 156 # the boolean vector

## [1] FALSE  TRUE FALSE  TRUE

price[price > 156]

## [1] 162 157

An alternative is given by the which () function which returns the indices whose elements satisfy a logical condition:

which(price > 155)

## [1] 2 4

price[which(price > 156)]

## [1] 162 157

You can use the indexing to change an element:

price [1] <- 0; price

## [1]   0 162 155 157

It is possible to give labels to the elements of a vector and extract elements based on them:

names(price)

## NULL

# NULL is a special object with NULL mode that reads "no container"
names(price) <- c('model.1', 'model.2', 'model.3', 'model.4')
price

## model.1 model.2 model.3 model.4 
##       0     162     155     157

price['model.3']

## model.3 
##     155

In a vector, all the elements must have the same mode:

x <- c(1,2, 'a', 'b'); x

## [1] "1" "2" "a" "b"

mode(x)

## [1] "character"

To generate the vector of the first \(n\) integers we use the syntax 1:n

1:10

##  [1]  1  2  3  4  5  6  7  8  9 10

2:6

## [1] 2 3 4 5 6

To generate more general sequences we use the seq() function:

seq(from = 2, to = 20, by = 2) # or more simply seq(2,20,2)

##  [1]  2  4  6  8 10 12 14 16 18 20

We can create a vector of repeated elements with rep():

rep(1, len = 3) # same thing as rep (1,3)

## [1] 1 1 1

rep(NA, 4)

## [1] NA NA NA NA

4.2 Matrices

A matrix is a vector with a dim attribute of length two. All the elements of a matrix therefore have the same mode. To create a matrix:

M <- matrix(2:7, nrow = 2, ncol = 3); M

##      [,1] [,2] [,3]
## [1,]    2    4    6
## [2,]    3    5    7

matrix (2:7, nrow = 2, ncol = 3, byrow = TRUE)

##      [,1] [,2] [,3]
## [1,]    2    3    4
## [2,]    5    6    7

By default matrix () fills the new matrix one column after another. Indexing is done through brackets:

M[2,] # 2nd line

## [1] 3 5 7

M[, 3] # 3rd column

## [1] 6 7

M[2.3]

## [1] 3

M[3]

## [1] 4

M[, -2] # to extract all columns except the 2nd

##      [,1] [,2]
## [1,]    2    6
## [2,]    3    7

To vertically (resp. horizontally) merge two matrices we use rbind() (resp. cbind()):

cbind (M, -M)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    2    4    6   -2   -4   -6
## [2,]    3    5    7   -3   -5   -7

rbind(M, 2 * M)

##      [,1] [,2] [,3]
## [1,]    2    4    6
## [2,]    3    5    7
## [3,]    4    8   12
## [4,]    6   10   14

4.3 Operations on numerical vectors and matrices

Element wise operations:

v <- c(3,4,1,6)
v + 2

## [1] 5 6 3 8

v * 2

## [1]  6  8  2 12

v * v

## [1]  9 16  1 36

v / 2

## [1] 1.5 2.0 0.5 3.0

v / v

## [1] 1 1 1 1

v + v^2

## [1] 12 20  2 42

sqrt(M)

##          [,1]     [,2]     [,3]
## [1,] 1.414214 2.000000 2.449490
## [2,] 1.732051 2.236068 2.645751

M * M

##      [,1] [,2] [,3]
## [1,]    4   16   36
## [2,]    9   25   49

# Try the following command:
# M + v

Transpose, multiplication, inverse:

t(M)

##      [,1] [,2]
## [1,]    2    3
## [2,]    4    5
## [3,]    6    7

N <- M[, -3]
N %*% diag(1,2) # row by column product matrix

##      [,1] [,2]
## [1,]    2    4
## [2,]    3    5

# diag (1,2) builds the 2x2 diagonal matrix where all the
# diagonal elements are equal to 1, ie the 2x2 identity matrix
solve(N)

##      [,1] [,2]
## [1,] -2.5    2
## [2,]  1.5   -1

solve(N) %*% N # checking if solve(N) is the inverse of N

##      [,1]         [,2]
## [1,]    1 1.776357e-15
## [2,]    0 1.000000e+00

The transpose of a vector is a row matrix:

V <- t(v)
dim(V)

## [1] 1 4

t(V)

##      [,1]
## [1,]    3
## [2,]    4
## [3,]    1
## [4,]    6

Pay attention to the following examples:

v %*% t(v) # v is considered a column vector!

##      [,1] [,2] [,3] [,4]
## [1,]    9   12    3   18
## [2,]   12   16    4   24
## [3,]    3    4    1    6
## [4,]   18   24    6   36

t(v) %*% v # ditto

##      [,1]
## [1,]   62

diag(1,4) %*% v # ditto

##      [,1]
## [1,]    3
## [2,]    4
## [3,]    1
## [4,]    6

v %*% v # v is both considered a row-vector and a column-vector

##      [,1]
## [1,]   62

4.4 Factors

A factor is a vector used to represent qualitative variables, ie a variable with discrete values. Its values, or categories, are called the levels in R.

city <- c('paris', 'lyon', 'lyon', 'paris', 'nantes')
fact.city <- as.factor(city); fact.city

## [1] paris  lyon   lyon   paris  nantes
## Levels: lyon nantes paris

class(fact.city)

## [1] "factor"

levels(fact.city)

## [1] "lyon"   "nantes" "paris"

A factor has the numeric mode. The reason for this counter-intuitive fact is that the elements of a factor are represented as integers corresponding to the lexicographic order of their values:

mode(fact.city)

## [1] "numeric"

as.numeric(fact.city)

## [1] 3 1 1 3 2

4.5 User-defined functions

Example:

my.function <- function(x, y = 10) {# the default value of y is 10
  z = x-y
  return(z)
}
my.function(2)

## [1] -8

my.function(2,4)

## [1] -2

my.function(y = 1, x = 4)

## [1] 3

Any variable defined in a function is local and does not appear in the workspace: try to run

4.6 Exercises

Let \(x\) be a vector with the elements of a sample:

##  [1] 45 63 17 32 54 57 41 29 34 37 18 39 46 43

Write a code to give
- the third element of the sample
- the first four elements of the sample
- the items strictly greater than 35.
- all elements except those in positions 3, 9 and 12.
Replace the first element by a missing value and give the position of all elements less than 30.

Write a function weighted_average that takes as inputs two vectors \(x=(x_1,\ldots,x_n)\) and \(w=(w_1,\ldots,w_n)\) and computes the weighted mean \[ \frac{1}{\sum_{i=1}^n w_i}\sum_{i=1}^nw_ix_i \]