4  Working with data.frame

We will refer to an R data.frame or tibble object simply as data. After all, data are the object that we will be spending most time on.

4.1 Data Frame

Data frames are created using the data.frame() function by supplying a list of columns. data.frames, as it is typically referred to are of list data type with one important distinction. List can have elements of unequal length. In data.frame, all the elements must have the same length to make the data.frame a true rectangular array.

my_list = list(
  serial = 1:5,
  age = c(10, 11, 20, 30, 32), 
  sex = c('M', 'F', 'F', 'M', 'M')
)
df = data.frame(my_list)

df
  serial age sex
1      1  10   M
2      2  11   F
3      3  20   F
4      4  30   M
5      5  32   M

If you look at the data type for df using typeof(df), you will see its a list.

typeof(df)
[1] "list"

To view the structure of df object

str(df)
'data.frame':   5 obs. of  3 variables:
 $ serial: int  1 2 3 4 5
 $ age   : num  10 11 20 30 32
 $ sex   : chr  "M" "F" "F" "M" ...

4.2 Attributes of Data Frame

As mentioned earlier, matrix and data frame are collection of vectors but they have additional characteristics called ‘attributes’. R’s data frame is a named list of vectors with the following attributes:

  • column names (names)
  • row names (row.names)
  • class (class)

Lets see the attributes of the df data frame object.

attributes(df)
$names
[1] "serial" "age"    "sex"   

$class
[1] "data.frame"

$row.names
[1] 1 2 3 4 5

Because they are attributes of an object, these functions can be used to extract these attributes from these objects. Thus, to know the column names sumply use the names() function as follows.

names(df)
[1] "serial" "age"    "sex"   

Likewise, to get the row names, use row.names(df) and to get the class of the object, use `class(df)

row.names(df)
[1] "1" "2" "3" "4" "5"
class(df)
[1] "data.frame"
arrtibutes are not the same as the elements of a list

You might be thinking that you can extract the names using the $ operator. They are not elements of the list. These are simply attributes which cannot be extracted using df$names

df$names
NULL
df$class
NULL
df$row.names
NULL

4.2.1 Exercise

  1. Create a matrix object and explore its attributes. What difference do you see from the attribtues of a data frame?
x = matrix(1:10, ncol=2)
x
attributes(x)
  1. Create a list object and explore its attributes.

  2. Create a data frame object and expore its attributes.

References