Handling tables in R (2024)

Language Technology and Data Analysis Laboratory
  • HOME
  • ABOUT LADAL
    • People | Collabs
    • News
  • EVENTS
    • Workshops
    • Computational Thinking in HASS
    • LADAL Webinar Series 2022
    • LADAL Webinar Series 2021
    • ATAP Events
  • TUTORIALS
  • RESOURCES
    • Links
    • Tutorial stylesheet
  • CONTACT

Martin Schweinberger

Handling tables in R (2)

This tutorial shows how to work with tables and how to tabulate datain R. A more advanced and detailed but also truly excellent and highlyrecommendable resource on processing data in R is Wickham and Grolemund (2016; see also Wickham et al.2019). Alternative but also very useful resources are Stander and Dalla Valle (2017)and Estrellado et al. (2020).

Handling tables in R (3)

This tutorial is aimed at beginners and intermediate users of R withthe aim of showcasing how to work with and process tabulated data(tables) using R. The aim is not to provide a fully-fledged analysis butrather to show and exemplify selected useful functions and methodsassociated with tabular data.

To be able to follow this tutorial, we suggest you check out andfamiliarize yourself with the content of the following RBasics tutorials:

  • Getting started with R
  • Loading, saving, and generatingdata in R
  • String Processing in R
  • Regular Expressions in R

Click here1 todownload the entire R Notebook for thistutorial.

Handling tables in R (4)
Clickhereto open an interactive Jupyter notebook that allows you to execute,change, and edit the code as well as to upload your own data.

Preparation and session set up

This tutorial is based on R. If you have not installed R or are newto it, you will find an introduction to and more information how to useR here. For thistutorials, we need to install certain packages from an Rlibrary so that the scripts shown below are executed withouterrors. Before turning to the code below, please install the packages byrunning the code below this paragraph. If you have already installed thepackages mentioned below, then you can skip ahead ignore this section.To install the necessary packages, simply run the following code - itmay take some time (between 1 and 5 minutes to install all of thepackages so you do not need to worry if it takes some time).

# install packagesinstall.packages("xlsx")install.packages("tidyverse")install.packages("flextable")install.packages("openxlsx")install.packages("here")# install klippy for copy-to-clipboard button in code chunksinstall.packages("remotes")remotes::install_github("rlesur/klippy")

Now that we have installed the packages, we can activate them asshown below.

# load packageslibrary(dplyr)library(tidyr)library(flextable)library(xlsx)library(openxlsx)library(here)# activate klippy for copy-to-clipboard buttonklippy::klippy()

Once you have installed RStudio and initiated the session byexecuting the code shown above, you are good to go.

Tables are one of the most common and important ways to process,handle, and describe data. This tutorial introduces different types oftables, how to load and save different types of tables, as well as howto modify and process tables and tabulated data.

When working with R, there are different kinds or types of tablesthat have different characteristics. The most common types of tables inR are:

  • matrices
  • data frames
  • tibbles

Matrices can only contain one type of data and all data points willbe converted to the type of scale with the lowest information value. Forinstance, if at least one variables in a table represents characters (orstrings), then all other variables are also converted into characters(although they may be numbers).

Data frames can contain different types of scales and variables arenot converted automatically. In this sense, data frames are moreflexible and are preferable over matrices.

Tibbles are the tidyverse equivalent of data frames which offer newfunctions and possibilities of handling and inspecting the data. .

There are several different functions that allow us to read tabulateddata into R. In our case, we use the readRDS function whichloads Rdata sets.

# load data with read.delimmytable <- base::readRDS(url("https://slcladal.github.io/data/mld.rda", "rb"))
First 10 rows of mytable.

status

attraction

money

Relationship

NotInterested

86.33

Relationship

NotInterested

45.58

Relationship

NotInterested

68.43

Relationship

NotInterested

52.93

Relationship

NotInterested

61.86

Relationship

NotInterested

48.47

Relationship

NotInterested

32.79

Relationship

NotInterested

35.91

Relationship

NotInterested

30.98

Relationship

NotInterested

44.82

If the data is stored as a txt-file, there are various functions toread in the data. The most common functions are read.delimand read.table.

# load data with read.delimtab1 <- read.delim("https://slcladal.github.io/data/mlrdata.txt", sep = "\t", header = TRUE)tab2 <- read.table("https://slcladal.github.io/data/mlrdata.txt", header = TRUE)

To load excel data from the web, you can use theread.xlsx function from the openxlsxpackage.

# load dataexceldata <- openxlsx::read.xlsx("https://slcladal.github.io/data/testdata1.xlsx", sheet = 1)
First 10 rows of the exceldata.

Variable1

Variable2

6

67

65

16

12

56

56

34

45

54

84

42

38

36

46

47

64

54

24

29

To load excel data from your own computer, you can use theread_excel function from the readxl package(which is part of the tidyverse and thus does not need to be installedseparately.

NOTE

You would need to adapt the path to the data (whichcurrently points to the data folder on my computer!). Thus, the pathbelow is specified to my computer.

`

`

# load dataexcelcomp <- readxl::read_excel(here::here("data", "testdata1.xlsx"), sheet = 1)
First 10 rows of the excelcomp data.

Variable1

Variable2

6

67

65

16

12

56

56

34

45

54

84

42

38

36

46

47

64

54

24

29

The most common functions that are used to inspect tabular data arethe head() and the str() functions. Thehead() function shows the first 6 lines (by default) but wecan also specify the number of rows. The str() functionprovides a summary of the structure of the data.

Inspecting the first 6 rows of a table.

head(mytable)
## status attraction money## 1 Relationship NotInterested 86.33## 2 Relationship NotInterested 45.58## 3 Relationship NotInterested 68.43## 4 Relationship NotInterested 52.93## 5 Relationship NotInterested 61.86## 6 Relationship NotInterested 48.47

Inspecting the first 10 rows of a table.

head(mytable, 10)
## status attraction money## 1 Relationship NotInterested 86.33## 2 Relationship NotInterested 45.58## 3 Relationship NotInterested 68.43## 4 Relationship NotInterested 52.93## 5 Relationship NotInterested 61.86## 6 Relationship NotInterested 48.47## 7 Relationship NotInterested 32.79## 8 Relationship NotInterested 35.91## 9 Relationship NotInterested 30.98## 10 Relationship NotInterested 44.82

Checking the structure of tabulated data.

str(mytable)
## 'data.frame': 100 obs. of 3 variables:## $ status : chr "Relationship" "Relationship" "Relationship" "Relationship" ...## $ attraction: chr "NotInterested" "NotInterested" "NotInterested" "NotInterested" ...## $ money : num 86.3 45.6 68.4 52.9 61.9 ...

The following section shows how to access and manipulate tables.

The tidyverse is a specific way of writing R code that builds on afamily of libraries designed for efficient data science work flows whichwere developed initially by Hadley Wickham. This new way of writing Rcode builds on a shared and underlying design philosophy and grammar.Due to its popularity and ease of use, the tidyverse way to write R codeis becoming increasingly popular and we will use it in the following tohandle and manipulate tabulated data.

If you have already loaded data into R and now want to process thedata, you typically have to modify the data in some form or another toget the information or format you need. The tidyverse offers veryuser-friendly, intuitive, and handy functions for processing the data tomatch the needs of your analysis. To have access to the tidyversefunctions for data processing, we load the tidyversepackage and load and inspect another set of data using theread.delim function. The new data is stored as a txt fileand has 100 observations (rows) and 3 variables (status,attraction, and money). The data set represents howmuch money people have spend in someone they were interested in or not(attraction: Interested versus NotInterested) andtheir own relationship status (status: Single versusRelationship).

# load new datanewdata <- base::readRDS(url("https://slcladal.github.io/data/mld.rda", "rb"))
First 10 rows of the excelcomp data.

status

attraction

money

Relationship

NotInterested

86.33

Relationship

NotInterested

45.58

Relationship

NotInterested

68.43

Relationship

NotInterested

52.93

Relationship

NotInterested

61.86

Relationship

NotInterested

48.47

Relationship

NotInterested

32.79

Relationship

NotInterested

35.91

Relationship

NotInterested

30.98

Relationship

NotInterested

44.82

The table represents 3 variables (status, attraction, and money) andeach row contains information on the relationship status of 100 peopleand how much money these people would spend on a gift to someone of theopposite sex who they are or are not interested in. We will now checkout different ways and functions to process this data.

Piping

Piping, done with the sequence %>%, is a very easy,intuitive, quick, and handy way to process data. Essentially pipingmeans that we take an element that is to the left of the piping symboland then do something to it; that way, the piping symbol can betranslated as and then.

We could, for example, load data and then capitalize the column namesand then group the data by status and attraction and then get the meanof money spend on deleting all observations except for the first one. Amore formal way to write this would be:

load %>% capitalize %>% group %>% summarize.

In R this command would look like this:

# example of a data processing pipelinepipeddata <- base::readRDS(url("https://slcladal.github.io/data/mld.rda", "rb")) %>% dplyr::rename(Status = status, Attraction = attraction, Money = money) %>% dplyr::group_by(Status, Attraction) %>% dplyr::summarise(Mean = mean(Money))# inspect summarized datapipeddata
## # A tibble: 4 × 3## # Groups: Status [2]## Status Attraction Mean## <chr> <chr> <dbl>## 1 Relationship Interested 99.2## 2 Relationship NotInterested 51.5## 3 Single Interested 157. ## 4 Single NotInterested 46.0

The pipe has worked and we get the resulting summary which shows themean of the money spend based on Attraction and Status.

Selecting and filtering

Among the most frequent procedures in data processing is selectingcertain columns or extracting rows based on variable levels. In thetidyverse, this is done by using the select andfilter functions. While select allows toextract columns, filter is used to extract rows, e.g.toget only observations that have a certain feature. Have a look at theexample below.

# select and filterreduceddata <- newdata %>% # select the columns attraction and money dplyr::select(attraction, money) %>% # extract rows which represent cases where the person was interested in someone dplyr::filter(attraction == "Interested")# inspect new tablenrow(reduceddata); table(reduceddata$attraction)
## [1] 50
## ## Interested ## 50

We have now reduced the data by excluding status (we have onlyselected attraction and money) and we have removed those 50 data rows ofpeople who were not interested. The select function (likemost other tidyverse functions) can also be used together with a minussign which causes a column to be removed, thusdplyr::select(-money) would remove the money column (seebelow).

# select and filterdatawithoutmoney <- newdata %>% # remove money dplyr::select(-money) # inspect datahead(datawithoutmoney)
## status attraction## 1 Relationship NotInterested## 2 Relationship NotInterested## 3 Relationship NotInterested## 4 Relationship NotInterested## 5 Relationship NotInterested## 6 Relationship NotInterested

Selecting and filtering are extremely powerful functions that canalso be combined with other functions. But before we discuss morecomplex issues, we will have a look at how we can change columns.

Changing data and adding columns

Changing and adding data is done with the mutatefunction. The mutate functions requires that we specify acolumn name - if we use the same name as the column we are changing,then we change the column but if we specify another column name, then anew column is created.

We will now create a new column (Spendalot) in which we encode if theperson has spend a lot of money (100 AUD or more) on the present or not(less than 100 AUD).

# creating a new columnnewdata <- newdata %>% dplyr::mutate(Spendalot = ifelse(money >= 100, "Alot", "Alittle")) # inspect datahead(newdata)
## status attraction money Spendalot## 1 Relationship NotInterested 86.33 Alittle## 2 Relationship NotInterested 45.58 Alittle## 3 Relationship NotInterested 68.43 Alittle## 4 Relationship NotInterested 52.93 Alittle## 5 Relationship NotInterested 61.86 Alittle## 6 Relationship NotInterested 48.47 Alittle

The table now has a new column (Spendalot) because we have specifieda column name that did not exist yet - had we writtendplyr::mutate(money = ifelse(money >= 100, "Alot", "Alittle"))then we would have changed the money column and replaced the moneyvalues with the labels Alot and Alittle.

Renaming columns

Oftentimes, column names are not really meaningful or incoherentwhich makes it easier to wrap your head around what the values in acolumn refer to. The easiest way around this is rename columns which is,fortunately very simple in the tidyverse. While the column names of ourexample table are meaningful, I want to capitalize the first letter ofeach column name. This can be done as follows.

# renaming columnsnewdata <- newdata %>% dplyr::rename(Status = status, Attraction = attraction, Money = money)# inspect datahead(newdata)
## Status Attraction Money Spendalot## 1 Relationship NotInterested 86.33 Alittle## 2 Relationship NotInterested 45.58 Alittle## 3 Relationship NotInterested 68.43 Alittle## 4 Relationship NotInterested 52.93 Alittle## 5 Relationship NotInterested 61.86 Alittle## 6 Relationship NotInterested 48.47 Alittle

The renaming was successful as all column names now begin with acapital letter.

Grouping and summarising

In contrast to mutate, which retains the number of rows, summarizingcreates new columns but collapses rows and only provides the summaryvalue (or values if more than one summary is specified). Also, columnsthat are not grouping variables are removed.

Summarizing is particularly useful when we want to get summaries ofgroups. We will modify the example from above and extract the mean andthe standard deviation of the money spend on presents by relationshipstatus and whether the giver was attracted to the giv-ee.

#grouping and summarizing data datasummary <- newdata %>% dplyr::group_by(Status, Attraction) %>% dplyr::summarise(Mean = round(mean(Money), 2), SD = round(sd(Money), 1))# inspect summarized datadatasummary
## # A tibble: 4 × 4## # Groups: Status [2]## Status Attraction Mean SD## <chr> <chr> <dbl> <dbl>## 1 Relationship Interested 99.2 14.7## 2 Relationship NotInterested 51.5 17 ## 3 Single Interested 157. 23.2## 4 Single NotInterested 46.0 19.9

Gathering and Spreading

One very common problem is that data - or at least parts of it - haveto be transformed from long to wide format or vice versa. In thetidyverse, this is done using the gather andspread function. We will convert the summary table shownabove into a wide format (we also remove the SD column as it is nolonger needed)

# converting data to wide format widedata <- datasummary %>% # remove SD column dplyr::select(-SD) %>% # convert into wide format tidyr::spread(Attraction, Mean)# inspect wide datawidedata
## # A tibble: 2 × 3## # Groups: Status [2]## Status Interested NotInterested## <chr> <dbl> <dbl>## 1 Relationship 99.2 51.5## 2 Single 157. 46.0

We can re-convert the wide into a long format using thegather function.

# converting data to long format longdata <- widedata %>% # convert into long format tidyr::gather(Attraction, Money, Interested:NotInterested)# inspect wide datalongdata
## # A tibble: 4 × 3## # Groups: Status [2]## Status Attraction Money## <chr> <chr> <dbl>## 1 Relationship Interested 99.2## 2 Single Interested 157. ## 3 Relationship NotInterested 51.5## 4 Single NotInterested 46.0

There are many more useful functions for processing, handling, andsummarizing tables but this should suffice to get you started.

Schweinberger, Martin. 2022. Handling tables in R. Brisbane:The University of Queensland. url: https://ladal.edu.au/table.html (Version2022.11.17).

@manual{schweinberger2022tabr, author = {Schweinberger, Martin}, title = {Handling tables in R}, note = {https://ladal.edu.au/table.html}, year = {2022}, organization = "The University of Queensland, School of Languages and Cultures}, address = {Brisbane}, edition = {2022.11.17}}
sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)## Platform: x86_64-w64-mingw32/x64 (64-bit)## Running under: Windows 10 x64 (build 22621)## ## Matrix products: default## ## locale:## [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8 ## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C ## [5] LC_TIME=English_Australia.utf8 ## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages:## [1] here_1.0.1 openxlsx_4.2.5.2 xlsx_0.6.5 flextable_0.9.1 ## [5] tidyr_1.3.0 dplyr_1.1.2 ## ## loaded via a namespace (and not attached):## [1] Rcpp_1.0.10 xlsxjars_0.6.1 assertthat_0.2.1 ## [4] rprojroot_2.0.3 digest_0.6.31 utf8_1.2.3 ## [7] mime_0.12 cellranger_1.1.0 R6_2.5.1 ## [10] evaluate_0.21 highr_0.10 pillar_1.9.0 ## [13] gdtools_0.3.3 rlang_1.1.1 readxl_1.4.2 ## [16] curl_5.0.0 uuid_1.1-0 rstudioapi_0.14 ## [19] data.table_1.14.8 jquerylib_0.1.4 klippy_0.0.0.9500 ## [22] rmarkdown_2.21 textshaping_0.3.6 shiny_1.7.4 ## [25] compiler_4.2.2 httpuv_1.6.11 xfun_0.39 ## [28] pkgconfig_2.0.3 askpass_1.1 systemfonts_1.0.4 ## [31] gfonts_0.2.0 htmltools_0.5.5 openssl_2.0.6 ## [34] tidyselect_1.2.0 tibble_3.2.1 fontBitstreamVera_0.1.1## [37] httpcode_0.3.0 fansi_1.0.4 withr_2.5.0 ## [40] crayon_1.5.2 later_1.3.1 crul_1.4.0 ## [43] grid_4.2.2 jsonlite_1.8.4 xtable_1.8-4 ## [46] lifecycle_1.0.3 magrittr_2.0.3 zip_2.3.0 ## [49] cli_3.6.1 stringi_1.7.12 cachem_1.0.8 ## [52] promises_1.2.0.1 xml2_1.3.4 bslib_0.4.2 ## [55] ellipsis_0.3.2 ragg_1.2.5 generics_0.1.3 ## [58] vctrs_0.6.2 tools_4.2.2 glue_1.6.2 ## [61] officer_0.6.2 fontquiver_0.2.1 purrr_1.0.1 ## [64] fastmap_1.1.1 yaml_2.3.7 fontLiberation_0.1.0 ## [67] rJava_1.0-6 knitr_1.43 sass_0.4.6

Back to top

Back to LADAL

Estrellado, Ryan A, Emily A Freer, Jesse Mostipak, Joshua M Rosenberg,and Isabella C Velásquez. 2020. Data Science in Education Usingr. Routledge.

Stander, Julian, and Luciana Dalla Valle. 2017. “On EnthusingStudents about Big Data and Social Media Visualization and AnalysisUsing r, RStudio, and RMarkdown.” Journal of StatisticsEducation 25 (2): 60–67.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, LucyD’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019.“Welcome to the Tidyverse.” Journal of Open SourceSoftware 4 (43): 1686.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science:Import, Tidy, Transform, Visualize, and Model Data. " O’ReillyMedia, Inc.".

  1. If you want to render the R Notebook on your machine,i.e.knitting the document to html or a pdf, you need to make sure thatyou have R and RStudio installed and you also need to download the bibliographyfile and store it in the same folder where you store theRmd file.↩︎

Handling tables in R (2024)
Top Articles
Latest Posts
Article information

Author: Nathanael Baumbach

Last Updated:

Views: 6360

Rating: 4.4 / 5 (55 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Nathanael Baumbach

Birthday: 1998-12-02

Address: Apt. 829 751 Glover View, West Orlando, IN 22436

Phone: +901025288581

Job: Internal IT Coordinator

Hobby: Gunsmithing, Motor sports, Flying, Skiing, Hooping, Lego building, Ice skating

Introduction: My name is Nathanael Baumbach, I am a fantastic, nice, victorious, brave, healthy, cute, glorious person who loves writing and wants to share my knowledge and understanding with you.