Making Summary Tables in R
Background
Table output of R is one of the richest and satisfying to use feature. Rmarkdown format provides loads of package support to create, format, and present tables beautifully. This is on one aspect extremely useful while on the other end it could very well be daunting as to choose between various package options to use while formating your table. I have a bunch of suggestions and enlistments here to help get off that dilemma.
Once in a while someone writes a blog post and addressess these issues. This is true for this topic too. https://rfortherestofus.com/2019/11/how-to-make-beautiful-tables-in-r/ has wonderfully curated list of several such options. My intnet too is to supplement the information included in the post.
General purpose tables
Here goes the list of packages:
- Table output (or in general, dataframe printing) is a more general idea for Rmarkdown documents. It can be set with a print option set in the YAML header.
title: Some good amount of table
output:
html_document:
df_print: paged
The df_print option can take other values such as default, kable and tibble. More on this at https://bookdown.org/yihui/rmarkdown/html-document.html#data-frame-printing.
- gt package
- kable + kableExtra.
Here are a bunch of appealing examples that surely entice you into using this combination of packages.
vignette(package = "kableExtra", topic = "awesome_table_in_pdf")
Sharla Gelfand also has whole repository maintained for sharing examples on use of kableExtra. Check that out at: https://github.com/sharlagelfand/kableExtra-cookbook
Additionally, I have forked the repo and tried to contribute some of my own hacks (not exactly my own, but learnt elsewhere on the internet) to the bookdown project.
- formattable
- DT. More at: https://rstudio.github.io/DT/
- reactable. A demonstration of use at: https://projects.fivethirtyeight.com/2019-womens-world-cup-predictions/
- flextable: https://davidgohel.github.io/flextable/index.html
- huxtable. https://hughjonesd.github.io/huxtable/
- rhandsontable. This extremely helpful package in case if you have dirty data and data representation. This lets you manually edit the data like working in a spreadsheet software. More on: https://jrowen.github.io/rhandsontable/
- pixiedust. https://github.com/nutterb/pixiedust
Summary tables
rtables package
For the time of creating this post, the package rtable was available only as github project of G. Becker. In particular, specific branch was used to compile a package. However, as he mentions here, the project has a long history of being released as open-source well after being using as proprietary for some time.
We start by installing and loading essential libraries.
Throughout, williams.trees
dataset from the agridat package will be used. Apart from already existing factors and numeric variables, an additional factor is generated from random process because pre-existing gen
(genotype information) variable is nested in structure. Nested variable means that summary for that is available only for specific grouping and not for overall use.
Using rtable::split_cols_by
, we split the analysis variable into multiple columns formed by a grouping variable.
## Chanthaburi HuaiBong Ratchaburi SaiThong Sakaerat SiSaKet
## D C B D C B D C B D C B D C B D C B
## ----------------------------------------------------------------------------------------------------------------------------------------------------
## mean 227.7 306.6 247.9 201.5 179.5 227.8 445.5 464.5 474.2 655.9 699.1 576.7 285.5 318.6 397.3 421.1 439.5 537.4
Row splitting can also be done as shown.
## Chanthaburi HuaiBong Ratchaburi SaiThong Sakaerat SiSaKet
## --------------------------------------------------------------------------------------
## D 15 (50%) 11 (40.7%) 11 (32.4%) 16 (48.5%) 11 (33.3%) 9 (25.7%)
## mean 227.67 201.55 445.45 655.94 285.45 421.11
## sd 107.19 91.81 141.97 277.59 142.17 177.37
## range 315 322 455 957 512 561
## max 391 397 673 1073 575 658
## min 76 75 218 116 63 97
## C 5 (16.7%) 10 (37%) 11 (32.4%) 7 (21.2%) 10 (30.3%) 12 (34.3%)
## mean 306.6 179.5 464.55 699.14 318.6 439.5
## sd 195.3 95.81 142.08 320.74 136.36 177.74
## range 489 249 508 820 458 515
## max 569 345 764 1083 632 735
## min 80 96 256 263 174 220
## B 10 (33.3%) 6 (22.2%) 12 (35.3%) 10 (30.3%) 12 (36.4%) 14 (40%)
## mean 247.9 227.83 474.17 576.7 397.33 537.43
## sd 142.2 89.87 154.73 214.48 190.91 209.85
## range 471 225 555 580 568 681
## max 555 327 764 899 658 839
## min 84 102 209 319 90 158
In the previous table we used custom function for summarizing. However, we can use pre-existing helper functions of R like the summary
function.
## Chanthaburi HuaiBong Ratchaburi SaiThong Sakaerat SiSaKet
## (N=30) (N=27) (N=34) (N=33) (N=33) (N=35)
## ----------------------------------------------------------------------------------------
## D 15 (50%) 11 (40.7%) 11 (32.4%) 16 (48.5%) 11 (33.3%) 9 (25.7%)
## Min. 76 75 218 116 63 97
## 1st Qu. 154.5 145.5 358.5 464.75 207 353
## Median 186 189 436 718.5 235 414
## Mean 227.67 201.55 445.45 655.94 285.45 421.11
## 3rd Qu. 325.5 227 538 884 362 521
## Max. 391 397 673 1073 575 658
## C 5 (16.7%) 10 (37%) 11 (32.4%) 7 (21.2%) 10 (30.3%) 12 (34.3%)
## Min. 80 96 256 263 174 220
## 1st Qu. 163 114.5 377 476 256 316.25
## Median 306 138 439 802 278 358
## Mean 306.6 179.5 464.55 699.14 318.6 439.5
## 3rd Qu. 415 227.75 559.5 897 358.75 619
## Max. 569 345 764 1083 632 735
## B 10 (33.3%) 6 (22.2%) 12 (35.3%) 10 (30.3%) 12 (36.4%) 14 (40%)
## Min. 84 102 209 319 90 158
## 1st Qu. 141.5 161.25 422.75 420.75 205.75 404.25
## Median 217.5 254.5 481.5 493 458 568
## Mean 247.9 227.83 474.17 576.7 397.33 537.43
## 3rd Qu. 323.25 287.75 540 753 529.75 677.75
## Max. 555 327 764 899 658 839
In earlier functions, we used variable as data parameter. But dataset entirely can also be provided as a data parameter if summary involves multiple variables.
## Chanthaburi HuaiBong Ratchaburi SaiThong Sakaerat SiSaKet
## (N=30) (N=27) (N=34) (N=33) (N=33) (N=35)
## ------------------------------------------------------------------------------------------------------------
## D 15 (50%) 11 (40.7%) 11 (32.4%) 16 (48.5%) 11 (33.3%) 9 (25.7%)
## Total genotypes 15 11 11 16 11 9
## Unique genotypes 3 5 5 5 4 4
## Genotypes with > 1 records 12 (40%) 6 (22.22%) 6 (17.65%) 11 (33.33%) 7 (21.21%) 5 (14.29%)
## C 5 (16.7%) 10 (37%) 11 (32.4%) 7 (21.2%) 10 (30.3%) 12 (34.3%)
## Total genotypes 5 10 11 7 10 12
## Unique genotypes 3 4 4 4 5 5
## Genotypes with > 1 records 2 (6.67%) 6 (22.22%) 7 (20.59%) 3 (9.09%) 5 (15.15%) 7 (20%)
## B 10 (33.3%) 6 (22.2%) 12 (35.3%) 10 (30.3%) 12 (36.4%) 14 (40%)
## Total genotypes 10 6 12 10 12 14
## Unique genotypes 4 3 4 4 5 5
## Genotypes with > 1 records 6 (20%) 3 (11.11%) 8 (23.53%) 6 (18.18%) 7 (21.21%) 9 (25.71%)
Also, insted of letting automatic counting from the given analysis variable, we could manually supply the column aggregate summary by initially populating the columns counts. This is done using tapply
or map
functions.
## Chanthaburi HuaiBong Ratchaburi SaiThong Sakaerat SiSaKet
## (N=30) (N=27) (N=34) (N=33) (N=33) (N=35)
## ------------------------------------------------------------------------------------------------------------
## D 15 (50%) 11 (40.7%) 11 (32.4%) 16 (48.5%) 11 (33.3%) 9 (25.7%)
## Total genotypes 15 11 11 16 11 9
## Unique genotypes 3 5 5 5 4 4
## Genotypes with > 1 records 12 (40%) 6 (22.22%) 6 (17.65%) 11 (33.33%) 7 (21.21%) 5 (14.29%)
## C 5 (16.7%) 10 (37%) 11 (32.4%) 7 (21.2%) 10 (30.3%) 12 (34.3%)
## Total genotypes 5 10 11 7 10 12
## Unique genotypes 3 4 4 4 5 5
## Genotypes with > 1 records 2 (6.67%) 6 (22.22%) 7 (20.59%) 3 (9.09%) 5 (15.15%) 7 (20%)
## B 10 (33.3%) 6 (22.2%) 12 (35.3%) 10 (30.3%) 12 (36.4%) 14 (40%)
## Total genotypes 10 6 12 10 12 14
## Unique genotypes 4 3 4 4 5 5
## Genotypes with > 1 records 6 (20%) 3 (11.11%) 8 (23.53%) 6 (18.18%) 7 (21.21%) 9 (25.71%)
Here are some of the handy utility functions that can be used on the go.
qwraps2 package
For constructing simple whole sample or subsample summary tables, qwarps package have simple interface. It provides a richly showcased vignette using mtcars dataset.
It requires markup language to be set early on in code chunk to render proper format.
mtcars2 (N = 32) | cyl_factor: 6 cylinders (N = 7) | cyl_factor: 4 cylinders (N = 11) | cyl_factor: 8 cylinders (N = 14) | P-value | |
---|---|---|---|---|---|
Miles Per Gallon | |||||
min | 10.4 | 17.8 | 21.4 | 10.4 | |
max | 33.9 | 21.4 | 33.9 | 19.2 | |
mean (sd) | 20.09 ± 6.03 | 19.74 ± 1.45 | 26.66 ± 4.51 | 15.10 ± 2.56 | P < 0.0001 |
Displacement | |||||
min | 71.1 | 145.0 | 71.1 | 275.8 | |
median | 196.3 | 167.6 | 108.0 | 350.5 | |
max | 472 | 258.0 | 146.7 | 472.0 | |
mean (sd) | 230.72 ± 123.94 | 183.31 ± 41.56 | 105.14 ± 26.87 | 353.10 ± 67.77 | P < 0.0001 |
Weight (1000 lbs) | |||||
min | 1.513 | 2.620 | 1.513 | 3.170 | |
max | 5.424 | 3.460 | 3.190 | 5.424 | |
mean (sd) | 3.22 ± 0.98 | 3.12 ± 0.36 | 2.29 ± 0.57 | 4.00 ± 0.76 | P < 0.0001 |
Forward Gears | P < 0.0001 | ||||
Three | 15 (47) | 2 (29) | 1 (9) | 12 (86) | |
Four | 12 (38) | 4 (57) | 8 (73) | 0 (0) | |
Five | 5 (16) | 1 (14) | 2 (18) | 2 (14) |
Alternatively, row group name can be used for informing p-value. This is exemplified in the vignette for package.
gtsummary package
This package is a recent development but has a lot more preview ready examples as vignette. It is richer and more easily extensible in feature because it draws upon the gt
package. Check vignettes out at cran repo for the package: https://cran.r-project.org/web/packages/gtsummary/index.html.