Using R and Python
This libguide covers resources for learning and using R and Python.
Data Services Profile
We are here to help you find, use, manage, visualize and share your data. Contact us to schedule a consultation. View and register for upcoming workshops. Visit our website to learn more about our services.
Where to Find R Packages?
- The Comprehensive R Archive Network (CRAN): This is the official repository for R packages. Packages submitted here are screened before published to ensure that they follow CRAN policies.
- Bioconductor: A discipline-specific repository for bioinformatic R packages. Packages submitted here need to follow Bioconductor guidelines.
- rOpenSci: rOpenSci (R for Open Science) is an organization which promote open science and reproducible research via sharing open sources software. Packages uploaded here are peer-reviewed by rOpenSci volunteers.
- GitHub: A software development platform, not designed specific for R packages. Anyone can upload and share any software products here.
- RStudio complies a list of useful R packages here.
Install, Update and Create R Packages
Installing packages
- Installing packages from CRAN
- Installing packages from GitHub
- Installing packages from Bioconductor
- Troubleshooting tips
Update packages using the instruction here.
Create packages: If you are an intermediate or advanced R user, you may want to create your own R packages. Here are some learning resources to teach you how to do it:
- Making Your First R Package by Chun Chan
- R packages: Organize, test and share your code by Hadley Wickham and Jenny Bryan
- Chapter 32 Write your own R package in STAT545 by Jenny Bryan
R Packages for Importing Data
- readr: Import rectangular data from delimited files, such as CSV or TSV. You can learn more about this package in Chapter 11 Data Import in the R for Data Science.
- googledrive and googlesheets4: Use these two packages to interface with your Google drive and work on your Google sheet data.
- readxl: Import Excel file into R.
- haven: Read and write data formats used by various statistical software, such as SAS, SPSS, STATA.
R Packages for Data Cleaning and Wrangling
- What is tidy data?
- Wickham, H. . (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10 and a summary of the paper.
- Tidy Data & Tidy Tools talk by Hadley Wickham
- tidyverse: Tidyverse is a collection of open source R packages focusing on data modelling, transformation, and visualization. On the Tidyverse website, you can find overviews of these packages and their download links.
- tidyr: Clean, transform and reshape your data into a tidy format for further analysis. This is included in the tidyverse package.
- dplyr: An R package within the tidyverse package and is used for data cleaning and manipulation. You can refer to Chapter 5: Data Transformation in the R for Data Science book to learn more about data transformation.
- stringr: An R package specialized in working with string (character, text) data. This is part of the tidyverse package too. You can read Chapter 14: Strings in R for Data Science book to learn more about string data.
- dbplyr: Use dbplyr syntax to connect to a database: dbplyr is a package of R specializing in data manipulation for large external databases. Introduction to dbplyr teaches its reader how to connect external databases to R and how to query the connected database.
- dplython: The Python version of dplyr (not an R package): Conducting data analyses in Python? Dplython introduces the functionality of dplyr to Python’s software library. Find out about the Dplython’s featured functions and installation instructions here.
- DataEditR: If you are used to work on data in Excel or spreadsheet, you may want to check this package out. It allows you to edit, transform, clean and manipulate data in the "Excel" way, but you can export the code that can reproduce everything you have done with your data to increase the reproducibility of your research data.
- lubridate: This package is good for dealing with date and time data. It contains a series of consistent and memorable functions to work with date-times and time-spans in R. Find out about lubridate’s featured functions and installation instructions here.
- janitor: Another data cleaning R package: As a data cleaning package, Janitor modifies data to ensure their consistency and accuracy. Find out about Janitor’s featured functions and installation instructions here.
- pivottabler: If you are using pivot tables in Excel, you may want to check out this package. Here is a short version and long version (http://www.pivottabler.org.uk/articles/) of the introduction to this package.
R Packages for Data Visualization
- Data visualization with base R: You can use base R functions to visualize your data. Below are some resources for base R graphing:
- Chapter 5: Data Visualization in Base in the R Software Handbook by Evaluation, Statistics, and Methodology - University of Tennessee, Knoxville
- Comparing ggplot2 and R Base Graphics by Nathan Yau
- lattice: Another data visualization R package. It is a package of R which let users create graphics with more customization. For a list of featured functions in lattice, click here. You can use this tutorial to learn more.
- ggplot2: Elegant Graphics for Data Analysis is an exhaustive reference work for ggplot 2, a data visualization package in R. The author, Hadley Wickham, starts from the basics of ggplot 2 and moves on to more advanced topics, such as the underlying grammar of this package.
- esquisse: Create ggplot2 chart with drag-and-drop GUI: With esquisse, users can now create graphics in ggplot2 by the drag-and-drop method.
- leaflet for R: Leaflet can be used to create interactive map and this R package can integrate and control Leaflet maps in R.
- Shiny: This R package is for creating interactive graphics. Below are some resources to learn Shiny:
- Learn Shiny by RStudio
- Mastering Shiny: Building interactive Apps, reports & dashboards powered by R by Hadley Wickham
- A list of R Shiny resources
- Examples of R Shiny
R Packages for Conducting Reproducible Research
- RMarkdown: Write reports with R code: RMarkdown creates dynamic reports by embedding code into text files (e.g. Word or PDF). With RMarkdown, you can generate high-quality reports, presentations, and dashboards which can be easily updated.
- Quarto: The next generation of RMarkdown from RStudio. If you are already an RMarkdown user and want to find out why you should use Quarto, you can read the FAQs for RMarkdown users.
- workflowR: Organize and share your research project and data analysis on a website. You can view an example using this package here and watch a tutorial.