Using R and Python

This libguide covers resources for learning and using R and Python.

Data Services Profile

We are here to help you find, use, manage, visualize and share your data. Contact us to schedule a consultation. View and register for upcoming workshops. Visit our website to learn more about our services.

Where to Find R Packages?

  • The Comprehensive R Archive Network (CRAN): This is the official repository for R packages. Packages submitted here are screened before published to ensure that they follow CRAN policies.​
  • Bioconductor: A discipline-specific repository for bioinformatic R packages. Packages submitted here need to follow Bioconductor guidelines.​
  • rOpenSci: rOpenSci (R for Open Science) is an organization which promote open science and reproducible research via sharing open sources software. Packages uploaded here are peer-reviewed by rOpenSci volunteers. 
  • GitHub: A software development platform, not designed specific for R packages. Anyone can upload and share any software products here. 
  • RStudio complies a list of useful R packages here

Install, Update and Create R Packages

Installing packages

Update packages using the instruction here.

Create packages: If you are an intermediate or advanced R user, you may want to create your own R packages. Here are some learning resources to teach you how to do it:

R Packages for Importing Data

  • readr: Import rectangular data from delimited files, such as CSV or TSV. You can learn more about this package in Chapter 11 Data Import in the R for Data Science
  • googledrive and googlesheets4: Use these two packages to interface with your Google drive and work on your Google sheet data. 
  • readxl: Import Excel file into R. 
  • haven: Read and write data formats used by various statistical software, such as SAS, SPSS, STATA.

R Packages for Data Cleaning and Wrangling

  • What is tidy data?
  • tidyverse: Tidyverse is a collection of open source R packages focusing on data modelling, transformation, and visualization. On the Tidyverse website, you can find overviews of these packages and their download links. 
    • tidyr: Clean, transform and reshape your data into a tidy format for further analysis. This is included in the tidyverse package.
    • dplyr: An R package within the tidyverse package and is used for data cleaning and manipulation. You can refer to Chapter 5: Data Transformation in the R for Data Science book to learn more about data transformation. 
    • stringr: An R package specialized in working with string (character, text) data. This is part of the tidyverse package too. You can read Chapter 14: Strings in R for Data Science book to learn more about string data.
  • dbplyr: Use dbplyr syntax to connect to a database: dbplyr is a package of R specializing in data manipulation for large external databases. Introduction to dbplyr teaches its reader how to connect external databases to R and how to query the connected database. 
  • dplython: The Python version of dplyr (not an R package): Conducting data analyses in Python? Dplython introduces the functionality of dplyr to Python’s software library. Find out about the Dplython’s featured functions and installation instructions here.  
  • DataEditR: If you are used to work on data in Excel or spreadsheet, you may want to check this package out. It allows you to edit, transform, clean and manipulate data in the "Excel" way, but you can export the code that can reproduce everything you have done with your data to increase the reproducibility of your research data.
  • lubridate: This package is good for dealing with date and time data. It contains a series of consistent and memorable functions to work with date-times and time-spans in R. Find out about lubridate’s featured functions and installation instructions here.    
  • janitor: Another data cleaning R package: As a data cleaning package, Janitor modifies data to ensure their consistency and accuracy. Find out about Janitor’s featured functions and installation instructions here
  • pivottabler: If you are using pivot tables in Excel, you may want to check out this package. Here is a short version and long version (http://www.pivottabler.org.uk/articles/) of the introduction to this package. 

R Packages for Data Visualization

R Packages for Conducting Reproducible Research

  • RMarkdown: Write reports with R code: RMarkdown creates dynamic reports by embedding code into text files (e.g. Word or PDF). With RMarkdown, you can generate high-quality reports, presentations, and dashboards which can be easily updated.  
  • Quarto: The next generation of RMarkdown from RStudio. If you are already an RMarkdown user and want to find out why you should use Quarto, you can read the FAQs for RMarkdown users
  • workflowR: Organize and share your research project and data analysis on a website. You can view an example using this package here and watch a tutorial.