Introduction to R and RStudio

What is R and RStudio?

The term R is referring to both the programming language and the free software environment for data analysis and graphics. R is widely used for data science, statistical and mathematical computing, data visualization, algorithm and application development, report writing, etc. RStudio is currently a very popular way to not only write your scripts but also to interact with the R software.

Why R?

R does not involve lots of pointing and clicking & is great for reproducibility

The learning curve for R is steeper than with other software, ex., STATA, and SPSS. But the operations do not rely on remembering a succession of pointing and clicking, but instead on a series of written commands, which is good. So, you don’t have to click the button all over again if you collect more data; you just need to run your R script again. Working with scripts can make your analysis clear and traceable and make it easy for collaboration. Moreover, working with scripts forces you to understand what is going on and facilitates your learning and comprehension of the methods.

R is evolved and interdisciplinary

R has the most comprehensive statistical analysis packages (over 12000 packages) for simple and advanced data analysis in CRAN. R is tightly connected to statisticians and academics, and many new developments in statistics first appear as R packages among all the programming languages. For example, R has packages for image analysis, GIS, time series, population genetics, and a lot more.

R works on data of all shapes and sizes

The skills you learn with R scale with the size of the dataset. R can handle both hundreds of millions of secondary data and a small group sample of experimental data neatly. Moreover, R is designed for data analysis, and it comes with special data structures and data types that make coping with missing data and statistical factors convenient.

R is cross-platform compatible

R can be run on Windows, Mac OS, and Linux. It can also import data from Microsoft Excel, Microsoft Access, MySQL, SQLite, Oracle, and other programs. Many commercial companies also developed API (Application Programming Interface) to interact with R and provide free data sources.

R has a vibrant and global community

There is a large and robust group of R users and many of them are willing to help you through mailing lists and websites such as Stack Overflow, or on the RStudio community.

R can open a door to data science for non-computer scientists

Many people interested in learning data science are not computer scientists. They are non-software engineers (e.g., mechanical, chemical, biological), and other technical-to-business converts. Many academic researchers don’t need to develop applications and would rather want a tool for quantitative analysis. R exactly fits in the niche for these people.

Basics of R

R is a versatile, open-source programming/scripting language and software under GPL (General Public License). R was originally a statistical computing language and now has been developed to a general-purpose programming language with over 7,000 user-contributed packages widely used in academia and industry. For people who have experience in programming, R is both an object-oriented and a functional language. You can use R as it is but combining it with the RStudio interface will help the organization and bring more benefits and options.

Basics of RStudio

First, you will need to install RStudio on your local computer following this instruction (covered Windows, Mac OS, and Linux) created by digital fellows. Then, let’s start RStudio and learn about our tool. As shown in the picture, there are four windows in the RStudio interface:

  1. Top left: script/editor window. You can write, edit, and save your R commands in the R script window. Moreover, you can have a complete record of what you did, and you can easily share with others.
  2. Top right: environment/workspace/history window. Here you can see all the objects/variables you created in the working environment and view and edit values by clicking on them. You can also search for the history input and manage R connections with external resources, ex., cloud databases.
  3. Bottom left: console window. This window is where you tell R what to do and it will show the results of a command. You can type commands direct after the > prompt. Commands written in this window will be lost after you close the RStudio session.
  4. Bottom right: file/plots/package/help window. The files tab shows the current working directory’s file and folder category. The plots tab shows you resulting graphs/figures that you execute. The packages tab will list all the packages that are installed. If the package is loaded, a checkmark will appear in the left box. The help tab shows you the documentation for the R functions, datasets, and packages which is helpful for when you come across code that you aren’t sure about.
Figure 1. RStudio Interface

Let’s Get Started

Let’s first create a R project and a R script file for writing R code. Here is a recommended workflow to start a new project with everything organized well:

  1. Under the file menu, click on New project, choose New directory, then New/Empty project.
  2. Enter a name for this new folder and choose a convenient location for it. This will be your working directory for this specific project (e.g., ~/intro-r).
  3. Confirm that the folder named in the Create project as a sub-directory of box is where you want the working directory created. Use the Browse button to navigate folders if changes are needed. Finally, click on Create Project.
  4. In your console (window on the bottom left), please check your current working directory ~/intro-r. If not, navigate there and create a new folder named data in your newly created working directory. (e.g., ~/intro-r/data).
  5. Create a new R script (File > New File > R script) and save it in your working directory (e.g., intro-r-script.R).

Your working directory should look like the picture below and this is where you can begin to write your R code and build your R project.

Figure 2. R Studio File Directory

Interacting with R

You can interact with R using the console or the script files (plain text files containing your code). If R is ready to accept commands, the R console shows a > prompt. If it receives a command, R will try to execute it, and when ready, show the results and come back with a new > prompt to wait for new commands.

If R is still waiting for you to complete the command, the console will show a + prompt. The common reasons could be that you have not ‘closed’ a parenthesis or quotation. When this happens, click inside the console window and press Esc or Ctrl-C and it should help you out.

Figure 3. Unclosed Parentheses

It is worth mentioning that the text/script editor is one of the highlights and benefits of using RStudio. It contains many handy features that make coding easier.

  1. To open a new R script, click a paper icon with a green plus sign in the top left corner of your RStudio window. A new blank script should appear in the top left pane and it has a .R extension.
  2. To run the command, you put down in the script, place your cursor in the line you want to execute and press Ctrl-Enter (Cmd-Enter if you have Mac). You should now see the output in your R console. If you want to run more than one line of code, you can highlight the code you want to run and press the same shortcut (Ctrl-Enter).
  3. To save the script, go to File -> Save or File -> Save As. RStudio should save it to your working directory by default, meaning saving it wherever the .RProj is located.
  4. Next time when you want to open your R project (R script) again, there are two avenues:
    1. Double click your .RProj file (R script). This will open your R project (R script) in a new window, and everything will be there.
    2. Open your RStudio. RStudio may automatically open your most recent project (script). In that case, you don’t have to do anything. Otherwise, go to File -> Open Project and navigate to your .RProj file (R script). Open this and you’ll be good to go!

RUG (R User Group)

If you have any questions regarding R and RStudio or just want to simply get started with it, do not hesitate to join us at the RUG (R User Group) meetings or to schedule a consultation with the GC Digital Fellows.

Note: the content in this post is adapted from the Intro to R & RStudio Workshop, created by the GCDI digital fellows.