The R language is a very versatile tool, which explains why it is used by so many academics (with a growing community). Almost any statistical modeling can be done in the R platform, and several packages for a number of tools have been developed in the past years . It is easy to get lost when you have one simple goal but are faced with a higher number of tools than you can possibly learn during your PhD years. This is why I think little posts like these one might be helpful to guide our tool search process.
Spatial analyses are important in our globalized world and, despite not being originally built for that goal, the R environment has seen the increase of packages devoted to GIS investigation. Other tools, like QGIS, will remain useful for initial data exploration and visualization (and much more for those who dive into it). However, implementing GIS in R allows you to streamline your analyses; i.e., spatial data can be both statistically modeled and plotted on a map, all within the same environment, with no need to export and import data in different formats.
Anyone who has attempted spatial analyses in R must have realized that they are very memory-demanding. This is why one of the biggest findings in my journey through R GIS was the package sf
. In this post, I will briefly introduce what this package is and how to use it to plot simple maps.
Using simple features with the package sf
In GIS, we refer to features as any representation of real world entities. Features have a geometry, i.e., how they should be represented in space, and attributes, which are any other properties and information of the feature. For instance, if the feature is a building in a city, its geometry might be a line delimiting the area of the building in space (a polygon geometry); or it might be the coordinate where the feature is located (a point geometry). The attributes of that feature are any additional information we want to keep about the building (for instance, its name, its owner, how many people live there, how old it is, etc.).
A single file of spatial data can include many features (for instance, a shapefile of the US states may have 50 features, one for each state, each with a different polygon geometry representing the state boundaries, and maybe some attributes with information about each state). The term “simple features” is used to define a simple way of describing spatial features. The package sf
handles spatial data in this simple feature format, and its advantage comes from the fact that spatial data is represented as native R objects. This means that no new type of object had to be created for this package; it only uses the native ways in which R can store data (vectors, matrix, lists, etc.).
An SF object is basically a matrix, where each row is a feature, and columns refer to different attributes. One of these attributes will be the geometry of the feature (i.e., how is that feature visualized in space). This geometry can be of many types, but the most common are points, polygons or a derivation of those (like, multipoints and multipolygons; Fig. 1).
In comparison to other spatial packages in R (like rgdal
), the package sf
stores data in a more compact format, which can save up to 30% of memory space (based on a recent test I did in a simple shapefile). This small difference can easily scale up to significantly reduce processing time when you are dealing with large amounts of data.
Installing and plotting
As any other package in R, the package sf
can be installed with the function install.packages
and loaded with the function library
:
install.packages('sf')
library('sf')
The package comes with several functions that can be used to deal with spatial data (here is a quick cheat sheet). All function are prefixed by st_
, which makes it easy to find them by command-line completion (i.e., you can just type “st_
” and a list of all functions will appear; browse through those with directional arrows and press tab to complete the command with the chosen function).
As an example, we can load a shapefile from the Natural Earth Database to show us the proximity of airports to urban centers in the world:
airports = st_read(‘ne_10m_airports.shp)
cities = st_read(‘ne_10m_urban_areas.shp’)
Since the SF object is basically a matrix, we can use the package ggplot
, a very popular R package to make elegant plants. With this package, we can overlay different layers (i.e., different spatial files, like our airports and cities files that were loaded above). We can also ustomize options of our maps using the function geom_sf
, a ggplot
function made specifically to deal with SF objects. The code below is an example showing how to plot our data of cities and airports locations with a focus on North America. We load the package rnaturalearth
to retrieve a SF object containing a polygon of the boundaries of all countries in the world. We save this SF objecy with the name “world”, and we use it as the main data to be plotted by ggplot (ggplot(data = world)
). Then, we use geom_sf
to set the fill color of these polygons, and to load new data (airports and cities, setting a size for the points and a color for the cities polygons). Finally, we use the function coord_sf
to delimit the boundaries of our map (setting minimum and maximum longitude and latitude).
library("ggplot2")
library("rnaturalearth")
#The line below uses the ne_countries function from the rnaturalearth package to retrieve
#a sf object of the boundaries of countries, in the medium scale, saving it to “world”.
world = ne_countries(scale = "medium", returnclass = "sf")
theme_set(theme_bw()) # Setting a color theme for the plot
ggplot(data = world) + # Main data is the boundaries of countries
geom_sf(fill= "antiquewhite", size = 0.1)+ # Setting color for countries polygons
geom_sf(data = airports, size = 2, color = 'purple')+
geom_sf(data = cities, colour = "black", fill = NA, size = 0.1)+
# In the line below, we use coord_sf to delimit the longitude and latitude minimum and
# maximum limits of our map (focusing on North America).
coord_sf(xlim = c(-127.908761,-60.320873), ylim = c(25.288551,49.270773), expand = FALSE)+
# The line below sets a few cosmetics, like color of lines and background
theme(panel.grid.major = element_line(color = gray(.9), linetype = "dashed", size = 0),
panel.background = element_rect(fill = "aliceblue"))
From the figure above, you can do all sorts of spatial analyses (still within the R environment), like computing the average distance from airports to the nearest urban area, or creating a model to spatially predict number of airports in an area based on the number of nearby urban areas within a radius. The possibilities are endless!
If you are new to R GIS and want to explore the package sf
further, you can check out this page to get started. It contains several tutorials on how to use open-source data to build maps with both packages sf
and ggplot
.