Chapter 4 Spatial Analysis

Now let’s do some analysis with the data we’ve acquired already:

  • Sites point data sites_sf
  • Utah interstates interstate_sf_proj

We also have some data that I’ve included in the exercises portion of the “worksheet”: a different elevation + snow raster stack (this one is in the NW corner of Utah),

elev_snow_stk
## class       : SpatRaster 
## dimensions  : 240, 240, 3  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -112, -110, 40, 42  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev_snow_nw_stack.tif 
## names       : elevation,  swe, snow_depth 
## min values  :  1280.021,    0,          0 
## max values  :  4087.231, 1108,       3196
plot(elev_snow_stk)

a set of plots as a point feature,

head(plots_sf)
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -111.6469 ymin: 40.40336 xmax: -110.4603 ymax: 41.81642
## Geodetic CRS:  WGS 84
##   Plots                   geometry
## 1     A POINT (-111.5881 40.53102)
## 2     B POINT (-111.6469 40.74425)
## 3     C  POINT (-110.626 41.14571)
## 4     D POINT (-111.2318 41.81642)
## 5     E POINT (-110.4603 40.40336)
## 6     F POINT (-111.0046 41.79678)

and a polygon feature of land management boundaries in Utah

ownership_sf <- st_read("Data/Exercises/UT_land_ownership", quiet = T) %>%
  st_transform(crs = 26912)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 388066 ymin: 4410117 xmax: 403492.2 ymax: 4414856
## Projected CRS: NAD83 / UTM zone 12N
##   OBJECTID   OWNER AGENCY ADMIN          DESIG                       geometry
## 1  2974453 Federal    BLM   BLM Bankhead Jones MULTIPOLYGON (((388854 4411...
## 2  2974454 Federal    BLM   BLM Bankhead Jones MULTIPOLYGON (((394528.1 44...
## 3  2974455 Federal    BLM   BLM Bankhead Jones MULTIPOLYGON (((388473.6 44...
## 4  2974456 Federal    BLM   BLM Bankhead Jones MULTIPOLYGON (((399793.3 44...
## 5  2974457 Federal    BLM   BLM Bankhead Jones MULTIPOLYGON (((389300.1 44...
## 6  2974458 Federal    BLM   BLM Bankhead Jones MULTIPOLYGON (((403492.2 44...

Let’s plot one of the rasters with our sites point vector and Utah highways line vector. To plot just one raster layer in a stack we can either index it with double brackets or with the name:

# these are different ways to get the same raster layer
elev_snow_stk[[1]]
## class       : SpatRaster 
## dimensions  : 240, 240, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -112, -110, 40, 42  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev_snow_nw_stack.tif 
## name        : elevation 
## min value   :  1280.021 
## max value   :  4087.231
elev_snow_stk$elevation
## class       : SpatRaster 
## dimensions  : 240, 240, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -112, -110, 40, 42  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev_snow_nw_stack.tif 
## name        : elevation 
## min value   :  1280.021 
## max value   :  4087.231
## Reading layer `utah_interstate' from data source 
##   `C:\Users\RonanHart\Documents\Projects\R_Spatial_Visualization_Workshop\Data\Examples\utah_interstate' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1848 features and 9 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -114.0437 ymin: 37.00002 xmax: -109.0513 ymax: 42.00117
## Geodetic CRS:  WGS 84
plot(elev_snow_stk$elevation)
plot(st_geometry(interstate_sf), lwd = 2, add = TRUE) # add = TRUE will add other elements to the plot without erasing previous elements and creating a new plot 
plot(st_geometry(sites_sf), pch = 16, add = TRUE) 
plot(st_geometry(plots_sf), pch = 3, add = TRUE)

(This can also be done with ggplot using as.data.frame)

elev_snow_stk$elevation %>%
  as.data.frame(xy = T) %>%
  ggplot() +
  geom_raster(aes(x = x, y = y, fill = elevation)) +
  scale_fill_gradientn(colors = rev(terrain.colors(10))) +
  geom_sf(data = interstate_sf) +
  geom_sf(data = sites_sf) +
  geom_sf(data = plots_sf, shape = 3) +
  coord_sf(datum = 4326) +
  xlim(ext(elev_snow_stk)[c(1,2)]) +
  ylim(ext(elev_snow_stk)[c(3,4)])

Let’s start on some analysis and computations that we can run on these data.

4.1 Selecting Attributes

Perhaps you have vector data and you want to select only certain attributes or attributes that reach a focal threshold. To do so we need to set up a logical statement, and we can do this in base R or in tidyverse.

Let’s say we want to select boundaries that are operated by BLM. In the shapefile of management boundaries, this information is located in the column “AGENCY”

unique(ownership_sf$AGENCY)
##  [1] "BLM"     "BR"      "DOD"     "DOE"     "NPS"     "Private" "USFS"   
##  [8] "USFWS"   "DNR"     "OS"      "SITLA"   "UDOT"    "Tribal"

In base R we can use the function which and in tidyverse we can use the function filter

# base R
blm_boundary <- ownership_sf[ownership_sf$AGENCY == "BLM", ]
# alternatively you can use the base R function subset()
blm_boundary <- subset(ownership_sf, ownership_sf$AGENCY == "BLM")

# tidyverse
blm_boundary <- ownership_sf %>% 
  filter(AGENCY == "BLM")

ggplot() +
  geom_sf(data = ownership_sf, col = "grey", size = 0.1) +
  geom_sf(data = blm_boundary, fill = "red", col = "grey30",
          alpha = 0.8, size = 0.1)

Using these functions, you can set up any logical statement using ==, %in%, >, >=, <, <=, or ! and select for the specific attributes you need.

4.2 Select features by location

Let’s make select the management boundaries based on if they are intersected by a major highway. For sf we’ll use the function st_intersect

ownership_roads <- st_intersects(interstate_sf_proj, ownership_sf) # the first argument is the target shape and the second argument the shape we're selecting from
class(ownership_roads)
## [1] "sgbp" "list"

The output is an sgbp object, or “Sparse Geometry Binary Predicate”. Basically it returns a list of vectors of integers, which refer to the indices of each polygon that intersects.

dim(ownership_roads)
## [1]  1848 15197
nrow(interstate_sf_proj)
## [1] 1848
nrow(ownership_sf)
## [1] 15197

So the dimensions of this list are the the number of rows in the target shape (the highways) and the number of rows in the intersecting shape (the management boundaries).

Lets look at the first five elements of this list:

ownership_roads[1:5]
## [[1]]
## [1] 4687
## 
## [[2]]
## [1] 6408
## 
## [[3]]
## [1] 12997
## 
## [[4]]
## [1] 6406
## 
## [[5]]
## [1] 4687

This means that the 1st row of the road polyline intersects with the 6459th row of the management polygon, the 2nd row of the road polyline intersects with the 6455th row of the management polygon, and etc.

If we wanted to know the specifc index of a specific road that intersected with a management boundary, it would be useful to keep all of these indices seperate. Since we just want to know which boundaries intersect a road, we can collapse this whole list together.

ownership_roads_index <- unique(unlist(ownership_roads)) # just pull the unique indices
ownership_roads_intersect <- ownership_sf[ownership_roads_index, ] 

ggplot() +
  geom_sf(data = ownership_sf, col = "grey", size = 0.1) +
  geom_sf(data = ownership_roads_intersect, fill = "red", col = "grey30",
          alpha = 0.8, size = 0.1) +
  geom_sf(data = interstate_sf_proj, col = "black", size = 1)

If you look at the help file for ?st_intersects, you’ll see there are a lot of different functions that select features based on another feature.

4.3 Joining Attributes

Let’s load in a table of some data collected at each plot

##   Plots      Species       Date AboveGroundBiomass MeanHeight PercentCover
## 1     A R. maritimus 2021-05-01          20.597457   1.930570     82.09463
## 2     A    B. cernua 2021-05-01          49.769924   2.410401     78.93562
## 3     A    S. acutus 2021-05-01          93.470523   3.342334     47.76196
## 4     B R. maritimus 2021-05-02           9.946616   1.695365     20.26923
## 5     B    B. cernua 2021-05-02          91.287592   4.460992     23.96294
## 6     B    S. acutus 2021-05-02          25.801678   2.173297     79.73088

Let’s join this table to the Plots feature so we could do some spatial analysis and mapping of the collected data. To join two tables together, we need to input the two tables and the name of the column that exists in both tables (so the join function knows how to match attributes together). In this case, that would be the Plots column.

head(plots_sf$Plots)
## [1] "A" "B" "C" "D" "E" "F"
head(plot_data$Plots)
## [1] "A" "A" "A" "B" "B" "B"

We can use the tidyverse’s join functions. (If you don’t know how joins work, I would recommend looking at the help file by typing ?left_join in the console)

plots_join <- left_join(plots_sf, plot_data, by = "Plots")
head(plots_join)
## Simple feature collection with 6 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -111.5881 ymin: 40.53102 xmax: -111.5881 ymax: 40.53102
## Geodetic CRS:  WGS 84
##   Plots      Species       Date AboveGroundBiomass MeanHeight PercentCover
## 1     A R. maritimus 2021-05-01           20.59746 1.93057046     82.09463
## 2     A    B. cernua 2021-05-01           49.76992 2.41040058     78.93562
## 3     A    S. acutus 2021-05-01           93.47052 3.34233369     47.76196
## 4     A R. maritimus 2021-05-08           17.65568 0.06695167     64.70602
## 5     A    B. cernua 2021-05-08           71.76185 2.99782913      2.33312
## 6     A    S. acutus 2021-05-08           21.21425 3.97119930     86.12095
##                     geometry
## 1 POINT (-111.5881 40.53102)
## 2 POINT (-111.5881 40.53102)
## 3 POINT (-111.5881 40.53102)
## 4 POINT (-111.5881 40.53102)
## 5 POINT (-111.5881 40.53102)
## 6 POINT (-111.5881 40.53102)

Great! At this point you could then do some spatial analysis based on location, or make a map based on average biomass, for example. However, that’s outside the scope of this workshop.

Joining two tables together is a valuable tool to know, not just for GIS but for any data management.

4.4 Extract Raster Values

What if we need to get data from our rasters at our specific site locations? We can use the function extract().

Let’s load a landcover raster so we can classify the habitat types of our sites

landcover <- rast("Data/Examples/landcover.tif")
landcover
## class       : SpatRaster 
## dimensions  : 18675, 14838, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 229319.6, 674459.6, 4094414, 4654664  (xmin, xmax, ymin, ymax)
## coord. ref. : NAD83 / UTM zone 12N (EPSG:26912) 
## source      : landcover.tif 
## name        : landcover 
## min value   :       137 
## max value   :       584
plot(landcover)
plot(st_geometry(sites_sf_proj), pch = 16, add = T)

extract returns a vector whose indices match the indices of the spatial object. We could leave it as a vector, or we could automatically attach it to the dataframe using $

sites_sf_proj$land_value <- terra::extract(landcover, sites_sf_proj)$landcover

(Note that I put terra:: in front of extract(), that’s because there are multiple packages that have a function called extract(), so we want to specify to R which pacakage we want)

sites_sf_proj
## Simple feature collection with 15 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 257254.9 ymin: 4106995 xmax: 634770.7 ymax: 4549014
## Projected CRS: NAD83 / UTM zone 12N
## First 10 features:
##    Site                 geometry land_value
## 1     1   POINT (506648 4311112)        489
## 2     2 POINT (364889.8 4474182)        148
## 3     3 POINT (573146.4 4492646)        155
## 4     4 POINT (303202.3 4546515)        458
## 5     5 POINT (512358.1 4514704)        549
## 6     6 POINT (409973.3 4547926)        557
## 7     7 POINT (388759.4 4549014)        579
## 8     8 POINT (634770.7 4350035)        316
## 9     9 POINT (621249.6 4185280)        158
## 10   10 POINT (524421.3 4431743)        156

Ok, but what do these numbers mean? Our landcover raster is a categorical raster, so these numbers aren’t actually real numbers but represent a habitat type. Fortunately we have a dataframe indicating what these numbers mean.

land_info <- read.csv("Data/Examples/landcover_info.csv")
head(land_info)[,1:5]
##   Value ClassCode         ClassName SubClassCode               SubClassName
## 1     1         1 Forest & Woodland          1.A Tropical Forest & Woodland
## 2     2         1 Forest & Woodland          1.A Tropical Forest & Woodland
## 3     3         1 Forest & Woodland          1.A Tropical Forest & Woodland
## 4     4         1 Forest & Woodland          1.A Tropical Forest & Woodland
## 5     5         1 Forest & Woodland          1.A Tropical Forest & Woodland
## 6     6         1 Forest & Woodland          1.A Tropical Forest & Woodland

The column “Value” corresponds to the cell value we extracted from the raster. We can use what we learned earlier how to join two tables together. In our previous example, the columns we used to join the tables together were named the same. In this case, they’re not: one is called “land_value” and one is called “Value”. We could rename them so that they match. But left_join() has a way of handling and matching columns, even if they’re not named the same. If we check ?left_join, in the “Arguments” section for by, it says “to join by different variables on x and y, use a named vector. For example, by = c("a" = "b") will match x$a to y$b.” So we just need to set by = c("land_value" = "Value")

sites_sf_land <- sites_sf_proj %>%
  left_join(land_info, c("land_value" = "Value")) 
head(sites_sf_land)[,1:6]
## Simple feature collection with 6 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 303202.3 ymin: 4311112 xmax: 573146.4 ymax: 4547926
## Projected CRS: NAD83 / UTM zone 12N
##   Site land_value ClassCode                                       ClassName
## 1    1        489         3                            Desert & Semi-Desert
## 2    2        148         1                               Forest & Woodland
## 3    3        155         1                               Forest & Woodland
## 4    4        458         2                         Shrub & Herb Vegetation
## 5    5        549         4 Polar & High Montane Scrub, Grassland & Barrens
## 6    6        557         7             Agricultural & Developed Vegetation
##   SubClassCode                         SubClassName                 geometry
## 1          3.B   Cool Semi-Desert Scrub & Grassland   POINT (506648 4311112)
## 2          1.B Temperate & Boreal Forest & Woodland POINT (364889.8 4474182)
## 3          1.B Temperate & Boreal Forest & Woodland POINT (573146.4 4492646)
## 4          2.C                 Shrub & Herb Wetland POINT (303202.3 4546515)
## 5          4.B     Temperate Alpine to Polar Tundra POINT (512358.1 4514704)
## 6          7.B   Herbaceous Agricultural Vegetation POINT (409973.3 4547926)

Awesome! Now we know what habitat each of our sites reside in.

4.5 Distance

Let’s say we needed to know how far from a major road each of our sites are. We’ll use the function st_distance for our sf objects. We simply need to input the focal feature (the sites) and the feature

dist <- st_distance(sites_sf_proj, interstate_sf_proj)
dim(dist)
## [1]   15 1848

What did this do? Why are there so many columns? Remember that our Utah highways feature is a polyline, meaning it’s a line of lines. If we look at the dimensions of the highways feature:

nrow(interstate_sf_proj)
## [1] 1848

There are 1849 lines (i.e. roads) that make up this whole feature. So st_distance found the distance for each site (the number of rows) for every road (the number of columns). This could be useful information, but presumably we want just the distance of the closest road. Let’s first use st_nearest_feature to find the closest road to each of our sites.

near_road <- st_nearest_feature(sites_sf_proj, interstate_sf_proj)
near_road
##  [1] 1101 1177 1384  799 1473  630  232  702  557  521  529   99 1430   15  557
interstate_sf_proj[near_road,]
## Simple feature collection with 15 features and 9 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 282642.9 ymin: 4137639 xmax: 661451 ymax: 4555607
## Projected CRS: NAD83 / UTM zone 12N
## First 10 features:
##      OBJECTID    FULLNAME    NAME POSTTYPE SPEED_LMT DOT_HWYNAM DOT_SRFTYP
## 1101   234986 I-70 WB FWY I-70 WB      FWY        75       I-70          P
## 1177   248283 I-80 EB FWY I-80 EB      FWY        75       I-80          P
## 1384   300050 I-80 EB FWY I-80 EB      FWY        75       I-80          P
## 799    163766 I-80 WB FWY I-80 WB      FWY        75       I-80          P
## 1473   320716 I-80 EB FWY I-80 EB      FWY        65       I-80          P
## 630    130269 I-15 SB FWY I-15 SB      FWY        70       I-15          P
## 232     45243 I-15 SB FWY I-15 SB      FWY        70       I-15          P
## 702    143420 I-70 WB FWY I-70 WB      FWY        75       I-70          P
## 557    117401 I-70 EB FWY I-70 EB      FWY        75       I-70          P
## 521    108444 I-15 NB FWY I-15 NB      FWY        65       I-15          P
##      DOT_AADT                  UNIQUE_ID                       geometry
## 1101       NA  12SWJ07420079_I-70_WB_FWY LINESTRING (506862.7 430072...
## 1177    12000  12TUL77660482_I-80 EB_FWY LINESTRING (377489.8 450489...
## 1384    16000  12TVL86655323_I-80 EB_FWY LINESTRING (484274.7 455222...
## 799        NA  12TTL95291161_I-80 WB_FWY LINESTRING (282642.9 451236...
## 1473    15000  12TVL67382316_I-80 EB_FWY LINESTRING (465971.3 451778...
## 630        NA 12TVL499657376_I-15 SB_FWY LINESTRING (415374.9 455118...
## 232        NA  12TVL13925335_I-15 SB_FWY LINESTRING (414453.3 455253...
## 702        NA  12SXJ59523424_I-70 WB_FWY LINESTRING (657315.1 433125...
## 557     10000  12SXJ29161112_I-70 EB_FWY LINESTRING (620558.6 431124...
## 521    104000  12TVK44904321_I-15_NB_FWY LINESTRING (444841.6 444243...

Now we have the indices of the roads that are closest to our sites, so we can find the distance of just these roads.

dist_near_road <- st_distance(sites_sf_proj, interstate_sf_proj[near_road,])
dist_near_road[, 1:5]
## Units: [m]
##            [,1]      [,2]      [,3]      [,4]      [,5]
##  [1,]  10315.51 232582.47 242147.42 281846.56 210641.52
##  [2,] 224150.99  33194.84 142629.52  67810.07 110086.22
##  [3,] 202785.29 195679.41 105586.36 265837.33 109558.50
##  [4,] 319202.58  85153.11 181162.34  35259.29 164053.11
##  [5,] 213951.35 134881.34  46277.52 204448.52  45475.54
##  [6,] 265510.76  53802.73  74425.61 108507.81  59486.92
##  [7,] 274946.67  45537.59  95569.24  89311.68  80131.79
##  [8,] 136172.75 299925.04 251423.12 364314.12 237979.89
##  [9,] 161631.44 401648.70 391502.51 451943.09 366979.12
## [10,] 132044.13 163762.97 126873.05 230529.64 104020.44
## [11,]  34586.77 269676.04 268322.22 321111.19 239842.61
## [12,] 250038.01 249684.87 349826.95 227716.46 311863.50
## [13,] 244382.54 398276.49 462818.09 407078.61 424773.11
## [14,] 193621.33 184870.40 278450.62 180463.52 240048.54
## [15,] 138475.56 365526.65 342070.17 421508.03 321034.17
dim(dist_near_road)
## [1] 15 15

This is still giving us the distance of each road to each site but we just want the distance between each site and it’s nearest road. There’s an argument in st_distance() called by_element that tells st_distance() to only find the distance between the first elements of the two objects.

dist_near_road <- st_distance(sites_sf_proj, interstate_sf_proj[near_road,],
                              by_element = TRUE)
dist_near_road
## Units: [m]
##  [1]  10315.506  33194.842 105586.359  35259.292  45475.541   6297.237
##  [7]  25526.749  29201.908 125931.670  80267.682  16154.905 102959.472
## [13]  65466.728  62646.964  65802.131

There we go! Now we have the distance between each site and it’s nearest road. This output is a vector, so we can add it to our sites data frame with $ (or mutate() in tidyverse)

sites_sf_proj$dist_near_road <- dist_near_road
head(sites_sf_proj)
## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 303202.3 ymin: 4311112 xmax: 573146.4 ymax: 4547926
## Projected CRS: NAD83 / UTM zone 12N
##   Site                 geometry land_value dist_near_road
## 1    1   POINT (506648 4311112)        489  10315.506 [m]
## 2    2 POINT (364889.8 4474182)        148  33194.842 [m]
## 3    3 POINT (573146.4 4492646)        155 105586.359 [m]
## 4    4 POINT (303202.3 4546515)        458  35259.292 [m]
## 5    5 POINT (512358.1 4514704)        549  45475.541 [m]
## 6    6 POINT (409973.3 4547926)        557   6297.237 [m]
sites_sf_proj <- sites_sf_proj %>%
  mutate(dist_near_road = dist_near_road)

(Note that if you look at the help file i.e. ?st_distance, there are other functions to calculate geometric measurements for sf objects: st_area and st_length)

4.6 Calculate Terrain Characteristics

From a DEM (digital elevation model) we can obtain a lot of other rasters that are likely useful in GIS research. The elevation raster we’ve been working with is a DEM. From a DEM we can derive other terrain characteristics :

  • Slope: Measurement of “steepness”
  • Aspect: Measurements of “Northness” and “Eastness”
  • Flow direction of water: the direction of the greatest drop in elevation
  • Terrain Ruggedness Index (TRI): the mean of the absolute differences between the value of a cell and the value of its 8 surrounding cells
  • Topographic Position Index (TPI): the difference between the value of a cell and the mean value of its 8 surrounding cells
  • Roughness: the difference between the maximum and the minimum value of a cell and its 8 surrounding cells

These definitions came from the help file for the function we can use to derive these characteristics: terrain().

slope <- terrain(elev_snow_stk$elevation, v = "slope", unit = "radians")
aspect <- terrain(elev_snow_stk$elevation, v = "aspect", unit = "radians")
roughness <- terrain(elev_snow_stk$elevation, v = "roughness")
terrain_stk <- c(elev_snow_stk$elevation, slope, aspect, roughness)
terrain_stk
## class       : SpatRaster 
## dimensions  : 240, 240, 4  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -112, -110, 40, 42  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## sources     : elev_snow_nw_stack.tif  
##               memory  
##               memory  
##               memory  
## names       : elevation,     slope,       aspect, roughness 
## min values  :  1280.021, 0.0000000, 3.786645e-05,     0.000 
## max values  :  4087.231, 0.5922148, 6.283185e+00,  1305.816

To compute the Northness or Eastness of a cell, we actually have to do one more step to the aspect raster. Aspect is a circular measurement (which is why its units are in degrees or radians), so (if you remember how trigonometry works) to calculate northness and eastness we need to use cosine and sine respectively. Because our units are in radians, we can simply apply the cos() and sin() functions directly to the aspect raster.

aspect_cos <- cos(terrain_stk$aspect)
aspect_sin <- sin(terrain_stk$aspect)
aspect_stk <- c(aspect_cos, aspect_sin)
names(aspect_stk) <- c("cosine_northness", "sine_eastness")
aspect_stk
## class       : SpatRaster 
## dimensions  : 240, 240, 2  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -112, -110, 40, 42  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source(s)   : memory
## names       : cosine_northness, sine_eastness 
## min values  :               -1,            -1 
## max values  :                1,             1
plot(aspect_stk)

4.7 Re-Classify Rasters

Sometimes you may need to re-classify a raster. This means that you are assigning or changing raster cell values to new values. You may need to do this to

  • replace values with new information
  • group or bin specific values together
  • set values to NA or set NA cells to a value

To re-classify a raster in R you use the aptly named function reclassify(). The arguments for reclassify are x (the Raster* object) and rcl (a matrix for reclassification)

Let’s practice. Let’s say we want to simplify our landcover raster and classify the cells into more general habitat classifications. Fortunately, the landcover data frame already has this information

unique(land_info[, c("ClassCode", "ClassName")])
##     ClassCode                                       ClassName
## 1           1                               Forest & Woodland
## 116         2                         Shrub & Herb Vegetation
## 299         3                            Desert & Semi-Desert
## 489         4 Polar & High Montane Scrub, Grassland & Barrens
## 498         5                              Aquatic Vegetation
## 503         6                            Open Rock Vegetation
## 541         8   Nonvascular & Sparse Vascular Rock Vegetation
## 544         7             Agricultural & Developed Vegetation
## 547         9            Introduced & Semi Natural Vegetation
## 553        10                  Recently Disturbed or Modified
## 565        11                                      Open Water
## 568         8                     Developed & Other Human Use
summary(land_info[, c("Value", "ClassCode")])
##      Value         ClassCode     
##  Min.   :  1.0   Min.   : 1.000  
##  1st Qu.:144.8   1st Qu.: 1.000  
##  Median :293.5   Median : 2.000  
##  Mean   :292.3   Mean   : 2.257  
##  3rd Qu.:440.2   3rd Qu.: 2.000  
##  Max.   :584.0   Max.   :11.000

The “Value” column (the data representing the cell values of the landcover raster) has values from 1 to 584 to represent specific habitats within “Forest & Woodland” or “Desert & Semi-Desert” for example, while the “ClassCode” column only has values from 1 to 11 to represent these general habitat types. Much more simple.

So let’s take these two columns and make a matrix for the rcl argument in reclassify.

rclLand <- land_info[, c("Value", "ClassCode")] 
rclLand <- as.matrix(rclLand)
head(rclLand)
##      Value ClassCode
## [1,]     1         1
## [2,]     2         1
## [3,]     3         1
## [4,]     4         1
## [5,]     5         1
## [6,]     6         1

Now we just need to input this matrix and the landcover raster into the function.

landcover_rcl <- classify(landcover, rclLand)
names(landcover_rcl) <- "landcover_rcl"
landcover_rcl
## class       : SpatRaster 
## dimensions  : 18675, 14838, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 229319.6, 674459.6, 4094414, 4654664  (xmin, xmax, ymin, ymax)
## coord. ref. : NAD83 / UTM zone 12N (EPSG:26912) 
## source      : landcover_rcl.tif 
## name        : landcover_rcl 
## min value   :             1 
## max value   :            11
plot(c(landcover, landcover_rcl)) # plot them together to compare

Great!

4.8 Raster Cell Stats

In my research I often have to perform cell algebra or focal statistics. Maybe you need to know the average elevation or the total herbaceous biomass within a certain area. The way to get these values are with the function cellStats. We simply need to input the raster and the stat function: sum, mean, min, max, sd, or a homemade function. Let’s say we need to calculate some stats of the elevation, SWE, and snow depth within a 5-km area around our sites.

plot(elev_snow_stk$elevation)
plot(st_geometry(sites_sf), pch = 16, add = T)

We need to know which of our sites are within the extent of the raster. There are probably many ways of doing this. The most straightforward way I can think of getting just the sites that fall within the raster are 1) extracting raster data at each site and 2) filtering to just the sites that have raster data attached.

sites_rast <- terra::extract(elev_snow_stk, sites_sf)
sites_rast # notice how only 3 rows (sites) have data
##    ID elevation swe snow_depth
## 1   1        NA  NA         NA
## 2   2        NA  NA         NA
## 3   3  2585.951 127        542
## 4   4        NA  NA         NA
## 5   5  3218.554 502       1640
## 6   6        NA  NA         NA
## 7   7        NA  NA         NA
## 8   8        NA  NA         NA
## 9   9        NA  NA         NA
## 10 10  2452.774 109        521
## 11 11        NA  NA         NA
## 12 12        NA  NA         NA
## 13 13        NA  NA         NA
## 14 14        NA  NA         NA
## 15 15        NA  NA         NA
sites_sf <- left_join(sites_sf, as.data.frame(sites_rast), by = join_by("Site" == "ID")) 
sites_sf
## Simple feature collection with 15 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -113.7909 ymin: 37.09849 xmax: -109.4373 ymax: 41.08479
## Geodetic CRS:  WGS 84
## First 10 features:
##    Site elevation swe snow_depth                   geometry
## 1     1        NA  NA         NA POINT (-110.9233 38.94893)
## 2     2        NA  NA         NA  POINT (-112.5923 40.4073)
## 3     3  2585.951 127        542 POINT (-110.1357 40.58137)
## 4     4        NA  NA         NA POINT (-113.3415 41.04609)
## 5     5  3218.554 502       1640 POINT (-110.8535 40.78323)
## 6     6        NA  NA         NA POINT (-112.0717 41.07762)
## 7     7        NA  NA         NA POINT (-112.3244 41.08479)
## 8     8        NA  NA         NA POINT (-109.4373 39.28921)
## 9     9        NA  NA         NA POINT (-109.6226 37.80686)
## 10   10  2452.774 109        521 POINT (-110.7138 40.03556)
sites_sf_rast <- sites_sf %>%
  filter(!is.na(elevation) & !is.na(swe) & !is.na(snow_depth))
sites_sf_rast
## Simple feature collection with 3 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -110.8535 ymin: 40.03556 xmax: -110.1357 ymax: 40.78323
## Geodetic CRS:  WGS 84
##   Site elevation swe snow_depth                   geometry
## 1    3  2585.951 127        542 POINT (-110.1357 40.58137)
## 2    5  3218.554 502       1640 POINT (-110.8535 40.78323)
## 3   10  2452.774 109        521 POINT (-110.7138 40.03556)

Let’s just do this for one of our sites for now and then we can try it with the others.

site_sf_rast_01 <- sites_sf_rast[1,]

First, we need to know which parts of this raster are within 5-km of the site, so we will need to crop the raster. We can do that by first making a buffer with st_buffer()

site_buff_5km <- st_buffer(site_sf_rast_01, 5000) # units are in meters, 1000m = 1km

plot(elev_snow_stk[[1]]) # let's just plot one layer for demonstration
plot(st_geometry(site_buff_5km), lwd = 1, add = T)

Now let’s crop the raster to this buffer’s extent. We’ll use the function crop(), which takes in the raster object we want to crop and an “extent object, or any object from which an Extent object can be extracted” (quoted from the crop help file). Because we can make an extent object from our buffer

ext(site_buff_5km)
## SpatExtent : -110.195356322096, -110.076232859429, 40.5357574438391, 40.626942424945 (xmin, xmax, ymin, ymax)

We can input our buffer object directly into crop()

stack_crop <- crop(elev_snow_stk, site_buff_5km)
plot(stack_crop[[1]])
plot(st_geometry(site_buff_5km), add = T)
plot(st_geometry(site_sf_rast_01), pch = 16, add = T)

Great! Now let’s calculate the mean, median, min/max, and standard deviation of the elevation, SWE, and snow depth within this area.

stack_stats <- data.frame(
  mean_5km = global(stack_crop, fun = "mean"),
  min_5km = global(stack_crop, fun = "min"),
  max_5km = global(stack_crop, fun = "max"),
  sd_5km = global(stack_crop, fun = "sd"))

# let's add a column for the environment type from the raster and remove the row names
stack_stats$environment <- row.names(stack_stats)
row.names(stack_stats) <- NULL
stack_stats
##        mean      min      max        sd environment
## 1 2522.8263 2138.969 3138.245 234.83332   elevation
## 2  140.4935   55.000  282.000  51.01483         swe
## 3  614.5195  284.000 1079.000 175.49777  snow_depth

We have our stats, but we want to join them with our sites data frame to make them more meaningful. We can’t use the function cbind() because the number of rows don’t match up. Instead, we can use dplyr’s function bind_cols(), which works just like cbind() or do.call(cbind, x) except that it’s a bit smarter and can add NAs if the number of rows or columns doesn’t match.

bind_cols(site_sf_rast_01, stack_stats)
## Simple feature collection with 3 features and 9 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -110.1357 ymin: 40.58137 xmax: -110.1357 ymax: 40.58137
## Geodetic CRS:  WGS 84
##   Site elevation swe snow_depth      mean      min      max        sd
## 1    3  2585.951 127        542 2522.8263 2138.969 3138.245 234.83332
## 2    3  2585.951 127        542  140.4935   55.000  282.000  51.01483
## 3    3  2585.951 127        542  614.5195  284.000 1079.000 175.49777
##   environment                   geometry
## 1   elevation POINT (-110.1357 40.58137)
## 2         swe POINT (-110.1357 40.58137)
## 3  snow_depth POINT (-110.1357 40.58137)

Great! But we only did this process for one site when presumabaly we would want these data for all available sites. We could copy and paste for each site, but that can get tedious, not to mention out of control if we had more than 3 sites. So the best way to do this would be with loops.

4.9 A Note About Loops

Learning all these functions is all well and good, but what if you have to perform them all on multiple features or rasters? Copying and pasting code over and over again can soon become confusing and messy and cause your code to be inefficient. The better way to address this is (in my opinion) with loops! for loops and lapply are lifesavers and I use them in all of my code. A future workshop will go into more depth on how to use loops (be on the lookout for that!), so I won’t go over them in too much detail. But I do want to show some ways you can use them for GIS applications.

(These code chunks are for demonstration only, these data and directories don’t actually exist)

# Example 1: Load a set of shapefiles and find the area for each
filenames <- list.files(dir) # get a list of shapefile names in a directory

area_list <- c() # create an empty vector for the areas to live in 

for(i in 1:length(filenames)){
  # load the shapefile
  shp <- st_read(filenames[i])
  
  # calculate the area
  area <- st_area(shp)
  
  # put the area into the vector
  area_list <- c(area_list, area)
}

# -----------------------------------------------------------------------------X

# Example 2: Load a set of shapefiles, generate a buffer for each, and calculate the  #            mean value of a raster within that buffer and the focal feature

filenames <- list.files(dir) # get a list of shapefile names in a directory
r <- raster(raster_filename) # load a raster

lapply(filenames, function(fn){
  # load a shapefile
  shp <- st_read(fn)
  
  # generate a 10kmX10km buffer 
  buffer <- st_buffer(shp, dist = 10000)
  
  # crop the raster to the shape and the buffer
  r_shp <- crop(r, extent(shp))
  r_buffer <- crop(r, extent(buffer))
  
  # calculate the mean value of the raster within the buffer and the feature
  r_shp_mean <- cellStats(r_shp, stat = "mean", na.rm = TRUE)
  r_buff_mean <- cellStats(r_shp, stat = "mean", na.rm = TRUE)
  
  # return both means in a list
  return(list(r_shp_mean, r_buff_mean))
})

# -----------------------------------------------------------------------------X

# Example 3: Generate a raster of the sum from a set of RasterStacks
#            and then save the output raster
filenames <- list.files(dir) # get a list of raster files in a directory
out_names <- paste0(filenames, "_sum")

lapply(1:length(filenames), function(i){
  # load RasterStak
  stk <- stack(filenames[i])
  
  # create a raster that is the sum of all layers in the stack
  sum <- calc(stk, fun = sum)
  sum <- sum(stk) # these two operations are equivalent
  
  writeRaster(sum, out_names[i], format = "GTiff")
})

# -----------------------------------------------------------------------------X

# Example 4: Pull the number of zeros in a set of rasters
filenames <- list.files(dir) # get a list of raster files in a directory
lapply(filenames, function(fn){
  # load raster
  rast <- raster(fn)
  
  # get the number of zeros in the raster
  n_0 <- getValues(rast) == 0 %>%
      which() %>%
      length()
  return(n_0)
})

Even more efficient would be to run these in parallel, but that is way beyond the scope of this workshop


I hope these functions helped you! The next chapter goes over some ways of obtaining the data we worked on today.