Chapter 2 GIS basics
2.1 Datums, Projections, and Coordinate Systems
Datums
The Earth is a spheroid (also called an ellipsoid). Because of variations in elevation across the world, the Earth’s surface is irregular.
A datum (also called a geographic coordinate system) is a reference surface that best fits the mean surface area of an area of interest. There is a global datum to represent the general surface of the Earth as a whole — World Geodetic System of 1984 i.e. WGS84.
However, because the Earth’s surface is irregular and the global datum might not reflect specific areas and variations in elevations, there are also local datums. A common local datum (for North America at least) is the North America Datum of 1983 (NAD83) .
The datum you choose to work with is up to you and where your study takes place. It is very important to know what datum you’re working with and that you remain consistent because coordinates of a location from one datum are likely different than the same location from a different datum. For example, if we look at the coordinates for Bellingham, Washington:
Datum | Longitude | Latitude |
---|---|---|
NAD 1983 | -122.46818353793 | 48.7438798543649 |
WGS 1984 | -122.46818353793 | 48.7438798534299 |
While the differences between NAD83 and WGS84 are not huge, these differences could impact any spatial analysis you perform. Also note that you would need to choose a different local datum if you’re working outside of North America.
Projections
While a datum references the position of an object in geographic space on a 3D surface, a projection (also called a projected coordinate system) represents that 3D surface onto a 2D plane.
This is important to know when plotting a map for a figure, as your chosen projection will change the visualization and shape of your map’s features. But more importantly for spatial analysis, a projection is needed when you need values such as length, area, or distance. Map projections are never 100% accurate, so every 2D map will have show some distortion. Different projections preserve different properties of the world, such as the relative shape of features, area, distance, or angle. For that reason, it’s important to pick a projection that would provide the highest accuracy for your region and the analysis you’re running.
A common projection to use is the Universal Transverse Mercator or UTM.
If your study region is in Utah, for example, you would use UTM Zone 12 N (or UTM 12N).
Note that while you will always have a datum, you do not necessarily need to ALWAYS use a projection. As for anything, it depends on your analysis and your system.
Coordinate Reference System
A coordinate reference system or CRS is simply the combination of the datum (geographic coordinate system) and the projection (projected coordinate system). For example, if you are working with the 1984 World Geodetic System that is projected to UTM Zone 12N, your CRS would be WGS84 UTM12N. If you are working with the 1983 North America Datum that is projected to UTM Zone 14N, your CRS would be NAD83 UTM14N. And so on.
These different combinations of CRS all have their own EPSG code. (These codes were orginally created by the European Petroleum Survey Group, which is where the acronym comes from).
For example, the EPSG code for WGS84 latitude/longitude (i.e. no projection) is 4326, the EPSG code for NAD83 UTM12N is 26912, and so on. These codes can easily be found on the Spatial Reference Website (or google if you forget what the website is).
2.2 Spatial Data
Vectors
Vector data are shapes with a geometry that can represent real world features. These geometries which can be made up of one or more vertices and paths. A vertex describes a position in space with x and y coordinates. A feature with one vertex would be a point, a feature with two or more vertices where the first and last vertices don’t connect would be a polyline, and a feature with at least three vertices and the first and last vertices connect (an enclosed area) would be polygon. Here are some examples of vector data that you might encounter in ecology:
- Points
- animal positional locations
- study site coordinates
- tree locations
- Lines
- roads
- fences
- boundaries
- rivers
- Areas (or polygons)
- bodies of water
- parks
- USFS land
- study plots
- area burned by a fire
Example with random vertices:
Or an example with Utah features:
Vector features have attributes, which can be text or numerical information that describe the features. These attributes are stored in a data frame.
## name state pop lat long capital
## 1 Bountiful UT UT 41622 40.88 -111.87 0
## 2 Layton UT UT 63096 41.08 -111.95 0
## 3 Logan UT UT 45262 41.74 -111.84 0
## 4 Murray UT UT 56848 40.65 -111.89 0
## 5 Ogden UT UT 78572 41.23 -111.97 0
## 6 Orem UT UT 94758 40.30 -111.70 0
## 7 Provo UT UT 105832 40.25 -111.64 0
## 8 Saint George UT UT 63952 37.08 -113.58 0
## 9 Salt Lake City UT UT 177318 40.78 -111.93 1
## 10 Sandy UT UT 89698 40.57 -111.85 0
## 11 Taylorsville UT UT 58200 40.66 -111.94 0
## 12 West Jordan UT UT 105629 40.60 -111.99 0
## 13 West Valley City UT UT 113989 40.69 -112.01 0
Some important information you need from your vector data are:
- the geometry type (if it’s a point, line, or polygon)
- the coordinate reference system
- the bounding box (the min/max points in x and y geographical space)
Rasters
Rasters are data represented by pixels (or cells) where each pixel has its own value. These cell values can be continuous (e.g. elevation, temperature, snow depth) or discrete (e.g. land cover, habitat type, presence/absence).
Raster data can have more than one band (where each band is a single raster). These raster layers can stack together to create a Raster Stack. For example, satellite imagery is a stack of 3 rasters, each containing continuous values indicating levels of Red, Green, and Blue.
## class : SpatRaster
## dimensions : 1679, 2312, 3 (nrow, ncol, nlyr)
## resolution : 0.0008983153, 0.0008983153 (x, y)
## extent : -113.2237, -111.1468, 40.27956, 41.78783 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : landsat_demo.tif
## names : Red, Green, Blue
But these 3 bands come together to make a true-color image.
The important information you need from your raster data are
- the coordinate reference system
- the extent (the min/max points in x and y geographical space)
- the cell resolution (the width and height of each cell)
The cell resolution basically means how “pixel-y” the raster is. A finer resolution (meaning the cell size is smaller) will have more detail than a coarser resolution (meaning the cell size is larger). For example, compare a raster with a pretty fine resolution (in this case 30m X 30m, meaning that each cells is 30-m wide and 30-m high)
Compared to the same raster but with a coarser resolution (in this case 300 m X 300 m)
Wouldn’t we always want to work with finer resolutions? If rasters with finer resolutions have more detail (and thus are more accurate to what’s actually on the landscape) than one with a coarser resolution, why would we ever work with a raster with coarse resolution? I can think of 2 reasons why:
- Sometimes you simply can’t obtain that data in a finer resolution. For example, MODIS offers NDVI rasters every 16 days, but the finest resolution is 250-m.
- The finer the resolution, the more cells there are, and so the time, computation power, and disk space to do any sort of computation or analysis on these cells increases.
As for everything, it depends on your analysis and your system.
2.3 Vectors vs Rasters: pros & cons
Advantages of Vectors
- Because vectors are just vertices and paths (rather than upwards of millions of grid cells), it takes less time to load, save, or perform any computation or analysis on a vector compared to a raster. (They also take up less disk space on your hard drive)
- For the same reason, they can often be more geographically accurate. A vector’s vertex is located at a single lat/long coordinate compared to a raster pixel at the same location but covers 250mX250m.
Disadvantages of Vectors
- It is difficult to store and display continuous data in vectors. (It can be done, but the data typically would need to be binned)
- Vectors are best used to represent features of the landscape, rather than the landscape itself.
Advantages of Rasters
- Rasters are best for satellite and other remotely sensed data. As the point above mentioned, they are great for representing the landscape itself.
- It is relatively easy and intuitive to perform any quantitative analysis with rasters. When raster cells are stacked (see figure below), it is pretty straightforward to perform any focal statistics or cell algebra.
Disadvantages of Rasters
- Depending on the resolution, they can look pixellated and not visually appealing. For analysis, this would affect computation time and disk space.
- Raster cells can only contain one value (compared to vectors, which can have an entire attribute table). If you want cells to contain more than one value, you would need a stack of rasters, which takes up disk space and computation power.
Now that we know about coordinate reference systems, vectors, and rasters, let’s learn how to deal with all of these in R!