This document intended to get you up to speed with using
rsgeo and comfortable branching out from using sf more
generally.
Understanding sf
Most R users that do geospatial analysis are familiar with sf and
understand spatial workflows in that context only. Many users of
sf view it as this magical data frame that lets you do
spatial analysis—and that is exactly how it feels! This means that sf
has done a great job in making spatial analysis feel a lot easier for
the vast majority of R users.
But sf needs to be understood in a more fundamental way.
sf is named after the Simple Feature Access Standard.
Simple features are an agreed upon way to represent “geometric
primitives”—things like points, linestrings, polygons, and their multi-
types. The sf package builds a hierarchy off of these. We
have sfg, sfc, and sf
objects.
sfg objects
At the core is the sfg class which are representations
of simple feature geometries. sfg class objects are
representations of a simple feature. They are a single geometry at a
time. We can think of them as a scalar value (even though R does not
have the concept of a scalar).
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
# create a point
pnt <- st_point(c(0, 10))
# create a line
ln <- st_linestring(matrix(c(0, 1, 0, 0), ncol = 2))
class(pnt)
#> [1] "XY" "POINT" "sfg"
class(ln)
#> [1] "XY" "LINESTRING" "sfg"Each scalar value can be used independently which is useful in itself.
st_length(ln)
#> [1] 1sfg objects are very simple objects constructed of
numeric vectors, matrices, and lists of matrices. They also have no
sense of a coordinate reference system (CRS).
But more often we want to have a vector of geometries. A vector of
geometries is stored in an sfc object.
sfc objects
sfc is short for simple feature column. Under the hood,
this is just a list of sfg objects. You might think that
you could create a vector of geometries by combining them using
c() but that would be wrong.
c(pnt, pnt)
#> MULTIPOINT ((0 10), (0 10))
c(pnt, ln)
#> GEOMETRYCOLLECTION (POINT (0 10), LINESTRING (0 0, 1 0))sfg objects behave like scalars so combining them
creates either a multi- type of the sfg or a geometry
collection (another type of simple feature).
To create a vector of geometries you must use st_sfc().
st_sfc() is the construct for creating a “simple feature
geometry list column.” It lets you also set a number of attributes that
are associated with the vector.
st_sfc(pnt, pnt)
#> Geometry set for 2 features
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 0 ymin: 10 xmax: 0 ymax: 10
#> CRS: NA
#> POINT (0 10)
#> POINT (0 10)sfc objects have attributes such as a CRS, bounding box,
and precision. sfc objects behave more like the vectors
that you are familiar with .
c(
st_sfc(pnt, pnt),
st_sfc(pnt),
st_sfc(ln)
)
#> Geometry set for 4 features
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: 0 ymin: 0 xmax: 1 ymax: 10
#> CRS: NA
#> POINT (0 10)
#> POINT (0 10)
#> POINT (0 10)
#> LINESTRING (0 0, 1 0)Since sfc objects are vectors, they can be included as a
column in a data frame.
df <- data.frame(
geo = st_sfc(pnt, pnt)
)
df
#> geometry
#> 1 POINT (0 10)
#> 2 POINT (0 10)
class(df)
#> [1] "data.frame"Having geometry included in a data frame is a huge win for the R community because this means they can included attributes along with the geometries.
For example we can create a vector of points from the x
and y columns from the diamonds dataset in
ggplot2.
data(diamonds, package = "ggplot2")
pnts <- purrr::map2(
diamonds$x,
diamonds$y,
function(.x, .y) st_point(c(.x, .y))
) |>
st_sfc()We can included this in a data frame
library(dplyr, warn.conflicts = FALSE)
dmnd <- diamonds |>
select(price, clarity) |>
bind_cols(geometry = pnts)
head(dmnd)
#> # A tibble: 6 × 3
#> price clarity geometry
#> <int> <ord> <POINT>
#> 1 326 SI2 (3.95 3.98)
#> 2 326 SI1 (3.89 3.84)
#> 3 327 VS1 (4.05 4.07)
#> 4 334 VS2 (4.2 4.23)
#> 5 335 SI2 (4.34 4.35)
#> 6 336 VVS2 (3.94 3.96)Now we have a tibble with price, clarity, and geometry! Huge win! What if we wanted to calculate the average price by clarity and keep the geometries?
dmnd |>
group_by(clarity) |>
summarise(avg_price = mean(price))
#> # A tibble: 8 × 2
#> clarity avg_price
#> <ord> <dbl>
#> 1 I1 3924.
#> 2 SI2 5063.
#> 3 SI1 3996.
#> 4 VS2 3925.
#> 5 VS1 3839.
#> 6 VVS2 3284.
#> 7 VVS1 2523.
#> 8 IF 2865.Well, we lose the geometry just like we would for any other column
that wasn’t included in the summarise() call. To keep it,
we would need to perform an operation on the geometry itself.
dmnd |>
group_by(clarity) |>
summarise(
avg_price = mean(price),
geometry = st_union(geometry)
)
#> # A tibble: 8 × 3
#> clarity avg_price geometry
#> <ord> <dbl> <MULTIPOINT>
#> 1 I1 3924. ((4.33 4.29), (4.33 4.36), (4.36 4.33), (4.38 4.42), (4.39 …
#> 2 SI2 5063. ((0 0), (0 6.62), (3.79 3.75), (3.84 3.82), (3.87 3.85), (3…
#> 3 SI1 3996. ((3.88 3.84), (3.89 3.84), (3.9 3.85), (3.93 3.96), (3.95 3…
#> 4 VS2 3925. ((0 0), (3.73 3.68), (3.73 3.71), (3.74 3.71), (3.76 3.73),…
#> 5 VS1 3839. ((0 0), (3.83 3.85), (3.84 3.87), (3.86 3.89), (3.88 3.9), …
#> 6 VVS2 3284. ((3.83 3.86), (3.85 3.89), (3.85 3.9), (3.85 3.91), (3.86 3…
#> 7 VVS1 2523. ((0 0), (3.83 3.85), (3.87 3.9), (3.88 3.95), (3.88 3.99), …
#> 8 IF 2865. ((3.86 3.88), (3.89 3.9), (3.91 3.95), (3.92 3.94), (3.93 3…Well, wouldn’t it be nice if the geometry knew to do that? Well, that
is exactly what sf objects are.
sf objects
sf objects are simply just data frames with a geometry
column that is sticky and smart. We can create an sf object
if an sfc column is present in a data frame by using
st_as_sf().
dmnd_sf <- st_as_sf(dmnd)Doing this creates an object of class sf. The two things
that make an sf object so special are the class
sf and the the attribute sf_column.
attr(dmnd_sf, "sf_column")
#> [1] "geometry"This attribute tells us which column is the geometry. Because we have
this sf can implement its own methods for common functions
like select(), mutate(),
aggregate(), group_by(), etc which always keep
the attr(x, "sf_column") attached to the data frame.
Having the class and attribute allow methods like
summarise() to be written for the class itself and handle
the things like unioning geometry for us.
dmnd_sf |>
group_by(clarity) |>
summarise(avg_price = mean(price))
#> Simple feature collection with 8 features and 2 fields
#> Geometry type: MULTIPOINT
#> Dimension: XY
#> Bounding box: xmin: 0 ymin: 0 xmax: 10.74 ymax: 58.9
#> CRS: NA
#> # A tibble: 8 × 3
#> clarity avg_price geometry
#> <ord> <dbl> <MULTIPOINT>
#> 1 I1 3924. ((4.33 4.29), (4.33 4.36), (4.36 4.33), (4.38 4.42), (4.39 …
#> 2 SI2 5063. ((0 0), (0 6.62), (3.79 3.75), (3.84 3.82), (3.87 3.85), (3…
#> 3 SI1 3996. ((3.88 3.84), (3.89 3.84), (3.9 3.85), (3.93 3.96), (3.95 3…
#> 4 VS2 3925. ((0 0), (3.73 3.68), (3.73 3.71), (3.74 3.71), (3.76 3.73),…
#> 5 VS1 3839. ((0 0), (3.83 3.85), (3.84 3.87), (3.86 3.89), (3.88 3.9), …
#> 6 VVS2 3284. ((3.83 3.86), (3.85 3.89), (3.85 3.9), (3.85 3.91), (3.86 3…
#> 7 VVS1 2523. ((0 0), (3.83 3.85), (3.87 3.9), (3.88 3.95), (3.88 3.99), …
#> 8 IF 2865. ((3.86 3.88), (3.89 3.9), (3.91 3.95), (3.92 3.94), (3.93 3…Note that this is just like what we wrote earlier except that we
didn’t have to handle the geometry column manually. If we compare these
two approaches and ignore attribute difference like
sf_column and class we can see that they are
identical.
Freeing yourself
Knowing how sf works can be helpful for understanding
how we can ease our reliance on it for all geospatial operations.
Instead of thinking of the entire sf data frame as the thing that
handles all geometry operations we now know that it is the
sfc geometry column.
In R there are different ways of representing geometry besides
sfc vectors. The packages s2,
geos, wk, and rsgeo all provide
different vectors of geometries that can be used.
These libraries are very handy for doing geometric operations. Each
of these packages tend to be better at one thing than another and each
have their place. You will often find big speed improvements if you use
these libraries instead of sf for certain things.
Take for example calculating the length of linestrings on a geodesic.
We can use the roxel dataset from
sfnetworks.
data(roxel, package = "sfnetworks")To illustrate we can extract the geometry column and cast it as an
rsgeo class object.
We can then use the functions st_length() and
length_haversine() respectively to compute the length of
linestrings.
bench::mark(
st_length(geo),
length_haversine(rs),
check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 st_length(geo) 2.28ms 2.37ms 409. 2.89MB 17.1
#> 2 length_haversine(rs) 322.55µs 756.37µs 1301. 8.8KB 0This is markedly faster when using rsgeo. We also do not have to extract the vector and work with it as its own object. Any vector can be included in a data frame, remember?
roxel |>
as_tibble() |>
mutate(
geometry = as_rsgeo(geometry),
length = length_haversine(geometry)
)
#> # A tibble: 851 × 4
#> name type
#> <chr> <fct>
#> 1 Havixbecker Strasse residential
#> 2 Pienersallee secondary
#> 3 Schulte-Bernd-Strasse residential
#> 4 NA path
#> 5 Welsingheide residential
#> 6 NA footway
#> 7 NA footway
#> 8 NA path
#> 9 NA track
#> 10 NA track
#> # ℹ 841 more rows
#> # ℹ 2 more variables: geometry <r_LINEST>, length <dbl>