Languageen

Using rstac and CQL2 to query STAC APIs

This tutorial builds upon the Download data from a STAC API using R, rstac, and GDAL tutorial, developing more complicated STAC queries using rstac to find, download, and process Landsat data using STAC metadata. That tutorial walks through building queries with rstac, using typical R functions to compose queries and download data. This tutorial walks through using rstac for more complex queries, based on CQL2 and the STAC API Filter Extension, and using the metadata provided by STAC APIs to filter through items and process assets.

To run this tutorial, you’ll need the rstac and sf packages. If necessary, you can install both packages via install.packages():

install.packages("sf")
install.packages("rstac")

As in the last tutorial, we’re going to start off by querying Microsoft’s Planetary Computer STAC API to get data for Ashe County, North Carolina. Let’s go ahead and load the geometry for the county:

ashe <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))[1, ]

Let’s try and get Landsat imagery for this area from January 2021. As we saw last time, we’re able to find all the STAC Items that match this description using rstac::stac_search(), providing our bounding box, time range, and desired data collection as regular function arguments. To get all the Landsat images for this spatiotemporal area of interest, we might write our query like this:

ashe_bbox <- ashe |>
sf::st_transform(4326) |>
sf::st_bbox()

stac_query <- rstac::stac(
"https://planetarycomputer.microsoft.com/api/stac/v1"
) |>
rstac::stac_search(
collections = "landsat-c2-l2",
bbox = ashe_bbox,
datetime = "2021-01-01/2021-01-31"
) |>
rstac::get_request()

stac_query
###STACItemCollection
- features (12 item(s)):
- LE07_L2SP_017035_20210127_02_T1
- LC08_L2SP_018035_20210126_02_T1
- LC08_L2SP_018034_20210126_02_T1
- LC08_L2SP_017035_20210119_02_T1
- LE07_L2SP_018035_20210118_02_T1
- LE07_L2SP_018034_20210118_02_T2
- LE07_L2SP_017035_20210111_02_T2
- LC08_L2SP_018035_20210110_02_T1
- LC08_L2SP_018034_20210110_02_T1
- LC08_L2SP_017035_20210103_02_T1
- LE07_L2SP_018035_20210102_02_T1
- LE07_L2SP_018034_20210102_02_T2
- assets:
ang, atmos_opacity, atran, blue, cdist, cloud_qa, coastal, drad, emis, emsd, green, lwir, lwir11, mtl.json, mtl.txt, mtl.xml, nir08, qa, qa_aerosol, qa_pixel, qa_radsat, red, rendered_preview, swir16, swir22, tilejson, trad, urad
- item's fields:
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

As we can see, this returns 12 separate items. We can also see that those items seem to have different prefixes in their names; some start with LE07, while others start with LC08. We might be able to guess what this means (spoiler alert, LE07 corresponds to Landsat-7 imagery, while LC08 is Landsat-8), but we might also not know which of these items are actually relevant to our search.

Luckily enough, STAC items include useful metadata about what their associated assets actually represent. This metadata gets converted by rstac into a list, which is then stored in the properties element of each item in our item collection. We can look at the names of these item properties to get a sense of what metadata is available for each of our items:

lapply(stac_query$features, \(x) names(x$properties)) |> 
unlist() |>
unique()
[1] "gsd"                         "created"                    
[3] "sci:doi" "datetime"
[5] "platform" "proj:epsg"
[7] "proj:shape" "description"
[9] "instruments" "eo:cloud_cover"
[11] "proj:transform" "view:off_nadir"
[13] "landsat:wrs_row" "landsat:scene_id"
[15] "landsat:wrs_path" "landsat:wrs_type"
[17] "view:sun_azimuth" "landsat:correction"
[19] "view:sun_elevation" "landsat:cloud_cover_land"
[21] "landsat:collection_number" "landsat:collection_category"

Many of these fields are defined by the STAC specification as common metadata, which defines fields that should mean the same thing across multiple data providers. For instance, the platform field should detail the “unique name of the specific platform to which the instrument is attached”, which means that we should be able to use it to confirm that the item naming conventions do in fact correspond to whether an image comes from Landsat-7 or Landsat-8:

lapply(
stac_query$features,
\(x) data.frame(id = x$id, platform = x$properties$platform)
) |>
do.call(what = rbind)
                                id  platform
1 LE07_L2SP_017035_20210127_02_T1 landsat-7
2 LC08_L2SP_018035_20210126_02_T1 landsat-8
3 LC08_L2SP_018034_20210126_02_T1 landsat-8
4 LC08_L2SP_017035_20210119_02_T1 landsat-8
5 LE07_L2SP_018035_20210118_02_T1 landsat-7
6 LE07_L2SP_018034_20210118_02_T2 landsat-7
7 LE07_L2SP_017035_20210111_02_T2 landsat-7
8 LC08_L2SP_018035_20210110_02_T1 landsat-8
9 LC08_L2SP_018034_20210110_02_T1 landsat-8
10 LC08_L2SP_017035_20210103_02_T1 landsat-8
11 LE07_L2SP_018035_20210102_02_T1 landsat-7
12 LE07_L2SP_018034_20210102_02_T2 landsat-7

This metadata can be really useful to let us decide which items we want to download from, without needing to download the whole data object! To query using these fields, however, we’re going to need to build our queries in a different way. Namely, rather than using rstac::stac_search(), we’re going to have to write our queries in Common Query Language, or CQL2. CQL2 is a draft OGC standard setting out “a generic filter grammar […] used in query operations to identify the subset of resources, such as features, that should be included in a response”. STAC APIs which implement the filter extension can accept CQL2 queries, which can help you filter down the set of items returned by the API.

CQL2 has a number of component pieces which define logical operators, spatial and temporal filters, and other filtering functiions. We’re going to focus primarily on how to use the most basic components to find items that intersect our spatiotemporal area of interest and have the properties we desire.

Luckily, rstac supports writing CQL2 queries through the rstac::ext_filter() function, turning R’s logcal operators and objects into valid CQL2 queries. This function helps to translate R expressions into CQL2 that can be sent as a query to a STAC API. A handful of helper functions, prefixed with cql2_, also help translate R objects into valid CQL2 representations.

For instance, to turn our stac_search() query into an ext_filter() query, we’ll need to convert both our bounding box and datetime arguments. We can convert our bounding box into a representation that ext_filter() can use via rstac::cql2_bbox_as_geojson():

ashe_bbox_geojson <- rstac::cql2_bbox_as_geojson(ashe_bbox)
ashe_bbox_geojson
$type
[1] "Polygon"

$coordinates
$coordinates[[1]]
[,1] [,2]
[1,] -81.74091 36.23444
[2,] -81.23971 36.23444
[3,] -81.23971 36.58973
[4,] -81.74091 36.58973
[5,] -81.74091 36.23444

And we can convert our datetime into a valid interval using rstac::cql2_interval():

time_range <- rstac::cql2_interval("2021-01-01", "2021-01-31")
time_range
interval("2021-01-01", "2021-01-31")

With these objects converted, we’re then able to build a query that uses CQL2 using rstac::ext_filter(). Rather than providing our filters as function arguments, like we did with stac_search(), we’re going to instead provide ext_filter() with a single query expression that combines all of the filters we care about. For instance, to request only items belonging to the Landsat collection, we’ll use == to filter to only items whose collection is landsat-c2-l2:

rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1") |>
rstac::ext_filter(
collection == "landsat-c2-l2"
)
###RSTACQuery
- url: https://planetarycomputer.microsoft.com/api/stac/v1
- params:
- filter: collection = 'landsat-c2-l2'
- field(s): version, base_url, endpoint, params, verb, encode

In addition to using logical operators, we’ll also use spatial and temporal operators to limit our results to only our area of interest. For instance, we’ll use the t_intersects CQL2 function and our time_range variable to limit our results to just January 2021. We’ll need to wrap our variable in { { } } to tell rstac to replace the variable name with its contents:

rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1") |>
rstac::ext_filter(
collection == "landsat-c2-l2" &&
t_intersects(datetime, )
)
###RSTACQuery
- url: https://planetarycomputer.microsoft.com/api/stac/v1
- params:
- filter: collection = 'landsat-c2-l2' AND T_INTERSECTS(datetime,INTERVAL('2021-01-01','2021-01-31'))
- field(s): version, base_url, endpoint, params, verb, encode

Notice how we used && to combine these two filters, restricting our results to only items that satisfy both conditions. Also notice how the filter parameter in our rstac query has changed, including a call to T_INTERSECTS()!

Similarly, we’ll need to use the s_intersects() CQL2 function to restrict our results to our spatial area of interest using our ashe_bbox_geojson variable:

rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1") |>
rstac::ext_filter(
collection == "landsat-c2-l2" &&
t_intersects(datetime, ) &&
s_intersects(geometry, )
)
###RSTACQuery
- url: https://planetarycomputer.microsoft.com/api/stac/v1
- params:
- filter: collection = 'landsat-c2-l2' AND T_INTERSECTS(datetime,INTERVAL('2021-01-01','2021-01-31')) AND S_INTERSECTS(geometry,POLYGON((-81.740905671483 36.234442085901,-81.2397076336137 36.234442085901,-81.2397076336137 36.589729047258,-81.740905671483 36.589729047258,-81.740905671483 36.234442085901)))
- field(s): version, base_url, endpoint, params, verb, encode

This query is equivalent to the one we constructed via stac_search(): we’re filtering our results based on collection and spatiotemporal range. To execute it against Planetary Computer, we’re going to need to use post_request(), rather than get_request(), to send this query as an HTTP POST rather than GET request:

rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1") |>
rstac::ext_filter(
collection == "landsat-c2-l2" &&
t_intersects(datetime, ) &&
s_intersects(geometry, )
) |>
rstac::post_request()
###STACItemCollection
- features (12 item(s)):
- LE07_L2SP_017035_20210127_02_T1
- LC08_L2SP_018035_20210126_02_T1
- LC08_L2SP_018034_20210126_02_T1
- LC08_L2SP_017035_20210119_02_T1
- LE07_L2SP_018035_20210118_02_T1
- LE07_L2SP_018034_20210118_02_T2
- LE07_L2SP_017035_20210111_02_T2
- LC08_L2SP_018035_20210110_02_T1
- LC08_L2SP_018034_20210110_02_T1
- LC08_L2SP_017035_20210103_02_T1
- LE07_L2SP_018035_20210102_02_T1
- LE07_L2SP_018034_20210102_02_T2
- assets:
ang, atmos_opacity, atran, blue, cdist, cloud_qa, coastal, drad, emis, emsd, green, lwir, lwir11, mtl.json, mtl.txt, mtl.xml, nir08, qa, qa_aerosol, qa_pixel, qa_radsat, red, rendered_preview, swir16, swir22, tilejson, trad, urad
- item's fields:
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

As you can see, the results from this query are exactly equivalent to those from stac_search(). For straightforward queries like this, stac_search() provides an easier and friendlier interface for constructing requests. However, using CQL2 via ext_filter() allows us to take full advantage of the metadata provided by the STAC API.

For instance, we could also filter our results to only include data from Landsat-8, using the platform property that we examined earlier. To do so, we’ll add another filter using == to our query:

rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1") |>
rstac::ext_filter(
collection == "landsat-c2-l2" &&
t_intersects(datetime, ) &&
s_intersects(geometry, ) &&
platform == "landsat-8"
) |>
rstac::post_request()
###STACItemCollection
- features (6 item(s)):
- LC08_L2SP_018035_20210126_02_T1
- LC08_L2SP_018034_20210126_02_T1
- LC08_L2SP_017035_20210119_02_T1
- LC08_L2SP_018035_20210110_02_T1
- LC08_L2SP_018034_20210110_02_T1
- LC08_L2SP_017035_20210103_02_T1
- assets:
ang, atran, blue, cdist, coastal, drad, emis, emsd, green, lwir11, mtl.json, mtl.txt, mtl.xml, nir08, qa, qa_aerosol, qa_pixel, qa_radsat, red, rendered_preview, swir16, swir22, tilejson, trad, urad
- item's fields:
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

We could also use other logical operators to filter these results down further. For instance, the eo:cloud_cover property, part of the electro-optical STAC extension, provides an estimate of how much of each image is covered by clouds. We could add a filter to restrict our results to only include images with less than 10% cloud cover using this property and <:

rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1") |>
rstac::ext_filter(
collection == "landsat-c2-l2" &&
t_intersects(datetime, ) &&
s_intersects(geometry, ) &&
platform == "landsat-8" &&
`eo:cloud_cover` < 10
) |>
rstac::post_request()
###STACItemCollection
- features (1 item(s)):
- LC08_L2SP_018034_20210110_02_T1
- assets:
ang, atran, blue, cdist, coastal, drad, emis, emsd, green, lwir11, mtl.json, mtl.txt, mtl.xml, nir08, qa, qa_aerosol, qa_pixel, qa_radsat, red, rendered_preview, swir16, swir22, tilejson, trad, urad
- item's fields:
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

rstac is able to translate several other R expressions into CQL2 representations. For a list of supported R expressions and other examples, check out the rstac documentation.