Reading Data from the STAC API¶

The Planetary Computer catalogs the datasets we host using the STAC (SpatioTemporal Asset Catalog) specification. We provide a STAC API endpoint that can be used to search our datasets by space, time, and more. This quickstart will show you how to search for data using our STAC API and open-source Python libraries. For more on how to use our STAC API from R, see Reading data from the STAC API with R.

To get started you'll need the pystac-client library installed. You can install it via pip:

> pip install pystac-client

First we'll use pystac-client to open up our STAC API:

In [1]:

from pystac_client import Client

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

Searching¶

We can use the STAC API to search for assets meeting some criteria. This might include the date and time the asset covers, is spatial extent, or any other property captured in the STAC item's metadata.

In this example we'll search for imagery from Landsat Collection 2 Level-2 area around Microsoft's main campus in December of 2020.

In [2]:

time_range = "2020-12-01/2020-12-31"
bbox = [-122.2751, 47.5469, -121.9613, 47.7458]

search = catalog.search(collections=["landsat-8-c2-l2"], bbox=bbox, datetime=time_range)
items = search.get_all_items()
len(items)

Out[2]:

In that example our spatial query used a bounding box with a bbox. Alternatively, you can pass a GeoJSON object as intersects

area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-122.2751, 47.5469],
            [-121.9613, 47.9613],
            [-121.9613, 47.9613],
            [-122.2751, 47.9613],
            [-122.2751, 47.5469],
        ]
    ],
}

time_range = "2020-12-01/2020-12-31"

search = catalog.search(
    collections=["landsat-8-c2-l2"], intersects=area_of_interest, datetime=time_range
)

items is a pystac.ItemCollection. We can see that 4 items matched our search criteria.

In [3]:

len(items)

Out[3]:

Each pystac.Item in this ItemCollection includes all the metadata for that scene. STAC Items are GeoJSON features, and so can be loaded by libraries like geopandas.

In [4]:

import geopandas

df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")
df

Out[4]:

	geometry	datetime	platform	proj:bbox	proj:epsg	description	instruments	eo:cloud_cover	landsat:wrs_row	landsat:scene_id	landsat:wrs_path	landsat:wrs_type	view:sun_azimuth	view:sun_elevation	landsat:cloud_cover_land	landsat:processing_level	landsat:collection_number	landsat:collection_category
0	POLYGON ((-122.74802 48.51223, -120.22983 48.0...	2020-12-29T18:55:56.738265Z	landsat-8	[471585.0, 5136885.0, 705015.0, 5373315.0]	32610	Landsat Collection 2 Level-2 Surface Reflectan...	[oli, tirs]	100.00	027	LC80460272020364LGN00	046	2	162.253231	17.458298	100.00	L2SP	02	T2
1	POLYGON ((-124.29833 48.51325, -121.79248 48.0...	2020-12-20T19:02:09.878796Z	landsat-8	[353385.0, 5135085.0, 589215.0, 5374215.0]	32610	Landsat Collection 2 Level-2 Surface Reflectan...	[oli, tirs]	100.00	027	LC80470272020355LGN00	047	2	163.360118	17.414441	100.00	L2SP	02	T2
2	POLYGON ((-122.75189 48.51190, -120.23406 48.0...	2020-12-13T18:56:00.096447Z	landsat-8	[471285.0, 5136885.0, 704715.0, 5373315.0]	32610	Landsat Collection 2 Level-2 Surface Reflectan...	[oli, tirs]	98.73	027	LC80460272020348LGN00	046	2	164.126188	17.799744	98.64	L2SP	02	T2
3	POLYGON ((-124.29534 48.51347, -121.78981 48.0...	2020-12-04T19:02:11.194486Z	landsat-8	[353685.0, 5135085.0, 589515.0, 5374215.0]	32610	Landsat Collection 2 Level-2 Surface Reflectan...	[oli, tirs]	1.55	027	LC80470272020339LGN00	047	2	164.914060	18.807230	1.90	L2SP	02	T1

We can use the eo extension to sort the items by cloudiness. We'll grab an item with low cloudiness:

In [5]:

selected_item = min(items, key=lambda item: item.properties["eo:cloud_cover"])
selected_item

Out[5]:

<Item id=LC08_L2SP_047027_20201204_02_T1>

Each STAC item has one or more Assets, which include links to the actual files.

In [6]:

import rich.table

table = rich.table.Table("Asset Key", "Descripiption")
for asset_key, asset in selected_item.assets.items():
    # print(f"{asset_key:<25} - {asset.title}")
    table.add_row(asset_key, asset.title)

table

Out[6]:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Asset Key                 ┃ Descripiption                                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ANG                       │ Angle Coefficients File                        │
│ SR_B1                     │ Coastal/Aerosol Band (B1)                      │
│ SR_B2                     │ Blue Band (B2)                                 │
│ SR_B3                     │ Green Band (B3)                                │
│ SR_B4                     │ Red Band (B4)                                  │
│ SR_B5                     │ Near Infrared Band 0.8 (B5)                    │
│ SR_B6                     │ Short-wave Infrared Band 1.6 (B6)              │
│ SR_B7                     │ Short-wave Infrared Band 2.2 (B7)              │
│ ST_QA                     │ Surface Temperature Quality Assessment Band    │
│ ST_B10                    │ Surface Temperature Band (B10)                 │
│ MTL.txt                   │ Product Metadata File                          │
│ MTL.xml                   │ Product Metadata File (xml)                    │
│ ST_DRAD                   │ Downwelled Radiance Band                       │
│ ST_EMIS                   │ Emissivity Band                                │
│ ST_EMSD                   │ Emissivity Standard Deviation Band             │
│ ST_TRAD                   │ Thermal Radiance Band                          │
│ ST_URAD                   │ Upwelled Radiance Band                         │
│ MTL.json                  │ Product Metadata File (json)                   │
│ QA_PIXEL                  │ Pixel Quality Assessment Band                  │
│ ST_ATRAN                  │ Atmospheric Transmittance Band                 │
│ ST_CDIST                  │ Cloud Distance Band                            │
│ QA_RADSAT                 │ Radiometric Saturation Quality Assessment Band │
│ thumbnail                 │ Thumbnail image                                │
│ SR_QA_AEROSOL             │ Aerosol Quality Analysis Band                  │
│ reduced_resolution_browse │ Reduced resolution browse image                │
│ tilejson                  │ TileJSON with default rendering                │
│ rendered_preview          │ Rendered preview                               │
└───────────────────────────┴────────────────────────────────────────────────┘

Here, we'll inspect the rendered_preview asset.

In [7]:

selected_item.assets["rendered_preview"].to_dict()

Out[7]:

{'href': 'https://planetarycomputer.microsoft.com/api/data/v1/item/preview.png?collection=landsat-8-c2-l2&item=LC08_L2SP_047027_20201204_02_T1&assets=SR_B4&assets=SR_B3&assets=SR_B2&color_formula=gamma+RGB+2.7%2C+saturation+1.5%2C+sigmoidal+RGB+15+0.55',
 'type': 'image/png',
 'title': 'Rendered preview',
 'rel': 'preview',
 'roles': ['overview']}

In [8]:

from IPython.display import Image

Image(url=selected_item.assets["rendered_preview"].href, width=500)

Out[8]:

That rendered_preview asset is generated dynamically from the raw data using the Planetary Computer's data API. We can access the raw data, stored as Cloud Optimzied GeoTIFFs in Azure Blob Storage, using one of the other assets. That said, we do need to do one more thing before accessing the data. If we simply made a request to the file in blob storage we'd get a 404:

In [9]:

import requests

url = selected_item.assets["SR_B2"].href
print("Accessing", url)
response = requests.get(url)
response

Accessing https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2020/047/027/LC08_L2SP_047027_20201204_20210313_02_T1/LC08_L2SP_047027_20201204_20210313_02_T1_SR_B2.TIF

Out[9]:

<Response [404]>

That's because the Plantary Computer uses Azure Blob Storage SAS Tokens to enable access to our data, which allows us to provide the data for free to anyone, anywhere while maintaining some control over the amount of egress for datasets.

To get a token for access, you can use the Planetary Computer's Data Authentication API. You can access that anonymously, or you can provide an API Key for higher rate limits and longer-lived tokens.

You can also use the planetary-computer package to generate tokens and sign asset HREFs for access. You can install via pip with

> pip install planetary-computer

In [10]:

import planetary_computer

# PC_SDK_SUBSCRIPTION_KEY
signed_href = planetary_computer.sign(selected_item).assets["SR_B2"].href

We can load up that single COG using libraries like rioxarray or rasterio

In [11]:

# import xarray as xr
import rioxarray

ds = rioxarray.open_rasterio(signed_href, overview_level=4).squeeze()
img = ds.plot(cmap="Blues", add_colorbar=False)
img.axes.set_axis_off();

If you wish to work with multiple STAC items as a datacube, you can use libraries like stackstac or odc-stac.

In [12]:

import stackstac

ds = stackstac.stack(planetary_computer.sign(items))
ds

Out[12]:

<xarray.DataArray 'stackstac-6a76bcf3889fe96912bc723182731111' (time: 4,
                                                                band: 19,
                                                                y: 7972,
                                                                x: 11722)>
dask.array<fetch_raster_window, shape=(4, 19, 7972, 11722), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/27)
  * time                         (time) datetime64[ns] 2020-12-04T19:02:11.19...
    id                           (time) <U31 'LC08_L2SP_047027_20201204_02_T1...
  * band                         (band) <U13 'SR_B1' 'SR_B2' ... 'SR_QA_AEROSOL'
  * x                            (x) float64 3.534e+05 3.534e+05 ... 7.05e+05
  * y                            (y) float64 5.374e+06 5.374e+06 ... 5.135e+06
    view:off_nadir               int64 0
    ...                           ...
    title                        (band) <U46 'Coastal/Aerosol Band (B1)' ... ...
    gsd                          (band) float64 30.0 30.0 30.0 ... 30.0 30.0
    common_name                  (band) object 'coastal' 'blue' ... None None
    center_wavelength            (band) object 0.44 0.48 0.56 ... None None None
    full_width_half_max          (band) object 0.02 0.06 0.06 ... None None None
    epsg                         int64 32610
Attributes:
    spec:        RasterSpec(epsg=32610, bounds=(353370.0, 5135070.0, 705030.0...
    crs:         epsg:32610
    transform:   | 30.00, 0.00, 353370.00|\n| 0.00,-30.00, 5374230.00|\n| 0.0...
    resolution:  30.0

Searching on additional properties¶

Previously, we searched for items by space and time. Because the Planetary Computer's STAC API supports the query parameter, you can search on additional properties on the STAC item.

For example, collections like sentinel-2-l2a and landsat-8-c2-l2 both implement the eo STAC extension and include an eo:cloud_cover property. Use query={"eo:cloud_cover": {"lt": 20}} to return only items that are less than 20% cloudy.

In [13]:

time_range = "2020-12-01/2020-12-31"
bbox = [-122.2751, 47.5469, -121.9613, 47.7458]

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=bbox,
    datetime=time_range,
    query={"eo:cloud_cover": {"lt": 20}},
)
items = search.get_all_items()

Other common uses of the query parameter is to filter a collection down to items of a specific type, For example, the GOES-CMI collection includes images from various when the satellite is in various modes, which produces images of either the Full Disk of the earth, the continental United States, or a mesoscale. You can use goes:image-type to filter down to just the ones you want.

In [14]:

search = catalog.search(
    collections=["goes-cmi"],
    bbox=[-67.2729, 25.6000, -61.7999, 27.5423],
    datetime=["2018-09-11T13:00:00Z", "2018-09-11T15:40:00Z"],
    query={"goes:image-type": {"eq": "MESOSCALE"}},
)

Analyzing STAC Metadata¶

STAC items are proper GeoJSON Features, and so can be treated as a kind of data on their own.

In [15]:

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[-124.2751, 45.5469, -110.9613, 47.7458],
    datetime="2020-12-26/2020-12-31",
)
items = search.get_all_items()

df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")

df[["geometry", "datetime", "s2:mgrs_tile", "eo:cloud_cover"]].explore(
    column="eo:cloud_cover", style_kwds={"fillOpacity": 0.1}
)

Out[15]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Or we can plot cloudiness of a region over time.

In [16]:

import pandas as pd

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[-124.2751, 45.5469, -123.9613, 45.7458],
    datetime="2020-01-01/2020-12-31",
)
items = search.get_all_items()
df = geopandas.GeoDataFrame.from_features(items.to_dict())
df["datetime"] = pd.to_datetime(df["datetime"])

In [17]:

ts = df.set_index("datetime").sort_index()["eo:cloud_cover"].rolling(7).mean()
ts.plot(title="eo:cloud-cover (7-scene rolling average)");

Working with STAC Catalogs and Collections¶

Our catalog is a STAC Catalog that we can crawl or search. The Catalog contains STAC Collections for each dataset we have indexed (which is not the yet the entirity of data hosted by the Planetary Computer).

Collections have information about the STAC Items they contain. For instance, here we look at the Bands available for Landsat 8 Collection 2 Level 2 data:

In [18]:

import pandas as pd

landsat = catalog.get_collection("landsat-8-c2-l2")

pd.DataFrame(landsat.extra_fields["summaries"]["eo:bands"])

Out[18]:

	gsd	name	common_name	description	center_wavelength	full_width_half_max
0	30	SR_B1	coastal	coastal	0.44	0.02
1	30	SR_B2	blue	visible blue	0.48	0.06
2	30	SR_B3	green	visible green	0.56	0.06
3	30	SR_B4	red	visible red	0.65	0.04
4	30	SR_B5	nir	near-infrared	0.86	0.03
5	30	SR_B6	swir16	short-wave infrared	1.60	0.08
6	30	SR_B7	swir22	short-wave infrared	2.20	0.20
7	100	ST_B10	lwir11	long-wave infrared	10.90	0.80
8	30	ST_TRAD	NaN	thermal radiance	NaN	NaN
9	30	ST_URAD	NaN	upwelled radiance	NaN	NaN
10	30	ST_ATRAN	NaN	atmospheric transmission	NaN	NaN
11	30	ST_CDIST	NaN	distance to nearest cloud	NaN	NaN
12	30	ST_DRAD	NaN	downwelled radiance	NaN	NaN
13	30	ST_EMIS	NaN	emissivity	NaN	NaN
14	30	ST_EMSD	NaN	emissivity standard deviation	NaN	NaN

We can see what Assets are available on our item with:

In [19]:

pd.DataFrame.from_dict(landsat.extra_fields["item_assets"], orient="index")[
    ["title", "description", "gsd"]
]

Out[19]:

	title	description	gsd
ANG	Angle Coefficients File	Collection 2 Level-1 Angle Coefficients File (...	NaN
SR_B1	Coastal/Aerosol Band (B1)	Collection 2 Level-2 Coastal/Aerosol Band (B1)...	30.0
SR_B2	Blue Band (B2)	Collection 2 Level-2 Blue Band (B2) Surface Re...	30.0
SR_B3	Green Band (B3)	Collection 2 Level-2 Green Band (B3) Surface R...	30.0
SR_B4	Red Band (B4)	Collection 2 Level-2 Red Band (B4) Surface Ref...	30.0
SR_B5	Near Infrared Band 0.8 (B5)	Collection 2 Level-2 Near Infrared Band 0.8 (B...	30.0
SR_B6	Short-wave Infrared Band 1.6 (B6)	Collection 2 Level-2 Short-wave Infrared Band ...	30.0
SR_B7	Short-wave Infrared Band 2.2 (B7)	Collection 2 Level-2 Short-wave Infrared Band ...	30.0
ST_QA	Surface Temperature Quality Assessment Band	Landsat Collection 2 Level-2 Surface Temperatu...	30.0
ST_B10	Surface Temperature Band (B10)	Landsat Collection 2 Level-2 Surface Temperatu...	100.0
MTL.txt	Product Metadata File	Collection 2 Level-1 Product Metadata File (MTL)	NaN
MTL.xml	Product Metadata File (xml)	Collection 2 Level-1 Product Metadata File (xml)	NaN
ST_DRAD	Downwelled Radiance Band	Landsat Collection 2 Level-2 Downwelled Radian...	30.0
ST_EMIS	Emissivity Band	Landsat Collection 2 Level-2 Emissivity Band S...	30.0
ST_EMSD	Emissivity Standard Deviation Band	Landsat Collection 2 Level-2 Emissivity Standa...	30.0
ST_TRAD	Thermal Radiance Band	Landsat Collection 2 Level-2 Thermal Radiance ...	30.0
ST_URAD	Upwelled Radiance Band	Landsat Collection 2 Level-2 Upwelled Radiance...	30.0
MTL.json	Product Metadata File (json)	Collection 2 Level-1 Product Metadata File (json)	NaN
QA_PIXEL	Pixel Quality Assessment Band	Collection 2 Level-1 Pixel Quality Assessment ...	30.0
ST_ATRAN	Atmospheric Transmittance Band	Landsat Collection 2 Level-2 Atmospheric Trans...	30.0
ST_CDIST	Cloud Distance Band	Landsat Collection 2 Level-2 Cloud Distance Ba...	30.0
QA_RADSAT	Radiometric Saturation Quality Assessment Band	Collection 2 Level-1 Radiometric Saturation Qu...	30.0
thumbnail	Thumbnail image	NaN	NaN
SR_QA_AEROSOL	Aerosol Quality Analysis Band	Collection 2 Level-2 Aerosol Quality Analysis ...	30.0
reduced_resolution_browse	Reduced resolution browse image	NaN	NaN

Some collections, like Daymet include collection-level assets. You can use the .assets property to access those assets.

In [20]:

collection = catalog.get_collection("daymet-daily-na")
collection

Out[20]:

<CollectionClient id=daymet-daily-na>

Just like assets on items, these assets include links to data in Azure Blob Storage.

In [21]:

asset = collection.assets["zarr-https"]
asset

Out[21]:

<Asset href=https://daymeteuwest.blob.core.windows.net/daymet-zarr/daily/na.zarr>

In [22]:

import fsspec
import xarray as xr

store = fsspec.get_mapper(asset.href)
ds = xr.open_zarr(store, **asset.extra_fields["xarray:open_kwargs"])
ds

Out[22]:

<xarray.Dataset>
Dimensions:                  (time: 14965, y: 8075, x: 7814, nv: 2)
Coordinates:
    lat                      (y, x) float32 dask.array<chunksize=(284, 584), meta=np.ndarray>
    lon                      (y, x) float32 dask.array<chunksize=(284, 584), meta=np.ndarray>
  * time                     (time) datetime64[ns] 1980-01-01T12:00:00 ... 20...
  * x                        (x) float32 -4.56e+06 -4.559e+06 ... 3.253e+06
  * y                        (y) float32 4.984e+06 4.983e+06 ... -3.09e+06
Dimensions without coordinates: nv
Data variables:
    dayl                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    lambert_conformal_conic  int16 ...
    prcp                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    srad                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    swe                      (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    time_bnds                (time, nv) datetime64[ns] dask.array<chunksize=(365, 2), meta=np.ndarray>
    tmax                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    tmin                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    vp                       (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    yearday                  (time) int16 dask.array<chunksize=(365,), meta=np.ndarray>
Attributes:
    Conventions:       CF-1.6
    Version_data:      Daymet Data Version 4.0
    Version_software:  Daymet Software Version 4.0
    citation:          Please see http://daymet.ornl.gov/ for current Daymet ...
    references:        Please see http://daymet.ornl.gov/ for current informa...
    source:            Daymet Software Version 4.0
    start_year:        1980