Read a STAC Catalog Using PySTAC

In this tutorial, you will gain an understanding of how to explore a STAC Catalog using PySTAC. You will be able to use these skills to explore any of the many existing STAC Catalogs.

This tutorial as well as the following 3 tutorials were adapted from the PySTAC site tutorials.

Throughout this tutorial, a variety of PySTAC classes, methods, and instances are used to read the STAC Catalog. We encourage you to look at the PySTAC API Reference while going through this tutorial.


If you need to install pystac, uncomment the line below and run this cell.

In [1]:
# ! pip install pystac

STAC Catalogs

A STAC Catalog is used to group other STAC objects like items, collections, and/or even other catalogs.

We will be using a small example catalog adapted from the example Landsat Collection in the GeoTrellis repository. All STAC Items and Collections can be found in the docs/example-catalog directory of the PySTAC documentation repository; all assets are hosted in the Landsat S3 bucket.

Import Packages and Store Data

To begin, import the packages and PySTAC classes that you need to access data and work with STAC Catalogs in Python.

In [2]:
import json

from pystac import Catalog, get_stac_version
from pystac.extensions.eo import EOExtension
from pystac.extensions.label import LabelExtension
In [3]:
# Read the example catalog
root_catalog = Catalog.from_file('public/example-catalog/catalog.json')

Explore the High-Level Catalog Information

To give us an idea of how the catalog we are working with is organized, let's take a look at all elements of the STAC using the describe method.

NOTE: Be careful using the `describe` method on large catalogs, as it will walk and print the entire tree of the STAC.
In [4]:
* <Catalog id=landsat-stac-collection-catalog>
    * <Collection id=landsat-8-l1>
      * <Item id=LC80140332018166LGN00>
      * <Item id=LC80150322018141LGN00>
      * <Item id=LC80150332018189LGN00>
      * <Item id=LC80300332018166LGN00>

From this output, we can see that the catalog has 1 collection and that this collection has 4 items.

Now, let's look at the root catalog more in depth.

In [5]:
# Print some basic metadata from the Catalog
print(f"ID: {}")
print(f"Title: {root_catalog.title or 'N/A'}")
print(f"Description: {root_catalog.description or 'N/A'}")
ID: landsat-stac-collection-catalog
Title: STAC for Landsat data
Description: STAC for Landsat data
Note: we do not print the "stac_version" here. PySTAC automatically updates any catalogs to the most recent supported STAC version and will automatically write this to the JSON object during serialization.

Let's confirm the latest STAC Specification version supported by PySTAC.

In [6]:

With this information, we have an understanding of the layout and general information about the STAC Catalog at hand. Now, let's dive deeper into this catalog.

Crawl STAC Child Catalogs and/or Collections

STAC Collections are used to group related items and provide aggregate or summary metadata for those items.

STAC Catalogs may have many nested layers of catalogs or collections within the top-level collection. Our example catalog only has one collection within the main catalog at landsat-8-l1/collection.json. We can list the collections in a given catalog using the Catalog.get_collections method. This method returns an iterable of PySTAC Collection instances, which we will turn into a list.

In [7]:
collections = list(root_catalog.get_collections())

print(f"Number of collections: {len(collections)}")
print("Collections IDs:")
for collection in collections:
    print(f"- {}")
Number of collections: 1
Collections IDs:
- landsat-8-l1

Let's grab that collection as a PySTAC Collection instance using the Catalog.get_child method so we can look at it in more detail. This method gets a child catalog or collection by ID, so we'll use the collection ID that we printed above. Since this method returns None if no child exists with the given ID, let's check to make sure we actually got the Collection.

In [8]:
collection = root_catalog.get_child("landsat-8-l1")
if collection is None:
    print("Collection is Empty. Check your downloads and try agian.")
    print("Collection has a root child. You may proceed to the following steps.")
Collection has a root child. You may proceed to the following steps.

Crawl the STAC Items

STAC Items are the fundamental building blocks of a STAC Catalog. Each Item represents a single spatiotemporal resource (e.g. a satellite scene).

Both catalogs and collections may have items associated with them. Let's crawl our catalog, starting at the root, to see what Items we have. The Catalog.get_all_items method provides a convenient way of recursively listing all Items associated with a catalog and all of its sub-catalogs.

In [9]:
items = list(root_catalog.get_all_items())

print(f"Number of items: {len(items)}")
for item in items:
    print(f"- {}")
Number of items: 4
- LC80140332018166LGN00
- LC80150322018141LGN00
- LC80150332018189LGN00
- LC80300332018166LGN00

These IDs are not very descriptive; in the next section, we will see how we can access the rich metadata associated with each item.

Explore STAC Item Metadata

Item Metadata

Items can have a lot of metadata. This can be a bit overwhelming at first, but let's break the metadata fields down into a few categories:

  1. Core Item Metadata
  2. Common Metadata
  3. STAC Extensions

We will walk through each of these metadata categories in the following sections.

First, let's grab one of the Items using the Catalog.get_item method. We will use recursive=True to recursively crawl all child catalogs and/or collections to find the item.

In [10]:
item = root_catalog.get_item("LC80140332018166LGN00", recursive=True)

1. Core Item Metadata

The core item metadata fields include spatiotemporal information and the ID of the collection to which the item belongs. These fields are all at the top level of the item JSON and we can access them through attributes on the PySTAC Item instance.

In [11]:
{'type': 'Polygon',
 'coordinates': [[[-76.12180471942207, 39.95810181489563],
   [-73.94910518227414, 39.55117185146004],
   [-74.49564725552679, 37.826064511480496],
   [-76.66550404911956, 38.240699151776084],
   [-76.12180471942207, 39.95810181489563]]]}
In [12]:
[-76.66703, 37.82561, -73.94861, 39.95958]
In [13]:
datetime.datetime(2018, 6, 15, 15, 39, 9, tzinfo=tzutc())
In [14]:

We can also access the same information as the cell above by running the Item.get_collection method.

In [15]:

2. Common Metadata

Certain fields that are commonly used in Items, but may also be found in other objects (e.g. Assets) are defined in the Common Metadata section of the spec. These include licensing and instrument information, descriptions of datetime ranges, and some other common fields. These properties can be found as attributes of the Item.common_metadata property, which is an instance of the CommonMetadata class.

In [16]:
In [17]:
In [18]:

3. STAC Extensions

STAC Extensions are a mechanism for providing additional metadata not covered by the core STAC Spec. We can see which STAC Extensions are implemented by this particular Item by examining the list of extension URIs in the stac_extensions field.

In [19]:

This Item implements the Electro-Optical, View Geometry, and Projection Extensions.

We can also check if a specific extension is implemented using the has_extension method for that extension class.

In [20]:
In [21]:

We can access fields associated with the extension as attributes on the extension instance. For instance, the "eo:cloud_cover" field defined in the Electro-Optical Extension can be accessed using the EOExtension.cloud_cover attribute.

In [22]:
eo_item_ext = EOExtension.ext(item)

We can also access the cloud cover field directly in the Item properties.

In [23]:['eo:cloud_cover']

Access STAC Item's Assets

To access the item's assets, we can use the assets attribute, which is a dictionary:

In [24]:
for asset_key in item.assets:
    asset = item.assets[asset_key]
    print('{}: {} ({})'.format(asset_key, asset.href, asset.media_type))
index: (text/html)
thumbnail: (image/jpeg)
B1: (image/tiff)
B2: (image/tiff)
B3: (image/tiff)
B4: (image/tiff)
B5: (image/tiff)
B6: (image/tiff)
B7: (image/tiff)
B8: (image/tiff)
B9: (image/tiff)
B10: (image/tiff)
B11: (image/tiff)
ANG: (text/plain)
MTL: (text/plain)
BQA: (image/tiff)

We can use the to_dict() method to convert an asset, or any PySTAC object, into a dictionary:

In [25]:
asset = item.assets['B3']
{'href': '',
 'type': 'image/tiff',
 'title': 'Band 3 (green)',
 'eo:bands': [{'name': 'B3',
   'full_width_half_max': 0.06,
   'center_wavelength': 0.56,
   'common_name': 'green'}],
 'roles': []}

Here, we use the eo Extension to get the band information for the asset:

In [26]:
eo_asset_ext = EOExtension.ext(asset)
bands = eo_asset_ext.bands
[<Band name=B3>]
In [27]:
{'name': 'B3',
 'full_width_half_max': 0.06,
 'center_wavelength': 0.56,
 'common_name': 'green'}

You have now successfully explored the common components of an existing STAC Catalog. To learn how to write your own STAC Catalog, see the following tutorial.

Join the conversation

If you have any questions, you’re welcome to ask our community on Gitter.