Languageen
4-create-stac-collection.ipynb

Create a STAC Catalog with a Collection Using PySTAC

This tutorial builds off of the knowledge from the previous tutorials (where you learned how to create a STAC Catalog and create a STAC Item that utilizes extensions). Now that you know the basics of creating a STAC Catalog, we want to add more functionality to it. This tutorial shows you how to add a STAC Collection to a Catalog to better organize the catalog's items.

Dependencies

If you need to install pystac, rasterio, or pystac, uncomment the lines below and run the cell.

In [1]:
# ! pip install pystac
# ! pip install rasterio
# ! pip install shapely

STAC Collections

Collections are a subtype of a catalog that have some additional properties to make them more searchable. They also can define common properties so that items in the collection don't have to duplicate common data for each item. Let's create a collection to hold common properties between two images from the SpaceNet 5 Challenge.

We will use the image we have been working with along with another image.

Import Packages and Store Data

To begin, import the packages that you need to access data and work with STAC in Python.

In [2]:
import os
import rasterio
import urllib.request
import pystac

from shapely.geometry import Polygon, mapping
from datetime import datetime, timezone
from pystac.extensions.eo import Band, EOExtension
from tempfile import TemporaryDirectory

Let's set up our temporary directory and store two images from the Spacenet 5 Challenge.

In [3]:
# Set temporary directory to store source data
tmp_dir = TemporaryDirectory()
img_path1 = os.path.join(tmp_dir.name, 'image1.tif')

# Fetch and store data
url1 = ('https://spacenet-dataset.s3.amazonaws.com/'
       'spacenet/SN5_roads/train/AOI_7_Moscow/MS/'
       'SN5_roads_train_AOI_7_Moscow_MS_chip996.tif')
urllib.request.urlretrieve(url1, img_path1)

url2 = ('https://spacenet-dataset.s3.amazonaws.com/'
       'spacenet/SN5_roads/train/AOI_7_Moscow/MS/'
       'SN5_roads_train_AOI_7_Moscow_MS_chip997.tif')
img_path2 = os.path.join(tmp_dir.name, 'image2.tif')
urllib.request.urlretrieve(url2, img_path2)

print("img_path1: " , img_path1, "\n", "img_path2: ", img_path2)
img_path1:  /var/folders/73/z5lbqv_s6l7fcfx464_pn8p80000gn/T/tmp72oetiz2/image1.tif 
 img_path2:  /var/folders/73/z5lbqv_s6l7fcfx464_pn8p80000gn/T/tmp72oetiz2/image2.tif

Collect the Items' geometry and bbox

To get the bounding box and footprint of the image, we will utilize the get_bbox_and_footprint function we first used in the Create a STAC Catalog Tutorial.

We will do this process for both the images in which we are using.

In [4]:
def get_bbox_and_footprint(raster):
    with rasterio.open(raster) as r:
        bounds = r.bounds
        bbox = [bounds.left, bounds.bottom, bounds.right, bounds.top]
        footprint = Polygon([
            [bounds.left, bounds.bottom],
            [bounds.left, bounds.top],
            [bounds.right, bounds.top],
            [bounds.right, bounds.bottom]
        ])
        
        return (bbox, mapping(footprint))
In [5]:
# Run the function and print out the results for image 1
bbox, footprint = get_bbox_and_footprint(img_path1)
print("bbox: ", bbox, "\n")
print("footprint: ", footprint)
bbox:  [37.6616853489879, 55.73478197572927, 37.66573047610874, 55.73882710285011] 

footprint:  {'type': 'Polygon', 'coordinates': (((37.6616853489879, 55.73478197572927), (37.6616853489879, 55.73882710285011), (37.66573047610874, 55.73882710285011), (37.66573047610874, 55.73478197572927), (37.6616853489879, 55.73478197572927)),)}
In [6]:
# Run the function and print out the results for image 2
bbox2, footprint2 = get_bbox_and_footprint(img_path2)
print("bbox: ", bbox2, "\n")
print("footprint: ", footprint2)
bbox:  [37.67786535472783, 55.726691972859314, 37.68191048184866, 55.730737099980146] 

footprint:  {'type': 'Polygon', 'coordinates': (((37.67786535472783, 55.726691972859314), (37.67786535472783, 55.730737099980146), (37.68191048184866, 55.730737099980146), (37.68191048184866, 55.726691972859314), (37.67786535472783, 55.726691972859314)),)}

Define the Bands of WorldView-3

In this tutorial, we will need the band information again. We have collected this band information from the WorldView-3 Data Sheet.

In [7]:
wv3_bands = [Band.create(name='Coastal', description='Coastal: 400 - 450 nm', common_name='coastal'),
             Band.create(name='Blue', description='Blue: 450 - 510 nm', common_name='blue'),
             Band.create(name='Green', description='Green: 510 - 580 nm', common_name='green'),
             Band.create(name='Yellow', description='Yellow: 585 - 625 nm', common_name='yellow'),
             Band.create(name='Red', description='Red: 630 - 690 nm', common_name='red'),
             Band.create(name='Red Edge', description='Red Edge: 705 - 745 nm', common_name='rededge'),
             Band.create(name='Near-IR1', description='Near-IR1: 770 - 895 nm', common_name='nir08'),
             Band.create(name='Near-IR2', description='Near-IR2: 860 - 1040 nm', common_name='nir09')]

Create the Collection

Take a look at the PySTAC API Documentation for Collection to see what information we need to supply in order to satisfy the specification.

Beyond what a Catalog requires, a Collection requires a license of the data in the collection and an extent that describes the range of space and time that the items it holds occupy.

An extent is comprised of a SpatialExtent and a TemporalExtent. These extents hold one or more bounding boxes and time intervals, respectively, that completely cover the items contained in the collections.

Let's start with creating two new items - these will be core items. We can set these items to implement the EO extension by specifying them in the stac_extensions.

In [8]:
collection_item = pystac.Item(id='local-image-col-1',
                               geometry=footprint,
                               bbox=bbox,
                               datetime=datetime.utcnow(),
                               properties={})

collection_item.common_metadata.gsd = 0.3
collection_item.common_metadata.platform = 'Maxar'
collection_item.common_metadata.instruments = ['WorldView3']

asset = pystac.Asset(href=img_path1, 
                      media_type=pystac.MediaType.GEOTIFF)
collection_item.add_asset("image", asset)
eo = EOExtension.ext(collection_item.assets["image"], add_if_missing=True)
eo.apply(wv3_bands)

collection_item2 = pystac.Item(id='local-image-col-2',
                               geometry=footprint2,
                               bbox=bbox2,
                               datetime=datetime.utcnow(),
                               properties={})

collection_item2.common_metadata.gsd = 0.3
collection_item2.common_metadata.platform = 'Maxar'
collection_item2.common_metadata.instruments = ['WorldView3']

asset2 = pystac.Asset(href=img_path2,
                     media_type=pystac.MediaType.GEOTIFF)
collection_item2.add_asset("image", asset2)
eo = EOExtension.ext(collection_item2.assets["image"], add_if_missing=True)
eo.apply([
    band for band in wv3_bands if band.name in ["Red", "Green", "Blue"]
])

We can use our two items' metadata to find out what the proper bounds are:

In [9]:
from shapely.geometry import shape

unioned_footprint = shape(footprint).union(shape(footprint2))
collection_bbox = list(unioned_footprint.bounds)
spatial_extent = pystac.SpatialExtent(bboxes=[collection_bbox])
In [10]:
collection_interval = sorted([collection_item.datetime, collection_item2.datetime])
temporal_extent = pystac.TemporalExtent(intervals=[collection_interval])
In [11]:
collection_extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)
In [12]:
collection = pystac.Collection(id='wv3-images',
                               description='Spacenet 5 images over Moscow',
                               extent=collection_extent,
                               license='CC-BY-SA-4.0')

Now if we add our items to our collection, and our collection to a catalog, we get the following STAC that can be saved:

In [13]:
collection.add_items([collection_item, collection_item2])
catalog = pystac.Catalog(id='catalog-with-collection', 
                         description='This Catalog is a basic demonstration of how to include a Collection in a STAC Catalog.')
catalog.add_child(collection)
Out[13]:
In [14]:
catalog.describe()
* <Catalog id=catalog-with-collection>
    * <Collection id=wv3-images>
      * <Item id=local-image-col-1>
      * <Item id=local-image-col-2>
In [15]:
catalog.normalize_and_save(root_href=os.path.join(tmp_dir.name, 'stac-collection'), 
                           catalog_type=pystac.CatalogType.SELF_CONTAINED)

Cleanup

Don't forget to clean up the temporary directory.

In [16]:
tmp_dir.cleanup()

There you have it. A STAC Catalog with a STAC Collection, STAC Items, and use of a STAC Extension. Now you are ready to build your own STAC Catalog for a dataset of your own.

Join the conversation

If you have any questions, you’re welcome to ask our community on Gitter.