Create a Basic STAC Catalog Using PySTAC¶
In the previous tutorial, you learned how to use PySTAC to read an existing STAC Catalog. In this tutorial, you will learn how to create your own STAC Catalog (also using PySTAC). By the end, you will have a basic STAC Catalog created.
In the following tutorials, you will learn to add additional STAC components and functionality to the basic STAC Catalog you create here. The experience you gain from this tutorial along with the extensive PySTAC documentation will allow you to create your own STAC Catalog on a different dataset.
Dependencies¶
If you need to install pystac, rasterio, or pystac, uncomment the lines below and run the cell.
# ! pip install pystac
# ! pip install rasterio
# ! pip install shapely
Import Packages and Store Data¶
To begin, import the packages and classes that you need to access data and work with STAC Catalogs in Python.
import os
import json
import rasterio
import urllib.request
import pystac
from datetime import datetime, timezone
from shapely.geometry import Polygon, mapping
from tempfile import TemporaryDirectory
To give us some material to work with, let's download a single image from the SpaceNet 5 Challenge. We will use a temporary directory to save our single-Item STAC.
# Set temporary directory to store source data
tmp_dir = TemporaryDirectory()
img_path = os.path.join(tmp_dir.name, 'image.tif')
# Fetch and store data
url = ('https://spacenet-dataset.s3.amazonaws.com/'
'spacenet/SN5_roads/train/AOI_7_Moscow/MS/'
'SN5_roads_train_AOI_7_Moscow_MS_chip996.tif')
urllib.request.urlretrieve(url, img_path)
We want to create a STAC Catalog. Take a look at the Catalog documentation to see what information you need to create our PySTAC Catalog
instance.
Create the STAC Catalog¶
Start by first creating the catalog and only populating the required arguments: the ID and description. The remaining arguments will be added to the catalog further along in the tutorial.
catalog = pystac.Catalog(id='tutorial-catalog', description='This catalog is a basic demonstration catalog utilizing a scene from SpaceNet 5.')
The catalog now exists. Take a look inside.
In the Introduction to STAC lesson on this site, you learned about the three main components of the STAC Specification and the possible relations between them all. Based on what you learned and what we have done so far, do you think your catalog has any children or items? Let's take a look:
print(list(catalog.get_children()))
print(list(catalog.get_items()))
Since we have not added them, there are no children or items in the catalog yet. We need to add these components.
JSON Progress Check¶
Throughout this tutorial, we will be checking in on the STAC components we are creating using to_dict()
to see how the STAC JSON is shaping up. Let's take a look at the catalog we just created.
print(json.dumps(catalog.to_dict(), indent=4))
Create a STAC Item¶
Now that the catalog exists, we can populate it. Let's create a STAC Item to represent the image we saved in the temporary directory. Again, take a look at the PySTAC Documentation for an Item to see what you need to supply.
For creating this item, you will populate all the attributes at once.
Let's collect the information we need for each attribute.
def get_bbox_and_footprint(raster):
with rasterio.open(raster) as r:
bounds = r.bounds
bbox = [bounds.left, bounds.bottom, bounds.right, bounds.top]
footprint = Polygon([
[bounds.left, bounds.bottom],
[bounds.left, bounds.top],
[bounds.right, bounds.top],
[bounds.right, bounds.bottom]
])
return (bbox, mapping(footprint))
# Run the function and print out the results
bbox, footprint = get_bbox_and_footprint(img_path)
print("bbox: ", bbox, "\n")
print("footprint: ", footprint)
Collect the Item datetime
¶
To obtain the datetime property for our Item from the image, we will use datetime.now().
datetime_utc = datetime.now(tz=timezone.utc)
Populate pystac.Item
¶
item = pystac.Item(id='local-image',
geometry=footprint,
bbox=bbox,
datetime=datetime_utc,
properties={})
Now we have our first Item.
However, the item has not been added to our catalog yet. Therefore, when you run the following cell, you can see that the Item does not have a parent yet:
print(item.get_parent() is None)
Add the Item to the Catalog¶
catalog.add_item(item)
Visualize the Catalog Relationship¶
Now that we have added the item to the catalog, we can see it link to it’s parent (which is the catalog).
item.get_parent()
You can also visualize the architecture of the STAC Catalog by using the describe()
method. As a reminder, be careful when using it on large catalogs, as it will walk the entire tree of the STAC.
catalog.describe()
Add STAC Assets¶
We’ve created an item, but there aren’t any assets associated with it. Let’s create one. As always, take a look at the PySTAC API Documentation to see what components are needed to create an Asset.
# Add Asset and all its information to Item
item.add_asset(
key='image',
asset=pystac.Asset(
href=img_path,
media_type=pystac.MediaType.GEOTIFF
)
)
JSON Progress Check¶
Run to_dict()
on the STAC Item we created. Notice the asset is now set:
print(json.dumps(item.to_dict(), indent=4))
Note that the link href properties are null. The empty href links are OK for now, as we’re working with the STAC in memory. Next, we will write the Catalog out and set those HREFs.
Save the Catalog¶
As the JSON above indicates, there are no HREFs set on these in-memory items. PySTAC uses the self
link on STAC objects to track where the file lives. Because we haven’t set them, they evaluate to None
:
print(catalog.get_self_href() is None)
print(item.get_self_href() is None)
Set the Catalog's HREFs¶
In order to set the HREFs, we can use normalize_hrefs
. This method will create a normalized set of HREFs for each STAC object in the catalog, according to the recommendations from the best practices document on how to lay out a catalog.
catalog.normalize_hrefs(os.path.join(tmp_dir.name, "stac"))
Now that we’ve normalized to a root directory (the temporary directory), we see that the self
links are set:
print("Catalog HREF: ", catalog.get_self_href())
print("Item HREF: ", item.get_self_href())
We can now call save
on the catalog, which will recursively save all the STAC objects to their respective self HREFs.
Save requires a CatalogType
to be set. You can review the PySTAC API Documentation to learn about each CatalogType
.
Here, we will be creating a ‘self-contained catalog.' This type is one that is designed for portability. Users may want to download an online catalog from and be able to use it on their local computer, so all links need to be relative.
Save the Catalog: Self Contained¶
catalog.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)
Take a look at the temporary directory to see the catalog and item.
!ls {tmp_dir.name}/stac/*
Now that our Catalog has been written out to file, we can open it up and read it directly. We should see the previously null hrefs populated with paths to the respective STAC component.
with open(catalog.self_href) as f:
print(f.read())
with open(item.self_href) as f:
print(f.read())
Save the Catalog: Absolute Published¶
As you can see, all links are saved with relative paths. That’s because we used catalog_type=CatalogType.SELF_CONTAINED
. If we save an Absolute Published Catalog, we’ll see absolute paths.
An Absolute Published Catalog is a catalog that uses absolute links for everything, both in the links objects and in the asset hrefs.
Let's try saving the same catalog with the CatalogType
as ABSOLUTE_PUBLISHED
.
catalog.save(catalog_type=pystac.CatalogType.ABSOLUTE_PUBLISHED)
Now the links included in the STAC Item are all absolute:
with open(item.get_self_href()) as f:
print(f.read())
Notice that the asset href
is absolute in both cases. We can make the asset href
relative to the STAC Item by using .make_all_asset_hrefs_relative()
:
catalog.make_all_asset_hrefs_relative()
catalog.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)
with open(item.get_self_href()) as f:
print(f.read())
There you have it: your very first STAC Catalog. In the following tutorial, you will learn how to add an Item with EO Extensions and a STAC Collection to this STAC Catalog.
Cleanup¶
Don't forget to clean up the temporary directory.
tmp_dir.cleanup()