Using regionmask with intake

Regions from geopandas shapefiles can be pre-defined in a yaml file, which can be easily shared. This relies on intake_geopandas and accepts regionmask_kwargs, which are passed to regionmask.from_geopandas.

You need to install intake_geopandas, which combines geopandas and intake, see https://intake.readthedocs.io/en/latest/.

Let’s explore the Marine Ecoregions Of the World (MEOW) data set, which is a biogeographic classification of the world’s coasts and shelves.

In [1]: import intake

In [2]: import intake_geopandas

# open a pre-defined remote or local catalog yaml file, containing the MEOW regions
In [3]: path = "../data/regions_remote_catalog.yaml"

In [4]: cat = intake.open_catalog(path)

# access data from remote source
In [5]: meow_regions = cat.MEOW.read()

In [6]: print(meow_regions)
<regionmask.Regions 'MEOW'>
Source:   https://geospatial.tnc.org/datasets/903c3ae05b264c00a3b5e58a4561b7e...
overlap:  None

Regions:
  1.0          NorGre                          North Greenland
  2.0    NorandEasIce                   North and East Iceland
  3.0       EasGreShe                     East Greenland Shelf
  4.0       WesGreShe                     West Greenland Shelf
  5.0 NorGraBanSouLab Northern Grand Banks - Southern Labrador
  ...             ...                                      ...
228.0       AmuBelSea              Amundsen/Bellingshausen Sea
229.0          RosSea                                 Ross Sea
230.0    BouandAntIsl             Bounty and Antipodes Islands
231.0          CamIsl                          Campbell Island
232.0          AucIsl                          Auckland Island

[232 regions]

In [7]: meow_regions.plot(add_label=False)
Out[7]: <GeoAxes: >
_images/plotting_MEOW.png

Remote catalogs can also be used:

url = 'https://raw.githubusercontent.com/regionmask/regionmask/main/data/regions_remote_catalog.yaml'
cat = intake.open_catalog(path)

Because the catalog sets use_fsspec=True and uses simplecache:: in the url, the shapefile is cached locally:

In [8]: import os

In [9]: import zipfile

In [10]: file = "cache/MEOW-TNC/data"

In [11]: assert os.path.exists(file)

In [12]: assert zipfile.is_zipfile(file)

Find more such pre-defined regions in remote_climate_data.

Build your own catalog

To create a catalog we use the syntax described in intake. Below we show the catalog used above, which contains two example datasets (the second is the MEOW regions):

plugins:
  source:
    - module: intake_geopandas

sources:
  Countries:
    description: Natural Earth Data Admin 0 Countries
    metadata:
      url: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/
    driver: intake_geopandas.regionmask.RegionmaskSource
    args:
      urlpath: simplecache::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
      use_fsspec: true
      storage_options:
        simplecache:
          same_names: true
      regionmask_kwargs:
        names: NAME_EN
        abbrevs: _from_name
        source: https://www.naturalearthdata.com

  MEOW:
    description: >-
      The Marine Ecoregions Of the World (MEOW) data set is a biogeographic
      classification of the worlds coasts and shelves. The ecoregions nest within the
      broader biogeographic tiers of Realms and Provinces.
    metadata:
      url:
        https://geospatial.tnc.org/datasets/903c3ae05b264c00a3b5e58a4561b7e6/about
    driver: intake_geopandas.regionmask.RegionmaskSource
    args:
      urlpath: simplecache::https://www.arcgis.com/sharing/rest/content/items/903c3ae05b264c00a3b5e58a4561b7e6/data
      use_fsspec: true
      storage_options:
        simplecache:
          same_names: true
          cache_storage: cache/MEOW-TNC/
      regionmask_kwargs:
        name: MEOW
        names: ECOREGION
        numbers: ECO_CODE_X
        abbrevs: _from_name
        source: https://geospatial.tnc.org/datasets/903c3ae05b264c00a3b5e58a4561b7e6/about