Using regionmask with intake
Regions from geopandas shapefiles can be pre-defined in a yaml file, which can be
easily shared. This relies on intake_geopandas
and accepts regionmask_kwargs
,
which are passed to regionmask.from_geopandas
.
You need to install intake_geopandas, which combines geopandas and intake, see https://intake.readthedocs.io/en/latest/.
Let’s explore the Marine Ecoregions Of the World (MEOW) data set, which is a biogeographic classification of the world’s coasts and shelves.
In [1]: import importlib
In [2]: import intake
In [3]: import intake_geopandas
# open a pre-defined remote or local catalog yaml file, containing the MEOW regions
In [4]: path = importlib.resources.files("regionmask").parent / "data"
In [5]: filename = path / "regions_remote_catalog.yaml"
In [6]: cat = intake.open_catalog(filename)
# access data from remote source
In [7]: meow_regions = cat.MEOW.read()
In [8]: print(meow_regions)
<regionmask.Regions 'MEOW'>
Source: https://geospatial.tnc.org/datasets/903c3ae05b264c00a3b5e58a4561b7e...
overlap: None
Regions:
1.0 NorGre North Greenland
2.0 NorandEasIce North and East Iceland
3.0 EasGreShe East Greenland Shelf
4.0 WesGreShe West Greenland Shelf
5.0 NorGraBanSouLab Northern Grand Banks - Southern Labrador
... ... ...
228.0 AmuBelSea Amundsen/Bellingshausen Sea
229.0 RosSea Ross Sea
230.0 BouandAntIsl Bounty and Antipodes Islands
231.0 CamIsl Campbell Island
232.0 AucIsl Auckland Island
[232 regions]
In [9]: meow_regions.plot(add_label=False)
Out[9]: <GeoAxes: >
Remote catalogs can also be used:
url = 'https://raw.githubusercontent.com/regionmask/regionmask/main/data/regions_remote_catalog.yaml'
cat = intake.open_catalog(path)
Because the catalog sets use_fsspec=True
and uses simplecache::
in the url, the shapefile is
cached locally:
In [10]: import os
In [11]: import zipfile
In [12]: file = ".cache/MEOW-TNC/data"
In [13]: assert os.path.exists(file)
In [14]: assert zipfile.is_zipfile(file)
Find more such pre-defined regions in remote_climate_data.
Build your own catalog
To create a catalog we use the syntax described in intake. Below we show the catalog used above, which contains two example datasets (the second is the MEOW regions):
plugins:
source:
- module: intake_geopandas
sources:
Countries:
description: Natural Earth Data Admin 0 Countries
metadata:
url: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/
driver: intake_geopandas.regionmask.RegionmaskSource
args:
urlpath: simplecache::https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
use_fsspec: true
storage_options:
simplecache:
same_names: true
regionmask_kwargs:
names: NAME_EN
abbrevs: _from_name
source: https://www.naturalearthdata.com
MEOW:
description: >-
The Marine Ecoregions Of the World (MEOW) data set is a biogeographic
classification of the worlds coasts and shelves. The ecoregions nest within the
broader biogeographic tiers of Realms and Provinces.
metadata:
url:
https://geospatial.tnc.org/datasets/903c3ae05b264c00a3b5e58a4561b7e6/about
driver: intake_geopandas.regionmask.RegionmaskSource
args:
urlpath: simplecache::https://www.arcgis.com/sharing/rest/content/items/903c3ae05b264c00a3b5e58a4561b7e6/data
use_fsspec: true
storage_options:
simplecache:
same_names: true
cache_storage: .cache/MEOW-TNC/
regionmask_kwargs:
name: MEOW
names: ECOREGION
numbers: ECO_CODE_X
abbrevs: _from_name
source: https://geospatial.tnc.org/datasets/903c3ae05b264c00a3b5e58a4561b7e6/about