Files
claude-scientific-skills/scientific-skills/geomaster/references/data-sources.md
urabbani 4787f98d98 Add GeoMaster: Comprehensive Geospatial Science Skill
- Added SKILL.md with installation, quick start, core concepts, workflows
- Added 12 reference documentation files covering 70+ topics
- Includes 500+ code examples across 7 programming languages
- Covers remote sensing, GIS, ML/AI, 30+ scientific domains
- MIT License

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Dr. Umair Rabbani <umairrs@gmail.com>
2026-03-01 13:42:41 +05:00

9.1 KiB

Geospatial Data Sources

Comprehensive catalog of satellite imagery, vector data, and APIs for geospatial analysis.

Satellite Data Sources

Sentinel Missions (ESA)

Platform Resolution Coverage Access
Sentinel-2 10-60m Global https://scihub.copernicus.eu/
Sentinel-1 5-40m (SAR) Global https://scihub.copernicus.eu/
Sentinel-3 300m-1km Global https://scihub.copernicus.eu/
Sentinel-5P Various Global https://scihub.copernicus.eu/
# Access via Sentinelsat
from sentinelsat import SentinelAPI, read_geojson, geojson_to_wkt

api = SentinelAPI('user', 'password', 'https://scihub.copernicus.eu/dhus')

# Search
products = api.query(geojson_to_wkt(aoi_geojson),
                     date=('20230101', '20231231'),
                     platformname='Sentinel-2',
                     cloudcoverpercentage=(0, 20))

# Download
api.download_all(products)

Landsat (USGS/NASA)

Platform Resolution Coverage Access
Landsat 9 30m Global https://earthexplorer.usgs.gov/
Landsat 8 30m Global https://earthexplorer.usgs.gov/
Landsat 7 15-60m Global https://earthexplorer.usgs.gov/
Landsat 5-7 30-60m Global https://earthexplorer.usgs.gov/

Commercial Satellite Data

Provider Platform Resolution API
Planet PlanetScope, SkySat 0.5-3m planet.com
Maxar WorldView, GeoEye 0.3-1.2m maxar.com
Airbus Pleiades, SPOT 0.5-2m airbus.com
Capella Capella-2 (SAR) 0.5-1m capellaspace.com

Elevation Data

Dataset Resolution Coverage Source
AW3D30 30m Global https://www.eorc.jaxa.jp/ALOS/en/aw3d30/
SRTM 30m 56°S-60°N https://www.usgs.gov/
ASTER GDEM 30m 83°S-83°N https://asterweb.jpl.nasa.gov/
Copernicus DEM 30m Global https://copernicus.eu/
ArcticDEM 2-10m Arctic https://www.pgc.umn.edu/
# Download SRTM via API
import elevation

# Download SRTM 1 arc-second (30m)
elevation.clip(bounds=(-122.5, 37.7, -122.3, 37.9), output='srtm.tif')

# Clean and fill gaps
elevation.clean('srtm.tif', 'srtm_filled.tif')

Land Cover Data

Dataset Resolution Classes Source
ESA WorldCover 10m 11 classes https://worldcover2021.esa.int/
ESRI Land Cover 10m 10 classes https://www.esri.com/
Copernicus Global 100m 23 classes https://land.copernicus.eu/
MODIS MCD12Q1 500m 17 classes https://lpdaac.usgs.gov/
NLCD (US) 30m 20 classes https://www.mrlc.gov/

Climate & Weather Data

Reanalysis Data

Dataset Resolution Temporal Access
ERA5 31km Hourly (1979+) https://cds.climate.copernicus.eu/
MERRA-2 50km Hourly (1980+) https://gmao.gsfc.nasa.gov/
JRA-55 55km 3-hourly (1958+) https://jra.kishou.go.jp/
# Download ERA5 via CDS API
import cdsapi

c = cdsapi.Client()

c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type': 'reanalysis',
        'variable': '2m_temperature',
        'year': '2023',
        'month': '01',
        'day': '01',
        'time': '12:00',
        'area': [37.9, -122.5, 37.7, -122.3],
        'format': 'netcdf'
    },
    'era5_temp.nc'
)

OpenStreetMap Data

Access Methods

# Via OSMnx
import osmnx as ox

# Download place boundary
gdf = ox.geocode_to_gdf('San Francisco, CA')

# Download street network
G = ox.graph_from_place('San Francisco, CA', network_type='drive')

# Download building footprints
buildings = ox.geometries_from_place('San Francisco, CA', tags={'building': True})

# Via Overpass API
import requests

overpass_url = "http://overpass-api.de/api/interpreter"
query = """
    [out:json];
    way["highway"](37.7,-122.5,37.9,-122.3);
    out geom;
"""

response = requests.get(overpass_url, params={'data': query})
data = response.json()

Vector Data Sources

Natural Earth

import geopandas as gpd

# Admin boundaries (scale: 10m, 50m, 110m)
countries = gpd.read_file('https://naturalearth.s3.amazonaws.com/10m_cultural/ne_10m_admin_0_countries.zip')
urban_areas = gpd.read_file('https://naturalearth.s3.amazonaws.com/10m_cultural/ne_10m_urban_areas.zip')
ports = gpd.read_file('https://naturalearth.s3.amazonaws.com/10m_cultural/ne_10m_ports.zip')

Other Sources

Dataset Type Access
GADM Admin boundaries https://gadm.org/
HydroSHEDS Rivers, basins https://www.hydrosheds.org/
Global Power Plant Power plants https://datasets.wri.org/
WorldPop Population https://www.worldpop.org/
GPW Population https://sedac.ciesin.columbia.edu/
HDX Humanitarian data https://data.humdata.org/

APIs

Google Maps Platform

import requests

# Geocoding
url = "https://maps.googleapis.com/maps/api/geocode/json"
params = {
    'address': 'Golden Gate Bridge',
    'key': YOUR_API_KEY
}

response = requests.get(url, params=params)
data = response.json()
location = data['results'][0]['geometry']['location']

Mapbox

# Geocoding
import requests

url = "https://api.mapbox.com/geocoding/v5/mapbox.places/Golden%20Gate%20Bridge.json"
params = {'access_token': YOUR_ACCESS_TOKEN}

response = requests.get(url, params=params)
data = response.json()

OpenWeatherMap

# Current weather
url = "https://api.openweathermap.org/data/2.5/weather"
params = {
    'lat': 37.7,
    'lon': -122.4,
    'appid': YOUR_API_KEY
}

response = requests.get(url, params=params)
weather = response.json()

Data APIs in Python

STAC (SpatioTemporal Asset Catalog)

import pystac_client

# Connect to STAC catalog
catalog = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")

# Search
search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[-122.5, 37.7, -122.3, 37.9],
    datetime="2023-01-01/2023-12-31",
    query={"eo:cloud_cover": {"lt": 20}}
)

items = search.get_all_items()

Planetary Computer

import planetary_computer
import pystac_client

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace
)

# Search and sign items
items = catalog.search(...)
signed_items = [planetary_computer.sign(item) for item in items]

Download Scripts

Automated Download Script

from sentinelsat import SentinelAPI
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
import os

def download_and_process_sentinel2(aoi, date_range, output_dir):
    """
    Download and process Sentinel-2 imagery.
    """
    # Initialize API
    api = SentinelAPI('user', 'password', 'https://scihub.copernicus.eu/dhus')

    # Search
    products = api.query(
        aoi,
        date=date_range,
        platformname='Sentinel-2',
        processinglevel='Level-2A',
        cloudcoverpercentage=(0, 20)
    )

    # Download
    api.download_all(products, directory_path=output_dir)

    # Process each product
    for product in products:
        product_path = f"{output_dir}/{product['identifier']}.SAFE"
        processed = process_sentinel2_product(product_path)
        save_rgb_composite(processed, f"{output_dir}/{product['identifier']}_rgb.tif")

def process_sentinel2_product(product_path):
    """Process Sentinel-2 L2A product."""
    # Find 10m bands (B02, B03, B04, B08)
    bands = {}
    for band_id in ['B02', 'B03', 'B04', 'B08']:
        band_path = find_band_file(product_path, band_id, resolution='10m')
        with rasterio.open(band_path) as src:
            bands[band_id] = src.read(1)
            profile = src.profile

    # Stack bands
    stacked = np.stack([bands['B04'], bands['B03'], bands['B02']])  # RGB

    return stacked, profile

Data Quality Assessment

def assess_data_quality(raster_path):
    """
    Assess quality of geospatial raster data.
    """
    import rasterio
    import numpy as np

    with rasterio.open(raster_path) as src:
        data = src.read()
        profile = src.profile

    quality_report = {
        'nodata_percentage': np.sum(data == src.nodata) / data.size * 100,
        'data_range': (data.min(), data.max()),
        'mean': np.mean(data),
        'std': np.std(data),
        'has_gaps': np.any(data == src.nodata),
        'projection': profile['crs'],
        'resolution': (profile['transform'][0], abs(profile['transform'][4]))
    }

    return quality_report

For data access code examples, see code-examples.md.