Files
claude-scientific-skills/scientific-skills/geomaster/SKILL.md
urabbani 85591617fc Improve GeoMaster skill with Rust support, troubleshooting, and modern workflows
Major improvements to the GeoMaster geospatial science skill:

### New Features
- Added Rust geospatial support (GeoRust crates: geo, proj, shapefile, rstar)
- Added comprehensive coordinate systems reference documentation
- Added troubleshooting guide with common error fixes
- Added cloud-native workflows (STAC, Planetary Computer, COG)
- Added automatic skill activation configuration

### Reference Documentation
- NEW: references/coordinate-systems.md - CRS fundamentals, UTM zones, EPSG codes
- NEW: references/troubleshooting.md - Installation fixes, runtime errors, performance tips
- UPDATED: references/programming-languages.md - Now covers 8 languages (added Rust)

### Main Skill File
- Streamlined SKILL.md from 690 to 362 lines (500-line rule compliance)
- Enhanced installation instructions with uv and conda
- Added modern cloud-native workflow examples
- Added performance optimization tips

### Documentation
- NEW: GEOMASTER_IMPROVEMENTS.md - Complete changelog and testing guide
- UPDATED: README.md - Highlight new capabilities

### Skill Activation
- Created skill-rules.json with 150+ keywords and 50+ intent patterns
- Supports file-based and content-based automatic activation

The skill now covers 8 programming languages (Python, R, Julia, JavaScript,
C++, Java, Go, Rust) with 500+ code examples across 70+ geospatial topics.
2026-03-05 14:26:09 +05:00

366 lines
12 KiB
Markdown

---
name: geomaster
description: Comprehensive geospatial science skill covering remote sensing, GIS, spatial analysis, machine learning for earth observation, and 30+ scientific domains. Supports satellite imagery processing (Sentinel, Landsat, MODIS, SAR, hyperspectral), vector and raster data operations, spatial statistics, point cloud processing, network analysis, cloud-native workflows (STAC, COG, Planetary Computer), and 8 programming languages (Python, R, Julia, JavaScript, C++, Java, Go, Rust) with 500+ code examples. Use for remote sensing workflows, GIS analysis, spatial ML, Earth observation data processing, terrain analysis, hydrological modeling, marine spatial analysis, atmospheric science, and any geospatial computation task.
license: MIT License
metadata:
skill-author: K-Dense Inc.
---
# GeoMaster
Comprehensive geospatial science skill covering GIS, remote sensing, spatial analysis, and ML for Earth observation across 70+ topics with 500+ code examples in 8 programming languages.
## Installation
```bash
# Core Python stack (conda recommended)
conda install -c conda-forge gdal rasterio fiona shapely pyproj geopandas
# Remote sensing & ML
uv pip install rsgislib torchgeo earthengine-api
uv pip install scikit-learn xgboost torch-geometric
# Network & visualization
uv pip install osmnx networkx folium keplergl
uv pip install cartopy contextily mapclassify
# Big data & cloud
uv pip install xarray rioxarray dask-geopandas
uv pip install pystac-client planetary-computer
# Point clouds
uv pip install laspy pylas open3d pdal
# Databases
conda install -c conda-forge postgis spatialite
```
## Quick Start
### NDVI from Sentinel-2
```python
import rasterio
import numpy as np
with rasterio.open('sentinel2.tif') as src:
red = src.read(4).astype(float) # B04
nir = src.read(8).astype(float) # B08
ndvi = (nir - red) / (nir + red + 1e-8)
ndvi = np.nan_to_num(ndvi, nan=0)
profile = src.profile
profile.update(count=1, dtype=rasterio.float32)
with rasterio.open('ndvi.tif', 'w', **profile) as dst:
dst.write(ndvi.astype(rasterio.float32), 1)
```
### Spatial Analysis with GeoPandas
```python
import geopandas as gpd
# Load and ensure same CRS
zones = gpd.read_file('zones.geojson')
points = gpd.read_file('points.geojson')
if zones.crs != points.crs:
points = points.to_crs(zones.crs)
# Spatial join and statistics
joined = gpd.sjoin(points, zones, how='inner', predicate='within')
stats = joined.groupby('zone_id').agg({
'value': ['count', 'mean', 'std', 'min', 'max']
}).round(2)
```
### Google Earth Engine Time Series
```python
import ee
import pandas as pd
ee.Initialize(project='your-project')
roi = ee.Geometry.Point([-122.4, 37.7]).buffer(10000)
s2 = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
.filterBounds(roi)
.filterDate('2020-01-01', '2023-12-31')
.filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)))
def add_ndvi(img):
return img.addBands(img.normalizedDifference(['B8', 'B4']).rename('NDVI'))
s2_ndvi = s2.map(add_ndvi)
def extract_series(image):
stats = image.reduceRegion(ee.Reducer.mean(), roi.centroid(), scale=10, maxPixels=1e9)
return ee.Feature(None, {'date': image.date().format('YYYY-MM-dd'), 'ndvi': stats.get('NDVI')})
series = s2_ndvi.map(extract_series).getInfo()
df = pd.DataFrame([f['properties'] for f in series['features']])
df['date'] = pd.to_datetime(df['date'])
```
## Core Concepts
### Data Types
| Type | Examples | Libraries |
|------|----------|-----------|
| Vector | Shapefile, GeoJSON, GeoPackage | GeoPandas, Fiona, GDAL |
| Raster | GeoTIFF, NetCDF, COG | Rasterio, Xarray, GDAL |
| Point Cloud | LAS, LAZ | Laspy, PDAL, Open3D |
### Coordinate Systems
- **EPSG:4326** (WGS 84) - Geographic, lat/lon, use for storage
- **EPSG:3857** (Web Mercator) - Web maps only (don't use for area/distance!)
- **EPSG:326xx/327xx** (UTM) - Metric calculations, <1% distortion per zone
- Use `gdf.estimate_utm_crs()` for automatic UTM detection
```python
# Always check CRS before operations
assert gdf1.crs == gdf2.crs, "CRS mismatch!"
# For area/distance calculations, use projected CRS
gdf_metric = gdf.to_crs(gdf.estimate_utm_crs())
area_sqm = gdf_metric.geometry.area
```
### OGC Standards
- **WMS**: Web Map Service - raster maps
- **WFS**: Web Feature Service - vector data
- **WCS**: Web Coverage Service - raster coverage
- **STAC**: Spatiotemporal Asset Catalog - modern metadata
## Common Operations
### Spectral Indices
```python
def calculate_indices(image_path):
"""NDVI, EVI, SAVI, NDWI from Sentinel-2."""
with rasterio.open(image_path) as src:
B02, B03, B04, B08, B11 = [src.read(i).astype(float) for i in [1,2,3,4,5]]
ndvi = (B08 - B04) / (B08 + B04 + 1e-8)
evi = 2.5 * (B08 - B04) / (B08 + 6*B04 - 7.5*B02 + 1)
savi = ((B08 - B04) / (B08 + B04 + 0.5)) * 1.5
ndwi = (B03 - B08) / (B03 + B08 + 1e-8)
return {'NDVI': ndvi, 'EVI': evi, 'SAVI': savi, 'NDWI': ndwi}
```
### Vector Operations
```python
# Buffer (use projected CRS!)
gdf_proj = gdf.to_crs(gdf.estimate_utm_crs())
gdf['buffer_1km'] = gdf_proj.geometry.buffer(1000)
# Spatial relationships
intersects = gdf[gdf.geometry.intersects(other_geometry)]
contains = gdf[gdf.geometry.contains(point_geometry)]
# Geometric operations
gdf['centroid'] = gdf.geometry.centroid
gdf['simplified'] = gdf.geometry.simplify(tolerance=0.001)
# Overlay operations
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
union = gpd.overlay(gdf1, gdf2, how='union')
```
### Terrain Analysis
```python
def terrain_metrics(dem_path):
"""Calculate slope, aspect, hillshade from DEM."""
with rasterio.open(dem_path) as src:
dem = src.read(1)
dy, dx = np.gradient(dem)
slope = np.arctan(np.sqrt(dx**2 + dy**2)) * 180 / np.pi
aspect = (90 - np.arctan2(-dy, dx) * 180 / np.pi) % 360
# Hillshade
az_rad, alt_rad = np.radians(315), np.radians(45)
hillshade = (np.sin(alt_rad) * np.sin(np.radians(slope)) +
np.cos(alt_rad) * np.cos(np.radians(slope)) *
np.cos(np.radians(aspect) - az_rad))
return slope, aspect, hillshade
```
### Network Analysis
```python
import osmnx as ox
import networkx as nx
# Download and analyze street network
G = ox.graph_from_place('San Francisco, CA', network_type='drive')
G = ox.add_edge_speeds(G).add_edge_travel_times(G)
# Shortest path
orig = ox.distance.nearest_nodes(G, -122.4, 37.7)
dest = ox.distance.nearest_nodes(G, -122.3, 37.8)
route = nx.shortest_path(G, orig, dest, weight='travel_time')
```
## Image Classification
```python
from sklearn.ensemble import RandomForestClassifier
import rasterio
from rasterio.features import rasterize
def classify_imagery(raster_path, training_gdf, output_path):
"""Train RF and classify imagery."""
with rasterio.open(raster_path) as src:
image = src.read()
profile = src.profile
transform = src.transform
# Extract training data
X_train, y_train = [], []
for _, row in training_gdf.iterrows():
mask = rasterize([(row.geometry, 1)],
out_shape=(profile['height'], profile['width']),
transform=transform, fill=0, dtype=np.uint8)
pixels = image[:, mask > 0].T
X_train.extend(pixels)
y_train.extend([row['class_id']] * len(pixels))
# Train and predict
rf = RandomForestClassifier(n_estimators=100, max_depth=20, n_jobs=-1)
rf.fit(X_train, y_train)
prediction = rf.predict(image.reshape(image.shape[0], -1).T)
prediction = prediction.reshape(profile['height'], profile['width'])
profile.update(dtype=rasterio.uint8, count=1)
with rasterio.open(output_path, 'w', **profile) as dst:
dst.write(prediction.astype(rasterio.uint8), 1)
return rf
```
## Modern Cloud-Native Workflows
### STAC + Planetary Computer
```python
import pystac_client
import planetary_computer
import odc.stac
# Search Sentinel-2 via STAC
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
search = catalog.search(
collections=["sentinel-2-l2a"],
bbox=[-122.5, 37.7, -122.3, 37.9],
datetime="2023-01-01/2023-12-31",
query={"eo:cloud_cover": {"lt": 20}},
)
# Load as xarray (cloud-native!)
data = odc.stac.load(
list(search.get_items())[:5],
bands=["B02", "B03", "B04", "B08"],
crs="EPSG:32610",
resolution=10,
)
# Calculate NDVI on xarray
ndvi = (data.B08 - data.B04) / (data.B08 + data.B04)
```
### Cloud-Optimized GeoTIFF (COG)
```python
import rasterio
from rasterio.session import AWSSession
# Read COG directly from cloud (partial reads)
session = AWSSession(aws_access_key_id=..., aws_secret_access_key=...)
with rasterio.open('s3://bucket/path.tif', session=session) as src:
# Read only window of interest
window = ((1000, 2000), (1000, 2000))
subset = src.read(1, window=window)
# Write COG
with rasterio.open('output.tif', 'w', **profile,
tiled=True, blockxsize=256, blockysize=256,
compress='DEFLATE', predictor=2) as dst:
dst.write(data)
# Validate COG
from rio_cogeo.cogeo import cog_validate
cog_validate('output.tif')
```
## Performance Tips
```python
# 1. Spatial indexing (10-100x faster queries)
gdf.sindex # Auto-created by GeoPandas
# 2. Chunk large rasters
with rasterio.open('large.tif') as src:
for i, window in src.block_windows(1):
block = src.read(1, window=window)
# 3. Dask for big data
import dask.array as da
dask_array = da.from_rasterio('large.tif', chunks=(1, 1024, 1024))
# 4. Use Arrow for I/O
gdf.to_file('output.gpkg', use_arrow=True)
# 5. GDAL caching
from osgeo import gdal
gdal.SetCacheMax(2**30) # 1GB cache
# 6. Parallel processing
rf = RandomForestClassifier(n_jobs=-1) # All cores
```
## Best Practices
1. **Always check CRS** before spatial operations
2. **Use projected CRS** for area/distance calculations
3. **Validate geometries**: `gdf = gdf[gdf.is_valid]`
4. **Handle missing data**: `gdf['geometry'] = gdf['geometry'].fillna(None)`
5. **Use efficient formats**: GeoPackage > Shapefile, Parquet for large data
6. **Apply cloud masking** to optical imagery
7. **Preserve lineage** for reproducible research
8. **Use appropriate resolution** for your analysis scale
## Detailed Documentation
- **[Coordinate Systems](references/coordinate-systems.md)** - CRS fundamentals, UTM, transformations
- **[Core Libraries](references/core-libraries.md)** - GDAL, Rasterio, GeoPandas, Shapely
- **[Remote Sensing](references/remote-sensing.md)** - Satellite missions, spectral indices, SAR
- **[Machine Learning](references/machine-learning.md)** - Deep learning, CNNs, GNNs for RS
- **[GIS Software](references/gis-software.md)** - QGIS, ArcGIS, GRASS integration
- **[Scientific Domains](references/scientific-domains.md)** - Marine, hydrology, agriculture, forestry
- **[Advanced GIS](references/advanced-gis.md)** - 3D GIS, spatiotemporal, topology
- **[Big Data](references/big-data.md)** - Distributed processing, GPU acceleration
- **[Industry Applications](references/industry-applications.md)** - Urban planning, disaster management
- **[Programming Languages](references/programming-languages.md)** - Python, R, Julia, JS, C++, Java, Go, Rust
- **[Data Sources](references/data-sources.md)** - Satellite catalogs, APIs
- **[Troubleshooting](references/troubleshooting.md)** - Common issues, debugging, error reference
- **[Code Examples](references/code-examples.md)** - 500+ examples
---
**GeoMaster covers everything from basic GIS operations to advanced remote sensing and machine learning.**