Counting Roof Faces: A Dataset of 890K Buildings
Part 1 of 2 in the Roof Face Count series. Next: From Point Regression to Roof Slope Detection.
Why Roof Structure Matters
Detecting buildings in satellite or drone images is essentially a solved problem. Modern foundation models like SAM can reliably segment buildings at resolutions below 50 cm. But segmenting the parts of a roof—the individual slopes, or “faces”—remains unsolved, even at the very high resolutions (1–3 cm) captured by drones.
Why does this matter? Roof slope information is essential for:
- Roof measurement reports: Insurance companies and roofing contractors need accurate measurements of each roof face for estimating materials and costs.
- Solar panel placement: Optimal solar array design requires knowing the orientation and area of each roof slope.
- Structural assessment: The number and arrangement of roof faces reveals the complexity of the underlying structure.
The challenge lies in the enormous visual variability of roofs: different shapes (gable, hip, mansard, flat), different materials and textures, occlusions from trees and multi-story structures, and varying viewing angles between nadir (straight down) and oblique perspectives.
The Task: One Point Per Slope
Rather than attempting full segmentation of roof faces (which would require polygon-level annotation at enormous scale), we framed the problem as point regression: place one point on each visible roof slope. This formulation is:
- Efficient to annotate: A single click per slope, enabling annotation at scale.
- Sufficient for counting: The number of predicted points equals the number of roof faces.
- A stepping stone to segmentation: Point locations can seed more detailed segmentation models.
Data Sources
We combined two major sources of imagery to build a diverse dataset:
AIRS (Aerial Imagery for Roof Segmentation)
The AIRS dataset provides high-resolution aerial imagery originally designed for roof segmentation research. We exported approximately 210K building images from AIRS, which provides consistently processed, nadir-view aerial photos.
OpenAerialMap Drone Imagery
To complement the standardized aerial imagery with real-world drone captures, we sourced approximately 627K drone images from OpenAerialMap. These images introduce the variability that production systems must handle:
- Variable ground sample distances (1–8 cm typically)
- Different camera sensors and color profiles
- Off-nadir viewing angles
- Varying weather and lighting conditions
The Processing Pipeline
Raw imagery—especially from diverse drone sources—requires significant processing before it can be used for training. Our pipeline handles three major challenges:
1. Building Isolation with SAM
Many images contain multiple buildings, trees, cars, and other objects. We used the Segment Anything Model (SAM) to isolate individual buildings. This is precisely what SAM excels at—segmenting prominent objects—though importantly, SAM cannot identify individual roof faces without human guidance, which is why our annotation task exists.
2. Parcel-Based Cropping
For drone images covering wide areas, we used property parcel boundaries to crop regions around individual buildings. This ensures each training sample focuses on a single structure with appropriate context. We applied an expanded bounding box (150% margin) around the parcel to include surrounding context that helps models reason about building boundaries.
3. Color Correction
Drone images from different sensors and lighting conditions exhibit dramatic color variation. Left uncorrected, this variation forces models to waste capacity on illumination invariance rather than learning roof structure.
We applied Gaussian-based color normalization, testing several parameter combinations. Through a Wilcoxon statistical test on human preference ratings, we found that a target distribution with mean=128 and standard deviation=64 produced the most natural-looking results while improving edge visibility—critical for detecting roof face boundaries.
Dataset Statistics
The final dataset (Export 17) contains 835,539 images:
- ~210K from AIRS (aerial/satellite)
- ~625K from drone imagery (after SAM processing and filtering)
- Buildings annotated with 1 to 15+ roof face points
- Coverage across diverse US residential areas
The dataset captures buildings ranging from simple 2-face gable roofs to complex multi-hip structures with 10+ faces. The distribution is heavily skewed toward simpler roofs (2–4 faces), reflecting the actual distribution of residential roof types, but includes enough complex examples to train robust models.
Total images in the database reached 883,894, with ongoing annotation expanding coverage toward 1 million buildings.
Key Design Decisions
Several decisions shaped the dataset’s utility:
- Point annotation over polygon annotation: Enabled 100x faster annotation, making the 890K scale feasible with a modest annotation team.
- Mixed resolution sources: Combining standardized aerial imagery with noisy drone data ensures models trained on this dataset transfer to real-world deployment.
- SAM-based isolation: Automated building isolation removed the need for manual cropping, which would have been prohibitive at this scale.
- Color normalization: Reduced spurious variation while preserving the structural edge information that matters for roof face detection.
Part 1 of 2 in the Roof Face Count series. Next: From Point Regression to Roof Slope Detection.