Property Intelligence from Satellite and Drone Imagery
The Goal: Automated Property Reports
Property inspection—for insurance underwriting, real estate appraisal, or damage assessment—traditionally requires a human to visit the site, photograph the property, and manually catalog its features. This is slow, expensive, and scales poorly across portfolios of thousands of properties.
The promise of automated property intelligence is to extract the same information from overhead imagery: identify the buildings on a parcel, delineate trees and vegetation, and then analyze specific property features like roof type, condition, and surrounding hazards. The two primary imagery sources are satellite (typically 30 cm ground sample distance) and drone (typically 10–23 cm GSD), each with distinct tradeoffs in coverage, cost, and resolution.
The Two-Stage Pipeline
Our approach breaks the problem into two stages:
Stage 1: Identify Key Objects
The first task is semantic segmentation—labeling every pixel in the image as belonging to a house, tree, driveway, pool, or background. This provides the spatial layout of the property and isolates the regions of interest for downstream analysis.
Stage 2: Analyze Property Features
Once buildings and trees are segmented, we extract property-level attributes: roof area, vegetation encroachment, proximity of trees to structures, and the presence of specific features like pools or detached garages. These attributes feed directly into property reports used by insurers and appraisers.
Zero-Shot Baseline: SAM-GEO
Before investing in custom model training, we established a baseline using SAM-GEO, a geospatial adaptation of Meta’s Segment Anything Model (SAM). SAM-GEO applies SAM’s zero-shot segmentation capability to overhead imagery, requiring no task-specific training data.
The key question was: how well does a foundation model designed for general segmentation handle the specific task of identifying houses and trees from overhead?
SAM-GEO Performance by Resolution
The results revealed a stark resolution dependency:
-
Drone resolution (10–23 cm GSD): SAM-GEO detected 93–95% of houses and trees. At this resolution, buildings have clear boundaries, roofs show distinct texture and color from surrounding ground, and trees cast recognizable shadow patterns. The zero-shot approach works remarkably well.
-
Satellite resolution (30 cm GSD): Performance dropped to 55–60% detection. At 30 cm, small structures begin to blur into their surroundings, tree canopies merge, and the reduced detail makes it harder for a general-purpose model to distinguish buildings from other rectangular features.
This 35-percentage-point gap between drone and satellite resolution motivated our decision to fine-tune dedicated models, particularly for satellite imagery where zero-shot performance is insufficient for production use.
Fine-Tuned Models: Pushing Accuracy Higher
Starting from the SAM-GEO baseline annotations as seed labels (corrected and refined by human annotators), we trained supervised segmentation models to close the accuracy gap.
Architecture Selection
We evaluated several semantic segmentation architectures, ultimately focusing on SegFormer—a transformer-based architecture that combines the global context modeling of self-attention with the computational efficiency needed for high-resolution geospatial imagery.
SegFormer uses a hierarchical encoder (the Mix Transformer, or MiT) that processes images at multiple scales, naturally capturing both fine-grained boundary details and broader spatial context. This multi-scale representation is particularly well-suited to overhead imagery, where the same scene contains both large structures (building footprints) and fine details (roof edges, individual trees).
Results on Drone Imagery
Using the SegFormer architecture with the MiT-B0 encoder (the smallest variant), we achieved strong results on drone imagery at 10–15 cm GSD:
| Metric | Score |
|---|---|
| Mean IoU (mIoU) | 95.85 |
| Pixel Accuracy | 97.88% |
These numbers represent near-human performance on the house detection task at drone resolution. The 95.85 mIoU means that, on average, the predicted building boundary overlaps with the ground truth by over 95%—more than sufficient for generating accurate property reports.
Even the lightweight MiT-B0 encoder achieved these results, suggesting that the task at drone resolution is well within the capacity of modern segmentation architectures. Larger encoder variants (B2, B3, B5) provided marginal improvements while significantly increasing inference cost.
Resolution: The Critical Variable
The single most important factor in property segmentation accuracy is ground sample distance. Our experiments across different resolution bands tell a clear story:
Drone Imagery (10–15 cm GSD)
At this resolution, individual shingles are sometimes visible, roof edges are sharp, and the distinction between buildings and ground is unambiguous. Both zero-shot (SAM-GEO) and fine-tuned models perform well. The main challenges are occlusion from trees and shadow-induced boundary ambiguity.
Drone Imagery (15–23 cm GSD)
Performance remains strong but begins to degrade on smaller structures (sheds, detached garages) and in areas with dense vegetation. Fine-tuned models maintain an advantage over zero-shot approaches.
Satellite Imagery (30 cm GSD)
The resolution threshold where zero-shot models struggle significantly. Fine-tuned models recover much of the lost accuracy, but small structures and properties under heavy tree cover remain challenging. At 30 cm, a typical residential building occupies only 50–100 pixels across its longest dimension, leaving little margin for boundary errors.
From Segmentation to Intelligence
Raw segmentation masks—pixel-level labels of “house” and “tree”—are just the starting point. Converting these into actionable property intelligence requires several additional steps:
-
Instance separation: When multiple buildings are adjacent or overlapping in the segmentation mask, they must be separated into individual structures. Connected component analysis with morphological operations handles most cases, but tightly packed townhouses and L-shaped buildings require more sophisticated approaches.
-
Area computation: Converting pixel counts to real-world square footage requires accurate knowledge of the image’s ground sample distance and projection. For drone imagery with known flight parameters, this is straightforward; for satellite imagery, it depends on the sensor’s orthorectification accuracy.
-
Tree-to-structure proximity: Insurance companies care about overhanging branches and trees that could fall on a structure. Computing this requires not just detecting trees and buildings, but estimating the height and canopy spread of each tree—information that is partially available from shadow analysis at a single time step, or more accurately from stereo or LiDAR data.
-
Temporal change detection: Comparing segmentation results across time reveals new construction, demolished structures, and vegetation changes—all relevant for keeping property records current.
Lessons Learned
-
Zero-shot models are a powerful starting point. SAM-GEO’s 93–95% detection rate on drone imagery means that for high-resolution use cases, a production system can be bootstrapped without any labeled training data.
-
Resolution dictates methodology. At drone resolution, the problem is largely solved by existing architectures. At satellite resolution, domain-specific fine-tuning is essential, and even then, small structures remain difficult.
-
Small models suffice at high resolution. SegFormer MiT-B0 achieved 95.85 mIoU on drone imagery, demonstrating that inference cost can be kept low for drone-based property analysis pipelines.
-
Segmentation is necessary but not sufficient. The real value lies in the property-level attributes extracted from segmentation masks—area, proximity, change over time—not in the masks themselves.