Federated Learning for Scalable and Privacy-Aware Soil Modeling

Soil data are often sensitive, fragmented, and difficult to share due to privacy concerns, ownership rights, and institutional or commercial restrictions. These barriers limit collaboration and slow progress in large-scale soil modeling. This project develops decentralized machine learning frameworks for soil property mapping and soil spectroscopy that enable collaborative model development without requiring the exchange of raw data. By leveraging federated learning, farms, laboratories, and research institutions can jointly train high-performing soil models while maintaining full control over their local datasets.

Technical approach

The framework applies federated learning (FL) to both satellite-based soil mapping and laboratory soil spectroscopy. Deep learning models—primarily convolutional neural networks—are trained locally at participating sites using heterogeneous datasets that vary across regions, sensors, and sampling strategies. Instead of transferring raw soil or spectral data, only encrypted model updates are shared with a coordinating server, where they are aggregated into a global model. The system supports non-identically distributed (non-IID) data and varying data volumes across contributors through strategies such as federated averaging and weighted aggregation, improving robustness, generalization, and local adaptability across diverse soil environments. 

Expected outcomes

The project delivers privacy-preserving, decentralized soil models capable of predicting key soil properties, including soil organic carbon, texture, pH, and cation exchange capacity, with accuracy comparable to—or exceeding—traditional centralized approaches. Each participating farm or institution receives a fine-tuned local model optimized for its own conditions, while simultaneously contributing to the continuous improvement of a shared global model. This enables effective collaboration across data-scarce and data-rich settings alike, without ever requiring contributors to relinquish ownership or disclose raw soil data.

Impact

The project enables collaborative soil modeling while preserving data ownership and privacy. It reduces barriers to data sharing, improves model transferability across regions, and provides a scalable pathway for distributed soil monitoring and decision support.
management at scale.

Outreach materials

Figure 1: Federated learning architecture for decentralized soil and spectral modeling, where local models are trained on heterogeneous datasets and synchronized through secure aggregation of model parameters rather than data exchange.
Figure 2: Satellite imagery is processed to generate bare-soil composites using cloud-based workflows and served through a geospatial data infrastructure. A global soil model is initially trained on reference datasets and subsequently refined through federated learning across distributed farms, where local fine-tuning occurs without sharing raw data. The resulting models are used locally to produce farm-scale maps of soil properties, such as soil organic carbon and clay content, while preserving data privacy and ownership.