Paul-Henry Cournéde

Publications

1 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Particle Swarm Optimization, Sensitivity Analysis, Crop Model, random forest, crop yield prediction, paramater estimation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 443

Abstracts

2 Bayesian Estimation of Hierarchical Models for Genotypic Differentiation of Arabidopsis thaliana

Authors: Paul-Henry Cournéde, Gautier Viaud

Abstract:

Plant growth models have been used extensively for the prediction of the phenotypic performance of plants. However, they remain most often calibrated for a given genotype and therefore do not take into account genotype by environment interactions. One way of achieving such an objective is to consider Bayesian hierarchical models. Three levels can be identified in such models: The first level describes how a given growth model describes the phenotype of the plant as a function of individual parameters, the second level describes how these individual parameters are distributed within a plant population, the third level corresponds to the attribution of priors on population parameters. Thanks to the Bayesian framework, choosing appropriate priors for the population parameters permits to derive analytical expressions for the full conditional distributions of these population parameters. As plant growth models are of a nonlinear nature, individual parameters cannot be sampled explicitly, and a Metropolis step must be performed. This allows for the use of a hybrid Gibbs--Metropolis sampler. A generic approach was devised for the implementation of both general state space models and estimation algorithms within a programming platform. It was designed using the Julia language, which combines an elegant syntax, metaprogramming capabilities and exhibits high efficiency. Results were obtained for Arabidopsis thaliana on both simulated and real data. An organ-scale Greenlab model for the latter is thus presented, where the surface areas of each individual leaf can be simulated. It is assumed that the error made on the measurement of leaf areas is proportional to the leaf area itself; multiplicative normal noises for the observations are therefore used. Real data were obtained via image analysis of zenithal images of Arabidopsis thaliana over a period of 21 days using a two-step segmentation and tracking algorithm which notably takes advantage of the Arabidopsis thaliana phyllotaxy. Since the model formulation is rather flexible, there is no need that the data for a single individual be available at all times, nor that the times at which data is available be the same for all the different individuals. This allows to discard data from image analysis when it is not considered reliable enough, thereby providing low-biased data in large quantity for leaf areas. The proposed model precisely reproduces the dynamics of Arabidopsis thaliana’s growth while accounting for the variability between genotypes. In addition to the estimation of the population parameters, the level of variability is an interesting indicator of the genotypic stability of model parameters. A promising perspective is to test whether some of the latter should be considered as fixed effects.

Keywords: Bayesian, Hierarchical models, genotypic differentiation, plant growth models

Procedia PDF Downloads 200
1 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Particle Swarm Optimization, Sensitivity Analysis, Crop Model, random forest, crop yield prediction, paramater estimation

Procedia PDF Downloads 110