Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32722
Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde


Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1110


[1] Drummond S T, Sudduth K A, Joshi A, et al. Statistical and neural methods for site-specific yield prediction(J). Transactions-American Society of Agricultural Engineers, 2003, 46(1): 5-16.
[2] Liu J, Goering C E, Tian L. A neural network for setting target corn yields(J). Transactions-American Society of Agricultural Engineers, 2001, 44(3): 705-714.
[3] Kang F. Mod`eles de croissance de plantes et m´ethodologies adaptees `a leur parametrisation pour l’analyse des ph´enotypes(D)0.5em minus 0.4emChatenay-Malabry, Ecole centrale de Paris, 2013.
[4] Cournede P H, Chen Y, Wu Q, Baey C, Bayol Development and evaluation of plant growth models: Methodology and implementation in the PYGMALION platform, 0.5em minus 0.4emMathematical Modelling of Natural Phenomena, 2013, 8(4): 112-130.
[5] Cournede P H, Letort V, Mathieu A, et al. Some parameter estimation issues in functional-structural plant modelling(J). Mathematical Modelling of Natural Phenomena, 2011, 6(2): 133-159.
[6] Goodwin G C, Payne R L. Dynamic system identification: experiment design and data analysis(J). 1977.
[7] Wallach D, Goffinet B. Mean squared error of prediction in models for studying ecological and agronomic systems(J). Biometrics, 1987: 561-573.
[8] Wallach D. Evaluating crop models(J). Working with Dynamic Crop Models Evaluation, Analysis, Parameterization, and Applications, Elsevier, Amsterdam, 2006: 11-54.
[9] Mess´ean A, Bernard H, de Turckheim ´ E. Concevoir et construire la d´ecision: D´emarches en agriculture, agroalimentaire et espace rural(M). Editions Quae, 2009.
[10] Lecoeur J, Poir´e-Lassus R, Christophe A, et al. Quantifying physiological determinants of genetic variation for yield potential in sunflower. SUNFLO: a model-based analysis(J). Functional plant biology, 2011, 38(3): 246-259.
[11] Brun F, Wallach D, Makowski D, et al. Working with dynamic crop models: Evaluation, analysis, parameterization, and applications(M). Elsevier, 2006.
[12] Saltelli A, Tarantola S, Campolongo F, et al. Sensitivity analysis in practice: a guide to assessing scientific models(M). John Wiley and Sons, 2004.
[13] Saltelli A, Chan K, and Scott EM, eds. Sensitivity analysis. Vol. 1. New York: Wiley, 2000.
[14] Wu, QL, Courn`ede PH and Mathieu, A An efficient computational method for global sensitivity analysis and its application to tree growth modelling(J). Reliability Engineering & System Safety, 2012, 107: 35-43.
[15] Courn`ede PH, Chen Y, Wu QL, Baey C, Bayol B Development and evaluation of plant growth models: Methodology and implementation in the pygmalion platform(J). Mathematical Modelling of Natural Phenomena, 2013, 8: 112-130.
[16] Eberhart R, Kennedy J. A new optimizer using particle swarm theory(C) Micro Machine and Human Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on. IEEE, 1995: 39-43.
[17] Shi Y. Particle swarm optimization: developments, applications and resources(C) Evolutionary computation, 2001. Proceedings of the 2001 Congress on. IEEE, 2001, 1: 81-86.
[18] Shi Y, Eberhart R. Parameter selection in particle swarm optimization(C) Evolutionary programming VII. Springer Berlin/Heidelberg, 1998: 591-600.
[19] Kennedy J. Particle swarm optimization(M) Encyclopedia of machine learning. Springer US, 2011: 760-766.
[20] Kennedy J, Mendes R. Population structure and particle swarm performance(C) Evolutionary Computation, 2002. CEC’02. Proceedings of the 2002 Congress on. IEEE, 2002, 2: 1671-1676.
[21] Clerc M. The swarm and the queen: towards a deterministic and adaptive particle swarm optimization(C) Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on. IEEE, 1999, 3: 1951-1957.
[22] Shi Y, Eberhart R. A modified particle swarm optimizer(C) Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on. IEEE, 1998: 69-73.
[23] Eberhart R C, Shi Y. Comparing inertia weights and constriction factors in particle swarm optimization(C) Evolutionary Computation, 2000. Proceedings of the 2000 Congress on. IEEE, 2000, 1: 84-88.
[24] Schutte J F, Reinbolt J A, Fregly B J, et al. Parallel global optimization with the particle swarm algorithm(J). International journal for numerical methods in engineering, 2004, 61(13): 2296.
[25] Clarke F H. Optimization and nonsmooth analysis(M). Society for Industrial and Applied Mathematics, 1990.
[26] Singh A, Ganapathysubramanian B, Singh A K, et al. Machine learning for high-throughput stress phenotyping in plants(J). Trends in plant science, 2016, 21(2): 110-124.
[27] Von Storch H. Misuses of statistical analysis in climate research(M) Analysis of Climate Variability. Springer Berlin Heidelberg, 1999: 11-26.
[28] Belsley D A. Conditioning diagnostics(M). John Wiley & Sons, Inc., 1991.
[29] Cline A K, Moler C B, Stewart G W, et al. An estimate for the condition number of a matrix(J). SIAM Journal on Numerical Analysis, 1979, 16(2): 368-375.
[30] Yin S, Ding S X, Haghani A, et al. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process(J). Journal of Process Control, 2012, 22(9): 1567-1581.
[31] Shin M Y. The use of ridge regression for yield prediction models with multicollinearity problems(J).. Journal of Korean Forestry Society, 1990, 79(3): 260-268.
[32] Hassan S S, Farhan M, Mangayil R, et al. Bioprocess data mining using regularized regression and random forests(J). BMC systems biology, 2013, 7(1): S5.
[33] Chang J, Clay D E, Dalsted K, et al. Corn (L.) yield prediction using multispectral and multidate reflectance(J). Agronomy journal, 2003, 95(6): 1447-1453.
[34] Abdel-Rahman E M, Mutanga O, Odindi J, et al. A comparison of partial least squares (PLS) and sparse PLS regressions for predicting yield of Swiss chard grown under different irrigation water sources using hyperspectral data(J). Computers and Electronics in Agriculture, 2014, 106: 11-19.
[35] Hall M A. Correlation-based feature selection of discrete and numeric class machine learning(J). 2000.
[36] Ru G. Data mining of agricultural yield data: A comparison of regression models(C) Industrial Conference on Data Mining. Springer Berlin Heidelberg, 2009: 24-37.
[37] Albuquerque M C F, de Carvalho N M. Effect of the type of environmental stress on the emergence of sunflower (Helianthus annus L.), soybean (Glycine max (L.) Merril) and maize (Zea mays L.) seeds with different levels of vigor(J). Seed Science and Technology (Switzerland), 2003, 31(2): 465-479.
[38] Midmore E K, McCartan S A, Jinks R L, et al. Using thermal time models to predict germination of five provenances of silver birch (Betula pendula Roth) in southern England(J). Silva Fennica, 2015, 49(2).
[39] Atwell B J, Kriedemann P E, Turnbull C G N. Plants in action: adaptation in nature, performance in cultivation(M). Macmillan Education AU, 1999.
[40] Williams M M. Agronomics and economics of plant population density on processing sweet corn(J). Field Crops Research, 2012, 128: 55-61.
[41] Monteith J L, Moss C J. Climate and the efficiency of crop production in Britain (and discussion)(J). Philosophical Transactions of the Royal Society of London B: Biological Sciences, 1977, 281(980): 277-294.