Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31108
A Parallel Approach for 3D-Variational Data Assimilation on GPUs in Ocean Circulation Models

Authors: Rossella Arcucci, Luisa D’Amore, Simone Celestino, Giuseppe Scotti, Giuliano Laccetti


This work is the first dowel in a rather wide research activity in collaboration with Euro Mediterranean Center for Climate Changes, aimed at introducing scalable approaches in Ocean Circulation Models. We discuss designing and implementation of a parallel algorithm for solving the Variational Data Assimilation (DA) problem on Graphics Processing Units (GPUs). The algorithm is based on the fully scalable 3DVar DA model, previously proposed by the authors, which uses a Domain Decomposition approach (we refer to this model as the DD-DA model). We proceed with an incremental porting process consisting of 3 distinct stages: requirements and source code analysis, incremental development of CUDA kernels, testing and optimization. Experiments confirm the theoretic performance analysis based on the so-called scale up factor demonstrating that the DD-DA model can be suitably mapped on GPU architectures.

Keywords: Data Assimilation, ocean models, parallel algorithm, GPU architectures

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675


[1] L. Carracciuolo, L. D’Amore, A. Murli, Towards a parallel component for imaging in PETSc programming environment: A case study in 3-D echocardiography, Parallel Computing, Vol. 32, (1), 2006, pp. 67-83.
[2] L. D’Amore, R. Arcucci, L. Marcellino and A. Murli, HPC computation issues of the incremental 3D variational data assimilation scheme in OceanVar software - Journal of Numerical Analysis, Industrial and Applied Mathematics, vol. 7, no. 3-4, 2012, pp. 91-105.
[3] L. D’Amore, R. Arcucci, L. Marcellino, A. Murli - A Parallel Three-dimensional Variational Data Assimilation Scheme - Numerical Analysis and Applied Mathematics, AIP Conference Proccedings, Vol. 1389, 2011, pp. 1829-1831.
[4] L. D’Amore, R. Arcucci, L. Carracciuolo, A. Murli - DD-OceanVar: a Domain Decomposition fully parallel Data Assimilation software in Mediterranean Sea - Procedia Computer Science 18, 2013, pp. 1235-1244.
[5] L. D’Amore, R. Arcucci, L. Carracciuolo, A. Murli - A Scalable Approach for Variational Data Assimilation - Journal of Scientific Computing, Vol. 61, 2014, pp. 239-257.
[6] L. D’Amore, D. Casaburi, A. Galletti, L. Marcellino, A. Murli - Integration of emerging computer technologies for an efficient image sequences analysis, Vol. 18, (4), 2011, pp. 365-378.
[7] L. D’Amore, A. Murli, V. Boccia, L. Carracciuolo - Insertion of PETSc in the NEMO stack software Driving NEMO towards Exascale Computing, High Performance Computing and Simulation (HPCS), July 2014, pp. 724 - 731, DOI:10.1109/HPCSim.2014.6903761.
[8] L. D’Amore, G. Laccetti, D. Romano, G. Scotti, A. Murli - Towards a parallel component in a GPU-CUDA environment: a case study with the L-BFGS Harwell routine - International Journal of Computer Mathematics, DOI: 10.1080/00207160.2014.899589, 2015, Vol 92 (1), pp. 59-76.
[9] L. D’Amore , D. Casaburi, A. Galletti, L. Marcellino, A. Murli - Integration of emerging computer technologies for an efficient image sequences analysis - Integrated Computer-Aided Engineering, Vol. 18, (4), 2011, pp. 365-378.
[10] S. Dobricic, N. Pinardi, An oceanographic three-dimensional variational data assimilation scheme - Ocean Modelling 22, 2008, pp. 89-105.
[11] S.A. Haben, A.S. Lawless,N.K. Nichols: Conditioning of the 3DVAR Data Assimilation Problem, Mathematics Report 3/2009. Department of Mathematics, University of Reading (2009)
[12] M. Harris - How to Implement Performance Metrics in CUDA C/C++ - November 7 2012, NVIDIA Web Site.
[13] NVIDIA, NVIDIA Compute Unified Device Architecture programming guide version 2.3, NVIDIA Developer Web Site, (2009). Available at
[14] NVIDIA, NVIDIA CUDA Programming Guide 3.1.1, 2010.
[15] E. Kalnay - Atmospheric Modeling, Data Assimilation and Predictability. - Cambridge University Press, Cambridge, MA (2003)
[16] Khronos OpenCL Working Group, The OpenCL Specification: Version 797 1.1, 2010.
[17] The NEMO System Home Page -
[18] M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, W.M. Hwu - QP: A heterogeneous multi-accelerator cluster - Proceedings of the 10th LCD International Conference on High-Performance Clustered Computing, Boulder, Colorado, 2009.
[19] TOP500 Supercomputer Site. 2014. TOP500 Supercomputer Novermeber 2014 List.
[20] C. Zhu, R.H. Byrd, P. Lu, and J. Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization, ACM Trans. Math. Softw. 23, 1997, pp. 550-560.