Search results for: microarray datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 286

Search results for: microarray datasets

16 Reducing the Imbalance Penalty through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey

Authors: H. Anıl, G. Kar

Abstract:

In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations, since the geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning and time series methods, the total generation of the power plants belonging to Zorlu Doğal Electricity Generation, which has a high installed capacity in terms of geothermal, was predicted for the first one-week and first two-weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.

Keywords: Machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 204
15 Data Privacy and Safety with Large Language Models

Authors: Ashly Joseph, Jithu Paulose

Abstract:

Large language models (LLMs) have revolutionized natural language processing capabilities, enabling applications such as chatbots, dialogue agents, image, and video generators. Nevertheless, their trainings on extensive datasets comprising personal information poses notable privacy and safety hazards. This study examines methods for addressing these challenges, specifically focusing on approaches to enhance the security of LLM outputs, safeguard user privacy, and adhere to data protection rules. We explore several methods including post-processing detection algorithms, content filtering, reinforcement learning from human and AI inputs, and the difficulties in maintaining a balance between model safety and performance. The study also emphasizes the dangers of unintentional data leakage, privacy issues related to user prompts, and the possibility of data breaches. We highlight the significance of corporate data governance rules and optimal methods for engaging with chatbots. In addition, we analyze the development of data protection frameworks, evaluate the adherence of LLMs to General Data Protection Regulation (GDPR), and examine privacy legislation in academic and business policies. We demonstrate the difficulties and remedies involved in preserving data privacy and security in the age of sophisticated artificial intelligence by employing case studies and real-life instances. This article seeks to educate stakeholders on practical strategies for improving the security and privacy of LLMs, while also assuring their responsible and ethical implementation.

Keywords: Data privacy, large language models, artificial intelligence, machine learning, cybersecurity, general data protection regulation, data safety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 106
14 Non-Invasive Data Extraction from Machine Display Units Using Video Analytics

Authors: Ravneet Kaur, Joydeep Acharya, Sudhanshu Gaur

Abstract:

Artificial Intelligence (AI) has the potential to transform manufacturing by improving shop floor processes such as production, maintenance and quality. However, industrial datasets are notoriously difficult to extract in a real-time, streaming fashion thus, negating potential AI benefits. The main example is some specialized industrial controllers that are operated by custom software which complicates the process of connecting them to an Information Technology (IT) based data acquisition network. Security concerns may also limit direct physical access to these controllers for data acquisition. To connect the Operational Technology (OT) data stored in these controllers to an AI application in a secure, reliable and available way, we propose a novel Industrial IoT (IIoT) solution in this paper. In this solution, we demonstrate how video cameras can be installed in a factory shop floor to continuously obtain images of the controller HMIs. We propose image pre-processing to segment the HMI into regions of streaming data and regions of fixed meta-data. We then evaluate the performance of multiple Optical Character Recognition (OCR) technologies such as Tesseract and Google vision to recognize the streaming data and test it for typical factory HMIs and realistic lighting conditions. Finally, we use the meta-data to match the OCR output with the temporal, domain-dependent context of the data to improve the accuracy of the output. Our IIoT solution enables reliable and efficient data extraction which will improve the performance of subsequent AI applications.

Keywords: Human machine interface, industrial internet of things, internet of things, optical character recognition, video analytic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 739
13 Sea Level Characteristics Referenced to Specific Geodetic Datum in Alexandria, Egypt

Authors: Ahmed M. Khedr, Saad M. Abdelrahman, Kareem M. Tonbol

Abstract:

Two geo-referenced sea level datasets (September 2008 – November 2010) and (April 2012 – January 2014) were recorded at Alexandria Western Harbour (AWH). Accurate re-definition of tidal datum, referred to the latest International Terrestrial Reference Frame (ITRF-2014), was discussed and updated to improve our understanding of the old predefined tidal datum at Alexandria. Tidal and non-tidal components of sea level were separated with the use of Delft-3D hydrodynamic model-tide suit (Delft-3D, 2015). Tidal characteristics at AWH were investigated and harmonic analysis showed the most significant 34 constituents with their amplitudes and phases. Tide was identified as semi-diurnal pattern as indicated by a “Form Factor” of 0.24 and 0.25, respectively. Principle tidal datums related to major tidal phenomena were recalculated referred to a meaningful geodetic height datum. The portion of residual energy (surge) out of the total sea level energy was computed for each dataset and found 77% and 72%, respectively. Power spectral density (PSD) showed accurate resolvability in high band (1–6) cycle/days for the nominated independent constituents, except some neighbouring constituents, which are too close in frequency. Wind and atmospheric pressure data, during the recorded sea level time, were analysed and cross-correlated with the surge signals. Moderate association between surge and wind and atmospheric pressure data were obtained. In addition, long-term sea level rise trend at AWH was computed and showed good agreement with earlier estimated rates.

Keywords: Alexandria, Delft-3D, Egypt, geodetic reference, harmonic analysis, sea level.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1360
12 A Grid-based Neural Network Framework for Multimodal Biometrics

Authors: Sitalakshmi Venkataraman

Abstract:

Recent scientific investigations indicate that multimodal biometrics overcome the technical limitations of unimodal biometrics, making them ideally suited for everyday life applications that require a reliable authentication system. However, for a successful adoption of multimodal biometrics, such systems would require large heterogeneous datasets with complex multimodal fusion and privacy schemes spanning various distributed environments. From experimental investigations of current multimodal systems, this paper reports the various issues related to speed, error-recovery and privacy that impede the diffusion of such systems in real-life. This calls for a robust mechanism that caters to the desired real-time performance, robust fusion schemes, interoperability and adaptable privacy policies. The main objective of this paper is to present a framework that addresses the abovementioned issues by leveraging on the heterogeneous resource sharing capacities of Grid services and the efficient machine learning capabilities of artificial neural networks (ANN). Hence, this paper proposes a Grid-based neural network framework for adopting multimodal biometrics with the view of overcoming the barriers of performance, privacy and risk issues that are associated with shared heterogeneous multimodal data centres. The framework combines the concept of Grid services for reliable brokering and privacy policy management of shared biometric resources along with a momentum back propagation ANN (MBPANN) model of machine learning for efficient multimodal fusion and authentication schemes. Real-life applications would be able to adopt the proposed framework to cater to the varying business requirements and user privacies for a successful diffusion of multimodal biometrics in various day-to-day transactions.

Keywords: Back Propagation, Grid Services, MultimodalBiometrics, Neural Networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1917
11 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2489
10 SAF: A Substitution and Alignment Free Similarity Measure for Protein Sequences

Authors: Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski

Abstract:

The literature reports a large number of approaches for measuring the similarity between protein sequences. Most of these approaches estimate this similarity using alignment-based techniques that do not necessarily yield biologically plausible results, for two reasons. First, for the case of non-alignable (i.e., not yet definitively aligned and biologically approved) sequences such as multi-domain, circular permutation and tandem repeat protein sequences, alignment-based approaches do not succeed in producing biologically plausible results. This is due to the nature of the alignment, which is based on the matching of subsequences in equivalent positions, while non-alignable proteins often have similar and conserved domains in non-equivalent positions. Second, the alignment-based approaches lead to similarity measures that depend heavily on the parameters set by the user for the alignment (e.g., gap penalties and substitution matrices). For easily alignable protein sequences, it's possible to supply a suitable combination of input parameters that allows such an approach to yield biologically plausible results. However, for difficult-to-align protein sequences, supplying different combinations of input parameters yields different results. Such variable results create ambiguities and complicate the similarity measurement task. To overcome these drawbacks, this paper describes a novel and effective approach for measuring the similarity between protein sequences, called SAF for Substitution and Alignment Free. Without resorting either to the alignment of protein sequences or to substitution relations between amino acids, SAF is able to efficiently detect the significant subsequences that best represent the intrinsic properties of protein sequences, those underlying the chronological dependencies of structural features and biochemical activities of protein sequences. Moreover, by using a new efficient subsequence matching scheme, SAF more efficiently handles protein sequences that contain similar structural features with significant meaning in chronologically non-equivalent positions. To show the effectiveness of SAF, extensive experiments were performed on protein datasets from different databases, and the results were compared with those obtained by several mainstream algorithms.

Keywords: Protein, Similarity, Substitution, Alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410
9 Modeling Stress-Induced Regulatory Cascades with Artificial Neural Networks

Authors: Maria E. Manioudaki, Panayiota Poirazi

Abstract:

Yeast cells live in a constantly changing environment that requires the continuous adaptation of their genomic program in order to sustain their homeostasis, survive and proliferate. Due to the advancement of high throughput technologies, there is currently a large amount of data such as gene expression, gene deletion and protein-protein interactions for S. Cerevisiae under various environmental conditions. Mining these datasets requires efficient computational methods capable of integrating different types of data, identifying inter-relations between different components and inferring functional groups or 'modules' that shape intracellular processes. This study uses computational methods to delineate some of the mechanisms used by yeast cells to respond to environmental changes. The GRAM algorithm is first used to integrate gene expression data and ChIP-chip data in order to find modules of coexpressed and co-regulated genes as well as the transcription factors (TFs) that regulate these modules. Since transcription factors are themselves transcriptionally regulated, a three-layer regulatory cascade consisting of the TF-regulators, the TFs and the regulated modules is subsequently considered. This three-layer cascade is then modeled quantitatively using artificial neural networks (ANNs) where the input layer corresponds to the expression of the up-stream transcription factors (TF-regulators) and the output layer corresponds to the expression of genes within each module. This work shows that (a) the expression of at least 33 genes over time and for different stress conditions is well predicted by the expression of the top layer transcription factors, including cases in which the effect of up-stream regulators is shifted in time and (b) identifies at least 6 novel regulatory interactions that were not previously associated with stress-induced changes in gene expression. These findings suggest that the combination of gene expression and protein-DNA interaction data with artificial neural networks can successfully model biological pathways and capture quantitative dependencies between distant regulators and downstream genes.

Keywords: gene modules, artificial neural networks, yeast, stress

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1465
8 Comparison of Data Reduction Algorithms for Image-Based Point Cloud Derived Digital Terrain Models

Authors: M. Uysal, M. Yilmaz, I. Tiryakioğlu

Abstract:

Digital Terrain Model (DTM) is a digital numerical representation of the Earth's surface. DTMs have been applied to a diverse field of tasks, such as urban planning, military, glacier mapping, disaster management. In the expression of the Earth' surface as a mathematical model, an infinite number of point measurements are needed. Because of the impossibility of this case, the points at regular intervals are measured to characterize the Earth's surface and DTM of the Earth is generated. Hitherto, the classical measurement techniques and photogrammetry method have widespread use in the construction of DTM. At present, RADAR, LiDAR, and stereo satellite images are also used for the construction of DTM. In recent years, especially because of its superiorities, Airborne Light Detection and Ranging (LiDAR) has an increased use in DTM applications. A 3D point cloud is created with LiDAR technology by obtaining numerous point data. However recently, by the development in image mapping methods, the use of unmanned aerial vehicles (UAV) for photogrammetric data acquisition has increased DTM generation from image-based point cloud. The accuracy of the DTM depends on various factors such as data collection method, the distribution of elevation points, the point density, properties of the surface and interpolation methods. In this study, the random data reduction method is compared for DTMs generated from image based point cloud data. The original image based point cloud data set (100%) is reduced to a series of subsets by using random algorithm, representing the 75, 50, 25 and 5% of the original image based point cloud data set. Over the ANS campus of Afyon Kocatepe University as the test area, DTM constructed from the original image based point cloud data set is compared with DTMs interpolated from reduced data sets by Kriging interpolation method. The results show that the random data reduction method can be used to reduce the image based point cloud datasets to 50% density level while still maintaining the quality of DTM.

Keywords: DTM, unmanned aerial vehicle, UAV, random, Kriging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 810
7 Evaluation of Video Quality Metrics and Performance Comparison on Contents Taken from Most Commonly Used Devices

Authors: Pratik Dhabal Deo, Manoj P.

Abstract:

With the increasing number of social media users, the amount of video content available has also significantly increased. Currently, the number of smartphone users is at its peak, and many are increasingly using their smartphones as their main photography and recording devices. There have been a lot of developments in the field of video quality assessment in since the past years and more research on various other aspects of video and image are being done. Datasets that contain a huge number of videos from different high-end devices make it difficult to analyze the performance of the metrics on the content from most used devices even if they contain contents taken in poor lighting conditions using lower-end devices. These devices face a lot of distortions due to various factors since the spectrum of contents recorded on these devices is huge. In this paper, we have presented an analysis of the objective Video Quality Analysis (VQA) metrics on contents taken only from most used devices and their performance on them, focusing on full-reference metrics. To carry out this research, we created a custom dataset containing a total of 90 videos that have been taken from three most commonly used devices, and Android smartphone, an iOS smartphone and a Digital Single-Lens Reflex (DSLR) camera. On the videos taken on each of these devices, the six most common types of distortions that users face have been applied in addition to already existing H.264 compression based on four reference videos. These six applied distortions have three levels of degradation each. A total of the five most popular VQA metrics have been evaluated on this dataset and the highest values and the lowest values of each of the metrics on the distortions have been recorded. Finally, it is found that blur is the artifact on which most of the metrics did not perform well. Thus, in order to understand the results better the amount of blur in the data set has been calculated and an additional evaluation of the metrics was done using High Efficiency Video Coding (HEVC) codec, which is the next version of H.264 compression, on the camera that proved to be the sharpest among the devices. The results have shown that as the resolution increases, the performance of the metrics tends to become more accurate and the best performing metric among them is VQM with very few inconsistencies and inaccurate results when the compression applied is H.264, but when the compression is applied is HEVC, Structural Similarity (SSIM) metric and Video Multimethod Assessment Fusion (VMAF) have performed significantly better.

Keywords: Distortion, metrics, recording, frame rate, video quality assessment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 366
6 The Use of Artificial Intelligence in Digital Forensics and Incident Response in a Constrained Environment

Authors: Dipo Dunsin, Mohamed C. Ghanem, Karim Ouazzane

Abstract:

Digital investigators often have a hard time spotting evidence in digital information. It has become hard to determine which source of proof relates to a specific investigation. A growing concern is that the various processes, technology, and specific procedures used in the digital investigation are not keeping up with criminal developments. Therefore, criminals are taking advantage of these weaknesses to commit further crimes. In digital forensics investigations, artificial intelligence (AI) is invaluable in identifying crime. Providing objective data and conducting an assessment is the goal of digital forensics and digital investigation, which will assist in developing a plausible theory that can be presented as evidence in court. This research paper aims at developing a multiagent framework for digital investigations using specific intelligent software agents (ISAs). The agents communicate to address particular tasks jointly and keep the same objectives in mind during each task. The rules and knowledge contained within each agent are dependent on the investigation type. A criminal investigation is classified quickly and efficiently using the case-based reasoning (CBR) technique. The proposed framework development is implemented using the Java Agent Development Framework, Eclipse, Postgres repository, and a rule engine for agent reasoning. The proposed framework was tested using the Lone Wolf image files and datasets. Experiments were conducted using various sets of ISAs and VMs. There was a significant reduction in the time taken for the Hash Set Agent to execute. As a result of loading the agents, 5% of the time was lost, as the File Path Agent prescribed deleting 1,510, while the Timeline Agent found multiple executable files. In comparison, the integrity check carried out on the Lone Wolf image file using a digital forensic tool kit took approximately 48 minutes (2,880 ms), whereas the MADIK framework accomplished this in 16 minutes (960 ms). The framework is integrated with Python, allowing for further integration of other digital forensic tools, such as AccessData Forensic Toolkit (FTK), Wireshark, Volatility, and Scapy.

Keywords: Artificial intelligence, computer science, criminal investigation, digital forensics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1292
5 Determination of Potential Agricultural Lands Using Landsat 8 OLI Images and GIS: Case Study of Gokceada (Imroz) Turkey

Authors: Rahmi Kafadar, Levent Genc

Abstract:

In present study, it was aimed to determine potential agricultural lands (PALs) in Gokceada (Imroz) Island of Canakkale province, Turkey. Seven-band Landsat 8 OLI images acquired on July 12 and August 13, 2013, and their 14-band combination image were used to identify current Land Use Land Cover (LULC) status. Principal Component Analysis (PCA) was applied to three Landsat datasets in order to reduce the correlation between the bands. A total of six Original and PCA images were classified using supervised classification method to obtain the LULC maps including 6 main classes (“Forest”, “Agriculture”, “Water Surface”, “Residential Area- Bare Soil”, “Reforestation” and “Other”). Accuracy assessment was performed by checking the accuracy of 120 randomized points for each LULC maps. The best overall accuracy and Kappa statistic values (90.83%, 0.8791% respectively) were found for PCA images which were generated from 14-bands combined images called 3- B/JA. Digital Elevation Model (DEM) with 15 m spatial resolution (ASTER) was used to consider topographical characteristics. Soil properties were obtained by digitizing 1:25000 scaled soil maps of Rural Services Directorate General. Potential Agricultural Lands (PALs) were determined using Geographic information Systems (GIS). Procedure was applied considering that “Other” class of LULC map may be used for agricultural purposes in the future properties. Overlaying analysis was conducted using Slope (S), Land Use Capability Class (LUCC), Other Soil Properties (OSP) and Land Use Capability Sub-Class (SUBC) properties. A total of 901.62 ha areas within “Other” class (15798.2 ha) of LULC map were determined as PALs. These lands were ranked as “Very Suitable”, “Suitable”, “Moderate Suitable” and “Low Suitable”. It was determined that the 8.03 ha were classified as “Very Suitable” while 18.59 ha as suitable and 11.44 ha as “Moderate Suitable” for PALs. In addition, 756.56 ha were found to be “Low Suitable”. The results obtained from this preliminary study can serve as basis for further studies.

Keywords: Digital Elevation Model (DEM), Geographic Information Systems (GIS), LANDSAT 8 OLI-TIRS, Land Use Land Cover (LULC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2647
4 Development of an Automatic Calibration Framework for Hydrologic Modelling Using Approximate Bayesian Computation

Authors: A. Chowdhury, P. Egodawatta, J. M. McGree, A. Goonetilleke

Abstract:

Hydrologic models are increasingly used as tools to predict stormwater quantity and quality from urban catchments. However, due to a range of practical issues, most models produce gross errors in simulating complex hydraulic and hydrologic systems. Difficulty in finding a robust approach for model calibration is one of the main issues. Though automatic calibration techniques are available, they are rarely used in common commercial hydraulic and hydrologic modelling software e.g. MIKE URBAN. This is partly due to the need for a large number of parameters and large datasets in the calibration process. To overcome this practical issue, a framework for automatic calibration of a hydrologic model was developed in R platform and presented in this paper. The model was developed based on the time-area conceptualization. Four calibration parameters, including initial loss, reduction factor, time of concentration and time-lag were considered as the primary set of parameters. Using these parameters, automatic calibration was performed using Approximate Bayesian Computation (ABC). ABC is a simulation-based technique for performing Bayesian inference when the likelihood is intractable or computationally expensive to compute. To test the performance and usefulness, the technique was used to simulate three small catchments in Gold Coast. For comparison, simulation outcomes from the same three catchments using commercial modelling software, MIKE URBAN were used. The graphical comparison shows strong agreement of MIKE URBAN result within the upper and lower 95% credible intervals of posterior predictions as obtained via ABC. Statistical validation for posterior predictions of runoff result using coefficient of determination (CD), root mean square error (RMSE) and maximum error (ME) was found reasonable for three study catchments. The main benefit of using ABC over MIKE URBAN is that ABC provides a posterior distribution for runoff flow prediction, and therefore associated uncertainty in predictions can be obtained. In contrast, MIKE URBAN just provides a point estimate. Based on the results of the analysis, it appears as though ABC the developed framework performs well for automatic calibration.

Keywords: Automatic calibration framework, approximate Bayesian computation, hydrologic and hydraulic modelling, MIKE URBAN software, R platform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1740
3 Machine Learning Techniques for Short-Term Rain Forecasting System in the Northeastern Part of Thailand

Authors: Lily Ingsrisawang, Supawadee Ingsriswang, Saisuda Somchit, Prasert Aungsuratana, Warawut Khantiyanan

Abstract:

This paper presents the methodology from machine learning approaches for short-term rain forecasting system. Decision Tree, Artificial Neural Network (ANN), and Support Vector Machine (SVM) were applied to develop classification and prediction models for rainfall forecasts. The goals of this presentation are to demonstrate (1) how feature selection can be used to identify the relationships between rainfall occurrences and other weather conditions and (2) what models can be developed and deployed for predicting the accurate rainfall estimates to support the decisions to launch the cloud seeding operations in the northeastern part of Thailand. Datasets collected during 2004-2006 from the Chalermprakiat Royal Rain Making Research Center at Hua Hin, Prachuap Khiri khan, the Chalermprakiat Royal Rain Making Research Center at Pimai, Nakhon Ratchasima and Thai Meteorological Department (TMD). A total of 179 records with 57 features was merged and matched by unique date. There are three main parts in this work. Firstly, a decision tree induction algorithm (C4.5) was used to classify the rain status into either rain or no-rain. The overall accuracy of classification tree achieves 94.41% with the five-fold cross validation. The C4.5 algorithm was also used to classify the rain amount into three classes as no-rain (0-0.1 mm.), few-rain (0.1- 10 mm.), and moderate-rain (>10 mm.) and the overall accuracy of classification tree achieves 62.57%. Secondly, an ANN was applied to predict the rainfall amount and the root mean square error (RMSE) were used to measure the training and testing errors of the ANN. It is found that the ANN yields a lower RMSE at 0.171 for daily rainfall estimates, when compared to next-day and next-2-day estimation. Thirdly, the ANN and SVM techniques were also used to classify the rain amount into three classes as no-rain, few-rain, and moderate-rain as above. The results achieved in 68.15% and 69.10% of overall accuracy of same-day prediction for the ANN and SVM models, respectively. The obtained results illustrated the comparison of the predictive power of different methods for rainfall estimation.

Keywords: Machine learning, decision tree, artificial neural network, support vector machine, root mean square error.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3230
2 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1175
1 Influence of a High-Resolution Land Cover Classification on Air Quality Modelling

Authors: C. Silveira, A. Ascenso, J. Ferreira, A. I. Miranda, P. Tuccella, G. Curci

Abstract:

Poor air quality is one of the main environmental causes of premature deaths worldwide, and mainly in cities, where the majority of the population lives. It is a consequence of successive land cover (LC) and use changes, as a result of the intensification of human activities. Knowing these landscape modifications in a comprehensive spatiotemporal dimension is, therefore, essential for understanding variations in air pollutant concentrations. In this sense, the use of air quality models is very useful to simulate the physical and chemical processes that affect the dispersion and reaction of chemical species into the atmosphere. However, the modelling performance should always be evaluated since the resolution of the input datasets largely dictates the reliability of the air quality outcomes. Among these data, the updated LC is an important parameter to be considered in atmospheric models, since it takes into account the Earth’s surface changes due to natural and anthropic actions, and regulates the exchanges of fluxes (emissions, heat, moisture, etc.) between the soil and the air. This work aims to evaluate the performance of the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem), when different LC classifications are used as an input. The influence of two LC classifications was tested: i) the 24-classes USGS (United States Geological Survey) LC database included by default in the model, and the ii) CLC (Corine Land Cover) and specific high-resolution LC data for Portugal, reclassified according to the new USGS nomenclature (33-classes). Two distinct WRF-Chem simulations were carried out to assess the influence of the LC on air quality over Europe and Portugal, as a case study, for the year 2015, using the nesting technique over three simulation domains (25 km2, 5 km2 and 1 km2 horizontal resolution). Based on the 33-classes LC approach, particular emphasis was attributed to Portugal, given the detail and higher LC spatial resolution (100 m x 100 m) than the CLC data (5000 m x 5000 m). As regards to the air quality, only the LC impacts on tropospheric ozone concentrations were evaluated, because ozone pollution episodes typically occur in Portugal, in particular during the spring/summer, and there are few research works relating to this pollutant with LC changes. The WRF-Chem results were validated by season and station typology using background measurements from the Portuguese air quality monitoring network. As expected, a better model performance was achieved in rural stations: moderate correlation (0.4 – 0.7), BIAS (10 – 21µg.m-3) and RMSE (20 – 30 µg.m-3), and where higher average ozone concentrations were estimated. Comparing both simulations, small differences grounded on the Leaf Area Index and air temperature values were found, although the high-resolution LC approach shows a slight enhancement in the model evaluation. This highlights the role of the LC on the exchange of atmospheric fluxes, and stresses the need to consider a high-resolution LC characterization combined with other detailed model inputs, such as the emission inventory, to improve air quality assessment.

Keywords: Land cover, tropospheric ozone, WRF-Chem, air quality assessment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 796