Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 452

Search results for: Trajectory Datasets

32 Web Proxy Detection via Bipartite Graphs and One-Mode Projections

Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo

Abstract:

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.

Keywords: Bipartite graph, clustering, one-mode projection, web proxy detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 697

31 Image Ranking to Assist Object Labeling for Training Detection Models

Authors: Tonislav Ivanov, Oleksii Nedashkivskyi, Denis Babeshko, Vadim Pinskiy, Matthew Putman

Abstract:

Training a machine learning model for object detection that generalizes well is known to benefit from a training dataset with diverse examples. However, training datasets usually contain many repeats of common examples of a class and lack rarely seen examples. This is due to the process commonly used during human annotation where a person would proceed sequentially through a list of images labeling a sufficiently high total number of examples. Instead, the method presented involves an active process where, after the initial labeling of several images is completed, the next subset of images for labeling is selected by an algorithm. This process of algorithmic image selection and manual labeling continues in an iterative fashion. The algorithm used for the image selection is a deep learning algorithm, based on the U-shaped architecture, which quantifies the presence of unseen data in each image in order to find images that contain the most novel examples. Moreover, the location of the unseen data in each image is highlighted, aiding the labeler in spotting these examples. Experiments performed using semiconductor wafer data show that labeling a subset of the data, curated by this algorithm, resulted in a model with a better performance than a model produced from sequentially labeling the same amount of data. Also, similar performance is achieved compared to a model trained on exhaustive labeling of the whole dataset. Overall, the proposed approach results in a dataset that has a diverse set of examples per class as well as more balanced classes, which proves beneficial when training a deep learning model.

Keywords: Computer vision, deep learning, object detection, semiconductor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 749

30 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365

29 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1497

28 Infrastructure Change Monitoring Using Multitemporal Multispectral Satellite Images

Authors: U. Datta

Abstract:

The main objective of this study is to find a suitable approach to monitor the land infrastructure growth over a period of time using multispectral satellite images. Bi-temporal change detection method is unable to indicate the continuous change occurring over a long period of time. To achieve this objective, the approach used here estimates a statistical model from series of multispectral image data over a long period of time, assuming there is no considerable change during that time period and then compare it with the multispectral image data obtained at a later time. The change is estimated pixel-wise. Statistical composite hypothesis technique is used for estimating pixel based change detection in a defined region. The generalized likelihood ratio test (GLRT) is used to detect the changed pixel from probabilistic estimated model of the corresponding pixel. The changed pixel is detected assuming that the images have been co-registered prior to estimation. To minimize error due to co-registration, 8-neighborhood pixels around the pixel under test are also considered. The multispectral images from Sentinel-2 and Landsat-8 from 2015 to 2018 are used for this purpose. There are different challenges in this method. First and foremost challenge is to get quite a large number of datasets for multivariate distribution modelling. A large number of images are always discarded due to cloud coverage. Due to imperfect modelling there will be high probability of false alarm. Overall conclusion that can be drawn from this work is that the probabilistic method described in this paper has given some promising results, which need to be pursued further.

Keywords: Co-registration, GLRT, infrastructure growth, multispectral, multitemporal, pixel-based change detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 658

27 Q-Map: Clinical Concept Mining from Clinical Documents

Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala

Abstract:

Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 728

26 Mapping of Alteration Zones in Mineral Rich Belt of South-East Rajasthan Using Remote Sensing Techniques

Authors: Mrinmoy Dhara, Vivek K. Sengar, Shovan L. Chattoraj, Soumiya Bhattacharjee

Abstract:

Remote sensing techniques have emerged as an asset for various geological studies. Satellite images obtained by different sensors contain plenty of information related to the terrain. Digital image processing further helps in customized ways for the prospecting of minerals. In this study, an attempt has been made to map the hydrothermally altered zones using multispectral and hyperspectral datasets of South East Rajasthan. Advanced Space-borne Thermal Emission and Reflection Radiometer (ASTER) and Hyperion (Level1R) dataset have been processed to generate different Band Ratio Composites (BRCs). For this study, ASTER derived BRCs were generated to delineate the alteration zones, gossans, abundant clays and host rocks. ASTER and Hyperion images were further processed to extract mineral end members and classified mineral maps have been produced using Spectral Angle Mapper (SAM) method. Results were validated with the geological map of the area which shows positive agreement with the image processing outputs. Thus, this study concludes that the band ratios and image processing in combination play significant role in demarcation of alteration zones which may provide pathfinders for mineral prospecting studies.

Keywords: Advanced space-borne thermal emission and reflection radiometer, ASTER, Hyperion, Band ratios, Alteration zones, spectral angle mapper.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1433

25 Bond Graph Modeling of Mechanical Dynamics of an Excavator for Hydraulic System Analysis and Design

Authors: Mutuku Muvengei, John Kihiu

Abstract:

This paper focuses on the development of bond graph dynamic model of the mechanical dynamics of an excavating mechanism previously designed to be used with small tractors, which are fabricated in the Engineering Workshops of Jomo Kenyatta University of Agriculture and Technology. To develop a mechanical dynamics model of the manipulator, forward recursive equations similar to those applied in iterative Newton-Euler method were used to obtain kinematic relationships between the time rates of joint variables and the generalized cartesian velocities for the centroids of the links. Representing the obtained kinematic relationships in bondgraphic form, while considering the link weights and momenta as the elements led to a detailed bond graph model of the manipulator. The bond graph method was found to reduce significantly the number of recursive computations performed on a 3 DOF manipulator for a mechanical dynamic model to result, hence indicating that bond graph method is more computationally efficient than the Newton-Euler method in developing dynamic models of 3 DOF planar manipulators. The model was verified by comparing the joint torque expressions of a two link planar manipulator to those obtained using Newton- Euler and Lagrangian methods as analyzed in robotic textbooks. The expressions were found to agree indicating that the model captures the aspects of rigid body dynamics of the manipulator. Based on the model developed, actuator sizing and valve sizing methodologies were developed and used to obtain the optimal sizes of the pistons and spool valve ports respectively. It was found that using the pump with the sized flow rate capacity, the engine of the tractor is able to power the excavating mechanism in digging a sandy-loom soil.

Keywords: Actuators, bond graphs, inverse dynamics, recursive equations, quintic polynomial trajectory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2845

24 An Autonomous Collaborative Forecasting System Implementation – The First Step towards Successful CPFR System

Authors: Chi-Fang Huang, Yun-Shiow Chen, Yun-Kung Chung

Abstract:

In the past decade, artificial neural networks (ANNs) have been regarded as an instrument for problem-solving and decision-making; indeed, they have already done with a substantial efficiency and effectiveness improvement in industries and businesses. In this paper, the Back-Propagation neural Networks (BPNs) will be modulated to demonstrate the performance of the collaborative forecasting (CF) function of a Collaborative Planning, Forecasting and Replenishment (CPFR®) system. CPFR functions the balance between the sufficient product supply and the necessary customer demand in a Supply and Demand Chain (SDC). Several classical standard BPN will be grouped, collaborated and exploited for the easy implementation of the proposed modular ANN framework based on the topology of a SDC. Each individual BPN is applied as a modular tool to perform the task of forecasting SKUs (Stock-Keeping Units) levels that are managed and supervised at a POS (point of sale), a wholesaler, and a manufacturer in an SDC. The proposed modular BPN-based CF system will be exemplified and experimentally verified using lots of datasets of the simulated SDC. The experimental results showed that a complex CF problem can be divided into a group of simpler sub-problems based on the single independent trading partners distributed over SDC, and its SKU forecasting accuracy was satisfied when the system forecasted values compared to the original simulated SDC data. The primary task of implementing an autonomous CF involves the study of supervised ANN learning methodology which aims at making “knowledgeable" decision for the best SKU sales plan and stocks management.

Keywords: CPFR, artificial neural networks, global logistics, supply and demand chain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1945

23 Acceleration-Based Motion Model for Visual SLAM

Authors: Daohong Yang, Xiang Zhang, Wanting Zhou, Lei Li

Abstract:

Visual Simultaneous Localization and Mapping (VSLAM) is a technology that gathers information about the surrounding environment to ascertain its own position and create a map. It is widely used in computer vision, robotics, and various other fields. Many visual SLAM systems, such as OBSLAM3, utilize a constant velocity motion model. The utilization of this model facilitates the determination of the initial pose of the current frame, thereby enhancing the efficiency and precision of feature matching. However, it is often difficult to satisfy the constant velocity motion model in actual situations. This can result in a significant deviation between the obtained initial pose and the true value, leading to errors in nonlinear optimization results. Therefore, this paper proposes a motion model based on acceleration that can be applied to most SLAM systems. To provide a more accurate description of the camera pose acceleration, we separate the pose transformation matrix into its rotation matrix and translation vector components. The rotation matrix is now represented by a rotation vector. We assume that, over a short period, the changes in rotating angular velocity and translation vector remain constant. Based on this assumption, the initial pose of the current frame is estimated. In addition, the error of the constant velocity model is analyzed theoretically. Finally, we apply our proposed approach to the ORBSLAM3 system and evaluate two sets of sequences from the TUM datasets. The results show that our proposed method has a more accurate initial pose estimation, resulting in an improvement of 6.61% and 6.46% in the accuracy of the ORBSLAM3 system on the two test sequences, respectively.

Keywords: Error estimation, constant acceleration motion model, pose estimation, visual SLAM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 147

22 Reducing the Imbalance Penalty through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey

Authors: H. Anıl, G. Kar

Abstract:

In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations, since the geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning and time series methods, the total generation of the power plants belonging to Zorlu Doğal Electricity Generation, which has a high installed capacity in terms of geothermal, was predicted for the first one-week and first two-weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.

Keywords: Machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 143

21 Non-Invasive Data Extraction from Machine Display Units Using Video Analytics

Authors: Ravneet Kaur, Joydeep Acharya, Sudhanshu Gaur

Abstract:

Artificial Intelligence (AI) has the potential to transform manufacturing by improving shop floor processes such as production, maintenance and quality. However, industrial datasets are notoriously difficult to extract in a real-time, streaming fashion thus, negating potential AI benefits. The main example is some specialized industrial controllers that are operated by custom software which complicates the process of connecting them to an Information Technology (IT) based data acquisition network. Security concerns may also limit direct physical access to these controllers for data acquisition. To connect the Operational Technology (OT) data stored in these controllers to an AI application in a secure, reliable and available way, we propose a novel Industrial IoT (IIoT) solution in this paper. In this solution, we demonstrate how video cameras can be installed in a factory shop floor to continuously obtain images of the controller HMIs. We propose image pre-processing to segment the HMI into regions of streaming data and regions of fixed meta-data. We then evaluate the performance of multiple Optical Character Recognition (OCR) technologies such as Tesseract and Google vision to recognize the streaming data and test it for typical factory HMIs and realistic lighting conditions. Finally, we use the meta-data to match the OCR output with the temporal, domain-dependent context of the data to improve the accuracy of the output. Our IIoT solution enables reliable and efficient data extraction which will improve the performance of subsequent AI applications.

Keywords: Human machine interface, industrial internet of things, internet of things, optical character recognition, video analytic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 684

20 Sea Level Characteristics Referenced to Specific Geodetic Datum in Alexandria, Egypt

Authors: Ahmed M. Khedr, Saad M. Abdelrahman, Kareem M. Tonbol

Abstract:

Two geo-referenced sea level datasets (September 2008 – November 2010) and (April 2012 – January 2014) were recorded at Alexandria Western Harbour (AWH). Accurate re-definition of tidal datum, referred to the latest International Terrestrial Reference Frame (ITRF-2014), was discussed and updated to improve our understanding of the old predefined tidal datum at Alexandria. Tidal and non-tidal components of sea level were separated with the use of Delft-3D hydrodynamic model-tide suit (Delft-3D, 2015). Tidal characteristics at AWH were investigated and harmonic analysis showed the most significant 34 constituents with their amplitudes and phases. Tide was identified as semi-diurnal pattern as indicated by a “Form Factor” of 0.24 and 0.25, respectively. Principle tidal datums related to major tidal phenomena were recalculated referred to a meaningful geodetic height datum. The portion of residual energy (surge) out of the total sea level energy was computed for each dataset and found 77% and 72%, respectively. Power spectral density (PSD) showed accurate resolvability in high band (1–6) cycle/days for the nominated independent constituents, except some neighbouring constituents, which are too close in frequency. Wind and atmospheric pressure data, during the recorded sea level time, were analysed and cross-correlated with the surge signals. Moderate association between surge and wind and atmospheric pressure data were obtained. In addition, long-term sea level rise trend at AWH was computed and showed good agreement with earlier estimated rates.

Keywords: Alexandria, Delft-3D, Egypt, geodetic reference, harmonic analysis, sea level.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309

19 Study of Unsteady Behaviour of Dynamic Shock Systems in Supersonic Engine Intakes

Authors: Siddharth Ahuja, T. M. Muruganandam

Abstract:

An analytical investigation is performed to study the unsteady response of a one-dimensional, non-linear dynamic shock system to external downstream pressure perturbations in a supersonic flow in a varying area duct. For a given pressure ratio across a wind tunnel, the normal shock's location can be computed as per one-dimensional steady gas dynamics. Similarly, for some other pressure ratio, the location of the normal shock will change accordingly, again computed using one-dimensional gas dynamics. This investigation focuses on the small-time interval between the first steady shock location and the new steady shock location (corresponding to different pressure ratios). In essence, this study aims to shed light on the motion of the shock from one steady location to another steady location. Further, this study aims to create the foundation of the Unsteady Gas Dynamics field enabling further insight in future research work. According to the new pressure ratio, a pressure pulse, generated at the exit of the tunnel which travels and perturbs the shock from its original position, setting it into motion. During such activity, other numerous physical phenomena also happen at the same time. However, three broad phenomena have been focused on, in this study - Traversal of a Wave, Fluid Element Interactions and Wave Interactions. The above mentioned three phenomena create, alter and kill numerous waves for different conditions. The waves which are created by the above-mentioned phenomena eventually interact with the shock and set it into motion. Numerous such interactions with the shock will slowly make it settle into its final position owing to the new pressure ratio across the duct, as estimated by one-dimensional gas dynamics. This analysis will be extremely helpful in the prediction of inlet 'unstart' of the flow in a supersonic engine intake and its prominence with the incoming flow Mach number, incoming flow pressure and the external perturbation pressure is also studied to help design more efficient supersonic intakes for engines like ramjets and scramjets.

Keywords: Analytical investigation, compression and expansion waves, fluid element interactions, shock trajectory, supersonic flow, unsteady gas dynamics, varying area duct, wave interactions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 839

18 A Grid-based Neural Network Framework for Multimodal Biometrics

Authors: Sitalakshmi Venkataraman

Abstract:

Recent scientific investigations indicate that multimodal biometrics overcome the technical limitations of unimodal biometrics, making them ideally suited for everyday life applications that require a reliable authentication system. However, for a successful adoption of multimodal biometrics, such systems would require large heterogeneous datasets with complex multimodal fusion and privacy schemes spanning various distributed environments. From experimental investigations of current multimodal systems, this paper reports the various issues related to speed, error-recovery and privacy that impede the diffusion of such systems in real-life. This calls for a robust mechanism that caters to the desired real-time performance, robust fusion schemes, interoperability and adaptable privacy policies. The main objective of this paper is to present a framework that addresses the abovementioned issues by leveraging on the heterogeneous resource sharing capacities of Grid services and the efficient machine learning capabilities of artificial neural networks (ANN). Hence, this paper proposes a Grid-based neural network framework for adopting multimodal biometrics with the view of overcoming the barriers of performance, privacy and risk issues that are associated with shared heterogeneous multimodal data centres. The framework combines the concept of Grid services for reliable brokering and privacy policy management of shared biometric resources along with a momentum back propagation ANN (MBPANN) model of machine learning for efficient multimodal fusion and authentication schemes. Real-life applications would be able to adopt the proposed framework to cater to the varying business requirements and user privacies for a successful diffusion of multimodal biometrics in various day-to-day transactions.

Keywords: Back Propagation, Grid Services, MultimodalBiometrics, Neural Networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881

17 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2436

16 SAF: A Substitution and Alignment Free Similarity Measure for Protein Sequences

Authors: Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski

Abstract:

The literature reports a large number of approaches for measuring the similarity between protein sequences. Most of these approaches estimate this similarity using alignment-based techniques that do not necessarily yield biologically plausible results, for two reasons. First, for the case of non-alignable (i.e., not yet definitively aligned and biologically approved) sequences such as multi-domain, circular permutation and tandem repeat protein sequences, alignment-based approaches do not succeed in producing biologically plausible results. This is due to the nature of the alignment, which is based on the matching of subsequences in equivalent positions, while non-alignable proteins often have similar and conserved domains in non-equivalent positions. Second, the alignment-based approaches lead to similarity measures that depend heavily on the parameters set by the user for the alignment (e.g., gap penalties and substitution matrices). For easily alignable protein sequences, it's possible to supply a suitable combination of input parameters that allows such an approach to yield biologically plausible results. However, for difficult-to-align protein sequences, supplying different combinations of input parameters yields different results. Such variable results create ambiguities and complicate the similarity measurement task. To overcome these drawbacks, this paper describes a novel and effective approach for measuring the similarity between protein sequences, called SAF for Substitution and Alignment Free. Without resorting either to the alignment of protein sequences or to substitution relations between amino acids, SAF is able to efficiently detect the significant subsequences that best represent the intrinsic properties of protein sequences, those underlying the chronological dependencies of structural features and biochemical activities of protein sequences. Moreover, by using a new efficient subsequence matching scheme, SAF more efficiently handles protein sequences that contain similar structural features with significant meaning in chronologically non-equivalent positions. To show the effectiveness of SAF, extensive experiments were performed on protein datasets from different databases, and the results were compared with those obtained by several mainstream algorithms.

Keywords: Protein, Similarity, Substitution, Alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1364

15 Motor Coordination and Body Mass Index in Primary School Children

Authors: Ingrid Ruzbarska, Martin Zvonar, Piotr Oleśniewicz, Julita Markiewicz-Patkowska, Krzysztof Widawski, Daniel Puciato

Abstract:

Obese children will probably become obese adults, consequently exposed to an increased risk of comorbidity and premature mortality. Body weight may be indirectly determined by continuous development of coordination and motor skills. The level of motor skills and abilities is an important factor that promotes physical activity since early childhood. The aim of the study is to thoroughly understand the internal relations between motor coordination abilities and the somatic development of prepubertal children and to determine the effect of excess body weight on motor coordination by comparing the motor ability levels of children with different body mass index (BMI) values. The data were collected from 436 children aged 7–10 years, without health limitations, fully participating in school physical education classes. Body height was measured with portable stadiometers (Harpenden, Holtain Ltd.), and body mass—with a digital scale (HN-286, Omron). Motor coordination was evaluated with the Kiphard-Schilling body coordination test, Körperkoordinationstest für Kinder. The normality test by Shapiro-Wilk was used to verify the data distribution. The correlation analysis revealed a statistically significant negative association between the dynamic balance and BMI, as well as between the motor quotient and BMI (p<0.01) for both boys and girls. The results showed no effect of gender on the difference in the observed trends. The analysis of variance proved statistically significant differences between normal weight children and their overweight or obese counterparts. Coordination abilities probably play an important role in preventing or moderating the negative trajectory leading to childhood overweight and obesity. At this age, the development of coordination abilities should become a key strategy, targeted at long-term prevention of obesity and the promotion of an active lifestyle in adulthood. Motor performance is essential for implementing a healthy lifestyle in childhood already. Physical inactivity apparently results in motor deficits and a sedentary lifestyle in children, which may be accompanied by excess energy intake and overweight.

Keywords: Childhood, KTK test, Physical education, Psychomotor competence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1316

14 Modeling Stress-Induced Regulatory Cascades with Artificial Neural Networks

Authors: Maria E. Manioudaki, Panayiota Poirazi

Abstract:

Yeast cells live in a constantly changing environment that requires the continuous adaptation of their genomic program in order to sustain their homeostasis, survive and proliferate. Due to the advancement of high throughput technologies, there is currently a large amount of data such as gene expression, gene deletion and protein-protein interactions for S. Cerevisiae under various environmental conditions. Mining these datasets requires efficient computational methods capable of integrating different types of data, identifying inter-relations between different components and inferring functional groups or 'modules' that shape intracellular processes. This study uses computational methods to delineate some of the mechanisms used by yeast cells to respond to environmental changes. The GRAM algorithm is first used to integrate gene expression data and ChIP-chip data in order to find modules of coexpressed and co-regulated genes as well as the transcription factors (TFs) that regulate these modules. Since transcription factors are themselves transcriptionally regulated, a three-layer regulatory cascade consisting of the TF-regulators, the TFs and the regulated modules is subsequently considered. This three-layer cascade is then modeled quantitatively using artificial neural networks (ANNs) where the input layer corresponds to the expression of the up-stream transcription factors (TF-regulators) and the output layer corresponds to the expression of genes within each module. This work shows that (a) the expression of at least 33 genes over time and for different stress conditions is well predicted by the expression of the top layer transcription factors, including cases in which the effect of up-stream regulators is shifted in time and (b) identifies at least 6 novel regulatory interactions that were not previously associated with stress-induced changes in gene expression. These findings suggest that the combination of gene expression and protein-DNA interaction data with artificial neural networks can successfully model biological pathways and capture quantitative dependencies between distant regulators and downstream genes.

Keywords: gene modules, artificial neural networks, yeast, stress

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1415

13 Comparison of Data Reduction Algorithms for Image-Based Point Cloud Derived Digital Terrain Models

Authors: M. Uysal, M. Yilmaz, I. Tiryakioğlu

Abstract:

Digital Terrain Model (DTM) is a digital numerical representation of the Earth's surface. DTMs have been applied to a diverse field of tasks, such as urban planning, military, glacier mapping, disaster management. In the expression of the Earth' surface as a mathematical model, an infinite number of point measurements are needed. Because of the impossibility of this case, the points at regular intervals are measured to characterize the Earth's surface and DTM of the Earth is generated. Hitherto, the classical measurement techniques and photogrammetry method have widespread use in the construction of DTM. At present, RADAR, LiDAR, and stereo satellite images are also used for the construction of DTM. In recent years, especially because of its superiorities, Airborne Light Detection and Ranging (LiDAR) has an increased use in DTM applications. A 3D point cloud is created with LiDAR technology by obtaining numerous point data. However recently, by the development in image mapping methods, the use of unmanned aerial vehicles (UAV) for photogrammetric data acquisition has increased DTM generation from image-based point cloud. The accuracy of the DTM depends on various factors such as data collection method, the distribution of elevation points, the point density, properties of the surface and interpolation methods. In this study, the random data reduction method is compared for DTMs generated from image based point cloud data. The original image based point cloud data set (100%) is reduced to a series of subsets by using random algorithm, representing the 75, 50, 25 and 5% of the original image based point cloud data set. Over the ANS campus of Afyon Kocatepe University as the test area, DTM constructed from the original image based point cloud data set is compared with DTMs interpolated from reduced data sets by Kriging interpolation method. The results show that the random data reduction method can be used to reduce the image based point cloud datasets to 50% density level while still maintaining the quality of DTM.

Keywords: DTM, unmanned aerial vehicle, UAV, random, Kriging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 747

12 Bidirectional Pendulum Vibration Absorbers with Homogeneous Variable Tangential Friction: Modelling and Design

Authors: Emiliano Matta

Abstract:

Passive resonant vibration absorbers are among the most widely used dynamic control systems in civil engineering. They typically consist in a single-degree-of-freedom mechanical appendage of the main structure, tuned to one structural target mode through frequency and damping optimization. One classical scheme is the pendulum absorber, whose mass is constrained to move along a curved trajectory and is damped by viscous dashpots. Even though the principle is well known, the search for improved arrangements is still under way. In recent years this investigation inspired a type of bidirectional pendulum absorber (BPA), consisting of a mass constrained to move along an optimal three-dimensional (3D) concave surface. For such a BPA, the surface principal curvatures are designed to ensure a bidirectional tuning of the absorber to both principal modes of the main structure, while damping is produced either by horizontal viscous dashpots or by vertical friction dashpots, connecting the BPA to the main structure. In this paper, a variant of BPA is proposed, where damping originates from the variable tangential friction force which develops between the pendulum mass and the 3D surface as a result of a spatially-varying friction coefficient pattern. Namely, a friction coefficient is proposed that varies along the pendulum surface in proportion to the modulus of the 3D surface gradient. With such an assumption, the dissipative model of the absorber can be proven to be nonlinear homogeneous in the small displacement domain. The resulting homogeneous BPA (HBPA) has a fundamental advantage over conventional friction-type absorbers, because its equivalent damping ratio results independent on the amplitude of oscillations, and therefore its optimal performance does not depend on the excitation level. On the other hand, the HBPA is more compact than viscously damped BPAs because it does not need the installation of dampers. This paper presents the analytical model of the HBPA and an optimal methodology for its design. Numerical simulations of single- and multi-story building structures under wind and earthquake loads are presented to compare the HBPA with classical viscously damped BPAs. It is shown that the HBPA is a promising alternative to existing BPA types and that homogeneous tangential friction is an effective means to realize systems provided with amplitude-independent damping.

Keywords: Amplitude-independent damping, Homogeneous friction, Pendulum nonlinear dynamics, Structural control, Vibration resonant absorbers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 681

11 Evaluation of Video Quality Metrics and Performance Comparison on Contents Taken from Most Commonly Used Devices

Authors: Pratik Dhabal Deo, Manoj P.

Abstract:

With the increasing number of social media users, the amount of video content available has also significantly increased. Currently, the number of smartphone users is at its peak, and many are increasingly using their smartphones as their main photography and recording devices. There have been a lot of developments in the field of video quality assessment in since the past years and more research on various other aspects of video and image are being done. Datasets that contain a huge number of videos from different high-end devices make it difficult to analyze the performance of the metrics on the content from most used devices even if they contain contents taken in poor lighting conditions using lower-end devices. These devices face a lot of distortions due to various factors since the spectrum of contents recorded on these devices is huge. In this paper, we have presented an analysis of the objective Video Quality Analysis (VQA) metrics on contents taken only from most used devices and their performance on them, focusing on full-reference metrics. To carry out this research, we created a custom dataset containing a total of 90 videos that have been taken from three most commonly used devices, and Android smartphone, an iOS smartphone and a Digital Single-Lens Reflex (DSLR) camera. On the videos taken on each of these devices, the six most common types of distortions that users face have been applied in addition to already existing H.264 compression based on four reference videos. These six applied distortions have three levels of degradation each. A total of the five most popular VQA metrics have been evaluated on this dataset and the highest values and the lowest values of each of the metrics on the distortions have been recorded. Finally, it is found that blur is the artifact on which most of the metrics did not perform well. Thus, in order to understand the results better the amount of blur in the data set has been calculated and an additional evaluation of the metrics was done using High Efficiency Video Coding (HEVC) codec, which is the next version of H.264 compression, on the camera that proved to be the sharpest among the devices. The results have shown that as the resolution increases, the performance of the metrics tends to become more accurate and the best performing metric among them is VQM with very few inconsistencies and inaccurate results when the compression applied is H.264, but when the compression is applied is HEVC, Structural Similarity (SSIM) metric and Video Multimethod Assessment Fusion (VMAF) have performed significantly better.

Keywords: Distortion, metrics, recording, frame rate, video quality assessment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 297

10 The Use of Artificial Intelligence in Digital Forensics and Incident Response in a Constrained Environment

Authors: Dipo Dunsin, Mohamed C. Ghanem, Karim Ouazzane

Abstract:

Digital investigators often have a hard time spotting evidence in digital information. It has become hard to determine which source of proof relates to a specific investigation. A growing concern is that the various processes, technology, and specific procedures used in the digital investigation are not keeping up with criminal developments. Therefore, criminals are taking advantage of these weaknesses to commit further crimes. In digital forensics investigations, artificial intelligence (AI) is invaluable in identifying crime. Providing objective data and conducting an assessment is the goal of digital forensics and digital investigation, which will assist in developing a plausible theory that can be presented as evidence in court. This research paper aims at developing a multiagent framework for digital investigations using specific intelligent software agents (ISAs). The agents communicate to address particular tasks jointly and keep the same objectives in mind during each task. The rules and knowledge contained within each agent are dependent on the investigation type. A criminal investigation is classified quickly and efficiently using the case-based reasoning (CBR) technique. The proposed framework development is implemented using the Java Agent Development Framework, Eclipse, Postgres repository, and a rule engine for agent reasoning. The proposed framework was tested using the Lone Wolf image files and datasets. Experiments were conducted using various sets of ISAs and VMs. There was a significant reduction in the time taken for the Hash Set Agent to execute. As a result of loading the agents, 5% of the time was lost, as the File Path Agent prescribed deleting 1,510, while the Timeline Agent found multiple executable files. In comparison, the integrity check carried out on the Lone Wolf image file using a digital forensic tool kit took approximately 48 minutes (2,880 ms), whereas the MADIK framework accomplished this in 16 minutes (960 ms). The framework is integrated with Python, allowing for further integration of other digital forensic tools, such as AccessData Forensic Toolkit (FTK), Wireshark, Volatility, and Scapy.

Keywords: Artificial intelligence, computer science, criminal investigation, digital forensics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1107

9 Determination of Potential Agricultural Lands Using Landsat 8 OLI Images and GIS: Case Study of Gokceada (Imroz) Turkey

Authors: Rahmi Kafadar, Levent Genc

Abstract:

In present study, it was aimed to determine potential agricultural lands (PALs) in Gokceada (Imroz) Island of Canakkale province, Turkey. Seven-band Landsat 8 OLI images acquired on July 12 and August 13, 2013, and their 14-band combination image were used to identify current Land Use Land Cover (LULC) status. Principal Component Analysis (PCA) was applied to three Landsat datasets in order to reduce the correlation between the bands. A total of six Original and PCA images were classified using supervised classification method to obtain the LULC maps including 6 main classes (“Forest”, “Agriculture”, “Water Surface”, “Residential Area- Bare Soil”, “Reforestation” and “Other”). Accuracy assessment was performed by checking the accuracy of 120 randomized points for each LULC maps. The best overall accuracy and Kappa statistic values (90.83%, 0.8791% respectively) were found for PCA images which were generated from 14-bands combined images called 3- B/JA. Digital Elevation Model (DEM) with 15 m spatial resolution (ASTER) was used to consider topographical characteristics. Soil properties were obtained by digitizing 1:25000 scaled soil maps of Rural Services Directorate General. Potential Agricultural Lands (PALs) were determined using Geographic information Systems (GIS). Procedure was applied considering that “Other” class of LULC map may be used for agricultural purposes in the future properties. Overlaying analysis was conducted using Slope (S), Land Use Capability Class (LUCC), Other Soil Properties (OSP) and Land Use Capability Sub-Class (SUBC) properties. A total of 901.62 ha areas within “Other” class (15798.2 ha) of LULC map were determined as PALs. These lands were ranked as “Very Suitable”, “Suitable”, “Moderate Suitable” and “Low Suitable”. It was determined that the 8.03 ha were classified as “Very Suitable” while 18.59 ha as suitable and 11.44 ha as “Moderate Suitable” for PALs. In addition, 756.56 ha were found to be “Low Suitable”. The results obtained from this preliminary study can serve as basis for further studies.

Keywords: Digital Elevation Model (DEM), Geographic Information Systems (GIS), LANDSAT 8 OLI-TIRS, Land Use Land Cover (LULC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2593

8 The Ballistics Case Study of the Enrica Lexie Incident

Authors: Diego Abbo

Abstract:

On February 15, 2012 off the Indian coast of Kerala, in position 091702N-0760180E by the oil tanker Enrica Lexie, flying the Italian flag, bursts of 5.56 x45 caliber shots were fired from assault rifles AR/70 Italian-made Beretta towards the Indian fisher boat St. Anthony. The shots that hit the St. Anthony fishing boat were six, of which two killed the Indian fishermen Ajesh Pink and Valentine Jelestine. From the analysis concerning the kinematic engagement of the two ships and from the autopsy and ballistic results of the Indian judicial authorities it is possible to reconstruct the trajectories of the six aforementioned shots. This essay reconstructs the trajectories of the six shots that cannot be of direct shooting but have undergone a rebound on the water. The investigation carried out scientifically demonstrates the rebound of the blows on the water, the gyrostatic deviation due to the rebound and the tumbling effect always due to the rebound as regards intermediate ballistics. In consideration of the four shots that directly impacted the fishing vessel, the current examination proves, with scientific value, that the trajectories could not be downwards but upwards. Also, the trajectory of two shots that hit to death the two fishermen could not be downwards but only upwards. In fact, this paper demonstrates, with scientific value: The loss of speed of the projectiles due to the rebound on the water; The tumbling effect in the ballistic medium within the two victims; The permanent cavities subject to the injury ballistics and the related ballistic trauma that prevented homeostasis causing bleeding in one case; The thermo-hardening deformation of the bullet found in Valentine Jelestine's skull; The upward and non-downward trajectories. The paper constitutes a tool in forensic ballistics in that it manages to reconstruct, from the final spot of the projectiles fired, all phases of ballistics like the internal one of the weapons that fired, the intermediate one, the terminal one and the penetrative structural one. In general terms the ballistics reconstruction is based on measurable parameters whose entity is contained with certainty within a lower and upper limit. Therefore, quantities that refer to angles, speed, impact energy and firing position of the shooter can be identified within the aforementioned limits. Finally, the investigation into the internal bullet track, obtained from any autopsy examination, offers a significant “lesson learned” but overall a starting point to contain or mitigate bleeding as a rescue from future gunshot wounds.

Keywords: Impact physics, intermediate ballistics, terminal ballistics, tumbling effect.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 749

7 Development of an Automatic Calibration Framework for Hydrologic Modelling Using Approximate Bayesian Computation

Authors: A. Chowdhury, P. Egodawatta, J. M. McGree, A. Goonetilleke

Abstract:

Hydrologic models are increasingly used as tools to predict stormwater quantity and quality from urban catchments. However, due to a range of practical issues, most models produce gross errors in simulating complex hydraulic and hydrologic systems. Difficulty in finding a robust approach for model calibration is one of the main issues. Though automatic calibration techniques are available, they are rarely used in common commercial hydraulic and hydrologic modelling software e.g. MIKE URBAN. This is partly due to the need for a large number of parameters and large datasets in the calibration process. To overcome this practical issue, a framework for automatic calibration of a hydrologic model was developed in R platform and presented in this paper. The model was developed based on the time-area conceptualization. Four calibration parameters, including initial loss, reduction factor, time of concentration and time-lag were considered as the primary set of parameters. Using these parameters, automatic calibration was performed using Approximate Bayesian Computation (ABC). ABC is a simulation-based technique for performing Bayesian inference when the likelihood is intractable or computationally expensive to compute. To test the performance and usefulness, the technique was used to simulate three small catchments in Gold Coast. For comparison, simulation outcomes from the same three catchments using commercial modelling software, MIKE URBAN were used. The graphical comparison shows strong agreement of MIKE URBAN result within the upper and lower 95% credible intervals of posterior predictions as obtained via ABC. Statistical validation for posterior predictions of runoff result using coefficient of determination (CD), root mean square error (RMSE) and maximum error (ME) was found reasonable for three study catchments. The main benefit of using ABC over MIKE URBAN is that ABC provides a posterior distribution for runoff flow prediction, and therefore associated uncertainty in predictions can be obtained. In contrast, MIKE URBAN just provides a point estimate. Based on the results of the analysis, it appears as though ABC the developed framework performs well for automatic calibration.

Keywords: Automatic calibration framework, approximate Bayesian computation, hydrologic and hydraulic modelling, MIKE URBAN software, R platform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672

6 Identifying Areas on the Pavement Where Rain Water Runoff Affects Motorcycle Behavior

Authors: Panagiotis Lemonakis, Theodoros Αlimonakis, George Kaliabetsos, Nikos Eliou

Abstract:

It is very well known that certain vertical and longitudinal slopes have to be assured in order to achieve adequate rainwater runoff from the pavement. The selection of longitudinal slopes, between the turning points of the vertical curves that meet the afore-mentioned requirement does not ensure adequate drainage because the same condition must also be applied at the transition curves. In this way none of the pavement edges’ slopes (as well as any other spot that lie on the pavement) will be opposite to the longitudinal slope of the rotation axis. Horizontal and vertical alignment must be properly combined in order to form a road which resultant slope does not take small values and hence, checks must be performed in every cross section and every chainage of the road. The present research investigates the rain water runoff from the road surface in order to identify the conditions under which, areas of inadequate drainage are being created, to analyze the rainwater behavior in such areas, to provide design examples of good and bad drainage zones and to track down certain motorcycle types which might encounter hazardous situations due to the presence of water film between the pavement and both of their tires resulting loss of traction. Moreover, it investigates the combination of longitudinal and cross slope values in critical pavement areas. It should be pointed out that the drainage gradient is analytically calculated for the whole road width and not just for an oblique slope per chainage (combination of longitudinal grade and cross slope). Lastly, various combinations of horizontal and vertical design are presented, indicating the crucial zones of bad pavement drainage. The key conclusion of the study is that any type of motorcycle will travel for some time inside the area of improper runoff for a certain time frame which depends on the speed and the trajectory that the rider chooses along the transition curve. Taking into account that on this section the rider will have to lean his motorcycle and hence reduce the contact area of his tire with the pavement it is apparent that any variations on the friction value due to the presence of a water film may lead to serious problems regarding his safety. The water runoff from the road pavement is improved when between reverse longitudinal slopes, crest instead of sag curve is chosen and particularly when its edges coincide with the edges of the horizontal curve. Lastly, the results of the investigation have shown that the variation of the longitudinal slope involves the vertical shift of the center of the poor water runoff area. The magnitude of this area increases as the length of the transition curve increases.

Keywords: Drainage, motorcycle safety, superelevation, transition curves, vertical grade.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 475

5 Machine Learning Techniques for Short-Term Rain Forecasting System in the Northeastern Part of Thailand

Authors: Lily Ingsrisawang, Supawadee Ingsriswang, Saisuda Somchit, Prasert Aungsuratana, Warawut Khantiyanan

Abstract:

This paper presents the methodology from machine learning approaches for short-term rain forecasting system. Decision Tree, Artificial Neural Network (ANN), and Support Vector Machine (SVM) were applied to develop classification and prediction models for rainfall forecasts. The goals of this presentation are to demonstrate (1) how feature selection can be used to identify the relationships between rainfall occurrences and other weather conditions and (2) what models can be developed and deployed for predicting the accurate rainfall estimates to support the decisions to launch the cloud seeding operations in the northeastern part of Thailand. Datasets collected during 2004-2006 from the Chalermprakiat Royal Rain Making Research Center at Hua Hin, Prachuap Khiri khan, the Chalermprakiat Royal Rain Making Research Center at Pimai, Nakhon Ratchasima and Thai Meteorological Department (TMD). A total of 179 records with 57 features was merged and matched by unique date. There are three main parts in this work. Firstly, a decision tree induction algorithm (C4.5) was used to classify the rain status into either rain or no-rain. The overall accuracy of classification tree achieves 94.41% with the five-fold cross validation. The C4.5 algorithm was also used to classify the rain amount into three classes as no-rain (0-0.1 mm.), few-rain (0.1- 10 mm.), and moderate-rain (>10 mm.) and the overall accuracy of classification tree achieves 62.57%. Secondly, an ANN was applied to predict the rainfall amount and the root mean square error (RMSE) were used to measure the training and testing errors of the ANN. It is found that the ANN yields a lower RMSE at 0.171 for daily rainfall estimates, when compared to next-day and next-2-day estimation. Thirdly, the ANN and SVM techniques were also used to classify the rain amount into three classes as no-rain, few-rain, and moderate-rain as above. The results achieved in 68.15% and 69.10% of overall accuracy of same-day prediction for the ANN and SVM models, respectively. The obtained results illustrated the comparison of the predictive power of different methods for rainfall estimation.

Keywords: Machine learning, decision tree, artificial neural network, support vector machine, root mean square error.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3178

4 Libretto Thematology in Rossini's Operas and Its Formation by the Composer

Authors: Areti Tziboula, Anna-Maria Rentzeperi-Tsonou

Abstract:

The present study examines the way Gioachino Rossini’s librettos are selected and formed demonstrating the evolutionary trajectory of the composer during his operatic career. Rossini, a dominant figure in the early 19th century Italian opera, is demanding in his choice of librettos and has a preference for subjects inspired by European literature, of his time or earlier. He begins his operatic career with farsae and operas buffae, but he mainly continues with operas seriae, to end it with a grand opera that conforms to the spirit of romanticism as manifested in Paris of his time. His farsae, operas buffae and comic operas in general are representative of the trends of the time: in some the irrational and the exaggeration prevail, in others the upheavals, others are semi-serious and emotional with a happy ending and others are comedies with more realistic characters, but usually the styles are mixed and complement each other. The stories that refer to his modern era unfold mocking human characters, beliefs attitudes and their expressions in every day habits, satirizing current affairs, presenting innovative elements in dramatic intervention and dealing with a variety of social and national issues. Count Ory, his final comic work, consists of a complex witty urban comic opera entwined with romantic sensitivity. The themes he chooses for his operas seriae are characterized by tragic passion, take place in the era of the Trojan War, the Roman Empire, the Middle Ages, and the Age of the Crusades and are set in Italy, England, Poland, Greece, Switzerland, Israel and Egypt. In his early works he sketches the characters remotely, objectively and with static, reflexive emotional expression and a happy ending. Then he continues with operas for the San Carlo Theater, which are characterized by experimentation and innovation to end up his Italian operatic career with the ostensibly backward but in fact tragic Semiramis followed in Paris by William Tell, his ultimate dramatic achievement. There are indirect references to burning issues of his era but the censorship of the time does not allow direct reference to topics that would upset the status quo. In addition, Rossini lives in a temporal period of peace after the Napoleonic Wars and by temperament he resists openly engaging in political strife. Furthermore, the need for survival necessitates the search for the more profitable contracts. In conclusion, Rossini, as a liberal personality, shapes his librettos without interruptions or setbacks, with ideas that come out after a lot of thought and a strong sense of purpose. He moves from the moral and aesthetic clarity of the classic tradition of his early works to a more elaborate and morally ambiguous romantic style in a moderate and hesitant way.

Keywords: Gioachino Rossini, libretto, nineteenth century music, opera.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 316

3 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1131