Abstracts | Computer and Systems Engineering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 411

World Academy of Science, Engineering and Technology

[Computer and Systems Engineering]

Online ISSN : 1307-6892

81 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 216
80 Programming without Code: An Approach and Environment to Conditions-On-Data Programming

Authors: Philippe Larvet

Abstract:

This paper presents the concept of an object-based programming language where tests (if... then... else) and control structures (while, repeat, for...) disappear and are replaced by conditions on data. According to the object paradigm, by using this concept, data are still embedded inside objects, as variable-value couples, but object methods are expressed into the form of logical propositions (‘conditions on data’ or COD).For instance : variable1 = value1 AND variable2 > value2 => variable3 = value3. Implementing this approach, a central inference engine turns and examines objects one after another, collecting all CODs of each object. CODs are considered as rules in a rule-based system: the left part of each proposition (left side of the ‘=>‘ sign) is the premise and the right part is the conclusion. So, premises are evaluated and conclusions are fired. Conclusions modify the variable-value couples of the object and the engine goes to examine the next object. The paper develops the principles of writing CODs instead of complex algorithms. Through samples, the paper also presents several hints for implementing a simple mechanism able to process this ‘COD language’. The proposed approach can be used within the context of simulation, process control, industrial systems validation, etc. By writing simple and rigorous conditions on data, instead of using classical and long-to-learn languages, engineers and specialists can easily simulate and validate the functioning of complex systems.

Keywords: conditions on data, logical proposition, programming without code, object-oriented programming, system simulation, system validation

Procedia PDF Downloads 195
79 Generic Early Warning Signals for Program Student Withdrawals: A Complexity Perspective Based on Critical Transitions and Fractals

Authors: Sami Houry

Abstract:

Complex systems exhibit universal characteristics as they near a tipping point. Among them are common generic early warning signals which precede critical transitions. These signals include: critical slowing down in which the rate of recovery from perturbations decreases over time; an increase in the variance of the state variable; an increase in the skewness of the state variable; an increase in the autocorrelations of the state variable; flickering between different states; and an increase in spatial correlations over time. The presence of the signals has management implications, as the identification of the signals near the tipping point could allow management to identify intervention points. Despite the applications of the generic early warning signals in various scientific fields, such as fisheries, ecology and finance, a review of literature did not identify any applications that address the program student withdrawal problem at the undergraduate distance universities. This area could benefit from the application of generic early warning signals as the program withdrawal rate amongst distance students is higher than the program withdrawal rate at face-to-face conventional universities. This research specifically assessed the generic early warning signals through an intensive case study of undergraduate program student withdrawal at a Canadian distance university. The university is non-cohort based due to its system of continuous course enrollment where students can enroll in a course at the beginning of every month. The assessment of the signals was achieved through the comparison of the incidences of generic early warning signals among students who withdrew or simply became inactive in their undergraduate program of study, the true positives, to the incidences of the generic early warning signals among graduates, the false positives. This was achieved through significance testing. Research findings showed support for the signal pertaining to the rise in flickering which is represented in the increase in the student’s non-pass rates prior to withdrawing from a program; moderate support for the signals of critical slowing down as reflected in the increase in the time a student spends in a course; and moderate support for the signals on increase in autocorrelation and increase in variance in the grade variable. The findings did not support the signal on the increase in skewness of the grade variable. The research also proposes a new signal based on the fractal-like characteristic of student behavior. The research also sought to extend knowledge by investigating whether the emergence of a program withdrawal status is self-similar or fractal-like at multiple levels of observation, specifically the program level and the course level. In other words, whether the act of withdrawal at the program level is also present at the course level. The findings moderately supported self-similarity as a potential signal. Overall, the assessment of the signals suggests that the signals, with the exception with the increase of skewness, could be utilized as a predictive management tool and potentially add one more tool, the fractal-like characteristic of withdrawal, as an additional signal in addressing the student program withdrawal problem.

Keywords: critical transitions, fractals, generic early warning signals, program student withdrawal

Procedia PDF Downloads 159
78 Foslip Loaded and CEA-Affimer Functionalised Silica Nanoparticles for Fluorescent Imaging of Colorectal Cancer Cells

Authors: Yazan S. Khaled, Shazana Shamsuddin, Jim Tiernan, Mike McPherson, Thomas Hughes, Paul Millner, David G. Jayne

Abstract:

Introduction: There is a need for real-time imaging of colorectal cancer (CRC) to allow tailored surgery to the disease stage. Fluorescence guided laparoscopic imaging of primary colorectal cancer and the draining lymphatics would potentially bring stratified surgery into clinical practice and realign future CRC management to the needs of patients. Fluorescent nanoparticles can offer many advantages in terms of intra-operative imaging and therapy (theranostic) in comparison with traditional soluble reagents. Nanoparticles can be functionalised with diverse reagents and then targeted to the correct tissue using an antibody or Affimer (artificial binding protein). We aimed to develop and test fluorescent silica nanoparticles and targeted against CRC using an anti-carcinoembryonic antigen (CEA) Affimer (Aff). Methods: Anti-CEA and control Myoglobin Affimer binders were subcloned into the expressing vector pET11 followed by transformation into BL21 Star™ (DE3) E.coli. The expression of Affimer binders was induced using 0.1 mM isopropyl β-D-1-thiogalactopyranoside (IPTG). Cells were harvested, lysed and purified using nickle chelating affinity chromatography. The photosensitiser Foslip (soluble analogue of 5,10,15,20-Tetra(m-hydroxyphenyl) chlorin) was incorporated into the core of silica nanoparticles using water-in-oil microemulsion technique. Anti-CEA or control Affs were conjugated to silica nanoparticles surface using sulfosuccinimidyl-4-(N-maleimidomethyl) cyclohexane-1-carboxylate (sulfo SMCC) chemical linker. Binding of CEA-Aff or control nanoparticles to colorectal cancer cells (LoVo, LS174T and HC116) was quantified in vitro using confocal microscopy. Results: The molecular weights of the obtained band of Affimers were ~12.5KDa while the diameter of functionalised silica nanoparticles was ~80nm. CEA-Affimer targeted nanoparticles demonstrated 9.4, 5.8 and 2.5 fold greater fluorescence than control in, LoVo, LS174T and HCT116 cells respectively (p < 0.002) for the single slice analysis. A similar pattern of successful CEA-targeted fluorescence was observed in the maximum image projection analysis, with CEA-targeted nanoparticles demonstrating 4.1, 2.9 and 2.4 fold greater fluorescence than control particles in LoVo, LS174T, and HCT116 cells respectively (p < 0.0002). There was no significant difference in fluorescence for CEA-Affimer vs. CEA-Antibody targeted nanoparticles. Conclusion: We are the first to demonstrate that Foslip-doped silica nanoparticles conjugated to anti-CEA Affimers via SMCC allowed tumour cell-specific fluorescent targeting in vitro, and had shown sufficient promise to justify testing in an animal model of colorectal cancer. CEA-Affimer appears to be a suitable targeting molecule to replace CEA-Antibody. Targeted silica nanoparticles loaded with Foslip photosensitiser is now being optimised to drive photodynamic killing, via reactive oxygen generation.

Keywords: colorectal cancer, silica nanoparticles, Affimers, antibodies, imaging

Procedia PDF Downloads 214
77 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 232
76 Assembly Training: An Augmented Reality Approach Using Design Science Research

Authors: Stefan Werrlich, Phuc-Anh Nguyen, Kai Nitsche, Gunther Notni

Abstract:

Augmented Reality (AR) is a strong growing research topic. This innovative technology is interesting for several training domains like education, medicine, military, sports and industrial use cases like assembly and maintenance tasks. AR can help to improve the efficiency, quality and transfer of training tasks. Due to these reasons, AR becomes more interesting for big companies and researchers because the industrial domain is still an unexplored field. This paper presents the research proposal of a PhD thesis which is done in cooperation with the BMW Group, aiming to explore head-mounted display (HMD) based training in industrial environments. We give a short introduction, describing the motivation, the underlying problems as well as the five formulated research questions we want to clarify along this thesis. We give a brief overview of the current assembly training in industrial environments and present some AR-based training approaches, including their research deficits. We use the Design Science Research (DSR) framework for this thesis and describe how we want to realize the seven guidelines, mandatory from the DSR. Furthermore, we describe each methodology which we use within that framework and present our approach in a comprehensive figure, representing the entire thesis.

Keywords: assembly, augmented reality, research proposal, training

Procedia PDF Downloads 222
75 A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.

Keywords: pattern recognition, global terrorism database, Manhattan distance, k-means clustering, terrorism data analysis

Procedia PDF Downloads 350
74 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 248
73 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)

Procedia PDF Downloads 246
72 Usability and Biometric Authentication of Electronic Voting System

Authors: Nighat Ayub, Masood Ahmad

Abstract:

In this paper, a new voting system is developed and its usability is evaluated. The main feature of this system is the biometric verification of the voter and then a few easy steps to cast a vote. As compared to existing systems available, e.g dual vote, the new system requires no training in advance. The security is achieved via multiple key concept (another part of this project). More than 100 student voters were participated in the election from University of Malakanad, Chakdara, PK. To achieve the reliability, the voters cast their votes in two ways, i.e. paper based and electronic based voting using our new system. The results of paper based and electronic voting system are compared and it is concluded that the voters cast their votes for the intended candidates on the electronic voting system. The voters were requested to fill a questionnaire and the results of the questionnaire are carefully analyzed. The results show that the new system proposed in this paper is more secure and usable than other systems.

Keywords: e-voting, security, usability, authentication

Procedia PDF Downloads 357
71 Use of Personal Rhythm to Authenticate Encrypted Messages

Authors: Carlos Gonzalez

Abstract:

When communicating using private and secure keys, there is always the doubt as to the identity of the message creator. We introduce an algorithm that uses the personal typing rhythm (keystroke dynamics) of the message originator to increase the trust of the authenticity of the message originator by the message recipient. The methodology proposes the use of a Rhythm Certificate Authority (RCA) to validate rhythm information. An illustrative example of the communication between Bob and Alice and the RCA is included. An algorithm of how to communicate with the RCA is presented. This RCA can be an independent authority or an enhanced Certificate Authority like the one used in public key infrastructure (PKI).

Keywords: authentication, digital signature, keystroke dynamics, personal rhythm, public-key encryption

Procedia PDF Downloads 266
70 Installing Cloud Computing Model for E-Businesses in Small Organizations

Authors: Khader Titi

Abstract:

Information technology developments have changed the way how businesses are working. Organizations are required to become visible online and stay connected to take advantages of costs reduction and improved operation of existing resources. The approval and the application areas of the cloud computing has significantly increased since it was presented by Google in 2007. Internet Cloud computing has attracted the IT enterprise attention especially the e-business enterprise. At this time, there is a great issue of environmental costs during the enterprises apply the e- business, but with the coming of cloud computing, most of the problem will be solved. Organizations around the world are facing with the continued budget challenges and increasing in the size of their computational data so, they need to find a way to deliver their services to clients as economically as possible without negotiating the achievement of anticipated outcomes. E- business companies need to provide better services to satisfy their clients. In this research, the researcher proposed a paradigm that use and deploy cloud computing technology environment to be used for e-business in small enterprises. Cloud computing might be a suitable model for implementing e-business and e-commerce architecture to improve efficiency and user satisfaction.

Keywords: E-commerce, cloud computing, B2C, SaaS

Procedia PDF Downloads 289
69 Learning Dynamic Representations of Nodes in Temporally Variant Graphs

Authors: Sandra Mitrovic, Gaurav Singh

Abstract:

In many industries, including telecommunications, churn prediction has been a topic of active research. A lot of attention has been drawn on devising the most informative features, and this area of research has gained even more focus with spread of (social) network analytics. The call detail records (CDRs) have been used to construct customer networks and extract potentially useful features. However, to the best of our knowledge, no studies including network features have yet proposed a generic way of representing network information. Instead, ad-hoc and dataset dependent solutions have been suggested. In this work, we build upon a recently presented method (node2vec) to obtain representations for nodes in observed network. The proposed approach is generic and applicable to any network and domain. Unlike node2vec, which assumes a static network, we consider a dynamic and time-evolving network. To account for this, we propose an approach that constructs the feature representation of each node by generating its node2vec representations at different timestamps, concatenating them and finally compressing using an auto-encoder-like method in order to retain reasonably long and informative feature vectors. We test the proposed method on churn prediction task in telco domain. To predict churners at timestamp ts+1, we construct training and testing datasets consisting of feature vectors from time intervals [t1, ts-1] and [t2, ts] respectively, and use traditional supervised classification models like SVM and Logistic Regression. Observed results show the effectiveness of proposed approach as compared to ad-hoc feature selection based approaches and static node2vec.

Keywords: churn prediction, dynamic networks, node2vec, auto-encoders

Procedia PDF Downloads 288
68 Reconfigurable Device for 3D Visualization of Three Dimensional Surfaces

Authors: Robson da C. Santos, Carlos Henrique de A. S. P. Coutinho, Lucas Moreira Dias, Gerson Gomes Cunha

Abstract:

The article refers to the development of an augmented reality 3D display, through the control of servo motors and projection of image with aid of video projector on the model. Augmented Reality is a branch that explores multiple approaches to increase real-world view by viewing additional information along with the real scene. The article presents the broad use of electrical, electronic, mechanical and industrial automation for geospatial visualizations, applications in mathematical models with the visualization of functions and 3D surface graphics and volumetric rendering that are currently seen in 2D layers. Application as a 3D display for representation and visualization of Digital Terrain Model (DTM) and Digital Surface Models (DSM), where it can be applied in the identification of canyons in the marine area of the Campos Basin, Rio de Janeiro, Brazil. The same can execute visualization of regions subject to landslides, as in Serra do Mar - Agra dos Reis and Serranas cities both in the State of Rio de Janeiro. From the foregoing, loss of human life and leakage of oil from pipelines buried in these regions may be anticipated in advance. The physical design consists of a table consisting of a 9 x 16 matrix of servo motors, totalizing 144 servos, a mesh is used on the servo motors for visualization of the models projected by a retro projector. Each model for by an image pre-processing, is sent to a server to be converted and viewed from a software developed in C # Programming Language.

Keywords: visualization, 3D models, servo motors, C# programming language

Procedia PDF Downloads 306
67 Taxonomic Classification for Living Organisms Using Convolutional Neural Networks

Authors: Saed Khawaldeh, Mohamed Elsharnouby, Alaa Eddin Alchalabi, Usama Pervaiz, Tajwar Aleef, Vu Hoang Minh

Abstract:

Taxonomic classification has a wide-range of applications such as finding out more about the evolutionary history of organisms that can be done by making a comparison between species living now and species that lived in the past. This comparison can be made using different kinds of extracted species’ data which include DNA sequences. Compared to the estimated number of the organisms that nature harbours, humanity does not have a thorough comprehension of which specific species they all belong to, in spite of the significant development of science and scientific knowledge over many years. One of the methods that can be applied to extract information out of the study of organisms in this regard is to use the DNA sequence of a living organism as a marker, thus making it available to classify it into a taxonomy. The classification of living organisms can be done in many machine learning techniques including Neural Networks (NNs). In this study, DNA sequences classification is performed using Convolutional Neural Networks (CNNs) which is a special type of NNs.

Keywords: deep networks, convolutional neural networks, taxonomic classification, DNA sequences classification

Procedia PDF Downloads 404
66 Modern Detection and Description Methods for Natural Plants Recognition

Authors: Masoud Fathi Kazerouni, Jens Schlemper, Klaus-Dieter Kuhnert

Abstract:

Green planet is one of the Earth’s names which is known as a terrestrial planet and also can be named the fifth largest planet of the solar system as another scientific interpretation. Plants do not have a constant and steady distribution all around the world, and even plant species’ variations are not the same in one specific region. Presence of plants is not only limited to one field like botany; they exist in different fields such as literature and mythology and they hold useful and inestimable historical records. No one can imagine the world without oxygen which is produced mostly by plants. Their influences become more manifest since no other live species can exist on earth without plants as they form the basic food staples too. Regulation of water cycle and oxygen production are the other roles of plants. The roles affect environment and climate. Plants are the main components of agricultural activities. Many countries benefit from these activities. Therefore, plants have impacts on political and economic situations and future of countries. Due to importance of plants and their roles, study of plants is essential in various fields. Consideration of their different applications leads to focus on details of them too. Automatic recognition of plants is a novel field to contribute other researches and future of studies. Moreover, plants can survive their life in different places and regions by means of adaptations. Therefore, adaptations are their special factors to help them in hard life situations. Weather condition is one of the parameters which affect plants life and their existence in one area. Recognition of plants in different weather conditions is a new window of research in the field. Only natural images are usable to consider weather conditions as new factors. Thus, it will be a generalized and useful system. In order to have a general system, distance from the camera to plants is considered as another factor. The other considered factor is change of light intensity in environment as it changes during the day. Adding these factors leads to a huge challenge to invent an accurate and secure system. Development of an efficient plant recognition system is essential and effective. One important component of plant is leaf which can be used to implement automatic systems for plant recognition without any human interface and interaction. Due to the nature of used images, characteristic investigation of plants is done. Leaves of plants are the first characteristics to select as trusty parts. Four different plant species are specified for the goal to classify them with an accurate system. The current paper is devoted to principal directions of the proposed methods and implemented system, image dataset, and results. The procedure of algorithm and classification is explained in details. First steps, feature detection and description of visual information, are outperformed by using Scale invariant feature transform (SIFT), HARRIS-SIFT, and FAST-SIFT methods. The accuracy of the implemented methods is computed. In addition to comparison, robustness and efficiency of results in different conditions are investigated and explained.

Keywords: SIFT combination, feature extraction, feature detection, natural images, natural plant recognition, HARRIS-SIFT, FAST-SIFT

Procedia PDF Downloads 243
65 Perception-Oriented Model Driven Development for Designing Data Acquisition Process in Wireless Sensor Networks

Authors: K. Indra Gandhi

Abstract:

Wireless Sensor Networks (WSNs) have always been characterized for application-specific sensing, relaying and collection of information for further analysis. However, software development was not considered as a separate entity in this process of data collection which has posed severe limitations on the software development for WSN. Software development for WSN is a complex process since the components involved are data-driven, network-driven and application-driven in nature. This implies that there is a tremendous need for the separation of concern from the software development perspective. A layered approach for developing data acquisition design based on Model Driven Development (MDD) has been proposed as the sensed data collection process itself varies depending upon the application taken into consideration. This work focuses on the layered view of the data acquisition process so as to ease the software point of development. A metamodel has been proposed that enables reusability and realization of the software development as an adaptable component for WSN systems. Further, observing users perception indicates that proposed model helps in improving the programmer's productivity by realizing the collaborative system involved.

Keywords: data acquisition, model-driven development, separation of concern, wireless sensor networks

Procedia PDF Downloads 406
64 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis

Procedia PDF Downloads 678
63 Localization of Geospatial Events and Hoax Prediction in the UFO Database

Authors: Harish Krishnamurthy, Anna Lafontant, Ren Yi

Abstract:

Unidentified Flying Objects (UFOs) have been an interesting topic for most enthusiasts and hence people all over the United States report such findings online at the National UFO Report Center (NUFORC). Some of these reports are a hoax and among those that seem legitimate, our task is not to establish that these events confirm that they indeed are events related to flying objects from aliens in outer space. Rather, we intend to identify if the report was a hoax as was identified by the UFO database team with their existing curation criterion. However, the database provides a wealth of information that can be exploited to provide various analyses and insights such as social reporting, identifying real-time spatial events and much more. We perform analysis to localize these time-series geospatial events and correlate with known real-time events. This paper does not confirm any legitimacy of alien activity, but rather attempts to gather information from likely legitimate reports of UFOs by studying the online reports. These events happen in geospatial clusters and also are time-based. We look at cluster density and data visualization to search the space of various cluster realizations to decide best probable clusters that provide us information about the proximity of such activity. A random forest classifier is also presented that is used to identify true events and hoax events, using the best possible features available such as region, week, time-period and duration. Lastly, we show the performance of the scheme on various days and correlate with real-time events where one of the UFO reports strongly correlates to a missile test conducted in the United States.

Keywords: time-series clustering, feature extraction, hoax prediction, geospatial events

Procedia PDF Downloads 345
62 Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features

Authors: Tharini N. de Silva, Xiao Zhibo, Zhao Rui, Mao Kezhi

Abstract:

Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.

Keywords: causal realtion extraction, relation extracton, convolutional neural network, text representation

Procedia PDF Downloads 685
61 Evolving Knowledge Extraction from Online Resources

Authors: Zhibo Xiao, Tharini Nayanika de Silva, Kezhi Mao

Abstract:

In this paper, we present an evolving knowledge extraction system named AKEOS (Automatic Knowledge Extraction from Online Sources). AKEOS consists of two modules, including a one-time learning module and an evolving learning module. The one-time learning module takes in user input query, and automatically harvests knowledge from online unstructured resources in an unsupervised way. The output of the one-time learning is a structured vector representing the harvested knowledge. The evolving learning module automatically schedules and performs repeated one-time learning to extract the newest information and track the development of an event. In addition, the evolving learning module summarizes the knowledge learned at different time points to produce a final knowledge vector about the event. With the evolving learning, we are able to visualize the key information of the event, discover the trends, and track the development of an event.

Keywords: evolving learning, knowledge extraction, knowledge graph, text mining

Procedia PDF Downloads 434
60 Metric Suite for Schema Evolution of a Relational Database

Authors: S. Ravichandra, D. V. L. N. Somayajulu

Abstract:

Requirement of stakeholders for adding more details to the database is the main cause of the schema evolution in the relational database. Further, this schema evolution causes the instability to the database. Hence, it is aimed to define a metric suite for schema evolution of a relational database. The metric suite will calculate the metrics based on the features of the database, analyse the queries on the database and measures the coupling, cohesion and component dependencies of the schema for existing and evolved versions of the database. This metric suite will also provide an indicator for the problems related to the stability and usability of the evolved database. The degree of change in the schema of a database is presented in the forms of graphs that acts as an indicator and also provides the relations between various parameters (metrics) related to the database architecture. The acquired information is used to defend and improve the stability of database architecture. The challenges arise in incorporating these metrics with varying parameters for formulating a suitable metric suite are discussed. To validate the proposed metric suite, an experimentation has been performed on publicly available datasets.

Keywords: cohesion, coupling, entropy, metric suite, schema evolution

Procedia PDF Downloads 421
59 MBES-CARIS Data Validation for the Bathymetric Mapping of Shallow Water in the Kingdom of Bahrain on the Arabian Gulf

Authors: Abderrazak Bannari, Ghadeer Kadhem

Abstract:

The objectives of this paper are the validation and the evaluation of MBES-CARIS BASE surface data performance for bathymetric mapping of shallow water in the Kingdom of Bahrain. The latter is an archipelago with a total land area of about 765.30 km², approximately 126 km of coastline and 8,000 km² of marine area, located in the Arabian Gulf, east of Saudi Arabia and west of Qatar (26° 00’ N, 50° 33’ E). To achieve our objectives, bathymetric attributed grid files (X, Y, and depth) generated from the coverage of ship-track MBSE data with 300 x 300 m cells, processed with CARIS-HIPS, were downloaded from the General Bathymetric Chart of the Oceans (GEBCO). Then, brought into ArcGIS and converted into a raster format following five steps: Exportation of GEBCO BASE surface data to the ASCII file; conversion of ASCII file to a points shape file; extraction of the area points covering the water boundary of the Kingdom of Bahrain and multiplying the depth values by -1 to get the negative values. Then, the simple Kriging method was used in ArcMap environment to generate a new raster bathymetric grid surface of 30×30 m cells, which was the basis of the subsequent analysis. Finally, for validation purposes, 2200 bathymetric points were extracted from a medium scale nautical map (1:100 000) considering different depths over the Bahrain national water boundary. The nautical map was scanned, georeferenced and overlaid on the MBES-CARIS generated raster bathymetric grid surface (step 5 above), and then homologous depth points were selected. Statistical analysis, expressed as a linear error at the 95% confidence level, showed a strong correlation coefficient (R² = 0.96) and a low RMSE (± 0.57 m) between the nautical map and derived MBSE-CARIS depths if we consider only the shallow areas with depths of less than 10 m (about 800 validation points). When we consider only deeper areas (> 10 m) the correlation coefficient is equal to 0.73 and the RMSE is equal to ± 2.43 m while if we consider the totality of 2200 validation points including all depths, the correlation coefficient is still significant (R² = 0.81) with satisfactory RMSE (± 1.57 m). Certainly, this significant variation can be caused by the MBSE that did not completely cover the bottom in several of the deeper pockmarks because of the rapid change in depth. In addition, steep slopes and the rough seafloor probably affect the acquired MBSE raw data. In addition, the interpolation of missed area values between MBSE acquisition swaths-lines (ship-tracked sounding data) may not reflect the true depths of these missed areas. However, globally the results of the MBES-CARIS data are very appropriate for bathymetric mapping of shallow water areas.

Keywords: bathymetry mapping, multibeam echosounder systems, CARIS-HIPS, shallow water

Procedia PDF Downloads 357
58 Potential of Hyperion (EO-1) Hyperspectral Remote Sensing for Detection and Mapping Mine-Iron Oxide Pollution

Authors: Abderrazak Bannari

Abstract:

Acid Mine Drainage (AMD) from mine wastes and contaminations of soils and water with metals are considered as a major environmental problem in mining areas. It is produced by interactions of water, air, and sulphidic mine wastes. This environment problem results from a series of chemical and biochemical oxidation reactions of sulfide minerals e.g. pyrite and pyrrhotite. These reactions lead to acidity as well as the dissolution of toxic and heavy metals (Fe, Mn, Cu, etc.) from tailings waste rock piles, and open pits. Soil and aquatic ecosystems could be contaminated and, consequently, human health and wildlife will be affected. Furthermore, secondary minerals, typically formed during weathering of mine waste storage areas when the concentration of soluble constituents exceeds the corresponding solubility product, are also important. The most common secondary mineral compositions are hydrous iron oxide (goethite, etc.) and hydrated iron sulfate (jarosite, etc.). The objectives of this study focus on the detection and mapping of MIOP in the soil using Hyperion EO-1 (Earth Observing - 1) hyperspectral data and constrained linear spectral mixture analysis (CLSMA) algorithm. The abandoned Kettara mine, located approximately 35 km northwest of Marrakech city (Morocco) was chosen as study area. During 44 years (from 1938 to 1981) this mine was exploited for iron oxide and iron sulphide minerals. Previous studies have shown that Kettara surrounding soils are contaminated by heavy metals (Fe, Cu, etc.) as well as by secondary minerals. To achieve our objectives, several soil samples representing different MIOP classes have been resampled and located using accurate GPS ( ≤ ± 30 cm). Then, endmembers spectra were acquired over each sample using an Analytical Spectral Device (ASD) covering the spectral domain from 350 to 2500 nm. Considering each soil sample separately, the average of forty spectra was resampled and convolved using Gaussian response profiles to match the bandwidths and the band centers of the Hyperion sensor. Moreover, the MIOP content in each sample was estimated by geochemical analyses in the laboratory, and a ground truth map was generated using simple Kriging in GIS environment for validation purposes. The acquired and used Hyperion data were corrected for a spatial shift between the VNIR and SWIR detectors, striping, dead column, noise, and gain and offset errors. Then, atmospherically corrected using the MODTRAN 4.2 radiative transfer code, and transformed to surface reflectance, corrected for sensor smile (1-3 nm shift in VNIR and SWIR), and post-processed to remove residual errors. Finally, geometric distortions and relief displacement effects were corrected using a digital elevation model. The MIOP fraction map was extracted using CLSMA considering the entire spectral range (427-2355 nm), and validated by reference to the ground truth map generated by Kriging. The obtained results show the promising potential of the proposed methodology for the detection and mapping of mine iron oxide pollution in the soil.

Keywords: hyperion eo-1, hyperspectral, mine iron oxide pollution, environmental impact, unmixing

Procedia PDF Downloads 203
57 The Enhancement of Target Localization Using Ship-Borne Electro-Optical Stabilized Platform

Authors: Jaehoon Ha, Byungmo Kang, Kilho Hong, Jungsoo Park

Abstract:

Electro-optical (EO) stabilized platforms have been widely used for surveillance and reconnaissance on various types of vehicles, from surface ships to unmanned air vehicles (UAVs). EO stabilized platforms usually consist of an assembly of structure, bearings, and motors called gimbals in which a gyroscope is installed. EO elements such as a CCD camera and IR camera, are mounted to a gimbal, which has a range of motion in elevation and azimuth and can designate and track a target. In addition, a laser range finder (LRF) can be added to the gimbal in order to acquire the precise slant range from the platform to the target. Recently, a versatile functionality of target localization is needed in order to cooperate with the weapon systems that are mounted on the same platform. The target information, such as its location or velocity, needed to be more accurate. The accuracy of the target information depends on diverse component errors and alignment errors of each component. Specially, the type of moving platform can affect the accuracy of the target information. In the case of flying platforms, or UAVs, the target location error can be increased with altitude so it is important to measure altitude as precisely as possible. In the case of surface ships, target location error can be increased with obliqueness of the elevation angle of the gimbal since the altitude of the EO stabilized platform is supposed to be relatively low. The farther the slant ranges from the surface ship to the target, the more extreme the obliqueness of the elevation angle. This can hamper the precise acquisition of the target information. So far, there have been many studies on EO stabilized platforms of flying vehicles. However, few researchers have focused on ship-borne EO stabilized platforms of the surface ship. In this paper, we deal with a target localization method when an EO stabilized platform is located on the mast of a surface ship. Especially, we need to overcome the limitation caused by the obliqueness of the elevation angle of the gimbal. We introduce a well-known approach for target localization using Unscented Kalman Filter (UKF) and present the problem definition showing the above-mentioned limitation. Finally, we want to show the effectiveness of the approach that will be demonstrated through computer simulations.

Keywords: target localization, ship-borne electro-optical stabilized platform, unscented kalman filter

Procedia PDF Downloads 491
56 Importance of Remote Sensing and Information Communication Technology to Improve Climate Resilience in Low Land of Ethiopia

Authors: Hasen Keder Edris, Ryuji Matsunaga, Toshi Yamanaka

Abstract:

The issue of climate change and its impact is a major contemporary global concern. Ethiopia is one of the countries experiencing adverse climate change impact including frequent extreme weather events that are exacerbating drought and water scarcity. Due to this reason, the government of Ethiopia develops a strategic document which focuses on the climate resilience green economy. One of the major components of the strategic framework is designed to improve community adaptation capacity and mitigation of drought. For effective implementation of the strategy, identification of regions relative vulnerability to drought is vital. There is a growing tendency of applying Geographic Information System (GIS) and Remote Sensing technologies for collecting information on duration and severity of drought by direct measure of the topography as well as an indirect measure of land cover. This study aims to show an application of remote sensing technology and GIS for developing drought vulnerability index by taking lowland of Ethiopia as a case study. In addition, it assesses integrated Information Communication Technology (ICT) potential of Ethiopia lowland and proposes integrated solution. Satellite data is used to detect the beginning of the drought. The severity of drought risk prone areas of livestock keeping pastoral is analyzed through normalized difference vegetation index (NDVI) and ten years rainfall data. The change from the existing and average SPOT NDVI and vegetation condition index is used to identify the onset of drought and potential risks. Secondary data is used to analyze geographical coverage of mobile and internet usage in the region. For decades, the government of Ethiopia introduced some technologies and approach to overcoming climate change related problems. However, lack of access to information and inadequate technical support for the pastoral area remains a major challenge. In conventional business as usual approach, the lowland pastorals continue facing a number of challenges. The result indicated that 80% of the region face frequent drought occurrence and out of this 60% of pastoral area faces high drought risk. On the other hand, the target area mobile phone and internet coverage is rapidly growing. One of identified ICT solution enabler technology is telecom center which covers 98% of the region. It was possible to identify the frequently affected area and potential drought risk using the NDVI remote-sensing data analyses. We also found that ICT can play an important role in mitigating climate change challenge. Hence, there is a need to strengthen implementation efforts of climate change adaptation through integrated Remote Sensing and web based information dissemination and mobile alert of extreme events.

Keywords: climate changes, ICT, pastoral, remote sensing

Procedia PDF Downloads 279
55 Attributes That Influence Respondents When Choosing a Mate in Internet Dating Sites: An Innovative Matching Algorithm

Authors: Moti Zwilling, Srečko Natek

Abstract:

This paper aims to present an innovative predictive analytics analysis in order to find the best combination between two consumers who strive to find their partner or in internet sites. The methodology shown in this paper is based on analysis of consumer preferences and involves data mining and machine learning search techniques. The study is composed of two parts: The first part examines by means of descriptive statistics the correlations between a set of parameters that are taken between man and women where they intent to meet each other through the social media, usually the internet. In this part several hypotheses were examined and statistical analysis were taken place. Results show that there is a strong correlation between the affiliated attributes of man and woman as long as concerned to how they present themselves in a social media such as "Facebook". One interesting issue is the strong desire to develop a serious relationship between most of the respondents. In the second part, the authors used common data mining algorithms to search and classify the most important and effective attributes that affect the response rate of the other side. Results exhibit that personal presentation and education background are found as most affective to achieve a positive attitude to one's profile from the other mate.

Keywords: dating sites, social networks, machine learning, decision trees, data mining

Procedia PDF Downloads 275
54 A Preliminary Study for Building an Arabic Corpus of Pair Questions-Texts from the Web: Aqa-Webcorp

Authors: Wided Bakari, Patrce Bellot, Mahmoud Neji

Abstract:

With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a database of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document.

Keywords: Arabic, web, corpus, search engine, URL, question, corpus building, script, Google, html, txt

Procedia PDF Downloads 294
53 Satellite Imagery Classification Based on Deep Convolution Network

Authors: Zhong Ma, Zhuping Wang, Congxin Liu, Xiangzeng Liu

Abstract:

Satellite imagery classification is a challenging problem with many practical applications. In this paper, we designed a deep convolution neural network (DCNN) to classify the satellite imagery. The contributions of this paper are twofold — First, to cope with the large-scale variance in the satellite image, we introduced the inception module, which has multiple filters with different size at the same level, as the building block to build our DCNN model. Second, we proposed a genetic algorithm based method to efficiently search the best hyper-parameters of the DCNN in a large search space. The proposed method is evaluated on the benchmark database. The results of the proposed hyper-parameters search method show it will guide the search towards better regions of the parameter space. Based on the found hyper-parameters, we built our DCNN models, and evaluated its performance on satellite imagery classification, the results show the classification accuracy of proposed models outperform the state of the art method.

Keywords: satellite imagery classification, deep convolution network, genetic algorithm, hyper-parameter optimization

Procedia PDF Downloads 270
52 Neighbor Caring Environment System (NCE) Using Parallel Replication Mechanism

Authors: Ahmad Shukri Mohd Noor, Emma Ahmad Sirajudin, Rabiei Mamat

Abstract:

Pertaining to a particular Marine interest, the process of data sampling could take years before a study can be concluded. Therefore, the need for a robust backup system for the data is invariably implicit. In recent advancement of Marine applications, more functionalities and tools are integrated to assist the work of the researchers. It is anticipated that this modality will continue as research scope widens and intensifies and at the same to follow suit with current technologies and lifestyles. The convenience to collect and share information these days also applies to the work in Marine research. Therefore, Marine system designers should be aware that high availability is a necessary attribute in Marine repository applications as well as a robust backup system for the data. In this paper, the approach to high availability is related both to hardware and software but the focus is more on software. We consider a NABTIC repository system that is primitively built on a single server and does not have replicated components. First, the system is decomposed into separate modules. The modules are placed on multiple servers to create a distributed system. Redundancy is added by placing the copies of the modules on different servers using Neighbor Caring Environment System(NCES) technique. NCER is utilizing parallel replication components mechanism. A background monitoring is established to check servers’ heartbeats to confirm their aliveness. At the same time, a critical adaptive threshold is maintained to make sure a failure is timely detected using Adaptive Fault Detection (AFD). A confirmed failure will set the recovery mode where a selection process will be done before a fail-over server is instructed. In effect, the Marine repository service is continued as the fail-over masks a recent failure. The performance of the new prototype is tested and is confirmed to be more highly available. Furthermore, the downtime is not noticeable as service is immediately restored automatically. The Marine repository system is said to have achieved fault tolerance.

Keywords: availability, fault detection, replication, fault tolerance, marine application

Procedia PDF Downloads 287