Search results for: text mining.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1058

Search results for: text mining.

158 Optimizing Spatial Trend Detection By Artificial Immune Systems

Authors: M. Derakhshanfar, B. Minaei-Bidgoli

Abstract:

Spatial trends are one of the valuable patterns in geo databases. They play an important role in data analysis and knowledge discovery from spatial data. A spatial trend is a regular change of one or more non spatial attributes when spatially moving away from a start object. Spatial trend detection is a graph search problem therefore heuristic methods can be good solution. Artificial immune system (AIS) is a special method for searching and optimizing. AIS is a novel evolutionary paradigm inspired by the biological immune system. The models based on immune system principles, such as the clonal selection theory, the immune network model or the negative selection algorithm, have been finding increasing applications in fields of science and engineering. In this paper, we develop a novel immunological algorithm based on clonal selection algorithm (CSA) for spatial trend detection. We are created neighborhood graph and neighborhood path, then select spatial trends that their affinity is high for antibody. In an evolutionary process with artificial immune algorithm, affinity of low trends is increased with mutation until stop condition is satisfied.

Keywords: Spatial Data Mining, Spatial Trend Detection, Heuristic Methods, Artificial Immune System, Clonal Selection Algorithm (CSA)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2048
157 Speaker Identification using Neural Networks

Authors: R.V Pawar, P.P.Kajave, S.N.Mali

Abstract:

The speech signal conveys information about the identity of the speaker. The area of speaker identification is concerned with extracting the identity of the person speaking the utterance. As speech interaction with computers becomes more pervasive in activities such as the telephone, financial transactions and information retrieval from speech databases, the utility of automatically identifying a speaker is based solely on vocal characteristic. This paper emphasizes on text dependent speaker identification, which deals with detecting a particular speaker from a known population. The system prompts the user to provide speech utterance. System identifies the user by comparing the codebook of speech utterance with those of the stored in the database and lists, which contain the most likely speakers, could have given that speech utterance. The speech signal is recorded for N speakers further the features are extracted. Feature extraction is done by means of LPC coefficients, calculating AMDF, and DFT. The neural network is trained by applying these features as input parameters. The features are stored in templates for further comparison. The features for the speaker who has to be identified are extracted and compared with the stored templates using Back Propogation Algorithm. Here, the trained network corresponds to the output; the input is the extracted features of the speaker to be identified. The network does the weight adjustment and the best match is found to identify the speaker. The number of epochs required to get the target decides the network performance.

Keywords: Average Mean Distance function, Backpropogation, Linear Predictive Coding, MultilayeredPerceptron,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1894
156 Application of Universal Distribution Factors for Real-Time Complex Power Flow Calculation

Authors: Abdullah M. Alodhaiani, Yasir A. Alturki, Mohamed A. Elkady

Abstract:

Complex power flow distribution factors, which relate line complex power flows to the bus injected complex powers, have been widely used in various power system planning and analysis studies. In particular, AC distribution factors have been used extensively in the recent power and energy pricing studies in free electricity market field. As was demonstrated in the existing literature, many of the electricity market related costing studies rely on the use of the distribution factors. These known distribution factors, whether the injection shift factors (ISF’s) or power transfer distribution factors (PTDF’s), are linear approximations of the first order sensitivities of the active power flows with respect to various variables. This paper presents a novel model for evaluating the universal distribution factors (UDF’s), which are appropriate for an extensive range of power systems analysis and free electricity market studies. These distribution factors are used for the calculations of lines complex power flows and its independent of bus power injections, they are compact matrix-form expressions with total flexibility in determining the position on the line at which line flows are measured. The proposed approach was tested on IEEE 9-Bus system. Numerical results demonstrate that the proposed approach is very accurate compared with exact method.

Keywords: Distribution Factors, Power System, Sensitivity Factors, Electricity Market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2627
155 A Weighted-Profiling Using an Ontology Basefor Semantic-Based Search

Authors: Hikmat A. M. Abd-El-Jaber, Tengku M. T. Sembok

Abstract:

The information on the Web increases tremendously. A number of search engines have been developed for searching Web information and retrieving relevant documents that satisfy the inquirers needs. Search engines provide inquirers irrelevant documents among search results, since the search is text-based rather than semantic-based. Information retrieval research area has presented a number of approaches and methodologies such as profiling, feedback, query modification, human-computer interaction, etc for improving search results. Moreover, information retrieval has employed artificial intelligence techniques and strategies such as machine learning heuristics, tuning mechanisms, user and system vocabularies, logical theory, etc for capturing user's preferences and using them for guiding the search based on the semantic analysis rather than syntactic analysis. Although a valuable improvement has been recorded on search results, the survey has shown that still search engines users are not really satisfied with their search results. Using ontologies for semantic-based searching is likely the key solution. Adopting profiling approach and using ontology base characteristics, this work proposes a strategy for finding the exact meaning of the query terms in order to retrieve relevant information according to user needs. The evaluation of conducted experiments has shown the effectiveness of the suggested methodology and conclusion is presented.

Keywords: information retrieval, user profiles, semantic Web, ontology, search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3219
154 Parkinsons Disease Classification using Neural Network and Feature Selection

Authors: Anchana Khemphila, Veera Boonjing

Abstract:

In this study, the Multi-Layer Perceptron (MLP)with Back-Propagation learning algorithm are used to classify to effective diagnosis Parkinsons disease(PD).It-s a challenging problem for medical community.Typically characterized by tremor, PD occurs due to the loss of dopamine in the brains thalamic region that results in involuntary or oscillatory movement in the body. A feature selection algorithm along with biomedical test values to diagnose Parkinson disease.Clinical diagnosis is done mostly by doctor-s expertise and experience.But still cases are reported of wrong diagnosis and treatment. Patients are asked to take number of tests for diagnosis.In many cases,not all the tests contribute towards effective diagnosis of a disease.Our work is to classify the presence of Parkinson disease with reduced number of attributes.Original,22 attributes are involved in classify.We use Information Gain to determine the attributes which reduced the number of attributes which is need to be taken from patients.The Artificial neural networks is used to classify the diagnosis of patients.Twenty-Two attributes are reduced to sixteen attributes.The accuracy is in training data set is 82.051% and in the validation data set is 83.333%.

Keywords: Data mining, classification, Parkinson disease, artificial neural networks, feature selection, information gain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3779
153 A New DIDS Design Based on a Combination Feature Selection Approach

Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman

Abstract:

Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.

Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1940
152 Predictive Functional Control with Disturbance Observer for Tendon-Driven Balloon Actuator

Authors: Jun-ya Nagase, Toshiyuki Satoh, Norihiko Saga, Koichi Suzumori

Abstract:

In recent years, Japanese society has been aging, engendering a labor shortage of young workers. Robots are therefore expected to perform tasks such as rehabilitation, nursing elderly people, and day-to-day work support for elderly people. The pneumatic balloon actuator is a rubber artificial muscle developed for use in a robot hand in such environments. This actuator has a long stroke and a high power-to-weight ratio compared with the present pneumatic artificial muscle. Moreover, the dynamic characteristics of this actuator resemble those of human muscle. This study evaluated characteristics of force control of balloon actuator using a predictive functional control (PFC) system with disturbance observer. The predictive functional control is a model-based predictive control (MPC) scheme that predicts the future outputs of the actual plants over the prediction horizon and computes the control effort over the control horizon at every sampling instance. For this study, a 1-link finger system using a pneumatic balloon actuator is developed. Then experiments of PFC control with disturbance observer are performed. These experiments demonstrate the feasibility of its control of a pneumatic balloon actuator for a robot hand.

Keywords: Disturbance observer, Pneumatic balloon, Predictive functional control, Rubber artificial muscle.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2421
151 Confucius about the Ideals of Man and the Moral Dignity

Authors: N. Kudaibergenova, S. Edilbay, S. Rysbekova, Zh. Amirkulova, G. Zhumatayev

Abstract:

Confucius was a fifth-century BCE Chinese thinker whose influence upon East Asian intellectual and social history is immeasurable. Better known is in China as “Master Kong”. As a culturally symbolic figure, he has been alternately idealized, deified, dismissed, vilified, and rehabilitated over the millennia by both Asian and non-Asian thinkers and regimes. Given his extraordinary impact on Chinese, Korean, Japanese, and Vietnamese thought, it is ironic that so little can be known about Confucius. The tradition that bears his name – “Confucianizm” (Chinese: Rujia) – ultimately traces itself to the sayings and biographical fragments recorded in the text known as the Analects (Chinese: Lunyu). In the Analects, two types of persons are opposed to one another – not in terms of basic potential, but in terms of developed potential. These are the junzi (literally, “lord’s son” or “gentleman”) and the xiaoren (“small person”). The junzi is the person who always manifests the quality of ren in his person and the displays the quality of lee in his actions. In this article examines the category of the ideal man and the spiritual and moral values of the philosophy of Confucius. According to Confucius high-morality Jun-zi is characterized by two things: a sense of humanity and duty. This article provides an analysis of the ethical category for the ideal man. 

Keywords: Confucius, Humanity, Men Zi, Lun Yui, Ideal man, Zhun Yun.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2762
150 Improving Spatiotemporal Change Detection: A High Level Fusion Approach for Discovering Uncertain Knowledge from Satellite Image Database

Authors: Wadii Boulila, Imed Riadh Farah, Karim Saheb Ettabaa, Basel Solaiman, Henda Ben Ghezala

Abstract:

This paper investigates the problem of tracking spa¬tiotemporal changes of a satellite image through the use of Knowledge Discovery in Database (KDD). The purpose of this study is to help a given user effectively discover interesting knowledge and then build prediction and decision models. Unfortunately, the KDD process for spatiotemporal data is always marked by several types of imperfections. In our paper, we take these imperfections into consideration in order to provide more accurate decisions. To achieve this objective, different KDD methods are used to discover knowledge in satellite image databases. Each method presents a different point of view of spatiotemporal evolution of a query model (which represents an extracted object from a satellite image). In order to combine these methods, we use the evidence fusion theory which considerably improves the spatiotemporal knowledge discovery process and increases our belief in the spatiotemporal model change. Experimental results of satellite images representing the region of Auckland in New Zealand depict the improvement in the overall change detection as compared to using classical methods.

Keywords: Knowledge discovery in satellite databases, knowledge fusion, data imperfection, data mining, spatiotemporal change detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1547
149 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings

Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank

Abstract:

Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].

Keywords: data mining, protein secondary structure prediction, parallelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1596
148 Identification of Most Frequently Occurring Lexis in Body-enhancement Medicinal Unsolicited Bulk e-mails

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of more than 2700 body enhancement medicinal UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the UBE documents that advertise various products for body enhancement. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexis-set in the given UBE and the probability that the given UBE will be the one advertising for fake medicinal product. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in such UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.

Keywords: Body Enhancement, Lexis, Medicinal, Unsolicited Bulk e-mail (UBE), Vector Space Document Model, Viagra

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3509
147 Designing a Framework for Network Security Protection

Authors: Eric P. Jiang

Abstract:

As the Internet continues to grow at a rapid pace as the primary medium for communications and commerce and as telecommunication networks and systems continue to expand their global reach, digital information has become the most popular and important information resource and our dependence upon the underlying cyber infrastructure has been increasing significantly. Unfortunately, as our dependency has grown, so has the threat to the cyber infrastructure from spammers, attackers and criminal enterprises. In this paper, we propose a new machine learning based network intrusion detection framework for cyber security. The detection process of the framework consists of two stages: model construction and intrusion detection. In the model construction stage, a semi-supervised machine learning algorithm is applied to a collected set of network audit data to generate a profile of normal network behavior and in the intrusion detection stage, input network events are analyzed and compared with the patterns gathered in the profile, and some of them are then flagged as anomalies should these events are sufficiently far from the expected normal behavior. The proposed framework is particularly applicable to the situations where there is only a small amount of labeled network training data available, which is very typical in real world network environments.

Keywords: classification, data analysis and mining, network intrusion detection, semi-supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1795
146 Secure Protocol for Short Message Service

Authors: Shubat S. Ahmeda, Ashraf M. Ali Edwila

Abstract:

Short Message Service (SMS) has grown in popularity over the years and it has become a common way of communication, it is a service provided through General System for Mobile Communications (GSM) that allows users to send text messages to others. SMS is usually used to transport unclassified information, but with the rise of mobile commerce it has become a popular tool for transmitting sensitive information between the business and its clients. By default SMS does not guarantee confidentiality and integrity to the message content. In the mobile communication systems, security (encryption) offered by the network operator only applies on the wireless link. Data delivered through the mobile core network may not be protected. Existing end-to-end security mechanisms are provided at application level and typically based on public key cryptosystem. The main concern in a public-key setting is the authenticity of the public key; this issue can be resolved by identity-based (IDbased) cryptography where the public key of a user can be derived from public information that uniquely identifies the user. This paper presents an encryption mechanism based on the IDbased scheme using Elliptic curves to provide end-to-end security for SMS. This mechanism has been implemented over the standard SMS network architecture and the encryption overhead has been estimated and compared with RSA scheme. This study indicates that the ID-based mechanism has advantages over the RSA mechanism in key distribution and scalability of increasing security level for mobile service.

Keywords: Elliptic Curve Cryptography (ECC), End-to-end Security, Identity-based Cryptography, Public Key, RSA, SMS Protocol.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2224
145 Collaborative Online Learning for Lecturers

Authors: Lee Bih Ni, Emily Doreen Lee, Wee Hui Yean

Abstract:

This paper was prepared to see the perceptions of online lectures regarding collaborative learning, in terms of how lecturers view online collaborative learning in the higher learning institution. The purpose of this study was conducted to determine the perceptions of online lectures about collaborative learning, especially how lecturers see online collaborative learning in the university. Adult learning education enhance collaborative learning culture with the target of involving learners in the learning process to make teaching and learning more effective and open at the university. This will finally make students learning that will assist each other. It is also to cut down the pressure of loneliness and isolation might felt among adult learners. Their ways in collaborative online was also determined. In this paper, researchers collect data using questionnaires instruments. The collected data were analyzed and interpreted. By analyzing the data, researchers report the results according the proof taken from the respondents. Results from the study, it is not only dependent on the lecturer but also a student to shape a good collaborative learning practice. Rational concepts and pattern to achieve these targets be clear right from the beginning and may be good seen by a number of proposals submitted and include how the higher learning institution has trained with ongoing lectures online. Advantages of online collaborative learning show that lecturers should be trained effectively. Studies have seen that the lecturer aware of online collaborative learning. This positive attitude will encourage the higher learning institution to continue to give the knowledge and skills required.

Keywords: Collaborative Online Learning, Lecturers’ Training.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2453
144 Comparison of Authentication Methods in Internet of Things Technology

Authors: Hafizah Che Hasan, Fateen Nazwa Yusof, Maslina Daud

Abstract:

Internet of Things (IoT) is a powerful industry system, which end-devices are interconnected and automated, allowing the devices to analyze data and execute actions based on the analysis. The IoT technology leverages the technology of Radio-Frequency Identification (RFID) and Wireless Sensor Network (WSN), including mobile and sensor. These technologies contribute to the evolution of IoT. However, due to more devices are connected each other in the Internet, and data from various sources exchanged between things, confidentiality of the data becomes a major concern. This paper focuses on one of the major challenges in IoT; authentication, in order to preserve data integrity and confidentiality are in place. A few solutions are reviewed based on papers from the last few years. One of the proposed solutions is securing the communication between IoT devices and cloud servers with Elliptic Curve Cryptograhpy (ECC) based mutual authentication protocol. This solution focuses on Hyper Text Transfer Protocol (HTTP) cookies as security parameter.  Next proposed solution is using keyed-hash scheme protocol to enable IoT devices to authenticate each other without the presence of a central control server. Another proposed solution uses Physical Unclonable Function (PUF) based mutual authentication protocol. It emphasizes on tamper resistant and resource-efficient technology, which equals a 3-way handshake security protocol.

Keywords: Internet of Things, authentication, PUF ECC, keyed hash scheme protocol.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1799
143 Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms

Authors: Aqsa Ali, Aleem Mushtaq, Attaullah Memon, Monna

Abstract:

In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.

Keywords: American sign language, assistive hand gesture interpreter, human-machine interface, machine learning, sensing glove.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2731
142 Mixtures of Monotone Networks for Prediction

Authors: Marina Velikova, Hennie Daniels, Ad Feelders

Abstract:

In many data mining applications, it is a priori known that the target function should satisfy certain constraints imposed by, for example, economic theory or a human-decision maker. In this paper we consider partially monotone prediction problems, where the target variable depends monotonically on some of the input variables but not on all. We propose a novel method to construct prediction models, where monotone dependences with respect to some of the input variables are preserved by virtue of construction. Our method belongs to the class of mixture models. The basic idea is to convolute monotone neural networks with weight (kernel) functions to make predictions. By using simulation and real case studies, we demonstrate the application of our method. To obtain sound assessment for the performance of our approach, we use standard neural networks with weight decay and partially monotone linear models as benchmark methods for comparison. The results show that our approach outperforms partially monotone linear models in terms of accuracy. Furthermore, the incorporation of partial monotonicity constraints not only leads to models that are in accordance with the decision maker's expertise, but also reduces considerably the model variance in comparison to standard neural networks with weight decay.

Keywords: mixture models, monotone neural networks, partially monotone models, partially monotone problems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1247
141 M2LGP: Mining Multiple Level Gradual Patterns

Authors: Yogi Satrya Aryadinata, Anne Laurent, Michel Sala

Abstract:

Gradual patterns have been studied for many years as they contain precious information. They have been integrated in many expert systems and rule-based systems, for instance to reason on knowledge such as “the greater the number of turns, the greater the number of car crashes”. In many cases, this knowledge has been considered as a rule “the greater the number of turns → the greater the number of car crashes” Historically, works have thus been focused on the representation of such rules, studying how implication could be defined, especially fuzzy implication. These rules were defined by experts who were in charge to describe the systems they were working on in order to turn them to operate automatically. More recently, approaches have been proposed in order to mine databases for automatically discovering such knowledge. Several approaches have been studied, the main scientific topics being: how to determine what is an relevant gradual pattern, and how to discover them as efficiently as possible (in terms of both memory and CPU usage). However, in some cases, end-users are not interested in raw level knowledge, and are rather interested in trends. Moreover, it may be the case that no relevant pattern can be discovered at a low level of granularity (e.g. city), whereas some can be discovered at a higher level (e.g. county). In this paper, we thus extend gradual pattern approaches in order to consider multiple level gradual patterns. For this purpose, we consider two aggregation policies, namely horizontal and vertical.

Keywords: Gradual Pattern.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1500
140 The Robust Clustering with Reduction Dimension

Authors: Dyah E. Herwindiati

Abstract:

A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paper

Keywords: Breakdown point, Consistency, 2DPCA, PCA, Outlier, Vector Variance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1698
139 Sliding Joints and Soil-Structure Interaction

Authors: Radim Cajka, Pavlina Mateckova, Martina Janulikova, Marie Stara

Abstract:

Use of a sliding joint is an effective method to decrease the stress in foundation structure where there is a horizontal deformation of subsoil (areas afflicted with underground mining) or horizontal deformation of a foundation structure (pre-stressed foundations, creep, shrinkage, temperature deformation). A convenient material for a sliding joint is a bitumen asphalt belt. Experiments for different types of bitumen belts were undertaken at the Faculty of Civil Engineering - VSB Technical University of Ostrava in 2008. This year an extension of the 2008 experiments is in progress and the shear resistance of a slide joint is being tested as a function of temperature in a temperature controlled room. In this paper experimental results of temperature dependant shear resistance are presented. The result of the experiments should be the sliding joint shear resistance as a function of deformation velocity and temperature. This relationship is used for numerical analysis of stress/strain relation between foundation structure and subsoil. Using a rheological slide joint could lead to a decrease of the reinforcement amount, and contribute to higher reliability of foundation structure and thus enable design of more durable and sustainable building structures.

Keywords: Pre-stressed foundations, sliding joint, soil-structure interaction, subsoil horizontal deformation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2016
138 Twitter Sentiment Analysis during the Lockdown on New Zealand

Authors: Smah Doeban Almotiri

Abstract:

One of the most common fields of natural language processing (NLP) is sentimental analysis. The inferred feeling in the text can be successfully mined for various events using sentiment analysis. Twitter is viewed as a reliable data point for sentimental analytics studies since people are using social media to receive and exchange different types of data on a broad scale during the COVID-19 epidemic. The processing of such data may aid in making critical decisions on how to keep the situation under control. The aim of this research is to look at how sentimental states differed in a single geographic region during the lockdown at two different times.1162 tweets were analyzed related to the COVID-19 pandemic lockdown using keywords hashtags (lockdown, COVID-19) for the first sample tweets were from March 23, 2020, until April 23, 2020, and the second sample for the following year was from March 1, 2021, until April 4, 2021. Natural language processing (NLP), which is a form of Artificial intelligent was used for this research to calculate the sentiment value of all of the tweets by using AFINN Lexicon sentiment analysis method. The findings revealed that the sentimental condition in both different times during the region's lockdown was positive in the samples of this study, which are unique to the specific geographical area of New Zealand. This research suggests applied machine learning sentimental method such as Crystal Feel and extended the size of the sample tweet by using multiple tweets over a longer period of time.

Keywords: sentiment analysis, Twitter analysis, lockdown, Covid-19, AFINN, NodeJS

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 586
137 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1272
136 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

Authors: Ghada Badr, Arwa Alturki

Abstract:

The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.

Keywords: Alignment, RNA secondary structure, pairwise, component-based, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974
135 Environmental Sanitation Dilemma in the Tamale Metropolis, Ghana

Authors: Paul N. Napari, Patrick B. Cobbinah

Abstract:

The 21st century has been characterized by rapid urbanization with its associated environmental sanitation challenges especially in developing countries. However, studies have focused largely on institutional capacity and the resources needed to manage environmental sanitation challenges, with few insights on the attitudes of city residents. This paper analyzes the environmental sanitation situation in a rapidly urbanizing Tamale metropolis, examines how city residents’ attitudes have contributed to poor environmental sanitation and further reviews approaches that have been employed to manage environmental sanitation. Using secondary and empirical data sources, the paper reveals that only 7.5 tons of 150 tons of total daily solid wastes generated is effectively managed. The findings suggest that the poor sanitation in the city is influenced by two factors; poor attitudes of city residents and weak institutions. While poor attitudes towards environmental sanitation has resulted in indiscriminate disposal of waste, weak institutions have resulted in lack of capacity and pragmatic interventions to manage the environmental sanitation challenges in the city. The paper recommends public education on environmental sanitation, public private partnership, increased stakeholder engagement and preparation and implementation of environmental sanitation plan as mechanisms to ensure effective environmental sanitation management in the Tamale metropolis.

Keywords: Environmental sanitation, developing countries, waste management, developing countries, Tamale, urbanization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3980
134 Effects of Livestream Affordances on Consumer Purchase Willingness: Explicit IT Affordances Perspective

Authors: Isaac O. Asante, Yushi Jiang, Hailin Tao

Abstract:

Livestreaming marketing, the new electronic commerce element, has become an optional marketing channel following the COVID-19 pandemic, and many sellers are leveraging the features presented by livestreaming to increase sales. This study was conducted to measure real-time observable interactions between consumers and sellers. Based on the affordance theory, this study conceptualized constructs representing the interactive features and examined how they drive consumers’ purchase willingness during livestreaming sessions using 1238 datasets from Amazon Live, following the manual observation of transaction records. Using structural equation modeling, the ordinary least square regression suggests that live viewers, new followers, live chats, and likes positively affect purchase willingness. The Sobel and Monte Carlo tests show that new followers, live chats, and likes significantly mediate the relationship between live viewers and purchase willingness. The study presents a way of measuring interactions in livestreaming commerce and proposes a way to manually gather data on consumer behaviors in livestreaming platforms when the application programming interface (API) of such platforms does not support data mining algorithms.

Keywords: Livestreaming marketing, live chats, live viewers, likes, new followers, purchase willingness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 156
133 Computer Simulation of Low Volume Roads Made from Recycled Materials

Authors: Aleš Florian, Lenka Ševelová

Abstract:

Low volume roads are widely used all over the world. To improve their quality the computer simulation of their behavior is proposed. The FEM model enables to determine stress and displacement conditions in the pavement and/or also in the particular material layers. Different variants of pavement layers, material used, humidity as well as loading conditions can be studied. Among others, the input information about material properties of individual layers made from recycled materials is crucial for obtaining results as exact as possible. For this purpose the cyclic-load triaxial test machine testing of cyclic-load performance of materials is a promising test method. The test is able to simulate the real traffic loading on particular materials taking into account the changes in the horizontal stress conditions produced in particular layers by crossings of vehicles. Also the test specimen can be prepared with different amount of water. Thus modulus of elasticity (Young modulus) of different materials including recycled ones can be measured under the different conditions of horizontal and vertical stresses as well as under the different humidity conditions. Using the proposed testing procedure the modulus of elasticity of recycled materials used in the newly built low volume road is obtained under different stress and humidity conditions set to standard, dry and fully saturated level. Obtained values of modulus of elasticity are used in FEA.

Keywords: FEA, FEM, geotechnical materials, low volume roads, pavement, triaxial test, Young modulus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629
132 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method

Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas

Abstract:

To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.

Keywords: Building energy prediction, data mining, demand response, electricity market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2205
131 Characteristics of the Storage Stability for Different Saccharomyces cerevisiae Strains

Authors: Gomaa N. Abdel-Rahman, Nadia R. A. Nassar, Yehia A. Heikal, Mahmoud A. M. Abou-Donia, Mohamed B. M. Ahmed, Mohamed Fadel

Abstract:

Storage stability is the important factor of baker's yeast quality. Effect of the storage period (fifteen days) on storage sugars and cell viability of baker's yeast, produced from three S. cerevisiae strains (FC-620, FH-620, and FAT-12) as comparison with baker's yeast produced by S. cerevisae F-707 (original strain of baker's yeast factory) were investigated. Studied trehalose and glycogen content ranged from 10.19 to 14.79 % and from 10.05 to 10.69 % (d.w.), respectively before storage. The trehalose and glycogen content of all strains was decreased by increasing the storage period with no significant differences between the reduction rates of trehalose. Meanwhile, reduction rates of glycogen had significant differences between different strains, where the FH-620 and FC-620 strains had lowest rates as 18.12 and 20.70 %, respectively. Also, total viable cells and gassing power of all strains were decreased by increasing the storage period. FH-620 and FC-620 strains had the lowest values of reduction rates as an indicator of storage resistant. Where the reduction rates in total viable cells of FH-620 and FC-620 strains were 22.05 and 24.70%, respectively, while the reduction rates of gassing power were 20.90 and 24.30%, in the same order. On other hand, FAT-12 strain was more sensitive to storage as compared to original strain, where the reduction rates were 35.60 and 35.75%, respectively for total viable cells and gassing power.

Keywords: Baker’s yeast, trehalose, glycogen, gassing power.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472
130 Statistical Analysis for Overdispersed Medical Count Data

Authors: Y. N. Phang, E. F. Loh

Abstract:

Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling overdispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling overdispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling overdispered medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling overdispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling overdispersed medical count data when ZIP and ZINB are inadequate.

Keywords: Zero inflated, inverse trinomial distribution, Poisson inverse Gaussian distribution, strict arcsine distribution, Pearson’s goodness of fit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3317
129 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Authors: Rodolfo Lorbieski, Silvia Modesto Nassar

Abstract:

Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.

Keywords: Stacking, multi-layers, ensemble, multi-class.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1095