Search results for: Frequent itemset mining.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 721

Search results for: Frequent itemset mining.

151 A Survey of 2nd Year Students’ Frequent English Writing Errors and the Effects of Participatory Error Correction Process

Authors: Chaiwat Tantarangsee

Abstract:

The purposes of this study are 1) to study the effects of participatory error correction process and 2) to find out the students’ satisfaction of such error correction process. This study is a Quasi Experimental Research with single group, in which data is collected 5 times preceding and following 4 experimental studies of participatory error correction process including providing coded indirect corrective feedback in the students’ texts with error treatment activities. Samples include 52 2nd year English Major students, Faculty of Humanities and Social Sciences, Suan Sunandha Rajabhat University. Tool for experimental study includes the lesson plan of the course; Reading and Writing English for Academic Purposes II, and tools for data collection include 5 writing tests of short texts and a questionnaire. Based on formative evaluation of the students’ writing ability prior to and after each of the 4 experiments, the research findings disclose the students’ higher scores with statistical difference at 0.00. Moreover, in terms of the effect size of such process, it is found that for mean of the students’ scores prior to and after the 4 experiments; d equals 0.6801, 0.5093, 0.5071, and 0.5296 respectively. It can be concluded that participatory error correction process enables all of the students to learn equally well and there is improvement in their ability to write short texts. Finally the students’ overall satisfaction of the participatory error correction process is in high level (Mean = 4.39, S.D. = 0.76).

Keywords: Coded indirect corrective feedback, participatory error correction process, error treatment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1745
150 Automatic Extraction of Features and Opinion-Oriented Sentences from Customer Reviews

Authors: Khairullah Khan, Baharum B. Baharudin, Aurangzeb Khan, Fazal_e_Malik

Abstract:

Opinion extraction about products from customer reviews is becoming an interesting area of research. Customer reviews about products are nowadays available from blogs and review sites. Also tools are being developed for extraction of opinion from these reviews to help the user as well merchants to track the most suitable choice of product. Therefore efficient method and techniques are needed to extract opinions from review and blogs. As reviews of products mostly contains discussion about the features, functions and services, therefore, efficient techniques are required to extract user comments about the desired features, functions and services. In this paper we have proposed a novel idea to find features of product from user review in an efficient way. Our focus in this paper is to get the features and opinion-oriented words about products from text through auxiliary verbs (AV) {is, was, are, were, has, have, had}. From the results of our experiments we found that 82% of features and 85% of opinion-oriented sentences include AVs. Thus these AVs are good indicators of features and opinion orientation in customer reviews.

Keywords: Classification, Customer Reviews, Helping Verbs, Opinion Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2041
149 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data

Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro

Abstract:

Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.

Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 382
148 An Intelligent System for Phish Detection, using Dynamic Analysis and Template Matching

Authors: Chinmay Soman, Hrishikesh Pathak, Vishal Shah, Aniket Padhye, Amey Inamdar

Abstract:

Phishing, or stealing of sensitive information on the web, has dealt a major blow to Internet Security in recent times. Most of the existing anti-phishing solutions fail to handle the fuzziness involved in phish detection, thus leading to a large number of false positives. This fuzziness is attributed to the use of highly flexible and at the same time, highly ambiguous HTML language. We introduce a new perspective against phishing, that tries to systematically prove, whether a given page is phished or not, using the corresponding original page as the basis of the comparison. It analyzes the layout of the pages under consideration to determine the percentage distortion between them, indicative of any form of malicious alteration. The system design represents an intelligent system, employing dynamic assessment which accurately identifies brand new phishing attacks and will prove effective in reducing the number of false positives. This framework could potentially be used as a knowledge base, in educating the internet users against phishing.

Keywords: World Wide Web, Phishing, Internet security, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1784
147 A Distance Function for Data with Missing Values and Its Application

Authors: Loai AbdAllah, Ilan Shimshoni

Abstract:

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our  experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Keywords: Missing values, Distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2695
146 Application of Artificial Neural Network to Classification Surface Water Quality

Authors: S. Wechmongkhonkon, N.Poomtong, S. Areerachakul

Abstract:

Water quality is a subject of ongoing concern. Deterioration of water quality has initiated serious management efforts in many countries. This study endeavors to automatically classify water quality. The water quality classes are evaluated using 6 factor indices. These factors are pH value (pH), Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), Nitrate Nitrogen (NO3N), Ammonia Nitrogen (NH3N) and Total Coliform (TColiform). The methodology involves applying data mining techniques using multilayer perceptron (MLP) neural network models. The data consisted of 11 sites of canals in Dusit district in Bangkok, Thailand. The data is obtained from the Department of Drainage and Sewerage Bangkok Metropolitan Administration during 2007-2011. The results of multilayer perceptron neural network exhibit a high accuracy multilayer perception rate at 96.52% in classifying the water quality of Dusit district canal in Bangkok Subsequently, this encouraging result could be applied with plan and management source of water quality.

Keywords: artificial neural network, classification, surface water quality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3158
145 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification

Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman

Abstract:

In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.

Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2649
144 Bayesian Networks for Earthquake Magnitude Classification in a Early Warning System

Authors: G. Zazzaro, F.M. Pisano, G. Romano

Abstract:

During last decades, worldwide researchers dedicated efforts to develop machine-based seismic Early Warning systems, aiming at reducing the huge human losses and economic damages. The elaboration time of seismic waveforms is to be reduced in order to increase the time interval available for the activation of safety measures. This paper suggests a Data Mining model able to correctly and quickly estimate dangerousness of the running seismic event. Several thousand seismic recordings of Japanese and Italian earthquakes were analyzed and a model was obtained by means of a Bayesian Network (BN), which was tested just over the first recordings of seismic events in order to reduce the decision time and the test results were very satisfactory. The model was integrated within an Early Warning System prototype able to collect and elaborate data from a seismic sensor network, estimate the dangerousness of the running earthquake and take the decision of activating the warning promptly.

Keywords: Bayesian Networks, Decision Support System, Magnitude Classification, Seismic Early Warning System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3554
143 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.

Keywords: Politics, machine learning, feature selection, LIWC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2314
142 Lexical Database for Multiple Languages: Multilingual Word Semantic Network

Authors: K. K. Yong, R. Mahmud, C. S. Woo

Abstract:

Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators and corpora are usually limited to certain languages and domains. Furthermore, search results from engines with traditional 'keyword' approach are no longer satisfying. More intelligent knowledge engineering agents are needed. To address to these problems, a system known as Multilingual Word Semantic Network is proposed. This system adapted semantic network to organize words according to concepts and relations. The system also uses open source as the development philosophy to enable the native language speakers and experts to contribute their knowledge to the system. The contributed words are then defined and linked using lexical and semantic relations. Thus, related words and derivatives can be identified and linked. From the outcome of the system implementation, it contributes to the development of semantic web and knowledge engineering.

Keywords: Multilingual, semantic network, intelligent knowledge engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1913
141 Attacks Classification in Adaptive Intrusion Detection using Decision Tree

Authors: Dewan Md. Farid, Nouria Harbi, Emna Bahri, Mohammad Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Recently, information security has become a key issue in information technology as the number of computer security breaches are exposed to an increasing number of security threats. A variety of intrusion detection systems (IDS) have been employed for protecting computers and networks from malicious network-based or host-based attacks by using traditional statistical methods to new data mining approaches in last decades. However, today's commercially available intrusion detection systems are signature-based that are not capable of detecting unknown attacks. In this paper, we present a new learning algorithm for anomaly based network intrusion detection system using decision tree algorithm that distinguishes attacks from normal behaviors and identifies different types of intrusions. Experimental results on the KDD99 benchmark network intrusion detection dataset demonstrate that the proposed learning algorithm achieved 98% detection rate (DR) in comparison with other existing methods.

Keywords: Detection rate, decision tree, intrusion detectionsystem, network security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3560
140 A Consumption-Based Hybrid Life Cycle Assessment of Carbon Footprints in California: High Footprints in Small Urban Households

Authors: Jukka Heinonen

Abstract:

Higher density reduces distances, private car dependency and thus reduces greenhouse gas emissions (GHGs). As a result, increased density has been given a central role among urban development targets. However, it is not just travel behavior that changes along with density. Rather, the consumption patterns, or overall lifestyles, change along with changing urban structure, particularly with changing housing types and consumption opportunities. Furthermore, elevated consumption of services, more frequent flying and less intra-household sharing have been shown to potentially outweigh the gains from reduced driving in more dense urban settlements. In this study, the geography of carbon footprints (CFs) in California is analyzed paying close attention to the household size differences and the resulting economies-of-scale advantages and disadvantages. A hybrid life cycle assessment (LCA) framework is employed together with consumer expenditure data to assess the CFs. According to the study, small urban households have the highest CFs in California. Their transport related emissions are significantly lower than those of the residents of less urbanized areas, but higher emissions from other consumption categories, together with the low degree of sharing of goods, overweigh the gains. Two functional units, per capita and per household, are used to analyze the CFs and to demonstrate the importance of household size. The lifestyle impacts visible through the consumption data are also discussed. The study suggests that there are still significant gaps in our understanding of the premises of low-carbon human settlements.

Keywords: Carbon footprint, life cycle assessment, consumption, lifestyle, household size, economies-of-scale.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1178
139 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch

Abstract:

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.

Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2476
138 The Development of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications

Authors: Mohamed R. Mhereeg

Abstract:

The paper investigates the feasibility of constructing a software multi-agent based monitoring and classification system and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. The agents function autonomously to provide continuous and periodic monitoring of excels spreadsheet workbooks. Resulting in, the development of the MultiAgent classification System (MACS) that is in compliance with the specifications of the Foundation for Intelligent Physical Agents (FIPA). However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies that are Windows Communication Foundation (WCF) services, Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW that is in order to satisfy the monitoring and classification of the multiple developer aspect. ODM was used to automate the classification phase of MACS.

Keywords: Autonomous, Classification, MACS, Multi-Agent, SOA, WCF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542
137 Information Overload, Information Literacy and Use of Technology by Students

Authors: Elena Krelja Kurelović, Jasminka Tomljanović, Vlatka Davidović

Abstract:

The development of web technologies and mobile devices makes creating, accessing, using and sharing information or communicating with each other simpler every day. However, while the amount of information constantly increasing it is becoming harder to effectively organize and find quality information despite the availability of web search engines, filtering and indexing tools. Although digital technologies have overall positive impact on students’ lives, frequent use of these technologies and digital media enriched with dynamic hypertext and hypermedia content, as well as multitasking, distractions caused by notifications, calls or messages; can decrease the attention span, make thinking, memorizing and learning more difficult, which can lead to stress and mental exhaustion. This is referred to as “information overload”, “information glut” or “information anxiety”. Objective of this study is to determine whether students show signs of information overload and to identify the possible predictors. Research was conducted using a questionnaire developed for the purpose of this study. The results show that students frequently use technology (computers, gadgets and digital media), while they show moderate level of information literacy. They have sometimes experienced symptoms of information overload. According to the statistical analysis, higher frequency of technology use and lower level of information literacy are correlated with larger information overload. The multiple regression analysis has confirmed that the combination of these two independent variables has statistically significant predictive capacity for information overload. Therefore, the information science teachers should pay attention to improving the level of students’ information literacy and educate them about the risks of excessive technology use.

Keywords: Information overload, technology use, digital media, information literacy, students.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2613
136 Using Data Mining Methodology to Build the Predictive Model of Gold Passbook Price

Authors: Chien-Hui Yang, Che-Yang Lin, Ya-Chen Hsu

Abstract:

Gold passbook is an investing tool that is especially suitable for investors to do small investment in the solid gold. The gold passbook has the lower risk than other ways investing in gold, but its price is still affected by gold price. However, there are many factors can cause influences on gold price. Therefore, building a model to predict the price of gold passbook can both reduce the risk of investment and increase the benefits. This study investigates the important factors that influence the gold passbook price, and utilize the Group Method of Data Handling (GMDH) to build the predictive model. This method can not only obtain the significant variables but also perform well in prediction. Finally, the significant variables of gold passbook price, which can be predicted by GMDH, are US dollar exchange rate, international petroleum price, unemployment rate, whole sale price index, rediscount rate, foreign exchange reserves, misery index, prosperity coincident index and industrial index.

Keywords: Gold price, Gold passbook price, Group Method ofData Handling (GMDH), Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2241
135 Accurate Control of a Pneumatic System using an Innovative Fuzzy Gain-Scheduling Pattern

Authors: M. G. Papoutsidakis, G. Chamilothoris, F. Dailami, N. Larsen, A Pipe

Abstract:

Due to their high power-to-weight ratio and low cost, pneumatic actuators are attractive for robotics and automation applications; however, achieving fast and accurate control of their position have been known as a complex control problem. A methodology for obtaining high position accuracy with a linear pneumatic actuator is presented. During experimentation with a number of PID classical control approaches over many operations of the pneumatic system, the need for frequent manual re-tuning of the controller could not be eliminated. The reason for this problem is thermal and energy losses inside the cylinder body due to the complex friction forces developed by the piston displacements. Although PD controllers performed very well over short periods, it was necessary in our research project to introduce some form of automatic gain-scheduling to achieve good long-term performance. We chose a fuzzy logic system to do this, which proved to be an easily designed and robust approach. Since the PD approach showed very good behaviour in terms of position accuracy and settling time, it was incorporated into a modified form of the 1st order Tagaki- Sugeno fuzzy method to build an overall controller. This fuzzy gainscheduler uses an input variable which automatically changes the PD gain values of the controller according to the frequency of repeated system operations. Performance of the new controller was significantly improved and the need for manual re-tuning was eliminated without a decrease in performance. The performance of the controller operating with the above method is going to be tested through a high-speed web network (GRID) for research purposes.

Keywords: Fuzzy logic, gain scheduling, leaky integrator, pneumatic actuator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1711
134 Protective Effect of Saponin Extract from the Root of Garcinia kola (Bitter kola) against Paracetamol- Induced Hepatotoxicity in Albino Rats

Authors: Yemisi Rufina Alli Smith, Isaac Gbadura Adanlawo

Abstract:

Liver disorders are one of the major problems of the world. Despite its frequent occurrence, high morbidity and high mortality, its medical management is currently inadequate. This study was designed to evaluate the hepatoprotective effect of saponin extract of the root of Garcinia kola on the integrity of the liver of paracetamol induced wistar albino rats. Twenty five (25) male adult wistar albino rats were divided into five (5) groups. Group I was the Control group that received distilled water only, group II was the negative control that received 2 g/kg of paracetamol on the 13th day, and group III, IV and V were pre-treated with 100, 200 and 400mg/kg of the saponin extract before inducing the liver damage on the 13th day with 2 g/kg of paracetamol. Twenty four (24) h after administration, the rats were sacrificed and blood samples were collected. The serum Alanine Transaminase (ALT), Aspartate Transaminase (AST), Alkaline Phosphatase (ALP) activities, Bilirubin and conjugated bilirubin, glucose and protein concentrations were evaluated. The liver was fixed immediately in Formalin and was processed and stained in Haematoxylin and Eosin (H&E). Administration of saponin extract from the root of Garcinia kola significantly decreased paracetamol induced elevated enzymes in the test group. Also histological observations showed that saponin extract of the root of Garcinia kola exhibited a significant liver protection against the toxicant as evident by the cells trying to return to normal. Saponin extract from the root of Garcinia kola indicated a protection of structural integrity of the hepatocytic cell membrane and regeneration of the damaged liver.

Keywords: Garcinia kola, Hepatoprotective, paracetamol, Saponin.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2894
133 Hybrid Methods for Optimisation of Weights in Spatial Multi-Criteria Evaluation Decision for Fire Risk and Hazard

Authors: I. Yakubu, D. Mireku-Gyimah, D. Asafo-Adjei

Abstract:

The challenge for everyone involved in preserving the ecosystem is to find creative ways to protect and restore the remaining ecosystems while accommodating and enhancing the country social and economic well-being. Frequent fires of anthropogenic origin have been affecting the ecosystems in many countries adversely. Hence adopting ways of decision making such as Multicriteria Decision Making (MCDM) is appropriate since it will enhance the evaluation and analysis of fire risk and hazard of the ecosystem. In this paper, fire risk and hazard data from the West Gonja area of Ghana were used in some of the methods (Analytical Hierarchy Process, Compromise Programming, and Grey Relational Analysis (GRA) for MCDM evaluation and analysis to determine the optimal weight method for fire risk and hazard. Ranking of the land cover types was carried out using; Fire Hazard, Fire Fighting Capacity and Response Risk Criteria. Pairwise comparison under Analytic Hierarchy Process (AHP) was used to determine the weight of the various criteria. Weights for sub-criteria were also obtained by the pairwise comparison method. The results were optimised using GRA and Compromise Programming (CP). The results from each method, hybrid GRA and CP, were compared and it was established that all methods were satisfactory in terms of optimisation of weight. The most optimal method for spatial multicriteria evaluation was the hybrid GRA method. Thus, a hybrid AHP and GRA method is more effective method for ranking alternatives in MCDM than the hybrid AHP and CP method.

Keywords: Compromise programming, grey relational analysis, spatial multi-criteria, weight optimisation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 605
132 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2412
131 The Intonation of Romanian Greetings: A Sociolinguistics Approach

Authors: Anca-Diana Bibiri, Mihaela Mocanu, Adrian Turculeț

Abstract:

In a language the inventory of greetings is dynamic with frequent input and output, although this is hardly noticed by the speakers. In this register, there are a number of constant, conservative elements that survive different language models (among them, the classic formulae: bună ziua! (good afternoon!), bună seara! (good evening!), noapte bună! (good night!), la revedere! (goodbye!) and a number of items that fail to pass the test of time, according to language use at a time (ciao!, pa!, bai!). The source of innovation depends both of internal factors (contraction, conversion, combination of classic formulae of greetings), and of external ones (borrowings and calques). Their use imposes their frequencies at once, namely the elimination of the use of others. This paper presents a sociolinguistic approach of contemporary Romanian greetings, based on prosodic surveys in two research projects: AMPRom, and SoRoEs. Romanian language presents a rich inventory of questions (especially partial interrogatives questions/WH-Q) which are used as greetings, alone or, more commonly accompanying a proper greeting. The representative of the typical formulae is Ce mai faci? (How are you?), which, unlike its English counterpart How do you do?, has not become a stereotype, but retains an obvious emotional impact, while serving as a mark of sociolinguistic group. The analyzed corpus consists of structures containing greetings recorded in the main Romanian cultural (urban) centers. From the methodological point of view, the acoustic analysis of the recorded data is performed using software tools (GoldWave, Praat), identifying intonation patterns related to three sociolinguistics variables: age, sex and level of education. The intonation patterns of the analyzed statements are at the interface between partial questions and typical greetings.

Keywords: acoustic analysis, greetings, Romanian language, sociolinguistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1641
130 DCBOR: A Density Clustering Based on Outlier Removal

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1888
129 Improvement of Overall Equipment Effectiveness through Total Productive Maintenance

Authors: S. Fore, L. Zuze

Abstract:

Frequent machine breakdowns, low plant availability and increased overtime are a great threat to a manufacturing plant as they increase operating costs of an industry. The main aim of this study was to improve Overall Equipment Effectiveness (OEE) at a manufacturing company through the implementation of innovative maintenance strategies. A case study approach was used. The paper focuses on improving the maintenance in a manufacturing set up using an innovative maintenance regime mix to improve overall equipment effectiveness. Interviews, reviewing documentation and historical records, direct and participatory observation were used as data collection methods during the research. Usually production is based on the total kilowatt of motors produced per day. The target kilowatt at 91% availability is 75 Kilowatts a day. Reduced demand and lack of raw materials particularly imported items are adversely affecting the manufacturing operations. The company had to reset its targets from the usual figure of 250 Kilowatt per day to mere 75 per day due to lower availability of machines as result of breakdowns as well as lack of raw materials. The price reductions and uncertainties as well as general machine breakdowns further lowered production. Some recommendations were given. For instance, employee empowerment in the company will enhance responsibility and authority to improve and totally eliminate the six big losses. If the maintenance department is to realise its proper function in a progressive, innovative industrial society, then its personnel must be continuously trained to meet current needs as well as future requirements. To make the maintenance planning system effective, it is essential to keep track of all the corrective maintenance jobs and preventive maintenance inspections. For large processing plants these cannot be handled manually. It was therefore recommended that the company implement (Computerised Maintenance Management System) CMMS.

Keywords: Maintenance, Manufacturing, Overall Equipment Effectiveness

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3936
128 Dynamics of Mini Hydraulic Backhoe Excavator: A Lagrange-Euler (L-E) Approach

Authors: Bhaveshkumar P. Patel, J. M. Prajapati

Abstract:

Excavators are high power machines used in the mining, agricultural and construction industry whose principal functions are digging (material removing), ground leveling and material transport operations. During the digging task there are certain unknown forces exerted by the bucket on the soil and the digging operation is repetitive in nature. Automation of the digging task can be performed by an automatically controlled excavator system, which is not only control the forces but also follow the planned digging trajectories. To develop such a controller for automated excavation, it is required to develop a dynamic model to describe the behavior of the control system during digging operation and motion of excavator with time. The presented work described a dynamic model needed for controller design and which is derived by applying Lagrange-Euler approach. The developed dynamic model is intended for further development of an automated excavation control system for light duty construction work and can be applied for heavy duty or all types of backhoe excavators.

Keywords: Backhoe excavator, controller, digging, excavation, trajectory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4404
127 Food Security in the Middle East and North Africa

Authors: Sara D. Garduño-Diaz, Philippe Y. Garduño-Diaz

Abstract:

To date, one of the few comprehensive indicators for the measurement of food security is the Global Food Security Index (GFSI). This index is a dynamic quantitative and qualitative benchmarking model, constructed from 28 unique indicators, that measures drivers of food security across both developing and developed countries. Whereas the GFSI has been calculated across a set of 109 countries, in this paper we aim to present and compare, for the Middle East and North Africa (MENA), 1) the Food Security Index scores achieved and 2) the data available on affordability, availability, and quality of food. The data for this work was taken from the latest available report published by the creators of the GFSI, which in turn used information from national and international statistical sources. MENA countries rank from place 17/109 (Israel, although with resent political turmoil this is likely to have changed) to place 91/109 (Yemen) with household expenditure spent in food ranging from 15.5% (Israel) to 60% (Egypt). Lower spending on food as a share of household consumption in most countries and better food safety net programs in the MENA have contributed to a notable increase in food affordability. The region has also, however, experienced a decline in food availability, owing to more limited food supplies and higher volatility of agricultural production. In terms of food quality and safety the MENA has the top ranking country (Israel). The most frequent challenges faced by the countries of the MENA include public expenditure on agricultural research and development as well as volatility of agricultural production. Food security is a complex phenomenon that interacts with many other indicators of a country’s wellbeing; in the MENA it is slowly but markedly improving.

Keywords: Diet, food insecurity, global food security index, nutrition, sustainability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3943
126 On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

Authors: Tatjana Eitrich, Bruno Lang

Abstract:

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.

Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1394
125 Efficacy of Biofeedback-Assisted Pelvic Floor Muscle Training on Postoperative Stress Urinary Incontinence

Authors: Asmaa M. El-Bandrawy, Afaf M. Botla, Ghada E. El-Refaye, Hassan O. Ghareeb

Abstract:

Background: Urinary incontinence is a common problem among adults. Its incidence increases with age and it is more frequent in women. Pelvic floor muscle training (PFMT) is the first-line therapy in the treatment of pelvic floor dysfunction (PFD) either alone or combined with biofeedback-assisted PFMT. The aim of the work: The purpose of this study is to evaluate the efficacy of biofeedback-assisted PFMT in postoperative stress urinary incontinence. Settings and Design: A single blind controlled trial design was. Methods and Material: This study was carried out in 30 volunteer patients diagnosed as severe degree of stress urinary incontinence and they were admitted to surgical treatment. They were divided randomly into two equal groups: (Group A) consisted of 15 patients who had been treated with post-operative biofeedback-assisted PFMT and home exercise program (Group B) consisted of 15 patients who had been treated with home exercise program only. Assessment of all patients in both groups (A) and (B) was carried out before and after the treatment program by measuring intra-vaginal pressure in addition to the visual analog scale. Results: At the end of the treatment program, there was a highly statistically significant difference between group (A) and group (B) in the intra-vaginal pressure and the visual analog scale favoring the group (A). Conclusion: biofeedback-assisted PFMT is an effective method for the symptomatic relief of post-operative female stress urinary incontinence.

Keywords: Stress urinary incontinence, pelvic floor muscles, pelvic floor exercises, biofeedback.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1357
124 Product Features Extraction from Opinions According to Time

Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou

Abstract:

Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.

Keywords: Opinion mining, product feature extraction, sentiment analysis, SentiWordNet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1265
123 A Preliminary Study of the Reconstruction of Urban Residential Public Space in the Context of the “Top-down” Construction Model in China: Based on Research of TianZiFang District in Shanghai and Residential Space in Hangzhou

Authors: Wang Qiaowei, Gao Yujiang

Abstract:

With the economic growth and rapid urbanization after the reform and openness, some of China's fast-growing cities have demolished former dwellings and built modern residential quarters. The blind, incomplete reference to western modern cities and the one-off construction lacking feedback mechanism have intensified such phenomenon, causing the citizen gradually expanded their living scale with the popularization of car traffic, and the peer-to-peer lifestyle gradually settled. The construction of large-scale commercial centers has caused obstacles to small business around the residential areas, leading to space for residents' interaction has been compressed. At the same time, the advocated Central Business District (CBD) model even leads to the unsatisfactory reconstruction of many historical blocks such as the Hangzhou Southern Song Dynasty Imperial Street. However, the popularity of historical spaces such as Wuzhen and Hongcun also indicates the collective memory and needs of the street space for Chinese residents. The evolution of Shanghai TianZiFang also proves the importance of the motivation of space participants in space construction in the context of the “top-down” construction model in China. In fact, there are frequent occurrences of “reconstruction”, which may redefine the space, in various residential areas. If these activities can be selectively controlled and encouraged, it will be beneficial to activate the public space as well as the residents’ intercourse, so that the traditional Chinese street space can be reconstructed in the context of modern cities.

Keywords: Rapid urbanization, traditional street space, space re-construction, bottom-up design.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 756
122 Cost Sensitive Feature Selection in Decision-Theoretic Rough Set Models for Customer Churn Prediction: The Case of Telecommunication Sector Customers

Authors: Emel Kızılkaya Aydogan, Mihrimah Ozmen, Yılmaz Delice

Abstract:

In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.

Keywords: Churn prediction, data mining, decision-theoretic rough set, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1721