Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 728

Search results for: Frequent itemset mining.

98 Evaluation of the Urban Regeneration Project: Land Use Transformation and SNS Big Data Analysis

Authors: Ju-Young Kim, Tae-Heon Moon, Jung-Hun Cho

Abstract:

Urban regeneration projects have been actively promoted in Korea. In particular, Jeonju Hanok Village is evaluated as one of representative cases in terms of utilizing local cultural heritage sits in the urban regeneration project. However, recently, there has been a growing concern in this area, due to the ‘gentrification’, caused by the excessive commercialization and surging tourists. This trend was changing land and building use and resulted in the loss of identity of the region. In this regard, this study analyzed the land use transformation between 2010 and 2016 to identify the commercialization trend in Jeonju Hanok Village. In addition, it conducted SNS big data analysis on Jeonju Hanok Village from February 14th, 2016 to March 31st, 2016 to identify visitors’ awareness of the village. The study results demonstrate that rapid commercialization was underway, unlikely the initial intention, so that planners and officials in city government should reconsider the project direction and rebuild deliberate management strategies. This study is meaningful in that it analyzed the land use transformation and SNS big data to identify the current situation in urban regeneration area. Furthermore, it is expected that the study results will contribute to the vitalization of regeneration area.

Keywords: Land use, SNS, text mining, urban regeneration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1215

97 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1272

96 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

Authors: Ghada Badr, Arwa Alturki

Abstract:

The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.

Keywords: Alignment, RNA secondary structure, pairwise, component-based, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974

95 Effects of Livestream Affordances on Consumer Purchase Willingness: Explicit IT Affordances Perspective

Authors: Isaac O. Asante, Yushi Jiang, Hailin Tao

Abstract:

Livestreaming marketing, the new electronic commerce element, has become an optional marketing channel following the COVID-19 pandemic, and many sellers are leveraging the features presented by livestreaming to increase sales. This study was conducted to measure real-time observable interactions between consumers and sellers. Based on the affordance theory, this study conceptualized constructs representing the interactive features and examined how they drive consumers’ purchase willingness during livestreaming sessions using 1238 datasets from Amazon Live, following the manual observation of transaction records. Using structural equation modeling, the ordinary least square regression suggests that live viewers, new followers, live chats, and likes positively affect purchase willingness. The Sobel and Monte Carlo tests show that new followers, live chats, and likes significantly mediate the relationship between live viewers and purchase willingness. The study presents a way of measuring interactions in livestreaming commerce and proposes a way to manually gather data on consumer behaviors in livestreaming platforms when the application programming interface (API) of such platforms does not support data mining algorithms.

Keywords: Livestreaming marketing, live chats, live viewers, likes, new followers, purchase willingness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 159

94 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method

Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas

Abstract:

To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.

Keywords: Building energy prediction, data mining, demand response, electricity market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2205

93 Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms

Authors: Chia-Ta Tsai, Wen-Lin Huang, Shinn-Jang Ho, Li-Sun Shu, Shinn-Ying Ho

Abstract:

Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.

Keywords: Bacterial virulence factors, GO terms, prediction, protein sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2190

92 Development of a Technology Assessment Model by Patents and Customers' Review Data

Authors: Kisik Song, Sungjoo Lee

Abstract:

Recent years have seen an increasing number of patent disputes due to excessive competition in the global market and a reduced technology life-cycle; this has increased the risk of investment in technology development. While many global companies have started developing a methodology to identify promising technologies and assess for decisions, the existing methodology still has some limitations. Post hoc assessments of the new technology are not being performed, especially to determine whether the suggested technologies turned out to be promising. For example, in existing quantitative patent analysis, a patent’s citation information has served as an important metric for quality assessment, but this analysis cannot be applied to recently registered patents because such information accumulates over time. Therefore, we propose a new technology assessment model that can replace citation information and positively affect technological development based on post hoc analysis of the patents for promising technologies. Additionally, we collect customer reviews on a target technology to extract keywords that show the customers’ needs, and we determine how many keywords are covered in the new technology. Finally, we construct a portfolio (based on a technology assessment from patent information) and a customer-based marketability assessment (based on review data), and we use them to visualize the characteristics of the new technologies.

Keywords: Technology assessment, patents, citation information, opinion mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 993

91 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Soo-Hyeon Jeon, Byeoung Kug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: Big Data Analysis, Document Classification, Text Mining, Topic Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1746

90 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: Product recommender system, Ensemble technique, Association rules, Decision tree, Artificial neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4223

89 Strategic Mine Planning: A SWOT Analysis Applied to KOV Open Pit Mine in the Democratic Republic of Congo

Authors: Patrick May Mukonki

Abstract:

KOV pit (Kamoto Oliveira Virgule) is located 10 km from Kolwezi town, one of the mineral rich town in the Lualaba province of the Democratic Republic of Congo. The KOV pit is currently operating under the Katanga Mining Limited (KML), a Glencore-Gecamines (a State Owned Company) join venture. Recently, the mine optimization process provided a life of mine of approximately 10 years withnice pushbacks using the Datamine NPV Scheduler software. In previous KOV pit studies, we recently outlined the impact of the accuracy of the geological information on a long-term mine plan for a big copper mine such as KOV pit. The approach taken, discussed three main scenarios and outlined some weaknesses on the geological information side, and now, in this paper that we are going to develop here, we are going to highlight, as an overview, those weaknesses, strengths and opportunities, in a global SWOT analysis. The approach we are taking here is essentially descriptive in terms of steps taken to optimize KOV pit and, at every step, we categorized the challenges we faced to have a better tradeoff between what we called strengths and what we called weaknesses. The same logic is applied in terms of the opportunities and threats. The SWOT analysis conducted in this paper demonstrates that, despite a general poor ore body definition, and very rude ground water conditions, there is room for improvement for such high grade ore body.

Keywords: Mine planning, mine optimization, mine scheduling, SWOT analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591

88 SNC Based Network Layer Design for Underwater Wireless Communication Used in Coral Farms

Authors: T. T. Manikandan, Rajeev Sukumaran

Abstract:

For maintaining the biodiversity of many ecosystems the existence of coral reefs play a vital role. But due to many factors such as pollution and coral mining, coral reefs are dying day by day. One way to protect the coral reefs is to farm them in a carefully monitored underwater environment and restore it in place of dead corals. For successful farming of corals in coral farms, different parameters of the water in the farming area need to be monitored and maintained at optimal level. Sensing underwater parameters using wireless sensor nodes is an effective way for precise and continuous monitoring in a highly dynamic environment like oceans. Here the sensed information is of varying importance and it needs to be provided with desired Quality of Service(QoS) guarantees in delivering the information to offshore monitoring centers. The main interest of this research is Stochastic Network Calculus (SNC) based modeling of network layer design for underwater wireless sensor communication. The model proposed in this research enforces differentiation of service in underwater wireless sensor communication with the help of buffer sizing and link scheduling. The delay and backlog bounds for such differentiated services are analytically derived using stochastic network calculus.

Keywords: Underwater Coral Farms, SNC, differentiated service, delay bound, backlog bound.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 370

87 Malware Beaconing Detection by Mining Large-scale DNS Logs for Targeted Attack Identification

Authors: Andrii Shalaginov, Katrin Franke, Xiongwei Huang

Abstract:

One of the leading problems in Cyber Security today is the emergence of targeted attacks conducted by adversaries with access to sophisticated tools. These attacks usually steal senior level employee system privileges, in order to gain unauthorized access to confidential knowledge and valuable intellectual property. Malware used for initial compromise of the systems are sophisticated and may target zero-day vulnerabilities. In this work we utilize common behaviour of malware called ”beacon”, which implies that infected hosts communicate to Command and Control servers at regular intervals that have relatively small time variations. By analysing such beacon activity through passive network monitoring, it is possible to detect potential malware infections. So, we focus on time gaps as indicators of possible C2 activity in targeted enterprise networks. We represent DNS log files as a graph, whose vertices are destination domains and edges are timestamps. Then by using four periodicity detection algorithms for each pair of internal-external communications, we check timestamp sequences to identify the beacon activities. Finally, based on the graph structure, we infer the existence of other infected hosts and malicious domains enrolled in the attack activities.

Keywords: Malware detection, network security, targeted attack.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6113

86 Architectural Approaches to a Sustainable Community with Floating Housing Units Adapting to Climate Change and Sea Level Rise in Vietnam

Authors: Nguyen Thi Thu Trang

Abstract:

Climate change and sea level rise is one of the greatest challenges facing human beings in the 21^st century. Because of sea level rise, several low-lying coastal areas around the globe are at risk of being completely submerged, disappearing under water. Particularly in Viet Nam, the rise in sea level is predicted to result in more frequent and even permanently inundated coastal plains. As a result, land reserving fund of coastal cities is going to be narrowed in near future, while construction ground is becoming increasingly limited due to a rapid growth in population. Faced with this reality, the solutions are being discussed not only in tradition view such as accommodation is raised or moved to higher areas, or “living with the water”, but also forwards to “living on the water”. Therefore, the concept of a sustainable floating community with floating houses based on the precious value of long term historical tradition of water dwellings in Viet Nam would be a sustainable solution for adaptation of climate change and sea level rise in the coastal areas. The sustainable floating community is comprised of sustainability in four components: architecture, environment, socio-economic and living quality. This research paper is focused on sustainability in architectural component of floating community. Through detailed architectural analysis of current floating houses and floating communities in Viet Nam, this research not only accumulates precious values of traditional architecture that need to be preserved and developed in the proposed concept, but also illustrates its weaknesses that need to address for optimal design of the future sustainable floating communities. Based on these studies the research would provide guidelines with appropriate architectural solutions for the concept of sustainable floating community with floating housing units that are adapted to climate change and sea level rise in Viet Nam.

Keywords: Climate change, floating houses, floating community, Viet Nam.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3282

85 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects. Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479

84 Ethnobotanical Study on the Usage of Toxic Plants in Traditional Medicine in the City Center of Tlemcen, Algeria

Authors: Nassima Elyebdri, Asma Boumediou, Soumia Addoun

Abstract:

Traditional medicine has been part of the Algerian culture for decades. In particular, the city of Tlemcen still retains practices based on phytotherapy to the present day, as this kind of medicine fulfills the needs of its followers among the local population. The toxic plants contain diverse natural substances which supplied a lot of medicine in the pharmaceutical industry. In order to explore new medicinal sources among toxic plants, an ethnobotanical study was carried out on the use of these plants by the population, at Emir Abdelkader Square of the city of Tlemcen, a rather busy place with a high number of traditional health practitioners and herbalists. This is a descriptive and transversal study aimed at estimating the frequency of using toxic plants among the studied population, for a period of 4 months. The information was collected, using self-anonymous questionnaires, and analyzed by the IBM SPSS Statistics software used for statistical analysis. A sample of 200 people, including 120 women and 80 men, were interviewed. The mean age was 41 ± 16 years. Among those questioned, 83.5% used plants; 8% of them used toxic plants and 35% used plants that can be toxic under certain conditions. Some improvements were observed in 88% of the cases where toxic plants were used. 80 medicinal plants, belonging to 36 botanical families, were listed, identified and classified. The most frequent indications for these plants were for respiratory diseases in 64.7% of cases, and for digestive disorders in 51.5% of cases. 11% of these plants are toxic, 26% could be toxic under certain conditions. Among toxics plants, the most common ones are Berberis vulgaris with 5.4%, indicated in the treatment of uterine fibroids and thyroid, Rhamnus alaternus with 4.8% for hepatic jaundice, Nerium oleander with 3% for hemorrhoids, Ruta chalepensis with 1.2%, indicated for digestive disorders and dysmenorrhea, and Viscum album with 1.2%, indicated for respiratory diseases. The most common plants that could be toxic are Mentha pulegium (15.6%), Eucalyptus globulus (11.4%), and Pimpinella anisum (10.2%). This study revealed interesting results on the use of toxic plants, which are likely to serve as a basis for further ethno-pharmacological investigations in order to get new drug sources.

Keywords: Ethnobotany, phytotherapy, Tlemcen, toxic plants.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1379

83 Image Classification and Accuracy Assessment Using the Confusion Matrix, Contingency Matrix, and Kappa Coefficient

Authors: F. F. Howard, C. B. Boye, I. Yakubu, J. S. Y. Kuma

Abstract:

One of the ways that could be used for the production of land use and land cover maps by a procedure known as image classification is the use of the remote sensing technique. Numerous elements ought to be taken into consideration, including the availability of highly satisfactory Landsat imagery, secondary data and a precise classification process. The goal of this study was to classify and map the land use and land cover of the study area using remote sensing and Geospatial Information System (GIS) analysis. The classification was done using Landsat 8 satellite images acquired in December 2020 covering the study area. The Landsat image was downloaded from the USGS. The Landsat image with 30 m resolution was geo-referenced to the WGS_84 datum and Universal Transverse Mercator (UTM) Zone 30N coordinate projection system. A radiometric correction was applied to the image to reduce the noise in the image. This study consists of two sections: the Land Use/Land Cover (LULC) and Accuracy Assessments using the confusion and contingency matrix and the Kappa coefficient. The LULC classifications were vegetation (agriculture) (67.87%), water bodies (0.01%), mining areas (5.24%), forest (26.02%), and settlement (0.88%). The overall accuracy of 97.87% and the kappa coefficient (K) of 97.3% were obtained for the confusion matrix. While an overall accuracy of 95.7% and a Kappa coefficient of 0.947 were obtained for the contingency matrix, the kappa coefficients were rated as substantial; hence, the classified image is fit for further research.

Keywords: Confusion Matrix, contingency matrix, kappa coefficient, land used/ land cover, accuracy assessment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 265

82 Fuzzy Wavelet Packet based Feature Extraction Method for Multifunction Myoelectric Control

Authors: Rami N. Khushaba, Adel Al-Jumaily

Abstract:

The myoelectric signal (MES) is one of the Biosignals utilized in helping humans to control equipments. Recent approaches in MES classification to control prosthetic devices employing pattern recognition techniques revealed two problems, first, the classification performance of the system starts degrading when the number of motion classes to be classified increases, second, in order to solve the first problem, additional complicated methods were utilized which increase the computational cost of a multifunction myoelectric control system. In an effort to solve these problems and to achieve a feasible design for real time implementation with high overall accuracy, this paper presents a new method for feature extraction in MES recognition systems. The method works by extracting features using Wavelet Packet Transform (WPT) applied on the MES from multiple channels, and then employs Fuzzy c-means (FCM) algorithm to generate a measure that judges on features suitability for classification. Finally, Principle Component Analysis (PCA) is utilized to reduce the size of the data before computing the classification accuracy with a multilayer perceptron neural network. The proposed system produces powerful classification results (99% accuracy) by using only a small portion of the original feature set.

Keywords: Biomedical Signal Processing, Data mining andInformation Extraction, Machine Learning, Rehabilitation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1738

81 Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

This paper introduces new algorithms (Fuzzy relative of the CLARANS algorithm FCLARANS and Fuzzy c Medoids based on randomized search FCMRANS) for fuzzy clustering of relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd) in which the within cluster dissimilarity of each cluster is minimized in each iteration by recomputing new medoids given current memberships, FCLARANS minimizes the same objective function minimized by FCMdd by changing current medoids in such away that that the sum of the within cluster dissimilarities is minimized. Computing new medoids may be effected by noise because outliers may join the computation of medoids while the choice of medoids in FCLARANS is dictated by the location of a predominant fraction of points inside a cluster and, therefore, it is less sensitive to the presence of outliers. In FCMRANS the step of computing new medoids in FCMdd is modified to be based on randomized search. Furthermore, a new initialization procedure is developed that add randomness to the initialization procedure used with FCMdd. Both FCLARANS and FCMRANS are compared with the robust and linearized version of fuzzy c-medoids (RFCMdd). Experimental results with different samples of the Reuter-21578, Newsgroups (20NG) and generated datasets with noise show that FCLARANS is more robust than both RFCMdd and FCMRANS. Finally, both FCMRANS and FCLARANS are more efficient and their outputs are almost the same as that of RFCMdd in terms of classification rate.

Keywords: Data Mining, Fuzzy Clustering, Relational Clustering, Medoid-Based Clustering, Cluster Analysis, Unsupervised Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2404

80 Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules

Authors: Suraiya Jabin, Kamal K. Bharadwaj

Abstract:

This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.

Keywords: Hierarchical Production Rule, Data Mining, Learning Classifier System, Fuzzy Subsumption Relation, Subsumption matrix, Reinforcement Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456

79 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues

Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid

Abstract:

New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.

Keywords: Information visualization, visual analytics, text mining, visual text analytics tools, big data visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1003

78 TNFRSF11B Gene Polymorphisms A163G and G11811C in Prediction of Osteoporosis Risk

Authors: Boroňová I., Bernasovská J., Kľoc J., Tomková Z., Petrejčíková E., Gabriková D., Mačeková S.

Abstract:

Osteoporosis is a complex health disease characterized by low bone mineral density, which is determined by an interaction of genetics with metabolic and environmental factors. Current research in genetics of osteoporosis is focused on identification of responsible genes and polymorphisms. TNFRSF11B gene plays a key role in bone remodeling. The aim of this study was to investigate the genotype and allele distribution of A163G (rs3102735) osteoprotegerin gene promoter and G1181C (rs2073618) osteoprotegerin first exon polymorphisms in the group of 180 unrelated postmenopausal women with diagnosed osteoporosis and 180 normal controls. Genomic DNA was isolated from peripheral blood leukocytes using standard methodology. Genotyping for presence of different polymorphisms was performed using the Custom Taqman®SNP Genotyping assays. Hardy-Weinberg equilibrium was tested for each SNP in the groups of participants using the chi-square (χ²) test. The distribution of investigated genotypes in the group of patients with osteoporosis were as follows: AA (66.7%), AG (32.2%), GG (1.1%) for A163G polymorphism; GG (19.4%), CG (44.4%), CC (36.1%) for G1181C polymorphism. The distribution of genotypes in normal controls were follows: AA (71.1%), AG (26.1%), GG (2.8%) for A163G polymorphism; GG (22.2%), CG (48.9%), CC (28.9%) for G1181C polymorphism. In A163G polymorphism the variant G allele was more common among patients with osteoporosis: 17.2% versus 15.8% in normal controls. Also, in G1181C polymorphism the phenomenon of more frequent occurrence of C allele in the group of patients with osteoporosis was observed (58.3% versus 53.3%). Genotype and allele distributions showed no significant differences (A163G: χ²=0.270, p=0.605; χ²=0.250, p=0.616; G1181C: χ²= 1.730, p=0.188; χ²=1.820, p=0.177). Our results represents an initial study, further studies of more numerous file and associations studies will be carried out. Knowing the distribution of genotypes is important for assessing the impact of these polymorphisms on various parameters associated with osteoporosis. Screening for identification of “at-risk” women likely to develop osteoporosis and initiating subsequent early intervention appears to be most effective strategy to substantially reduce the risks of osteoporosis.

Keywords: Osteoporosis, Real-time PCR method, SNP polymorphisms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2247

77 Auto-Selective Three Term Control of Position and Compliance of a Pneumatic Actuator

Authors: M. G. Papoutsidakis, G. Chamilothoris, A Pipe

Abstract:

Due to their high power-to-weight ratio and low cost, pneumatic actuators are attractive for robotics and automation applications; however, achieving fast and accurate control of their position have been known as a complex control problem. The paper presents a methodology for obtaining controllers that achieve high position accuracy and preserve the closed-loop characteristics over a broad operating range. Experimentation with a number of conventional (or "classical") three-term controllers shows that, as repeated operations accumulate, the characteristics of the pneumatic actuator change requiring frequent re-tuning of the controller parameters (PID gains). Furthermore, three-term controllers are found to perform poorly in recovering the closed-loop system after the application of load or other external disturbances. The key reason for these problems lies in the non-linear exchange of energy inside the cylinder relating, in particular, to the complex friction forces that develop on the piston-wall interface. In order to overcome this problem but still remain within the boundaries of classical control methods, we designed an auto selective classicaql controller so that the system performance would benefit from all three control gains (KP, Kd, Ki) according to system requirements and the characteristics of each type of controller. This challenging experimentation took place for consistent performance in the face of modelling imprecision and disturbances. In the work presented, a selective PID controller is presented for an experimental rig comprising an air cylinder driven by a variable-opening pneumatic valve and equipped with position and pressure sensors. The paper reports on tests carried out to investigate the capability of this specific controller to achieve consistent control performance under, repeated operations and other changes in operating conditions.

Keywords: Classical selective controller, long-termexperimentation, pneumatic actuator, position accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1941

76 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: Fuzzy C-means clustering, Fuzzy C-means clustering based attribute weighting, Pima Indians diabetes dataset, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1767

75 A Fuzzy-Rough Feature Selection Based on Binary Shuffled Frog Leaping Algorithm

Authors: Javad Rahimipour Anaraki, Saeed Samet, Mahdi Eftekhari, Chang Wook Ahn

Abstract:

Feature selection and attribute reduction are crucial problems, and widely used techniques in the field of machine learning, data mining and pattern recognition to overcome the well-known phenomenon of the Curse of Dimensionality. This paper presents a feature selection method that efficiently carries out attribute reduction, thereby selecting the most informative features of a dataset. It consists of two components: 1) a measure for feature subset evaluation, and 2) a search strategy. For the evaluation measure, we have employed the fuzzy-rough dependency degree (FRFDD) of the lower approximation-based fuzzy-rough feature selection (L-FRFS) due to its effectiveness in feature selection. As for the search strategy, a modified version of a binary shuffled frog leaping algorithm is proposed (B-SFLA). The proposed feature selection method is obtained by hybridizing the B-SFLA with the FRDD. Nine classifiers have been employed to compare the proposed approach with several existing methods over twenty two datasets, including nine high dimensional and large ones, from the UCI repository. The experimental results demonstrate that the B-SFLA approach significantly outperforms other metaheuristic methods in terms of the number of selected features and the classification accuracy.

Keywords: Binary shuffled frog leaping algorithm, feature selection, fuzzy-rough set, minimal reduct.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 731

74 Chemotherapy Safety Protocol for Oncology Nurses: It's Effect on Their Protective Measures Practices

Authors: Magda M. Mohsen, Manal E. Fareed

Abstract:

Background: Widespread use of chemotherapeutic drugs in the treatment of cancer has lead to higher health hazards among employee who handle and administer such drugs, so nurses should know how to protect themselves, their patients and their work environment against toxic effects of chemotherapy. Aim of this study was carried out to examine the effect of chemotherapy safety protocol for oncology nurses on their protective measure practices. Design: A quasi experimental research design was utilized. Setting: The study was carried out in oncology department of Menoufia university hospital and Tanta oncology treatment center. Sample: A convenience sample of forty five nurses in Tanta oncology treatment center and eighteen nurses in Menoufiya oncology department. Tools: 1. an interviewing questionnaire that covering sociodemographic data, assessment of unit and nurses' knowledge about chemotherapy. II: Obeservational check list to assess nurses' actual practices of handling and adminestration of chemotherapy. A base line data were assessed before implementing Chemotherapy Safety protocol, then Chemotherapy Safety protocol was implemented, and after 2 monthes they were assessed again. Results: reveled that 88.9% of study group I and 55.6% of study group II improved to good total knowledge scores after educating on the safety protocol, also 95.6% of study group I and 88.9% of study group II had good total practice score after educating on the safety protocol. Moreover less than half of group I (44.4%) reported that heavy workload is the most barriers for them, while the majority of group II (94.4%) had many barriers for adhering to the safety protocol such as they didn’t know the protocol, the heavy work load and inadequate equipment. Conclusions: Safety protocol for Oncology Nurses seemed to have positive effect on improving nurses' knowledge and practice. Recommendation: chemotherapy safety protocol should be instituted for all oncology nurses who are working in any oncology unit and/ or center to enhance compliance, and this protocol should be done at frequent intervals.

Keywords: Chemotherapy Safety protocol, Effect, protective measure practice.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7199

73 Wind Energy Resources Assessment and Micrositting on Different Areas of Libya: The Case Study in Darnah

Authors: F. Ahwide, Y. Bouker, K. Hatem

Abstract:

This paper presents long term wind data analysis in terms of annual and diurnal variations at different areas of Libya. The data of the wind speed and direction are taken each ten minutes for a period, at least two years, are used in the analysis. ‘WindPRO’ software and Excel workbook were used for the wind statistics and energy calculations. As for Darnah, average speeds are 10m, 20m and 40m and 6.57 m/s, 7.18 m/s, and 8.09 m/s, respectively. Highest wind speeds are observed at SSW, followed by S, WNW and NW sectors. Lowest wind speeds are observed between N and E sectors. Most frequent wind directions are NW and NNW. Hence, wind turbines can be installed against these directions. The most powerful sector is NW (31.3% of total expected wind energy), followed by 17.9% SSW, 11.5% NNW and 8.2% WNW

In Excel workbook, an estimation of annual energy yield at position of Derna, Al-Maqrun, Tarhuna and Al-Asaaba meteorological mast has been done, considering a generic wind turbine of 1.65 MW. (mtORRES, TWT 82-1.65MW) in position of meteorological mast. Three other turbines have been tested and a reduction of 18% over the net AEP. At 80m, the estimation of energy yield for Derna, Al- Maqrun, Tarhuna and Asaaba is 6.78 GWh or 3390 equivalent hours, 5.80 GWh or 2900 equivalent hours, 4.91 GWh or 2454 equivalent hours and 5.08 GWh or 2541 equivalent hours respectively. It seems a fair value in the context of a possible development of a wind energy project in the areas, considering a value of 2400 equivalent hours as an approximate limit to consider a wind warm economically profitable. Furthermore, an estimation of annual energy yield at positions of Misalatha, Azizyah and Goterria meteorological mast has been done, considering a generic wind turbine of 2 MW. We found that, at 80 m the estimation of energy yield is 3.12 GWh or 1557 equivalent hours, 4.47 GWh or 2235 equivalent hours and 4.07GWh or 2033 respectively.

It seems a very poor value in the context of possible development of a wind energy project in the areas, considering a value of 2400 equivalent hours as an approximate limit to consider a wind warm economically profitable. Anyway, more data and a detailed wind farm study would be necessary to draw conclusions.

Keywords: Wind turbines, wind data, energy yield, micrositting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2638

72 Relevance Feedback within CBIR Systems

Authors: Mawloud Mosbah, Bachir Boucheham

Abstract:

We present here the results for a comparative study of some techniques, available in the literature, related to the relevance feedback mechanism in the case of a short-term learning. Only one method among those considered here is belonging to the data mining field which is the K-nearest neighbors algorithm (KNN) while the rest of the methods is related purely to the information retrieval field and they fall under the purview of the following three major axes: Shifting query, Feature Weighting and the optimization of the parameters of similarity metric. As a contribution, and in addition to the comparative purpose, we propose a new version of the KNN algorithm referred to as an incremental KNN which is distinct from the original version in the sense that besides the influence of the seeds, the rate of the actual target image is influenced also by the images already rated. The results presented here have been obtained after experiments conducted on the Wang database for one iteration and utilizing color moments on the RGB space. This compact descriptor, Color Moments, is adequate for the efficiency purposes needed in the case of interactive systems. The results obtained allow us to claim that the proposed algorithm proves good results; it even outperforms a wide range of techniques available in the literature.

Keywords: CBIR, Category Search, Relevance Feedback (RFB), Query Point Movement, Standard Rocchio’s Formula, Adaptive Shifting Query, Feature Weighting, Optimization of the Parameters of Similarity Metric, Original KNN, Incremental KNN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2342

71 ParkedGuard: An Efficient and Accurate Parked Domain Detection System Using Graphical Locality Analysis and Coarse-To-Fine Strategy

Authors: Chia-Min Lai, Wan-Ching Lin, Hahn-Ming Lee, Ching-Hao Mao

Abstract:

As world wild internet has non-stop developments, making profit by lending registered domain names emerges as a new business in recent years. Unfortunately, the larger the market scale of domain lending service becomes, the riskier that there exist malicious behaviors or malwares hiding behind parked domains will be. Also, previous work for differentiating parked domain suffers two main defects: 1) too much data-collecting effort and CPU latency needed for features engineering and 2) ineffectiveness when detecting parked domains containing external links that are usually abused by hackers, e.g., drive-by download attack. Aiming for alleviating above defects without sacrificing practical usability, this paper proposes ParkedGuard as an efficient and accurate parked domain detector. Several scripting behavioral features were analyzed, while those with special statistical significance are adopted in ParkedGuard to make feature engineering much more cost-efficient. On the other hand, finding memberships between external links and parked domains was modeled as a graph mining problem, and a coarse-to-fine strategy was elaborately designed by leverage the graphical locality such that ParkedGuard outperforms the state-of-the-art in terms of both recall and precision rates.

Keywords: Coarse-to-fine strategy, domain parking service, graphical locality analysis, parked domain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1253

70 Efficiency in Urban Governance towards Sustainability and Competitiveness of City : A Case Study of Kuala Lumpur

Authors: Hamzah Jusoh, Azmizam Abdul Rashid

Abstract:

Malaysia has successfully applied economic planning to guide the development of the country from an economy of agriculture and mining to a largely industrialised one. Now, with its sights set on attaining the economic level of a fully developed nation by 2020, the planning system must be made even more efficient and focused. It must ensure that every investment made in the country, contribute towards creating the desirable objective of a strong, modern, internationally competitive, technologically advanced, post-industrial economy. Cities in Malaysia must also be fully aware of the enormous competition it faces in a region with rapidly expanding and modernising economies, all contending for the same pool of potential international investments. Efficiency of urban governance is also fundamental issue in development characterized by sustainability, subsidiarity, equity, transparency and accountability, civic engagement and citizenship, and security. As described above, city competitiveness is harnessed through 'city marketing and city management'. High technology and high skilled industries, together with finance, transportation, tourism, business, information and professional services shopping and other commercial activities, are the principal components of the nation-s economy, which must be developed to a level well beyond where it is now. In this respect, Kuala Lumpur being the premier city must play the leading role.

Keywords: Economic planning, sustainability, efficiency, urban governance and city competitiveness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2242

69 Spreading Japan's National Image through China during the Era of Mass Tourism: The Japan National Tourism Organization’s Use of Sina Weibo

Authors: Abigail Qian Zhou

Abstract:

Since China has entered an era of mass tourism, there has been a fundamental change in the way Chinese people approach and perceive the image of other countries. With the advent of the new media era, social networking sites such as Sina Weibo have become a tool for many foreign governmental organizations to spread and promote their national image. Among them, the Japan National Tourism Organization (JNTO) was one of the first foreign official tourism agencies to register with Sina Weibo and actively implement communication activities. Due to historical and political reasons, cognition of Japan's national image by the Chinese has always been complicated and contradictory. However, since 2015, China has become the largest source of tourists visiting Japan. This clearly indicates that the broadening of Japan's national image in China has been effective and has value worthy of reference in promoting a positive Chinese perception of Japan and encouraging Japanese tourism. Within this context and using the method of content analysis in media studies through content mining software, this study analyzed how JNTO’s Sina Weibo accounts have constructed and spread Japan's national image. This study also summarized the characteristics of its content and form, and finally revealed the strategy of JNTO in building its international image. The findings of this study not only add a tourism-based perspective to traditional national image communications research, but also provide some reference for the effective international dissemination of national image in the future.

Keywords: National image, tourism, international communication, Japan, China.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1001