Search results for: clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 610

Search results for: clustering

310 Modified Active (MA) Algorithm to Generate Semantic Web Related Clustered Hierarchy for Keyword Search

Authors: G. Leena Giri, Archana Mathur, S. H. Manjula, K. R. Venugopal, L. M. Patnaik

Abstract:

Keyword search in XML documents is based on the notion of lowest common ancestors in the labelled trees model of XML documents and has recently gained a lot of research interest in the database community. In this paper, we propose the Modified Active (MA) algorithm which is an improvement over the active clustering algorithm by taking into consideration the entity aspect of the nodes to find the level of the node pertaining to a particular keyword input by the user. A portion of the bibliography database is used to experimentally evaluate the modified active algorithm and results show that it performs better than the active algorithm. Our modification improves the response time of the system and thereby increases the efficiency of the system.

Keywords: keyword matching patterns, MA algorithm, semantic search, knowledge management

Procedia PDF Downloads 414
309 On Musical Information Geometry with Applications to Sonified Image Analysis

Authors: Shannon Steinmetz, Ellen Gethner

Abstract:

In this paper, a theoretical foundation is developed for patterned segmentation of audio using the geometry of music and statistical manifold. We demonstrate image content clustering using conic space sonification. The algorithm takes a geodesic curve as a model estimator of the three-parameter Gamma distribution. The random variable is parameterized by musical centricity and centric velocity. Model parameters predict audio segmentation in the form of duration and frame count based on the likelihood of musical geometry transition. We provide an example using a database of randomly selected images, resulting in statistically significant clusters of similar image content.

Keywords: sonification, musical information geometry, image, content extraction, automated quantification, audio segmentation, pattern recognition

Procedia PDF Downloads 238
308 Improved Particle Swarm Optimization with Cellular Automata and Fuzzy Cellular Automata

Authors: Ramin Javadzadeh

Abstract:

The particle swarm optimization are Meta heuristic optimization method, which are used for clustering and pattern recognition applications are abundantly. These algorithms in multimodal optimization problems are more efficient than genetic algorithms. A major drawback in these algorithms is their slow convergence to global optimum and their weak stability can be considered in various running of these algorithms. In this paper, improved Particle swarm optimization is introduced for the first time to overcome its problems. The fuzzy cellular automata is used for improving the algorithm efficiently. The credibility of the proposed approach is evaluated by simulations, and it is shown that the proposed approach achieves better results can be achieved compared to the Particle swarm optimization algorithms.

Keywords: cellular automata, cellular learning automata, local search, optimization, particle swarm optimization

Procedia PDF Downloads 607
307 Fuzzy Rules Based Improved BEENISH Protocol for Wireless Sensor Networks

Authors: Rishabh Sharma

Abstract:

The main design parameter of WSN (wireless sensor network) is the energy consumption. To compensate this parameter, hierarchical clustering is a technique that assists in extending duration of the networks life by efficiently consuming the energy. This paper focuses on dealing with the WSNs and the FIS (fuzzy interface system) which are deployed to enhance the BEENISH protocol. The node energy, mobility, pause time and density are considered for the selection of CH (cluster head). The simulation outcomes exhibited that the projected system outperforms the traditional system with regard to the energy utilization and number of packets transmitted to sink.

Keywords: wireless sensor network, sink, sensor node, routing protocol, fuzzy rule, fuzzy inference system

Procedia PDF Downloads 106
306 Real-Time Classification of Marbles with Decision-Tree Method

Authors: K. S. Parlak, E. Turan

Abstract:

The separation of marbles according to the pattern quality is a process made according to expert decision. The classification phase is the most critical part in terms of economic value. In this study, a self-learning system is proposed which performs the classification of marbles quickly and with high success. This system performs ten feature extraction by taking ten marble images from the camera. The marbles are classified by decision tree method using the obtained properties. The user forms the training set by training the system at the marble classification stage. The system evolves itself in every marble image that is classified. The aim of the proposed system is to minimize the error caused by the person performing the classification and achieve it quickly.

Keywords: decision tree, feature extraction, k-means clustering, marble classification

Procedia PDF Downloads 382
305 A Prediction Model of Tornado and Its Impact on Architecture Design

Authors: Jialin Wu, Zhiwei Lian, Jieyu Tang, Jingyun Shen

Abstract:

Tornado is a serious and unpredictable natural disaster, which has an important impact on people's production and life. The probability of being hit by tornadoes in China was analyzed considering the principles of tornado formation. Then some suggestions on layout and shapes for newly-built buildings were provided combined with the characteristics of tornado wind fields. Fuzzy clustering and inverse closeness methods were used to evaluate the probability levels of tornado risks in various provinces based on classification and ranking. GIS was adopted to display the results. Finally, wind field single-vortex tornado was studied to discuss the optimized design of rural low-rise houses in Yancheng, Jiangsu as an example. This paper may provide enough data to support building and urban design in some specific regions.

Keywords: tornado probability, computational fluid dynamics, fuzzy mathematics, optimal design

Procedia PDF Downloads 137
304 Disclosure on Adherence of the King Code's Audit Committee Guidance: Cluster Analyses to Determine Strengths and Weaknesses

Authors: Philna Coetzee, Clara Msiza

Abstract:

In modern society, audit committees are seen as the custodians of accountability and the conscience of management and the board. But who holds the audit committee accountable for their actions or non-actions and how do we know what they are supposed to be doing and what they are doing? The purpose of this article is to provide greater insight into the latter part of this problem, namely, determine what best practises for audit committees and the disclosure of what is the realities are. In countries where governance is well established, the roles and responsibilities of the audit committee are mostly clearly guided by legislation and/or guidance documents, with countries increasingly providing guidance on this topic. With high cost involved to adhere to governance guidelines, the public (for public organisations) and shareholders (for private organisations) expect to see the value of their ‘investment’. For audit committees, the dividends on the investment should reflect in less fraudulent activities, less corruption, higher efficiency and effectiveness, improved social and environmental impact, and increased profits, to name a few. If this is not the case (which is reflected in the number of fraudulent activities in both the private and the public sector), stakeholders have the right to ask: where was the audit committee? Therefore, the objective of this article is to contribute to the body of knowledge by comparing the adherence of audit committee to best practices guidelines as stipulated in the King Report across public listed companies, national and provincial government departments, state-owned enterprises and local municipalities. After constructs were formed, based on the literature, factor analyses were conducted to reduce the number of variables in each construct. Thereafter, cluster analyses, which is an explorative analysis technique that classifies a set of objects in such a way that objects that are more similar are grouped into the same group, were conducted. The SPSS TwoStep Clustering Component was used, being capable of handling both continuous and categorical variables. In the first step, a pre-clustering procedure clusters the objects into small sub-clusters, after which it clusters these sub-clusters into the desired number of clusters. The cluster analyses were conducted for each construct and the measure, namely the audit opinion as listed in the external audit report, were included. Analysing 228 organisations' information, the results indicate that there is a clear distinction between the four spheres of business that has been included in the analyses, indicating certain strengths and certain weaknesses within each sphere. The results may provide the overseers of audit committees’ insight into where a specific sector’s strengths and weaknesses lie. Audit committee chairs will be able to improve the areas where their audit committee is lacking behind. The strengthening of audit committees should result in an improvement of the accountability of boards, leading to less fraud and corruption.

Keywords: audit committee disclosure, cluster analyses, governance best practices, strengths and weaknesses

Procedia PDF Downloads 167
303 Molecular Clustering and Velocity Increase in Converging-Diverging Nozzle in Molecular Dynamics Simulation

Authors: Jeoungsu Na, Jaehawn Lee, Changil Hong, Suhee Kim

Abstract:

A molecular dynamics simulation in a converging-diverging nozzle was performed to study molecular collisions and their influence to average flow velocity according to a variety of vacuum levels. The static pressures and the dynamic pressure exerted by the molecule collision on the selected walls were compared to figure out the intensity variances of the directional flows. With pressure differences constant between the entrance and the exit of the nozzle, the numerical experiment was performed for molecular velocities and directional flows. The result shows that the velocities increased at the nozzle exit as the vacuum level gets higher in that area because less molecular collisions.

Keywords: cavitation, molecular collision, nozzle, vacuum, velocity increase

Procedia PDF Downloads 434
302 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 418
301 Human Behavior Modeling in Video Surveillance of Conference Halls

Authors: Nour Charara, Hussein Charara, Omar Abou Khaled, Hani Abdallah, Elena Mugellini

Abstract:

In this paper, we present a human behavior modeling approach in videos scenes. This approach is used to model the normal behaviors in the conference halls. We exploited the Probabilistic Latent Semantic Analysis technique (PLSA), using the 'Bag-of-Terms' paradigm, as a tool for exploring video data to learn the model by grouping similar activities. Our term vocabulary consists of 3D spatio-temporal patch groups assigned by the direction of motion. Our video representation ensures the spatial information, the object trajectory, and the motion. The main importance of this approach is that it can be adapted to detect abnormal behaviors in order to ensure and enhance human security.

Keywords: activity modeling, clustering, PLSA, video representation

Procedia PDF Downloads 394
300 Genetic Structuring of Four Tectona grandis L. F. Seed Production Areas in Southern India

Authors: P. M. Sreekanth

Abstract:

Teak (Tectona grandis L. f.) is a tree species indigenous to India and other Southeastern countries. It produces high-value timber and is easily established in plantations. Reforestation requires a constant supply of high quality seeds. Seed Production Areas (SPA) of teak are improved stands used for collection of open-pollinated quality seeds in large quantities. Information on the genetic diversity of major teak SPAs in India is scanty. The genetic structure of four important seed production areas of Kerala State in Southern India was analyzed employing amplified fragment length polymorphism markers using ten selective primer combinations on 80 samples (4 populations X 20 trees). The study revealed that the gene diversity of the SPAs varied from 0.169 (Konni SPA) to 0.203 (Wayanad SPA). The percentage of polymorphic loci ranged from 74.42 (Parambikulam SPA) to 84.06 (Konni SPA). The mean total gene diversity index (HT) of all the four SPAs was 0.2296 ±0.02. A high proportion of genetic diversity was observed within the populations (83%) while diversity between populations was lower (17%) (GST = 0.17). Principal coordinate analysis and STRUCTURE analysis of the genotypes indicated that the pattern of clustering was in accordance with the origin and geographic location of SPAs, indicating specific identity of each population. A UPGMA dendrogram was prepared and showed that all the twenty samples from each of Konni and Parambikulam SPAs clustered into two separate groups, respectively. However, five Nilambur genotypes and one Wayanad genotype intruded into the Konni cluster. The higher gene flow estimated (Nm = 2.4) reflected the inclusion of Konni origin planting stock in the Nilambur and Wayanad plantations. Evidence for population structure investigated using 3D Principal Coordinate Analysis of FAMD software 1.30 indicated that the pattern of clustering was in accordance with the origin of SPAs. The present study showed that assessment of genetic diversity in seed production plantations can be achieved using AFLP markers. The AFLP fingerprinting was also capable of identifying the geographical origin of planting stock and there by revealing the occurrence of the errors in genotype labeling. Molecular marker-based selective culling of genetically similar trees from a stand so as to increase the genetic base of seed production areas could be a new proposition to improve quality of seeds required for raising commercial plantations of teak. The technique can also be used to assess the genetic diversity status of plus trees within provenances during their selection for raising clonal seed orchards for assuring the quality of seeds available for raising future plantations.

Keywords: AFLP, genetic structure, spa, teak

Procedia PDF Downloads 308
299 Energy Efficient Firefly Algorithm in Wireless Sensor Network

Authors: Wafa’ Alsharafat, Khalid Batiha, Alaa Kassab

Abstract:

Wireless sensor network (WSN) is comprised of a huge number of small and cheap devices known as sensor nodes. Usually, these sensor nodes are massively and deployed randomly as in Ad-hoc over hostile and harsh environment to sense, collect and transmit data to the needed locations (i.e., base station). One of the main advantages of WSN is that the ability to work in unattended and scattered environments regardless the presence of humans such as remote active volcanoes environments or earthquakes. In WSN expanding network, lifetime is a major concern. Clustering technique is more important to maximize network lifetime. Nature-inspired algorithms are developed and optimized to find optimized solutions for various optimization problems. We proposed Energy Efficient Firefly Algorithm to improve network lifetime as long as possible.

Keywords: wireless network, SN, Firefly, energy efficiency

Procedia PDF Downloads 389
298 Changes in Geospatial Structure of Households in the Czech Republic: Findings from Population and Housing Census

Authors: Jaroslav Kraus

Abstract:

Spatial information about demographic processes are a standard part of outputs in the Czech Republic. That was also the case of Population and Housing Census which was held on 2011. This is a starting point for a follow up study devoted to two basic types of households: single person households and households of one completed family. Single person households and one family households create more than 80 percent of all households, but the share and spatial structure is in long-term changing. The increase of single households is results of long-term fertility decrease and divorce increase, but also possibility of separate living. There are regions in the Czech Republic with traditional demographic behavior, and regions like capital Prague and some others with changing pattern. Population census is based - according to international standards - on the concept of currently living population. Three types of geospatial approaches will be used for analysis: (i) firstly measures of geographic distribution, (ii) secondly mapping clusters to identify the locations of statistically significant hot spots, cold spots, spatial outliers, and similar features and (iii) finally analyzing pattern approach as a starting point for more in-depth analyses (geospatial regression) in the future will be also applied. For analysis of this type of data, number of households by types should be distinct objects. All events in a meaningful delimited study region (e.g. municipalities) will be included in an analysis. Commonly produced measures of central tendency and spread will include: identification of the location of the center of the point set (by NUTS3 level); identification of the median center and standard distance, weighted standard distance and standard deviational ellipses will be also used. Identifying that clustering exists in census households datasets does not provide a detailed picture of the nature and pattern of clustering but will be helpful to apply simple hot-spot (and cold spot) identification techniques to such datasets. Once the spatial structure of households will be determined, any particular measure of autocorrelation can be constructed by defining a way of measuring the difference between location attribute values. The most widely used measure is Moran’s I that will be applied to municipal units where numerical ratio is calculated. Local statistics arise naturally out of any of the methods for measuring spatial autocorrelation and will be applied to development of localized variants of almost any standard summary statistic. Local Moran’s I will give an indication of household data homogeneity and diversity on a municipal level.

Keywords: census, geo-demography, households, the Czech Republic

Procedia PDF Downloads 97
297 Cardiac Biosignal and Adaptation in Confined Nuclear Submarine Patrol

Authors: B. Lefranc, C. Aufauvre-Poupon, C. Martin-Krumm, M. Trousselard

Abstract:

Isolated and confined environments (ICE) present several challenges which may adversely affect human’s psychology and physiology. Submariners in Sub-Surface Ballistic Nuclear (SSBN) mission exposed to these environmental constraints must be able to perform complex tasks as part of their normal duties, as well as during crisis periods when emergency actions are required or imminent. The operational and environmental constraints they face contribute to challenge human adaptability. The impact of such a constrained environment has yet to be explored. Establishing a knowledge framework is a determining factor, particularly in view of the next long space travels. Ensuring that the crews are maintained in optimal operational conditions is a real challenge because the success of the mission depends on them. This study focused on the evaluation of the impact of stress on mental health and sensory degradation of submariners during a mission on SSBN using cardiac biosignal (heart rate variability, HRV) clustering. This is a pragmatic exploratory study of a prospective cohort included 19 submariner volunteers. HRV was recorded at baseline to classify by clustering the submariners according to their stress level based on parasympathetic (Pa) activity. Impacts of high Pa (HPa) versus low Pa (LPa) level at baseline were assessed on emotional state and sensory perception (interoception and exteroception) as a cardiac biosignal during the patrol and at a recovery time one month after. Whatever the time, no significant difference was found in mental health between groups. There are significant differences in the interoceptive, exteroceptive and physiological functioning during the patrol and at recovery time. To sum up, compared to the LPa group, the HPa maintains a higher level in psychosensory functioning during the patrol and at recovery but exhibits a decrease in Pa level. The HPa group has less adaptable HRV characteristics, less unpredictability and flexibility of cardiac biosignals while the LPa group increases them during the patrol and at recovery time. This dissociation between psychosensory and physiological adaptation suggests two treatment modalities for ICE environments. To our best knowledge, our results are the first to highlight the impact of physiological differences in the HRV profile on the adaptability of submariners. Further studies are needed to evaluate the negative emotional and cognitive effects of ICEs based on the cardiac profile. Artificial intelligence offers a promising future for maintaining high level of operational conditions. These future perspectives will not only allow submariners to be better prepared, but also to design feasible countermeasures that will help support analog environments that bring us closer to a trip to Mars.

Keywords: adaptation, exteroception, HRV, ICE, interoception, SSBN

Procedia PDF Downloads 182
296 Phylogenetic Inferences based on Morphoanatomical Characters in Plectranthus esculentus N. E. Br. (Lamiaceae) from Nigeria

Authors: Otuwose E. Agyeno, Adeniyi A. Jayeola, Bashir A. Ajala

Abstract:

P. esculentus is indigenous to Nigeria yet no wild relation has been encountered or reported. This has made it difficult to establish proper lineages between the varieties and landraces under cultivation. The present work is the first to determine the apormophy of 135 morphoanatomical characters in organs of 46 accessions drawn from 23 populations of this species based on dicta. The character states were coded in accession x character-state matrices and only 83 were informative and utilised for neighbour joining clustering based on euclidean values, and heuristic search in parsimony analysis using PAST ver. 3.15 software. Compatibility and evolutionary trends between accessions were then explored from values and diagrams produced. The low consistency indices (CI) recorded support monophyly and low homoplasy in this taxon. Agglomerative schedules based on character type and source data sets divided the accessions into mainly 3 clades, each of complexes of accessions. Solenostemon rotundifolius (Poir) J.K Morton was the outgroup (OG) used, and it occurred within the largest clades except when the characters were combined in a data set. The OG showed better compatibility with accessions of populations of landrace Isci, and varieties Riyum and Long’at. Otherwise, its aerial parts are more consistent with those of accessions of variety Bebot. The highly polytomous clades produced due to anatomical data set may be an indication of how stable such characters are in this species. Strict consensus trees with more than 60 nodes outputted showed that the basal nodes were strongly supported by 3 to 17 characters across the data sets, suggesting that populations of this species are more alike. The OG was clearly the first diverging lineage and closely related to accessions of landrace Gwe and variety Bebot morphologically, but different from them anatomically. It was also distantly related to landrace Fina and variety Long’at in terms of root, stem and leaf structural attributes. There were at least 5 other clades with each comprising of complexes of accessions from different localities and terrains within the study area. Spherical stem in cross section, size of vascular bundles at the stem corners as well as the alternate and whorl phyllotaxy are attributes which may have facilitated each other’s evolution in all accessions of the landrace Gwe, and they may be innovative since such states are not characteristic of the larger Lamiaceae, and Plectranthus L’Her in particular. In conclusion, this study has provided valuable information about infraspecific diversity in this taxon. It supports recognition of the varietal statuses accorded to populations of P. esculentus, as well as the hypothesis that the wild gene might have been distributed on the Jos Plateau. However, molecular characterisation of accessions of populations of this species would resolve this problem better.

Keywords: clustering, lineage, morphoanatomical characters, Nigeria, phylogenetics, Plectranthus esculentus, population

Procedia PDF Downloads 136
295 Using Nonhomogeneous Poisson Process with Compound Distribution to Price Catastrophe Options

Authors: Rong-Tsorng Wang

Abstract:

In this paper, we derive a pricing formula for catastrophe equity put options (or CatEPut) with non-homogeneous loss and approximated compound distributions. We assume that the loss claims arrival process is a nonhomogeneous Poisson process (NHPP) representing the clustering occurrences of loss claims, the size of loss claims is a sequence of independent and identically distributed random variables, and the accumulated loss distribution forms a compound distribution and is approximated by a heavy-tailed distribution. A numerical example is given to calibrate parameters, and we discuss how the value of CatEPut is affected by the changes of parameters in the pricing model we provided.

Keywords: catastrophe equity put options, compound distributions, nonhomogeneous Poisson process, pricing model

Procedia PDF Downloads 167
294 Optimized Cluster Head Selection Algorithm Based on LEACH Protocol for Wireless Sensor Networks

Authors: Wided Abidi, Tahar Ezzedine

Abstract:

Low-Energy Adaptive Clustering Hierarchy (LEACH) has been considered as one of the effective hierarchical routing algorithms that optimize energy and prolong the lifetime of network. Since the selection of Cluster Head (CH) in LEACH is carried out randomly, in this paper, we propose an approach of electing CH based on LEACH protocol. In other words, we present a formula for calculating the threshold responsible for CH election. In fact, we adopt three principle criteria: the remaining energy of node, the number of neighbors within cluster range and the distance between node and CH. Simulation results show that our proposed approach beats LEACH protocol in regards of prolonging the lifetime of network and saving residual energy.

Keywords: wireless sensors networks, LEACH protocol, cluster head election, energy efficiency

Procedia PDF Downloads 330
293 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 378
292 Data-Driven Market Segmentation in Hospitality Using Unsupervised Machine Learning

Authors: Rik van Leeuwen, Ger Koole

Abstract:

Within hospitality, marketing departments use segmentation to create tailored strategies to ensure personalized marketing. This study provides a data-driven approach by segmenting guest profiles via hierarchical clustering based on an extensive set of features. The industry requires understandable outcomes that contribute to adaptability for marketing departments to make data-driven decisions and ultimately driving profit. A marketing department specified a business question that guides the unsupervised machine learning algorithm. Features of guests change over time; therefore, there is a probability that guests transition from one segment to another. The purpose of the study is to provide steps in the process from raw data to actionable insights, which serve as a guideline for how hospitality companies can adopt an algorithmic approach.

Keywords: hierarchical cluster analysis, hospitality, market segmentation

Procedia PDF Downloads 108
291 Efficient Subgoal Discovery for Hierarchical Reinforcement Learning Using Local Computations

Authors: Adrian Millea

Abstract:

In hierarchical reinforcement learning, one of the main issues encountered is the discovery of subgoal states or options (which are policies reaching subgoal states) by partitioning the environment in a meaningful way. This partitioning usually requires an expensive global clustering operation or eigendecomposition of the Laplacian of the states graph. We propose a local solution to this issue, much more efficient than algorithms using global information, which successfully discovers subgoal states by computing a simple function, which we call heterogeneity for each state as a function of its neighbors. Moreover, we construct a value function using the difference in heterogeneity from one step to the next, as reward, such that we are able to explore the state space much more efficiently than say epsilon-greedy. The same principle can then be applied to higher level of the hierarchy, where now states are subgoals discovered at the level below.

Keywords: exploration, hierarchical reinforcement learning, locality, options, value functions

Procedia PDF Downloads 171
290 A Product-Specific/Unobservable Approach to Segmentation for a Value Expressive Credit Card Service

Authors: Manfred F. Maute, Olga Naumenko, Raymond T. Kong

Abstract:

Using data from a nationally representative financial panel of Canadian households, this study develops a psychographic segmentation of the customers of a value-expressive credit card service and tests for effects on relational response differences. The variety of segments elicited by agglomerative and k means clustering and the familiar profiles of individual clusters suggest that the face validity of the psychographic segmentation was quite high. Segmentation had a significant effect on customer satisfaction and relationship depth. However, when socio-demographic characteristics like household size and income were accounted for in the psychographic segmentation, the effect on relational response differences was magnified threefold. Implications for the segmentation of financial services markets are considered.

Keywords: customer satisfaction, financial services, psychographics, response differences, segmentation

Procedia PDF Downloads 334
289 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 443
288 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization

Authors: K. Umbleja, M. Ichino, H. Yaguchi

Abstract:

In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.

Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data

Procedia PDF Downloads 170
287 Discovering the Effects of Meteorological Variables on the Air Quality of Bogota, Colombia, by Data Mining Techniques

Authors: Fabiana Franceschi, Martha Cobo, Manuel Figueredo

Abstract:

Bogotá, the capital of Colombia, is its largest city and one of the most polluted in Latin America due to the fast economic growth over the last ten years. Bogotá has been affected by high pollution events which led to the high concentration of PM10 and NO2, exceeding the local 24-hour legal limits (100 and 150 g/m3 each). The most important pollutants in the city are PM10 and PM2.5 (which are associated with respiratory and cardiovascular problems) and it is known that their concentrations in the atmosphere depend on the local meteorological factors. Therefore, it is necessary to establish a relationship between the meteorological variables and the concentrations of the atmospheric pollutants such as PM10, PM2.5, CO, SO2, NO2 and O3. This study aims to determine the interrelations between meteorological variables and air pollutants in Bogotá, using data mining techniques. Data from 13 monitoring stations were collected from the Bogotá Air Quality Monitoring Network within the period 2010-2015. The Principal Component Analysis (PCA) algorithm was applied to obtain primary relations between all the parameters, and afterwards, the K-means clustering technique was implemented to corroborate those relations found previously and to find patterns in the data. PCA was also used on a per shift basis (morning, afternoon, night and early morning) to validate possible variation of the previous trends and a per year basis to verify that the identified trends have remained throughout the study time. Results demonstrated that wind speed, wind direction, temperature, and NO2 are the most influencing factors on PM10 concentrations. Furthermore, it was confirmed that high humidity episodes increased PM2,5 levels. It was also found that there are direct proportional relationships between O3 levels and wind speed and radiation, while there is an inverse relationship between O3 levels and humidity. Concentrations of SO2 increases with the presence of PM10 and decreases with the wind speed and wind direction. They proved as well that there is a decreasing trend of pollutant concentrations over the last five years. Also, in rainy periods (March-June and September-December) some trends regarding precipitations were stronger. Results obtained with K-means demonstrated that it was possible to find patterns on the data, and they also showed similar conditions and data distribution among Carvajal, Tunal and Puente Aranda stations, and also between Parque Simon Bolivar and las Ferias. It was verified that the aforementioned trends prevailed during the study period by applying the same technique per year. It was concluded that PCA algorithm is useful to establish preliminary relationships among variables, and K-means clustering to find patterns in the data and understanding its distribution. The discovery of patterns in the data allows using these clusters as an input to an Artificial Neural Network prediction model.

Keywords: air pollution, air quality modelling, data mining, particulate matter

Procedia PDF Downloads 258
286 Design of Personal Job Recommendation Framework on Smartphone Platform

Authors: Chayaporn Kaensar

Abstract:

Recently, Job Recommender Systems have gained much attention in industries since they solve the problem of information overload on the recruiting website. Therefore, we proposed Extended Personalized Job System that has the capability of providing the appropriate jobs for job seeker and recommending some suitable information for them using Data Mining Techniques and Dynamic User Profile. On the other hands, company can also interact to the system for publishing and updating job information. This system have emerged and supported various platforms such as web application and android mobile application. In this paper, User profiles, Implicit User Action, User Feedback, and Clustering Techniques in WEKA libraries have gained attention and implemented for this application. In additions, open source tools like Yii Web Application Framework, Bootstrap Front End Framework and Android Mobile Technology were also applied.

Keywords: recommendation, user profile, data mining, web and mobile technology

Procedia PDF Downloads 313
285 Extracting Actions with Improved Part of Speech Tagging for Social Networking Texts

Authors: Yassine Jamoussi, Ameni Youssfi, Henda Ben Ghezala

Abstract:

With the growing interest in social networking, the interaction of social actors evolved to a source of knowledge in which it becomes possible to perform context aware-reasoning. The information extraction from social networking especially Twitter and Facebook is one of the problems in this area. To extract text from social networking, we need several lexical features and large scale word clustering. We attempt to expand existing tokenizer and to develop our own tagger in order to support the incorrect words currently in existence in Facebook and Twitter. Our goal in this work is to benefit from the lexical features developed for Twitter and online conversational text in previous works, and to develop an extraction model for constructing a huge knowledge based on actions

Keywords: social networking, information extraction, part-of-speech tagging, natural language processing

Procedia PDF Downloads 305
284 Authentication Based on Hand Movement by Low Dimensional Space Representation

Authors: Reut Lanyado, David Mendlovic

Abstract:

Most biological methods for authentication require special equipment and, some of them are easy to fake. We proposed a method for authentication based on hand movement while typing a sentence with a regular camera. This technique uses the full video of the hand, which is harder to fake. In the first phase, we tracked the hand joints in each frame. Next, we represented a single frame for each individual using our Pose Agnostic Rotation and Movement (PARM) dimensional space. Then, we indicated a full video of hand movement in a fixed low dimensional space using this method: Fixed Dimension Video by Interpolation Statistics (FDVIS). Finally, we identified each individual in the FDVIS representation using unsupervised clustering and supervised methods. Accuracy exceeds 96% for 80 individuals by using supervised KNN.

Keywords: authentication, feature extraction, hand recognition, security, signal processing

Procedia PDF Downloads 128
283 Capacitated Multiple Allocation P-Hub Median Problem on a Cluster Based Network under Congestion

Authors: Çağrı Özgün Kibiroğlu, Zeynep Turgut

Abstract:

This paper considers a hub location problem where the network service area partitioned into predetermined zones (represented by node clusters is given) and potential hub nodes capacity levels are determined a priori as a selection criteria of hub to investigate congestion effect on network. The objective is to design hub network by determining all required hub locations in the node clusters and also allocate non-hub nodes to hubs such that the total cost including transportation cost, opening cost of hubs and penalty cost for exceed of capacity level at hubs is minimized. A mixed integer linear programming model is developed introducing additional constraints to the traditional model of capacitated multiple allocation hub location problem and empirically tested.

Keywords: hub location problem, p-hub median problem, clustering, congestion

Procedia PDF Downloads 492
282 Diagnose of the Future of Family Businesses Based on the Study of Spanish Family Businesses Founders

Authors: Fernando Doral

Abstract:

Family businesses are a key phenomenon within the business landscape. Nevertheless, it involves two terms (“family” and “business”) which are nowadays rapidly evolving. Consequently, it isn't easy to diagnose if a family business will be a growing or decreasing phenomenon, which is the objective of this study. For that purpose, a sample of 50 Spanish-established companies from various sectors was taken. Different factors were identified for each enterprise, related to the profile of the founders, such as age, the number of sons and daughters, or support received from the family at the moment to start it up. That information was taken as an input for a clustering method to identify groups, which could help define the founders' profiles. That characterization was carried as a base to identify three factors whose evolution should be analyzed: family structures, business landscape and entrepreneurs' motivations. The analysis of the evolution of these three factors seems to indicate a negative tendency of family businesses. Therefore the consequent diagnosis of this study is to consider family businesses as a declining phenomenon.

Keywords: business diagnose, business trends, family business, family business founders

Procedia PDF Downloads 208
281 Data Mining Techniques for Anti-Money Laundering

Authors: M. Sai Veerendra

Abstract:

Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most of the financial institutions internationally have been implementing anti-money laundering solutions (AML) to fight investment fraud activities. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project on developing a new data mining solution for AML Units in an international investment bank in Ireland, we survey recent data mining approaches for AML. In this paper, we present not only these approaches but also give an overview on the important factors in building data mining solutions for AML activities.

Keywords: data mining, clustering, money laundering, anti-money laundering solutions

Procedia PDF Downloads 539