Search results for: Clusters of Microcalcifications
177 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach
Authors: K. Thangavel, R. Rathipriya
Abstract:
For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133176 Real Time Approach for Data Placement in Wireless Sensor Networks
Authors: Sanjeev Gupta, Mayank Dave
Abstract:
The issue of real-time and reliable report delivery is extremely important for taking effective decision in a real world mission critical Wireless Sensor Network (WSN) based application. The sensor data behaves differently in many ways from the data in traditional databases. WSNs need a mechanism to register, process queries, and disseminate data. In this paper we propose an architectural framework for data placement and management. We propose a reliable and real time approach for data placement and achieving data integrity using self organized sensor clusters. Instead of storing information in individual cluster heads as suggested in some protocols, in our architecture we suggest storing of information of all clusters within a cell in the corresponding base station. For data dissemination and action in the wireless sensor network we propose to use Action and Relay Stations (ARS). To reduce average energy dissipation of sensor nodes, the data is sent to the nearest ARS rather than base station. We have designed our architecture in such a way so as to achieve greater energy savings, enhanced availability and reliability.
Keywords: Cluster head, data reliability, real time communication, wireless sensor networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1814175 An Improved K-Means Algorithm for Gene Expression Data Clustering
Authors: Billel Kenidra, Mohamed Benmohammed
Abstract:
Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.
Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1284174 Critical Psychosocial Risk Treatment for Engineers and Technicians
Authors: R. Berglund, T. Backström, M. Bellgran
Abstract:
This study explores how management addresses psychosocial risks in seven teams of engineers and technicians in the midst of the fourth industrial revolution. The sample is from an ongoing quasi-experiment about psychosocial risk management in a manufacturing company in Sweden. Each of the seven teams belongs to one of two clusters: a positive cluster or a negative cluster. The positive cluster reports a significantly positive change in psychosocial risk levels between two time-points and the negative cluster reports a significantly negative change. The data are collected using semi-structured interviews. The results of the computer aided thematic analysis show that there are more differences than similarities when comparing the risk treatment actions taken between the two clusters. Findings show that the managers in the positive cluster use more enabling actions that foster and support formal and informal relationship building. In contrast, managers that use less enabling actions hinder the development of positive group processes and contribute negative changes in psychosocial risk levels. This exploratory study sheds some light on how management can influence significant positive and negative changes in psychosocial risk levels during a risk management process.
Keywords: Group process model, risk treatment, risk management, psychosocial.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1025173 Probe Selection for Pathway-Specific Microarray Probe Design Minimizing Melting Temperature Variance
Authors: Fabian Horn, Reinhard Guthke
Abstract:
In molecular biology, microarray technology is widely and successfully utilized to efficiently measure gene activity. If working with less studied organisms, methods to design custom-made microarray probes are available. One design criterion is to select probes with minimal melting temperature variances thus ensuring similar hybridization properties. If the microarray application focuses on the investigation of metabolic pathways, it is not necessary to cover the whole genome. It is more efficient to cover each metabolic pathway with a limited number of genes. Firstly, an approach is presented which minimizes the overall melting temperature variance of selected probes for all genes of interest. Secondly, the approach is extended to include the additional constraints of covering all pathways with a limited number of genes while minimizing the overall variance. The new optimization problem is solved by a bottom-up programming approach which reduces the complexity to make it computationally feasible. The new method is exemplary applied for the selection of microarray probes in order to cover all fungal secondary metabolite gene clusters for Aspergillus terreus.
Keywords: bottom-up approach, gene clusters, melting temperature, metabolic pathway, microarray probe design, probe selection
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559172 Malaria Prone Zones of West Bengal: A Spatio-Temporal Scenario
Authors: Meghna Maiti, Utpal Roy
Abstract:
In India, till today, malaria is considered to be one of the significant infectious diseases. Most of the cases regional geographical factors are the principal elements to let the places a unique identity. The incidence and intensity of infectious diseases are quite common and affect different places differently across the nation. The present study aims to identify spatial clusters of hot spots and cold spots of malaria incidence and their seasonal variation during the three periods of 2012-2014, 2015-2017 and 2018-20 in the state of West Bengal in India. As malaria is a vector-borne disease, numbers of positive test results are to be reported by the laboratories to the Department of Health, West Bengal (through the National Vector Borne Disease Control Programme). Data on block-wise monthly malaria positive cases are collected from Health Management Information System (HMIS), Ministry of Health and Family Welfare, Government of India. Moran’s I statistic is performed to assess the spatial autocorrelation of malaria incidence. The spatial statistical analysis mainly Local Indicators of Spatial Autocorrelation (LISA) cluster and Local Geary Cluster are applied to find the spatial clusters of hot spots and cold spots and seasonal variability of malaria incidence over the three periods. The result indicates that the spatial distribution of malaria is clustered during each of the three periods of 2012-2014, 2015-2017 and 2018-20. The analysis shows that in all the cases, high-high clusters are primarily concentrated in the western (Purulia, Paschim Medinipur districts), central (Maldah, Murshidabad districts) and the northern parts (Jalpaiguri, Kochbihar districts) and low-low clusters are found in the lower Gangetic plain (central-south) mainly and northern parts of West Bengal during the stipulated period. Apart from this seasonal variability inter-year variation is also visible. The results from different methods of this study indicate significant variation in the spatial distribution of malaria incidence in West Bengal and high incidence clusters are primarily persistently concentrated over the western part during 2012-2020 along with a strong seasonal pattern with a peak in rainy and autumn. By applying the different techniques in identifying the different degrees of incidence zones of malaria across West Bengal, some specific pockets or malaria hotspots are marked and identified where the incidence rates are quite harmonious over the different periods. From this analysis, it is clear that malaria is not a disease that is distributed uniformly across the state; some specific pockets are more prone to be affected in particular seasons of each year. Disease ecology and spatial patterns must be the factors in explaining the real factors for the higher incidence of this issue within those affected districts. The further study mainly by applying empirical approach is needed for discerning the strong relationship between communicable disease and other associated affecting factors.
Keywords: Malaria, infectious diseases, spatial statistics, spatial autocorrelation, LISA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 534171 Identifying Factors for Evaluating Livability Potential within a Metropolis: A Case of Kolkata
Authors: Arpan Paul, Joy Sen
Abstract:
Livability is a holistic concept whose factors include many complex characteristics and levels of interrelationships among them. It has been considered as people’s need for public amenities and is recognized as a major element to create social welfare. The concept and principles of livability are essential for recognizing the significance of community well-being. The attributes and dimensions of livability are also important aspects to measure the overall quality of environment. Livability potential is mainly considered as the capacity to develop into the overall well-being of an urban area in future. The intent of the present study is to identify the prime factors to evaluate livability potential within a metropolis. For ground level case study, the paper has selected Kolkata Metropolitan Area (KMA) as it has wide physical, social, and economic variations within it. The initial part of the study deals with detailed literature review on livability and its significance of evaluating its potential within a metropolis. The next segment is dedicated for identifying the primary factors which would evaluate livability potential within a metropolis. In pursuit of identifying primary factors, which have a direct impact on urban livability, this study delineates the metropolitan area into various clusters, having their distinct livability potential. As a final outcome of the study, variations of livability potential of those selected clusters are highlighted to explain the complexity of the metropolitan development.
Keywords: Livability potential, metropolis, Kolkata Metropolitan Area (KMA), well-being.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252170 An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data
Authors: Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon-kyung Lee, Hyejung Yoon
Abstract:
Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Keywords: Ant colony system, biological data, clustering, DNA chip.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1974169 Effects of Grape Seed Oil on Postharvest Life and Quality of Some Grape Cultivars
Authors: Zeki Kara, Kevser Yazar
Abstract:
Table grapes (Vitis vinifera L.) are an important crop worldwide. Postharvest problems like berry shattering, decay and stem dehydration are some of the important factors that limit the marketing of table grapes. Edible coatings are an alternative for increasing shelf-life of fruits, protecting fruits from humidity and oxygen effects, thus retarding their deterioration. This study aimed to compare different grape seed oil applications (GSO, 0.5 g L-1, 1 g L-1, 2 g L-1) and SO2 generating pads effects (SO2-1, SO2-2). Treated grapes with GSO and generating pads were packaged into polyethylene trays and stored at 0 ± 1°C and 85-95% moisture. Effects of the applications were investigated by some quality and sensory evaluations with intervals of 15 days. SO2 applications were determined the most effective treatments for minimizing weight loss and changes in TA, pH, color and appearance value. Grape seed oil applications were determined as a good alternative for grape preservation, improving weight losses and °Brix, TA, the color values and sensory analysis. Commercially, ‘Alphonse Lavallée’ clusters were stored for 75 days and ‘Antep Karası’ clusters for 60 days. The data obtained from GSO indicated that it had a similar quality result to SO2 for up to 40 days storage.
Keywords: Postharvest, quality, sensory analyses, Vitis vinifera L.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837168 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques
Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar
Abstract:
Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.Keywords: Share of electricity generation, CO2 emission, targets, multivariate methods, hierarchical clustering, K-means clustering, discriminant analyzed, correlation, EU member countries.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1247167 Analysis of Entrepreneurship in Industrial Cluster
Authors: Wen-Hsiang Lai
Abstract:
Except for the internal aspects of entrepreneurship (i.e.motivation, opportunity perspective and alertness), there are external aspects that affecting entrepreneurship (i.e. the industrial cluster). By comparing the machinery companies located inside and outside the industrial district, this study aims to explore the cluster effects on the entrepreneurship of companies in Taiwan machinery clusters (TMC). In this study, three factors affecting the entrepreneurship in TMC are conducted as “competition”, “embedded-ness” and “specialized knowledge”. The “competition” in the industrial cluster is defined as the competitive advantages that companies gain in form of demand effects and diversified strategies; the “embedded-ness” refers to the quality of company relations (relational embedded-ness) and ranges (structural embedded-ness) with the industry components (universities, customers and complementary) that affecting knowledge transfer and knowledge generations; the “specialized knowledge” shares theinternal knowledge within industrial clusters. This study finds that when comparing to the companieswhich are outside the cluster, the industrial cluster has positive influence on the entrepreneurship. Additionally, the factor of “relational embedded-ness” has significant impact on the entrepreneurship and affects the adaptation ability of companies in TMC. Finally, the factor of “competition” reveals partial influence on the entrepreneurship.
Keywords: Entrepreneurship, Industrial Cluster, Industrial District, Economies of Agglomerations, Taiwan Machinery Cluster (TMC).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2263166 An AI-Based Dynamical Resource Allocation Calculation Algorithm for Unmanned Aerial Vehicle
Authors: Zhou Luchen, Wu Yubing, Burra Venkata Durga Kumar
Abstract:
As the scale of the network becomes larger and more complex than before, the density of user devices is also increasing. The development of Unmanned Aerial Vehicle (UAV) networks is able to collect and transform data in an efficient way by using software-defined networks (SDN) technology. This paper proposed a three-layer distributed and dynamic cluster architecture to manage UAVs by using an AI-based resource allocation calculation algorithm to address the overloading network problem. Through separating services of each UAV, the UAV hierarchical cluster system performs the main function of reducing the network load and transferring user requests, with three sub-tasks including data collection, communication channel organization, and data relaying. In this cluster, a head node and a vice head node UAV are selected considering the CPU, RAM, and ROM memory of devices, battery charge, and capacity. The vice head node acts as a backup that stores all the data in the head node. The k-means clustering algorithm is used in order to detect high load regions and form the UAV layered clusters. The whole process of detecting high load areas, forming and selecting UAV clusters, and moving the selected UAV cluster to that area is proposed as offloading traffic algorithm.
Keywords: k-means, resource allocation, SDN, UAV network, unmanned aerial vehicles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 351165 A Psychophysiological Evaluation of an Effective Recognition Technique Using Interactive Dynamic Virtual Environments
Authors: Mohammadhossein Moghimi, Robert Stone, Pia Rotshtein
Abstract:
Recording psychological and physiological correlates of human performance within virtual environments and interpreting their impacts on human engagement, ‘immersion’ and related emotional or ‘effective’ states is both academically and technologically challenging. By exposing participants to an effective, real-time (game-like) virtual environment, designed and evaluated in an earlier study, a psychophysiological database containing the EEG, GSR and Heart Rate of 30 male and female gamers, exposed to 10 games, was constructed. Some 174 features were subsequently identified and extracted from a number of windows, with 28 different timing lengths (e.g. 2, 3, 5, etc. seconds). After reducing the number of features to 30, using a feature selection technique, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) methods were subsequently employed for the classification process. The classifiers categorised the psychophysiological database into four effective clusters (defined based on a 3-dimensional space – valence, arousal and dominance) and eight emotion labels (relaxed, content, happy, excited, angry, afraid, sad, and bored). The KNN and SVM classifiers achieved average cross-validation accuracies of 97.01% (±1.3%) and 92.84% (±3.67%), respectively. However, no significant differences were found in the classification process based on effective clusters or emotion labels.
Keywords: Virtual Reality, effective computing, effective VR, emotion-based effective physiological database.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 994164 Mapping of Adrenal Gland Diseases Research in Middle East Countries: A Scientometric Analysis, 2007-2013
Authors: Zahra Emami, Mohammad Ebrahim Khamseh, Nahid Hashemi Madani, Iman Kermani
Abstract:
The aim of the study was to map scientific research on adrenal gland diseases in the Middle East countries through the Web of Science database using scientometric analysis. Data were analyzed with Excel software; and HistCite was used for mapping of the scientific texts. In this study, from a total of 268 retrieved records, 1125 authors from 328 institutions published their texts in 138 journals. Among 17 Middle East countries, Turkey ranked first with 164 documents (61.19%), Israel ranked second with 47 documents (15.53%) and Iran came in the third place with 26 documents. Most of the publications (185 documents, 69.2%) were articles. Among the universities of the Middle East, Istanbul University had the highest science production rate (9.7%). The Journal of Clinical Endocrinology & Metabolism had the highest TGCS (243 citations). In the scientific mapping, 7 clusters were formed based on TLCS (Total Local Citation Score) & TGCS (Total Global Citation Score). considering the study results, establishment of scientific connections and collaboration with other countries and use of publications on adrenal gland diseases from high ranking universities can help in the development of this field and promote the medical practice in this regard. Moreover, investigation of the formed clusters in relation to Congenital Hyperplasia and puberty related disorders can be research priorities for investigators.
Keywords: Mapping, scientific research, adrenal gland diseases, scientometric.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1371163 Web Proxy Detection via Bipartite Graphs and One-Mode Projections
Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo
Abstract:
With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.
Keywords: Bipartite graph, clustering, one-mode projection, web proxy detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 746162 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform
Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu
Abstract:
Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.
Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance empirical formula, typical SQL query tasks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837161 Localization of Geospatial Events and Hoax Prediction in the UFO Database
Authors: Harish Krishnamurthy, Anna Lafontant, Ren Yi
Abstract:
Unidentified Flying Objects (UFOs) have been an interesting topic for most enthusiasts and hence people all over the United States report such findings online at the National UFO Report Center (NUFORC). Some of these reports are a hoax and among those that seem legitimate, our task is not to establish that these events confirm that they indeed are events related to flying objects from aliens in outer space. Rather, we intend to identify if the report was a hoax as was identified by the UFO database team with their existing curation criterion. However, the database provides a wealth of information that can be exploited to provide various analyses and insights such as social reporting, identifying real-time spatial events and much more. We perform analysis to localize these time-series geospatial events and correlate with known real-time events. This paper does not confirm any legitimacy of alien activity, but rather attempts to gather information from likely legitimate reports of UFOs by studying the online reports. These events happen in geospatial clusters and also are time-based. We look at cluster density and data visualization to search the space of various cluster realizations to decide best probable clusters that provide us information about the proximity of such activity. A random forest classifier is also presented that is used to identify true events and hoax events, using the best possible features available such as region, week, time-period and duration. Lastly, we show the performance of the scheme on various days and correlate with real-time events where one of the UFO reports strongly correlates to a missile test conducted in the United States.
Keywords: Time-series clustering, feature extraction, hoax prediction, geospatial events.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 851160 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection
Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada
Abstract:
With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.Keywords: Machine learning, Imbalanced data, Data mining, Big data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1137159 Networks in the Tourism Sector in Brazil: Proposal of a Management Model Applied to Tourism Clusters
Authors: Gysele Lima Ricci, Jose Miguel Rodriguez Anton
Abstract:
Companies in the tourism sector need to achieve competitive advantages for their survival in the market. In this way, the models based on association, cooperation, complementarity, distribution, exchange and mutual assistance arise as a possibility of organizational development, taking as reference the concept of networks. Many companies seek to partner in local networks as clusters to act together and associate. The main objective of the present research is to identify the specificities of management and the practices of cooperation in the tourist destination of São Paulo - Brazil, and to propose a new management model with possible cluster of tourism. The empirical analysis was carried out in three phases. As a first phase, a research was made by the companies, associations and tourism organizations existing in São Paulo, analyzing the characteristics of their business. In the second phase, the management specificities and cooperation practice used in the tourist destination. And in the third phase, identifying the possible strengths and weaknesses that potential or potential tourist cluster could have, proposing the development of the management model of the same adapted to the needs of the companies, associations and organizations. As a main result, it has been identified that companies, associations and organizations could be looking for synergies with each other and collaborate through a Hiperred organizational structure, in which they share their knowledge, try to make the most of the collaboration and to benefit from three concepts: flexibility, learning and collaboration. Finally, it is concluded that, the proposed tourism cluster management model is viable for the development of tourism destinations because it makes it possible to strategically address agents which are responsible for public policies, as well as public and private companies and organizations in their strategies competitiveness and cooperation.Keywords: Cluster, management model, networks, tourism sector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1010158 Maximization of Lifetime for Wireless Sensor Networks Based on Energy Efficient Clustering Algorithm
Authors: Frodouard Minani
Abstract:
Since last decade, wireless sensor networks (WSNs) have been used in many areas like health care, agriculture, defense, military, disaster hit areas and so on. Wireless Sensor Networks consist of a Base Station (BS) and more number of wireless sensors in order to monitor temperature, pressure, motion in different environment conditions. The key parameter that plays a major role in designing a protocol for Wireless Sensor Networks is energy efficiency which is a scarcest resource of sensor nodes and it determines the lifetime of sensor nodes. Maximizing sensor node’s lifetime is an important issue in the design of applications and protocols for Wireless Sensor Networks. Clustering sensor nodes mechanism is an effective topology control approach for helping to achieve the goal of this research. In this paper, the researcher presents an energy efficiency protocol to prolong the network lifetime based on Energy efficient clustering algorithm. The Low Energy Adaptive Clustering Hierarchy (LEACH) is a routing protocol for clusters which is used to lower the energy consumption and also to improve the lifetime of the Wireless Sensor Networks. Maximizing energy dissipation and network lifetime are important matters in the design of applications and protocols for wireless sensor networks. Proposed system is to maximize the lifetime of the Wireless Sensor Networks by choosing the farthest cluster head (CH) instead of the closest CH and forming the cluster by considering the following parameter metrics such as Node’s density, residual-energy and distance between clusters (inter-cluster distance). In this paper, comparisons between the proposed protocol and comparative protocols in different scenarios have been done and the simulation results showed that the proposed protocol performs well over other comparative protocols in various scenarios.
Keywords: Base station, clustering algorithm, energy efficient, wireless sensor networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 844157 Improving Fake News Detection Using K-means and Support Vector Machine Approaches
Authors: Kasra Majbouri Yazdi, Adel Majbouri Yazdi, Saeid Khodayi, Jingyu Hou, Wanlei Zhou, Saeed Saedy
Abstract:
Fake news and false information are big challenges of all types of media, especially social media. There is a lot of false information, fake likes, views and duplicated accounts as big social networks such as Facebook and Twitter admitted. Most information appearing on social media is doubtful and in some cases misleading. They need to be detected as soon as possible to avoid a negative impact on society. The dimensions of the fake news datasets are growing rapidly, so to obtain a better result of detecting false information with less computation time and complexity, the dimensions need to be reduced. One of the best techniques of reducing data size is using feature selection method. The aim of this technique is to choose a feature subset from the original set to improve the classification performance. In this paper, a feature selection method is proposed with the integration of K-means clustering and Support Vector Machine (SVM) approaches which work in four steps. First, the similarities between all features are calculated. Then, features are divided into several clusters. Next, the final feature set is selected from all clusters, and finally, fake news is classified based on the final feature subset using the SVM method. The proposed method was evaluated by comparing its performance with other state-of-the-art methods on several specific benchmark datasets and the outcome showed a better classification of false information for our work. The detection performance was improved in two aspects. On the one hand, the detection runtime process decreased, and on the other hand, the classification accuracy increased because of the elimination of redundant features and the reduction of datasets dimensions.
Keywords: Fake news detection, feature selection, support vector machine, K-means clustering, machine learning, social media.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4524156 A New Algorithm for Cluster Initialization
Authors: Moth'd Belal. Al-Daoud
Abstract:
Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.
Keywords: clustering, k-means, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2103155 Modeling Aggregation of Insoluble Phase in Reactors
Authors: A. Brener, B. Ismailov, G. Berdalieva
Abstract:
In the paper we submit the modification of kinetic Smoluchowski equation for binary aggregation applying to systems with chemical reactions of first and second orders in which the main product is insoluble. The goal of this work is to create theoretical foundation and engineering procedures for calculating the chemical apparatuses in the conditions of joint course of chemical reactions and processes of aggregation of insoluble dispersed phases which are formed in working zones of the reactor.
Keywords: Binary aggregation, Clusters, Chemical reactions, Insoluble phases.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1481154 Performance Evaluation and Plugging Characteristics of Controllable Self-Aggregating Colloidal Particle Profile Control Agent
Authors: Zhiguo Yang, Xiangan Yue, Minglu Shao, Yang Yue, Tianqi Yue
Abstract:
In low permeability reservoirs, the reservoir pore throat is small and the micro heterogeneity is prominent. Conventional microsphere profile control agents generally have good injectability but poor plugging effect; however, profile control agents with good plugging effect generally have poor injectability, which makes it difficult for agent to realize deep profile control of reservoir. To solve this problem, styrene and acrylamide were used as monomers in the laboratory. Emulsion polymerization was used to prepare the Controllable Self-Aggregating Colloidal Particle (CSA), which was rich in amide group. The CSA microsphere dispersion solution with a particle diameter smaller than the pore throat diameter was injected into the reservoir to ensure that the profile control agent had good inject ability. After dispersing the CSA microsphere to the deep part of the reservoir, the CSA microspheres dispersed in static for a certain period of time will self-aggregate into large-sized particle clusters to achieve plugging of hypertonic channels. The CSA microsphere has the characteristics of low expansion and avoids shear fracture in the process of migration. It can be observed by transmission electron microscope that CSA microspheres still maintain regular and uniform spherical and core-shell heterogeneous structure after aging at 100 ºC for 35 days, and CSA microspheres have good thermal stability. The results of bottle test showed that with the increase of cation concentration, the aggregation time of CSA microspheres gradually shortened, and the influence of divalent cations was greater than that of monovalent ions. Physical simulation experiments show that CSA microspheres have good injectability, and the aggregated CSA particle clusters can produce effective plugging and migrate to the deep part of the reservoir for profile control.
Keywords: Heterogeneous reservoir, deep profile control, emulsion polymerization, colloidal particles, plugging characteristic.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 486153 Gas Sensing Properties of SnO2 Thin Films Modified by Ag Nanoclusters Synthesized by SILD Method
Authors: G. Korotcenkov, B. K. Cho, L. B. Gulina, V. P. Tolstoy
Abstract:
The effect of SnO2 surface modification by Ag nanoclusters, synthesized by SILD method, on the operating characteristics of thin film gas sensors was studied and models for the promotional role of Ag additives were discussed. It was found that mentioned above approach can be used for improvement both the sensitivity and the rate of response of the SnO2-based gas sensors to CO and H2. At the same time, the presence of the Ag clusters on the surface of SnO2 depressed the sensor response to ozone.
Keywords: Ag nanoparticles, deposition, characterization, gas sensors, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2389152 Contextual SenSe Model: Word Sense Disambiguation Using Sense and Sense Value of Context Surrounding the Target
Authors: Vishal Raj, Noorhan Abbas
Abstract:
Ambiguity in NLP (Natural Language Processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a method to create an affinity matrix to calculate the affinity between any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an algorithm to create the sense clusters of tokens using affinity matrix under hierarchy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contextual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and challenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.
Keywords: Word Sense Disambiguation, WSD, Contextual SenSe Model, Most Frequent Sense, part of speech, POS, Natural Language Processing, NLP, OOV, out of vocabulary, ELMo, Embeddings from Language Model, BERT, Bidirectional Encoder Representations from Transformers, Word2Vec, lemma_POS, Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 384151 The Role of Knowledge Management in Innovation: Spanish Evidence
Authors: María Jesús Luengo-Valderrey, Mónica Moso-Díez
Abstract:
In the knowledge-based economy, innovation is considered essential in order to achieve survival and growth in organizations. On the other hand, knowledge management is currently understood as one of the keys to innovation process. Both factors are generally admitted as generators of competitive advantage in organizations. Specifically, activities on R&D&I and those that generate internal knowledge have a positive influence in innovation results. This paper examines this effect and if it is similar or not is what we aimed to quantify in this paper. We focus on the impact that proportion of knowledge workers, the R&D&I investment, the amounts destined for ICTs and training for innovation have on the variation of tangible and intangibles returns for the sector of high and medium technology in Spain. To do this, we have performed an empirical analysis on the results of questionnaires about innovation in enterprises in Spain, collected by the National Statistics Institute. First, using clusters methodology, the behavior of these enterprises regarding knowledge management is identified. Then, using SEM methodology, we performed, for each cluster, the study about cause-effect relationships among constructs defined through variables, setting its type and quantification. The cluster analysis results in four groups in which cluster number 1 and 3 presents the best performance in innovation with differentiating nuances among them, while clusters 2 and 4 obtained divergent results to a similar innovative effort. However, the results of SEM analysis for each cluster show that, in all cases, knowledge workers are those that affect innovation performance most, regardless of the level of investment, and that there is a strong correlation between knowledge workers and investment in knowledge generation. The main findings reached is that Spanish high and medium technology companies improve their innovation performance investing in internal knowledge generation measures, specially, in terms of R&D activities, and underinvest in external ones. This, and the strong correlation between knowledge workers and the set of activities that promote the knowledge generation, should be taken into account by managers of companies, when making decisions about their investments for innovation, since they are key for improving their opportunities in the global market.
Keywords: High and medium technology sector, innovation, knowledge management, Spanish companies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2196150 Hierarchical Clustering Algorithms in Data Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.Keywords: Clustering, method, algorithm, hierarchical, survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3376149 Growth of Droplet in Radiation-Induced Plasma of Own Steam
Authors: Pavlo Selyshchev
Abstract:
The theoretical approach is developed to describe the change of drops in the atmosphere of own steam and buffer gas under irradiation. It is shown that the irradiation influences on size of stable droplet and on the conditions under which the droplet exists. Under irradiation the change of drop becomes more complex: the not monotone and periodical change of size of drop becomes possible. All possible solutions are represented by means of phase portrait. It is found all qualitatively different phase portraits as function of critical parameters: rate generation of clusters and substance density.
Keywords: Irradiation, steam, plasma, cluster formation, liquid droplets, evolution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2092148 Achieving High Availability by Implementing Beowulf Cluster
Authors: A.F.A. Abidin, N.S.M. Usop
Abstract:
A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. This paper proposed the way to implement the Beowulf Cluster in order to achieve high performance as well as high availability.Keywords: Beowulf Cluster, grid computing, GridMPI, MPICH.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1677