Search results for: Spatial temporal data mining
7823 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining
Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato
Abstract:
Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.Keywords: Data mining, data science, trajectory, animal behavior.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9197822 Spatial Time Series Models for Rice and Cassava Yields Based On Bayesian Linear Mixed Models
Authors: Panudet Saengseedam, Nanthachai Kantanantha
Abstract:
This paper proposes a linear mixed model (LMM) with spatial effects to forecast rice and cassava yields in Thailand at the same time. A multivariate conditional autoregressive (MCAR) model is assumed to present the spatial effects. A Bayesian method is used for parameter estimation via Gibbs sampling Markov Chain Monte Carlo (MCMC). The model is applied to the rice and cassava yields monthly data which have been extracted from the Office of Agricultural Economics, Ministry of Agriculture and Cooperatives of Thailand. The results show that the proposed model has better performance in most provinces in both fitting part and validation part compared to the simple exponential smoothing and conditional auto regressive models (CAR) from our previous study.
Keywords: Bayesian method, Linear mixed model, Multivariate conditional autoregressive model, Spatial time series.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22477821 Finding Fuzzy Association Rules Using FWFP-Growth with Linguistic Supports and Confidences
Authors: Chien-Hua Wang, Chin-Tzong Pang
Abstract:
In data mining, the association rules are used to search for the relations of items of the transactions database. Following the data is collected and stored, it can find rules of value through association rules, and assist manager to proceed marketing strategy and plan market framework. In this paper, we attempt fuzzy partition methods and decide membership function of quantitative values of each transaction item. Also, by managers we can reflect the importance of items as linguistic terms, which are transformed as fuzzy sets of weights. Next, fuzzy weighted frequent pattern growth (FWFP-Growth) is used to complete the process of data mining. The method above is expected to improve Apriori algorithm for its better efficiency of the whole association rules. An example is given to clearly illustrate the proposed approach.Keywords: Association Rule, Fuzzy Partition Methods, FWFP-Growth, Apiroir algorithm
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16527820 An Application for Web Mining Systems with Services Oriented Architecture
Authors: Thiago M. R. Dias, Gray F. Moita, Paulo E. M. Almeida
Abstract:
Although the World Wide Web is considered the largest source of information there exists nowadays, due to its inherent dynamic characteristics, the task of finding useful and qualified information can become a very frustrating experience. This study presents a research on the information mining systems in the Web; and proposes an implementation of these systems by means of components that can be built using the technology of Web services. This implies that they can encompass features offered by a services oriented architecture (SOA) and specific components may be used by other tools, independent of platforms or programming languages. Hence, the main objective of this work is to provide an architecture to Web mining systems, divided into stages, where each step is a component that will incorporate the characteristics of SOA. The separation of these steps was designed based upon the existing literature. Interesting results were obtained and are shown here.Keywords: Web Mining, Service Oriented Architecture, WebServices.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14727819 Online Forums Hotspot Detection and Analysis Using Aging Theory
Authors: K. Nirmala Devi, V. Murali Bhaskaran
Abstract:
The exponential growth of social media arouses much attention on public opinion information. The online forums, blogs, micro blogs are proving to be extremely valuable resources and are having bulk volume of information. However, most of the social media data is unstructured and semi structured form. So that it is more difficult to decipher automatically. Therefore, it is very much essential to understand and analyze those data for making a right decision. The online forums hotspot detection is a promising research field in the web mining and it guides to motivate the user to take right decision in right time. The proposed system consist of a novel approach to detect a hotspot forum for any given time period. It uses aging theory to find the hot terms and E-K-means for detecting the hotspot forum. Experimental results demonstrate that the proposed approach outperforms k-means for detecting the hotspot forums with the improved accuracy.
Keywords: Hotspot forums, Micro blog, Blog, Sentiment Analysis, Opinion Mining, Social media, Twitter, Web mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21857818 An Innovation of Travel Information Gathering Framework
Authors: Pairaya J., Buddhagarn R., Sukree S., Punthumadee K.
Abstract:
Application of Information Technology (IT) has revolutionized the functioning of business all over the world. Its impact has been felt mostly among the information of dependent industries. Tourism is one of such industry. The conceptual framework in this study represents an innovation of travel information searching system on mobile devices which is used as tools to deliver travel information (such as hotels, restaurants, tourist attractions and souvenir shops) for each user by travelers segmentation based on data mining technique to segment the tourists- behavior patterns then match them with tourism products and services. This system innovation is designed to be a knowledge incremental learning. It is a marketing strategy to support business to respond traveler-s demand effectively.Keywords: Tourism, Innovation, Information Searching, Data Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18707817 Human Behavior Modeling in Video Surveillance of Conference Halls
Authors: Nour Charara, Hussein Charara, Omar Abou Khaled, Hani Abdallah, Elena Mugellini
Abstract:
In this paper, we present a human behavior modeling approach in videos scenes. This approach is used to model the normal behaviors in the conference halls. We exploited the Probabilistic Latent Semantic Analysis technique (PLSA), using the 'Bag-of-Terms' paradigm, as a tool for exploring video data to learn the model by grouping similar activities. Our term vocabulary consists of 3D spatio-temporal patch groups assigned by the direction of motion. Our video representation ensures the spatial information, the object trajectory, and the motion. The main importance of this approach is that it can be adapted to detect abnormal behaviors in order to ensure and enhance human security.Keywords: Activity modeling, clustering, PLSA, video representation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8427816 A Novel Approach to Improve Users Search Goal in Web Usage Mining
Authors: R. Lokeshkumar, P. Sengottuvelan
Abstract:
Web mining is to discover and extract useful Information. Different users may have different search goals when they search by giving queries and submitting it to a search engine. The inference and analysis of user search goals can be very useful for providing an experience result for a user search query. In this project, we propose a novel approach to infer user search goals by analyzing search web logs. First, we propose a novel approach to infer user search goals by analyzing search engine query logs, the feedback sessions are constructed from user click-through logs and it efficiently reflect the information needed for users. Second we propose a preprocessing technique to clean the unnecessary data’s from web log file (feedback session). Third we propose a technique to generate pseudo-documents to representation of feedback sessions for clustering. Finally we implement k-medoids clustering algorithm to discover different user search goals and to provide a more optimal result for a search query based on feedback sessions for the user.Keywords: Data Preprocessing, Session Identification, Web log mining, Web Personalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20237815 The Relationship between Spatial Planning and Transportation Planning in Southern Africa and its Consequences for Human Settlement
Authors: David Dewar
Abstract:
The paper reviews the relationship between spatial and transportation planning in the Southern African Development Community (SADC) region of Sub-Saharan Africa. It argues that most urbanisation in the region has largely occurred subsequent to the 1950s and, accordingly, urban development has been profoundly and negatively affected by the (misguided) spatial and institutional tenets of modernism. It demonstrates how a considerable amount of the poor performance of these settlements can be directly attributed to this. Two factors in particular about the planning systems are emphasized: the way in which programmatic land-use planning lies at the heart of both spatial and transportation planning; and the way on which transportation and spatial planning have been separated into independent processes. In the final section, the paper identifies ways of improving the planning system. Firstly, it identifies the performance qualities which Southern African settlements should be seeking to achieve. Secondly, it focuses on two necessary arenas of change: the need to replace programmatic land-use planning practices with structuralspatial approaches; and it makes a case for making urban corridors a spatial focus of integrated planning, as a way of beginning the restructuring and intensification of settlements which are currently characterised by sprawl, fragmentation and separationKeywords: Corridors, modernism, programmatic planning, structural-spatial planning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23627814 Object-Oriented Cognitive-Spatial Complexity Measures
Authors: Varun Gupta, Jitender Kumar Chhabra
Abstract:
Software maintenance and mainly software comprehension pose the largest costs in the software lifecycle. In order to assess the cost of software comprehension, various complexity measures have been proposed in the literature. This paper proposes new cognitive-spatial complexity measures, which combine the impact of spatial as well as architectural aspect of the software to compute the software complexity. The spatial aspect of the software complexity is taken into account using the lexical distances (in number of lines of code) between different program elements and the architectural aspect of the software complexity is taken into consideration using the cognitive weights of control structures present in control flow of the program. The proposed measures are evaluated using standard axiomatic frameworks and then, the proposed measures are compared with the corresponding existing cognitive complexity measures as well as the spatial complexity measures for object-oriented software. This study establishes that the proposed measures are better indicators of the cognitive effort required for software comprehension than the other existing complexity measures for object-oriented software.Keywords: cognitive complexity, software comprehension, software metrics, spatial complexity, Object-oriented software
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21457813 Acute Coronary Syndrome Prediction Using Data Mining Techniques- An Application
Authors: Tahseen A. Jilani, Huda Yasin, Madiha Yasin, C. Ardil
Abstract:
In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.
Keywords: Acute coronary syndrome (ACS), binary logistic regression analyses, myocardial ischemia (MI), principle component analysis, unstable angina (U.A.).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21147812 A Multivariate Statistical Approach for Water Quality Assessment of River Hindon, India
Authors: Nida Rizvi, Deeksha Katyal, Varun Joshi
Abstract:
River Hindon is an important river catering the demand of highly populated rural and industrial cluster of western Uttar Pradesh, India. Water quality of river Hindon is deteriorating at an alarming rate due to various industrial, municipal and agricultural activities. The present study aimed at identifying the pollution sources and quantifying the degree to which these sources are responsible for the deteriorating water quality of the river. Various water quality parameters, like pH, temperature, electrical conductivity, total dissolved solids, total hardness, calcium, chloride, nitrate, sulphate, biological oxygen demand, chemical oxygen demand, and total alkalinity were assessed. Water quality data obtained from eight study sites for one year has been subjected to the two multivariate techniques, namely, principal component analysis and cluster analysis. Principal component analysis was applied with the aim to find out spatial variability and to identify the sources responsible for the water quality of the river. Three Varifactors were obtained after varimax rotation of initial principal components using principal component analysis. Cluster analysis was carried out to classify sampling stations of certain similarity, which grouped eight different sites into two clusters. The study reveals that the anthropogenic influence (municipal, industrial, waste water and agricultural runoff) was the major source of river water pollution. Thus, this study illustrates the utility of multivariate statistical techniques for analysis and elucidation of multifaceted data sets, recognition of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.Keywords: Cluster analysis, multivariate statistical technique, river Hindon, water Quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38167811 Data Mining Determination of Sunlight Average Input for Solar Power Plant
Authors: Fl. Loury, P. Sablonière, C. Lamoureux, G. Magnier, Th. Gutierrez
Abstract:
A method is proposed to extract faithful representative patterns from data set of observations when they are suffering from non-negligible fluctuations. Supposing time interval between measurements to be extremely small compared to observation time, it consists in defining first a subset of intermediate time intervals characterizing coherent behavior. Data projection on these intervals gives a set of curves out of which an ideally “perfect” one is constructed by taking the sup limit of them. Then comparison with average real curve in corresponding interval gives an efficiency parameter expressing the degradation consecutive to fluctuation effect. The method is applied to sunlight data collected in a specific place, where ideal sunlight is the one resulting from direct exposure at location latitude over the year, and efficiency is resulting from action of meteorological parameters, mainly cloudiness, at different periods of the year. The extracted information already gives interesting element of decision, before being used for analysis of plant control.
Keywords: Base Input Reconstruction, Data Mining, Efficiency Factor, Information Pattern Operator.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15287810 Gender Based Variability Time Series Complexity Analysis
Authors: Ramesh K. Sunkaria, Puneeta Marwaha
Abstract:
Non linear methods of heart rate variability (HRV) analysis are becoming more popular. It has been observed that complexity measures quantify the regularity and uncertainty of cardiovascular RR-interval time series. In the present work, SampEn has been evaluated in healthy normal sinus rhythm (NSR) male and female subjects for different data lengths and tolerance level r. It is demonstrated that SampEn is small for higher values of tolerance r. Also SampEn value of healthy female group is higher than that of healthy male group for short data length and with increase in data length both groups overlap each other and it is difficult to distinguish them. The SampEn gives inaccurate results by assigning higher value to female group, because male subject have more complex HRV pattern than that of female subjects. Therefore, this traditional algorithm exhibits higher complexity for healthy female subjects than for healthy male subjects, which is misleading observation. This may be due to the fact that SampEn do not account for multiple time scales inherent in the physiologic time series and the hidden spatial and temporal fluctuations remains unexplored.
Keywords: Heart rate variability, normal sinus rhythm group, RR interval time series, sample entropy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17677809 Spatial Structure and Spatial Impacts of the Jakarta Metropolitan Area: A Southeast Asian EMR Perspective
Authors: Ikhwan Hakim, Bruno Parolin
Abstract:
This paper investigates the spatial structure of employment in the Jakarta Metropolitan Area (JMA), with reference to the concept of the Southeast Asian extended metropolitan region (EMR). A combination of factor analysis and local Getis-Ord (Gi*) hot-spot analysis is used to identify clusters of employment in the region, including those of the urban and agriculture sectors. Spatial statistical analysis is further used to probe the spatial association of identified employment clusters with their surroundings on several dimensions, including the spatial association between the central business district (CBD) in Jakarta city on employment density in the region, the spatial impacts of urban expansion on population growth and the degree of urban-rural interaction. The degree of spatial interaction for the whole JMA is measured by the patterns of commuting trips destined to the various employment clusters. Results reveal the strong role of the urban core of Jakarta, and the regional CBD, as the centre for mixed job sectors such as retail, wholesale, services and finance. Manufacturing and local government services, on the other hand, form corridors radiating out of the urban core, reaching out to the agriculture zones in the fringes. Strong associations between the urban expansion corridors and population growth, and urban-rural mix, are revealed particularly in the eastern and western parts of JMA. Metropolitan wide commuting patterns are focussed on the urban core of Jakarta and the CBD, while relatively local commuting patterns are shown to be prevalent for the employment corridors.
Keywords: Jakarta Metropolitan Area, Southeast Asian EMR, spatial association, spatial statistics, spatial structure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25967808 A Novel Method Based on Monte Carlo for Simulation of Variable Resolution X-ray CT Scanner: Measurement of System Presampling MTF
Authors: H. Arabi, A.R. Kamali Asl
Abstract:
The purpose of this work is measurement of the system presampling MTF of a variable resolution x-ray (VRX) CT scanner. In this paper, we used the parameters of an actual VRX CT scanner for simulation and study of effect of different focal spot sizes on system presampling MTF by Monte Carlo method (GATE simulation software). Focal spot size of 0.6 mm limited the spatial resolution of the system to 5.5 cy/mm at incident angles of below 17º for cell#1. By focal spot size of 0.3 mm the spatial resolution increased up to 11 cy/mm and the limiting effect of focal spot size appeared at incident angles of below 9º. The focal spot size of 0.3 mm could improve the spatial resolution to some extent but because of magnification non-uniformity, there is a 10 cy/mm difference between spatial resolution of cell#1 and cell#256. The focal spot size of 0.1 mm acted as an ideal point source for this system. The spatial resolution increased to more than 35 cy/mm and at all incident angles the spatial resolution was a function of incident angle. By the way focal spot size of 0.1 mm minimized the effect of magnification nonuniformity.Keywords: Focal spot, Spatial resolution, Monte Carlosimulation, Variable resolution x-ray (VRX) CT.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15297807 Impact of Landuse Change on Surface Temperature in Ibadan, Nigeria
Authors: Abegunde Linda, Adedeji Oluwatola
Abstract:
It has become an increasing evident that large development influences the climate. There are concerns that rising temperature over developed areas could have negative impact and increase living discomfort within city boundaries. Temperature trends in Ibadan city have received little attention, yet the area has experienced heavy urban expansion between 1972 and 2014. This research aims at examining the impact of landuse change on surface temperature knowing that the built-up environment absorb and store solar energy, resulting into the Urban Heat Island (UHI) effect. The Landsat imagery was used to examine the landuse change for a period of 42 years (1972-2014). Land Surface Temperature (LST) was obtained by converting the thermal band to a surface temperature map and zonal statistic analyses was used to examine the relationship between landuse and temperature emission. The results showed that the settlement area increased to a large extent while the area covered by vegetation reduced during the study period. The spatial and temporal trends of surface temperature are related to the gradual change in urban landuse/landcover and the settlement area has the highest emission. This research provides useful insight into the temporal behavior of the Ibadan city.
Keywords: Landuse, LST, Remote sensing, UHI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27657806 Methodology of the Turkey’s National Geographic Information System Integration Project
Authors: Buse A. Ataç, Doğan K. Cenan, Arda Çetinkaya, Naz D. Şahin, Köksal Sanlı, Zeynep Koç, Akın Kısa
Abstract:
With its spatial data reliability, interpretation and questioning capabilities, Geographical Information Systems make significant contributions to scientists, planners and practitioners. Geographic information systems have received great attention in today's digital world, growing rapidly, and increasing the efficiency of use. Access to and use of current and accurate geographical data, which are the most important components of the Geographical Information System, has become a necessity rather than a need for sustainable and economic development. This project aims to enable sharing of data collected by public institutions and organizations on a web-based platform. Within the scope of the project, INSPIRE (Infrastructure for Spatial Information in the European Community) data specifications are considered as a road-map. In this context, Turkey's National Geographic Information System (TUCBS) Integration Project supports sharing spatial data within 61 pilot public institutions as complied with defined national standards. In this paper, which is prepared by the project team members in the TUCBS Integration Project, the technical process with a detailed methodology is explained. In this context, the main technical processes of the Project consist of Geographic Data Analysis, Geographic Data Harmonization (Standardization), Web Service Creation (WMS, WFS) and Metadata Creation-Publication. In this paper, the integration process carried out to provide the data produced by 61 institutions to be shared from the National Geographic Data Portal (GEOPORTAL), have been trying to be conveyed with a detailed methodology.
Keywords: Data specification, geoportal, GIS, INSPIRE, TUCBS, Turkey’s National Geographic Information System.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6967805 Using the Combined Model of PROMETHEE and Fuzzy Analytic Network Process for Determining Question Weights in Scientific Exams through Data Mining Approach
Authors: Hassan Haleh, Amin Ghaffari, Parisa Farahpour
Abstract:
Need for an appropriate system of evaluating students- educational developments is a key problem to achieve the predefined educational goals. Intensity of the related papers in the last years; that tries to proof or disproof the necessity and adequacy of the students assessment; is the corroborator of this matter. Some of these studies tried to increase the precision of determining question weights in scientific examinations. But in all of them there has been an attempt to adjust the initial question weights while the accuracy and precision of those initial question weights are still under question. Thus In order to increase the precision of the assessment process of students- educational development, the present study tries to propose a new method for determining the initial question weights by considering the factors of questions like: difficulty, importance and complexity; and implementing a combined method of PROMETHEE and fuzzy analytic network process using a data mining approach to improve the model-s inputs. The result of the implemented case study proves the development of performance and precision of the proposed model.Keywords: Assessing students, Analytic network process, Clustering, Data mining, Fuzzy sets, Multi-criteria decision making, and Preference function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15827804 Estimation of Forest Fire Emission in Thailand by Using Remote Sensing Information
Authors: A. Junpen, S. Garivait, S. Bonnet, A. Pongpullponsak
Abstract:
The forest fires in Thailand are annual occurrence which is the cause of air pollutions. This study intended to estimate the emission from forest fire during 2005-2009 using MODerateresolution Imaging Spectro-radiometer (MODIS) sensor aboard the Terra and Aqua satellites, experimental data, and statistical data. The forest fire emission is estimated using equation established by Seiler and Crutzen in 1982. The spatial and temporal variation of forest fire emission is analyzed and displayed in the form of grid density map. From the satellite data analysis suggested between 2005 and 2009, the number of fire hotspots occurred 86,877 fire hotspots with a significant highest (more than 80% of fire hotspots) in the deciduous forest. The peak period of the forest fire is in January to May. The estimation on the emissions from forest fires during 2005 to 2009 indicated that the amount of CO, CO2, CH4, and N2O was about 3,133,845 tons, 47,610.337 tons, 204,905 tons, and 6,027 tons, respectively, or about 6,171,264 tons of CO2eq. They also emitted 256,132 tons of PM10. The year 2007 was found to be the year when the emissions were the largest. Annually, March is the period that has the maximum amount of forest fire emissions. The areas with high density of forest fire emission were the forests situated in the northern, the western, and the upper northeastern parts of the country.
Keywords: Emissions, Forest fire, Remote sensing information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21947803 BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis
Authors: Mohamed A. Mahfouz, M. A. Ismail
Abstract:
Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.Keywords: Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19647802 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7797801 Grocery Customer Behavior Analysis using RFID-based Shopping Paths Data
Authors: In-Chul Jung, Young S. Kwon
Abstract:
Knowing about the customer behavior in a grocery has been a long-standing issue in the retailing industry. The advent of RFID has made it easier to collect moving data for an individual shopper's behavior. Most of the previous studies used the traditional statistical clustering technique to find the major characteristics of customer behavior, especially shopping path. However, in using the clustering technique, due to various spatial constraints in the store, standard clustering methods are not feasible because moving data such as the shopping path should be adjusted in advance of the analysis, which is time-consuming and causes data distortion. To alleviate this problem, we propose a new approach to spatial pattern clustering based on the longest common subsequence. Experimental results using real data obtained from a grocery confirm the good performance of the proposed method in finding the hot spot, dead spot and major path patterns of customer movements.Keywords: customer path, shopping behavior, exploratoryanalysis, LCS, RFID
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31497800 Data Hiding in Images in Discrete Wavelet Domain Using PMM
Authors: Souvik Bhattacharyya, Gautam Sanyal
Abstract:
Over last two decades, due to hostilities of environment over the internet the concerns about confidentiality of information have increased at phenomenal rate. Therefore to safeguard the information from attacks, number of data/information hiding methods have evolved mostly in spatial and transformation domain.In spatial domain data hiding techniques,the information is embedded directly on the image plane itself. In transform domain data hiding techniques the image is first changed from spatial domain to some other domain and then the secret information is embedded so that the secret information remains more secure from any attack. Information hiding algorithms in time domain or spatial domain have high capacity and relatively lower robustness. In contrast, the algorithms in transform domain, such as DCT, DWT have certain robustness against some multimedia processing.In this work the authors propose a novel steganographic method for hiding information in the transform domain of the gray scale image.The proposed approach works by converting the gray level image in transform domain using discrete integer wavelet technique through lifting scheme.This approach performs a 2-D lifting wavelet decomposition through Haar lifted wavelet of the cover image and computes the approximation coefficients matrix CA and detail coefficients matrices CH, CV, and CD.Next step is to apply the PMM technique in those coefficients to form the stego image. The aim of this paper is to propose a high-capacity image steganography technique that uses pixel mapping method in integer wavelet domain with acceptable levels of imperceptibility and distortion in the cover image and high level of overall security. This solution is independent of the nature of the data to be hidden and produces a stego image with minimum degradation.Keywords: Cover Image, Pixel Mapping Method (PMM), StegoImage, Integer Wavelet Tranform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28547799 Melodic and Temporal Structure of Indonesian Sentences of Sitcom "International Class" Actors: Prosodic Study with Experimental Phonetics Approach
Authors: Tri Sulistyaningtyas, Yani Suryani, Dana Waskita, Linda Handayani Sukaemi, Ferry Fauzi Hermawan
Abstract:
The enthusiasm of foreigners studying the Indonesian language by Foreign Speakers (BIPA) was documented in a sitcom "International Class". Tone and stress when they speak the Indonesian language is unique and different from Indonesian pronunciation. By using the Praat program, this research aims to describe prosodic Indonesian language which is spoken by ‘International Class” actors consisting of Abbas from Nigeria, Lee from Korea, and Kotaro from Japan. Data for the research are taken from the video sitcom "International Class" that aired on Indonesian television. The results of this study revealed that pitch movement that arises when pronouncing Indonesian sentences was up and down gradually, there is also a rise and fall sharply. In terms of stress, respondents tend to contain a lot of stress when pronouncing Indonesian sentences. Meanwhile, in terms of temporal structure, the duration pronouncing Indonesian sentences tends to be longer than that of Indonesian speakers.Keywords: Melodic structure, temporal structure, prosody, experimental phonetics, international class.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9597798 A Recommender System Fusing Collaborative Filtering and User’s Review Mining
Authors: Seulbi Choi, Hyunchul Ahn
Abstract:
Collaborative filtering (CF) algorithm has been popularly used for recommender systems in both academic and practical applications. It basically generates recommendation results using users’ numeric ratings. However, the additional use of the information other than user ratings may lead to better accuracy of CF. Considering that a lot of people are likely to share their honest opinion on the items they purchased recently due to the advent of the Web 2.0, user's review can be regarded as the new informative source for identifying user's preference with accuracy. Under this background, this study presents a hybrid recommender system that fuses CF and user's review mining. Our system adopts conventional memory-based CF, but it is designed to use both user’s numeric ratings and his/her text reviews on the items when calculating similarities between users.Keywords: Recommender system, collaborative filtering, text mining, review mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15877797 Segmentation of Breast Lesions in Ultrasound Images Using Spatial Fuzzy Clustering and Structure Tensors
Authors: Yan Xu, Toshihiro Nishimura
Abstract:
Segmentation in ultrasound images is challenging due to the interference from speckle noise and fuzziness of boundaries. In this paper, a segmentation scheme using fuzzy c-means (FCM) clustering incorporating both intensity and texture information of images is proposed to extract breast lesions in ultrasound images. Firstly, the nonlinear structure tensor, which can facilitate to refine the edges detected by intensity, is used to extract speckle texture. And then, a spatial FCM clustering is applied on the image feature space for segmentation. In the experiments with simulated and clinical ultrasound images, the spatial FCM clustering with both intensity and texture information gets more accurate results than the conventional FCM or spatial FCM without texture information.
Keywords: fuzzy c-means, spatial information, structure tensor, ultrasound image segmentation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18047796 A Research on the Coordinated Development of Chengdu-Chongqing Economic Circle Under the Background of New Urbanization
Authors: Deng Tingting
Abstract:
The coordinated and integrated development of regions is an inevitable requirement for China to move towards high-quality sustainable development. As one of the regions with the best economic foundation and the strongest economic strength in the western China, it is a typical area with national importance and strong network connection characteristics in terms of the comprehensive effect of linking the inland hinterland and connecting the western and national urban networks. The integrated development of the Chengdu-Chongqing economic circle is of great strategic significance for the rapid and high-quality development of the western region. In the context of new urbanization, this paper takes 16 urban units within the economic circle as the research object, based on the 5-year panel data of population, regional economy and spatial construction and development from 2016 to 2020, using the entropy method and Theil index to analyze the three target layers, and cause analysis. The research shows that there are temporal and spatial differences in the Chengdu-Chongqing economic circle, and there are significant differences between the core city and the surrounding cities. Therefore, by reforming and innovating the regional coordinated development mechanism, breaking administrative barriers, and strengthening the "polar nucleus" radiation function to release the driving force for economic development, especially in the gully areas of economic development belts, will not only promote the coordinated development of internal regions, but also promote the coordinated and sustainable development of the western region and toward a high-quality development path.
Keywords: Chengdu-Chongqing economic circle, new urbanization, coordinated regional development, Theil Index.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2037795 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering
Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel
Abstract:
Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15007794 Real-time Network Anomaly Detection Systems Based on Machine-Learning Algorithms
Authors: Zahra Ramezanpanah, Joachim Carvallo, Aurelien Rodriguez
Abstract:
This paper aims to detect anomalies in streaming data using machine learning algorithms. In this regard, we designed two separate pipelines and evaluated the effectiveness of each separately. The first pipeline, based on supervised machine learning methods, consists of two phases. In the first phase, we trained several supervised models using the UNSW-NB15 data set. We measured the efficiency of each using different performance metrics and selected the best model for the second phase. At the beginning of the second phase, we first, using Argus Server, sniffed a local area network. Several types of attacks were simulated and then sent the sniffed data to a running algorithm at short intervals. This algorithm can display the results of each packet of received data in real-time using the trained model. The second pipeline presented in this paper is based on unsupervised algorithms, in which a Temporal Graph Network (TGN) is used to monitor a local network. The TGN is trained to predict the probability of future states of the network based on its past behavior. Our contribution in this section is introducing an indicator to identify anomalies from these predicted probabilities.
Keywords: Cyber-security, Intrusion Detection Systems, Temporal Graph Network, Anomaly Detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 505