Search results for: data analyze
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26669

Search results for: data analyze

26639 Association of Social Data as a Tool to Support Government Decision Making

Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias

Abstract:

Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.

Keywords: social data, government decision making, association of social data, data mining

Procedia PDF Downloads 345
26638 An ANN-Based Predictive Model for Diagnosis and Forecasting of Hypertension

Authors: Obe Olumide Olayinka, Victor Balanica, Eugen Neagoe

Abstract:

The effects of hypertension are often lethal thus its early detection and prevention is very important for everybody. In this paper, a neural network (NN) model was developed and trained based on a dataset of hypertension causative parameters in order to forecast the likelihood of occurrence of hypertension in patients. Our research goal was to analyze the potential of the presented NN to predict, for a period of time, the risk of hypertension or the risk of developing this disease for patients that are or not currently hypertensive. The results of the analysis for a given patient can support doctors in taking pro-active measures for averting the occurrence of hypertension such as recommendations regarding the patient behavior in order to lower his hypertension risk. Moreover, the paper envisages a set of three example scenarios in order to determine the age when the patient becomes hypertensive, i.e. determine the threshold for hypertensive age, to analyze what happens if the threshold hypertensive age is set to a certain age and the weight of the patient if being varied, and, to set the ideal weight for the patient and analyze what happens with the threshold of hypertensive age.

Keywords: neural network, hypertension, data set, training set, supervised learning

Procedia PDF Downloads 366
26637 Mining Multicity Urban Data for Sustainable Population Relocation

Authors: Xu Du, Aparna S. Varde

Abstract:

In this research, we propose to conduct diagnostic and predictive analysis about the key factors and consequences of urban population relocation. To achieve this goal, urban simulation models extract the urban development trends as land use change patterns from a variety of data sources. The results are treated as part of urban big data with other information such as population change and economic conditions. Multiple data mining methods are deployed on this data to analyze nonlinear relationships between parameters. The result determines the driving force of population relocation with respect to urban sprawl and urban sustainability and their related parameters. Experiments so far reveal that data mining methods discover useful knowledge from the multicity urban data. This work sets the stage for developing a comprehensive urban simulation model for catering to specific questions by targeted users. It contributes towards achieving sustainability as a whole.

Keywords: data mining, environmental modeling, sustainability, urban planning

Procedia PDF Downloads 273
26636 Realization of a (GIS) for Drilling (DWS) through the Adrar Region

Authors: Djelloul Benatiallah, Ali Benatiallah, Abdelkader Harouz

Abstract:

Geographic Information Systems (GIS) include various methods and computer techniques to model, capture digitally, store, manage, view and analyze. Geographic information systems have the characteristic to appeal to many scientific and technical field, and many methods. In this article we will present a complete and operational geographic information system, following the theoretical principles of data management and adapting to spatial data, especially data concerning the monitoring of drinking water supply wells (DWS) Adrar region. The expected results of this system are firstly an offer consulting standard features, updating and editing beneficiaries and geographical data, on the other hand, provides specific functionality contractors entered data, calculations parameterized and statistics.

Keywords: GIS, DWS, drilling, Adrar

Procedia PDF Downloads 285
26635 Framework for Integrating Big Data and Thick Data: Understanding Customers Better

Authors: Nikita Valluri, Vatcharaporn Esichaikul

Abstract:

With the popularity of data-driven decision making on the rise, this study focuses on providing an alternative outlook towards the process of decision-making. Combining quantitative and qualitative methods rooted in the social sciences, an integrated framework is presented with a focus on delivering a much more robust and efficient approach towards the concept of data-driven decision-making with respect to not only Big data but also 'Thick data', a new form of qualitative data. In support of this, an example from the retail sector has been illustrated where the framework is put into action to yield insights and leverage business intelligence. An interpretive approach to analyze findings from both kinds of quantitative and qualitative data has been used to glean insights. Using traditional Point-of-sale data as well as an understanding of customer psychographics and preferences, techniques of data mining along with qualitative methods (such as grounded theory, ethnomethodology, etc.) are applied. This study’s final goal is to establish the framework as a basis for providing a holistic solution encompassing both the Big and Thick aspects of any business need. The proposed framework is a modified enhancement in lieu of traditional data-driven decision-making approach, which is mainly dependent on quantitative data for decision-making.

Keywords: big data, customer behavior, customer experience, data mining, qualitative methods, quantitative methods, thick data

Procedia PDF Downloads 135
26634 Using TRACE, PARCS, and SNAP Codes to Analyze the Load Rejection Transient of ABWR

Authors: J. R. Wang, H. C. Chang, A. L. Ho, J. H. Yang, S. W. Chen, C. Shih

Abstract:

The purpose of the study is to analyze the load rejection transient of ABWR by using TRACE, PARCS, and SNAP codes. This study has some steps. First, using TRACE, PARCS, and SNAP codes establish the model of ABWR. Second, the key parameters are identified to refine the TRACE/PARCS/SNAP model further in the frame of a steady state analysis. Third, the TRACE/PARCS/SNAP model is used to perform the load rejection transient analysis. Finally, the FSAR data are used to compare with the analysis results. The results of TRACE/PARCS are consistent with the FSAR data for the important parameters. It indicates that the TRACE/PARCS/SNAP model of ABWR has a good accuracy in the load rejection transient.

Keywords: ABWR, TRACE, PARCS, SNAP

Procedia PDF Downloads 174
26633 Data Presentation of Lane-Changing Events Trajectories Using HighD Dataset

Authors: Basma Khelfa, Antoine Tordeux, Ibrahima Ba

Abstract:

We present a descriptive analysis data of lane-changing events in multi-lane roads. The data are provided from The Highway Drone Dataset (HighD), which are microscopic trajectories in highway. This paper describes and analyses the role of the different parameters and their significance. Thanks to HighD data, we aim to find the most frequent reasons that motivate drivers to change lanes. We used the programming language R for the processing of these data. We analyze the involvement and relationship of different variables of each parameter of the ego vehicle and the four vehicles surrounding it, i.e., distance, speed difference, time gap, and acceleration. This was studied according to the class of the vehicle (car or truck), and according to the maneuver it undertook (overtaking or falling back).

Keywords: autonomous driving, physical traffic model, prediction model, statistical learning process

Procedia PDF Downloads 233
26632 Visual Simulation for the Relationship of Urban Fabric

Authors: Ting-Yu Lin, Han-Liang Lin

Abstract:

This article is about the urban form of visualization by Cityengine. City is composed of different domains, and each domain has its own fabric because of arrangement. For example, a neighborhood unit contains fabrics such as schools, street networks, residential and commercial spaces. Therefore, studying urban morphology can help us understand the urban form in planning process. Streets, plots, and buildings seem as urban fabrics, and they configure urban form. Traditionally, urban morphology usually discussed single parameter, which is building type, ignoring other parameters such as streets and plots. However, urban space is three-dimensional, instead of two-dimensional. People perceive urban space by their visualization. Therefore, using visualization can fill the gap between two dimensions and three dimensions. Hence, the study of urban morphology will strengthen the understanding of whole appearance of a city. Cityengine is a software which can edit, analyze and monitor the data and visualize the result for GIS, a common tool to analyze data and display the map for urban plan and urban design. Cityengine can parameterize the data of streets, plots and building types and visualize the result in three-dimensional way. The research will reappear the real urban form by visualizing. We can know whether the urban form can be parameterized and the parameterized result can match the real urban form. Then, visualizing the result by software in three dimension to analyze the rule of urban form. There will be three stages of the research. It will start with a field survey of Tainan East District in Taiwan to conclude the relationships between urban fabrics of street networks, plots and building types. Second, to visualize the relationship, it will turn the relationship into codes which Cityengine can read. Last, Cityengine will automatically display the result by visualizing.

Keywords: Cityengine, urban fabric, urban morphology, visual simulation

Procedia PDF Downloads 270
26631 The Impact of Inflation Rate and Interest Rate on Islamic and Conventional Banking in Afghanistan

Authors: Tareq Nikzad

Abstract:

Since the first bank was established in 1933, Afghanistan's banking sector has seen a number of variations but hasn't been able to grow to its full potential because of the civil war. The implementation of dual banks in Afghanistan is investigated in this study in relation to the effects of inflation and interest rates. This research took data from World Bank Data (WBD) over a period of nineteen years. For the banking sector, inflation, which is the general rise in prices of goods and services over time, presents considerable difficulties. The objectives of this research are to analyze the effect of inflation and interest rates on conventional and Islamic banks in Afghanistan, identify potential differences between these two banking models, and provide insights for policymakers and practitioners. A mixed-methods approach is used in the research to analyze quantitative data and qualitatively examine the unique difficulties that banks in Afghanistan's economic atmosphere encounter. The findings contribute to the understanding of the relationship between interest rate, inflation rate, and the performance of both banking systems in Afghanistan. The paper concludes with recommendations for policymakers and banking institutions to enhance the stability and growth of the banking sector in Afghanistan. Interest is described as "a prefixed rate for use or borrowing of money" from an Islamic perspective. This "prefixed rate," known in Islamic economics as "riba," has been described as "something undesirable." Furthermore, by using the time series regression data technique on the annual data from 2003 to 2021, this research examines the effect of CPI inflation rate and interest rate of Banking in Afghanistan.

Keywords: inflation, Islamic banking, conventional banking, interest, Afghanistan, impact

Procedia PDF Downloads 48
26630 Analyzing the Results of Buildings Energy Audit by Using Grey Set Theory

Authors: Tooraj Karimi, Mohammadreza Sadeghi Moghadam

Abstract:

Grey set theory has the advantage of using fewer data to analyze many factors, and it is therefore more appropriate for system study rather than traditional statistical regression which require massive data, normal distribution in the data and few variant factors. So, in this paper grey clustering and entropy of coefficient vector of grey evaluations are used to analyze energy consumption in buildings of the Oil Ministry in Tehran. In fact, this article intends to analyze the results of energy audit reports and defines most favorable characteristics of system, which is energy consumption of buildings, and most favorable factors affecting these characteristics in order to modify and improve them. According to the results of the model, ‘the real Building Load Coefficient’ has been selected as the most important system characteristic and ‘uncontrolled area of the building’ has been diagnosed as the most favorable factor which has the greatest effect on energy consumption of building. Grey clustering in this study has been used for two purposes: First, all the variables of building relate to energy audit cluster in two main groups of indicators and the number of variables is reduced. Second, grey clustering with variable weights has been used to classify all buildings in three categories named ‘no standard deviation’, ‘low standard deviation’ and ‘non- standard’. Entropy of coefficient vector of Grey evaluations is calculated to investigate greyness of results. It shows that among the 38 buildings surveyed in terms of energy consumption, 3 cases are in standard group, 24 cases are in ‘low standard deviation’ group and 11 buildings are completely non-standard. In addition, clustering greyness of 13 buildings is less than 0.5 and average uncertainly of clustering results is 66%.

Keywords: energy audit, grey set theory, grey incidence matrixes, grey clustering, Iran oil ministry

Procedia PDF Downloads 352
26629 Big Data Analysis with Rhipe

Authors: Byung Ho Jung, Ji Eun Shin, Dong Hoon Lim

Abstract:

Rhipe that integrates R and Hadoop environment made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data. Experimental results for comparing the performance of our Rhipe with stats and biglm packages available on bigmemory, showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases. We also compared the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster. The results showed that fully-distributed mode was faster than pseudo-distributed mode, and computing speeds of fully-distributed mode were faster as the number of data nodes increases.

Keywords: big data, Hadoop, Parallel regression analysis, R, Rhipe

Procedia PDF Downloads 479
26628 Analyze Needs for Training on Academic Procrastination Behavior on Students in Indonesia

Authors: Iman Dwi Almunandar, Nellawaty A. Tewu, Anshari Al Ghaniyy

Abstract:

The emergence of academic procrastination behavior among students in Indonesian, especially the students of Faculty of Psychology at YARSI University becomes a habit to be underestimated, so often interfere with the effectiveness of learning process. The lecturers at the Faculty of Psychology YARSI University have very often warned students to be able to do and collect assignments accordance to predetermined deadline. However, they are still violated it. According to researchers, this problem needs to do a proper training for the solution to minimize academic procrastination behavior on students. In this study, researchers conducted analyze needs for deciding whether need the training or not. Number of sample is 30 respondents which being choose with a simple random sampling. Measurement of academic procrastination behavior is using the theory by McCloskey (2011), there are six dimensions: Psychological Belief about Abilities, Distractions, Social Factor of Procrastination, Time Management, Personal Initiative, Laziness. Methods of analyze needs are using Questioner, Interview, Observations, Focus Group Discussion (FGD), Intelligence Tests. The result of analyze needs shows that psychology students generation of 2015 at the Faculty of Psychology YARSI University need for training on Time Management.

Keywords: procrastination, psychology, analyze needs, behavior

Procedia PDF Downloads 351
26627 Prediction of Marine Ecosystem Changes Based on the Integrated Analysis of Multivariate Data Sets

Authors: Prozorkevitch D., Mishurov A., Sokolov K., Karsakov L., Pestrikova L.

Abstract:

The current body of knowledge about the marine environment and the dynamics of marine ecosystems includes a huge amount of heterogeneous data collected over decades. It generally includes a wide range of hydrological, biological and fishery data. Marine researchers collect these data and analyze how and why the ecosystem changes from past to present. Based on these historical records and linkages between the processes it is possible to predict future changes. Multivariate analysis of trends and their interconnection in the marine ecosystem may be used as an instrument for predicting further ecosystem evolution. A wide range of information about the components of the marine ecosystem for more than 50 years needs to be used to investigate how these arrays can help to predict the future.

Keywords: barents sea ecosystem, abiotic, biotic, data sets, trends, prediction

Procedia PDF Downloads 89
26626 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 428
26625 A Comparative Asessment of Some Algorithms for Modeling and Forecasting Horizontal Displacement of Ialy Dam, Vietnam

Authors: Kien-Trinh Thi Bui, Cuong Manh Nguyen

Abstract:

In order to simulate and reproduce the operational characteristics of a dam visually, it is necessary to capture the displacement at different measurement points and analyze the observed movement data promptly to forecast the dam safety. The accuracy of forecasts is further improved by applying machine learning methods to data analysis progress. In this study, the horizontal displacement monitoring data of the Ialy hydroelectric dam was applied to machine learning algorithms: Gaussian processes, multi-layer perceptron neural networks, and the M5-rules algorithm for modelling and forecasting of horizontal displacement of the Ialy hydropower dam (Vietnam), respectively, for analysing. The database which used in this research was built by collecting time series of data from 2006 to 2021 and divided into two parts: training dataset and validating dataset. The final results show all three algorithms have high performance for both training and model validation, but the MLPs is the best model. The usability of them are further investigated by comparison with a benchmark models created by multi-linear regression. The result show the performance which obtained from all the GP model, the MLPs model and the M5-Rules model are much better, therefore these three models should be used to analyze and predict the horizontal displacement of the dam.

Keywords: Gaussian processes, horizontal displacement, hydropower dam, Ialy dam, M5-Rules, multi-layer perception neural networks

Procedia PDF Downloads 177
26624 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho

Abstract:

Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques

Procedia PDF Downloads 362
26623 Removal of Toxic Ni++ Ions from Wastewater by Nano-Bentonite

Authors: A. M. Ahmed, Mona A. Darwish

Abstract:

Removal of Ni++ ions from aqueous solution by sorption ontoNano-bentonite was investigated. Experiments were carried out as a function amount of Nano-bentonite, pH, concentration of metal, constant time, agitation speed and temperature. The adsorption parameter of metal ions followed the Langmuir Freundlich adsorption isotherm were applied to analyze adsorption data. The adsorption process has fit pseudo-second order kinetic models. Thermodynamics parameters e.g.ΔG*, ΔS °and ΔH ° of adsorption process have also been calculated and the sorption process was found to be endothermic. The adsorption process has fit pseudo-second order kinetic models. Langmuir and Freundich adsorption isotherm models were applied to analyze adsorption data and both were found to be applicable to the adsorption process. Thermodynamic parameters, e.g., ∆G °, ∆S ° and ∆H ° of the on-going adsorption process have also been calculated and the sorption process was found to be endothermic. Finally, it can be seen that Bentonite was found to be more effective for the removal of Ni (II) same with some experimental conditions.

Keywords: waste water, nickel, bentonite, adsorption

Procedia PDF Downloads 230
26622 The Neoliberal Social-Economic Development and Values in the Baltic States

Authors: Daiva Skuciene

Abstract:

The Baltic States turned to free market and capitalism after independency. The new socioeconomic system, democracy and priorities about the welfare of citizens formed. The researches show that Baltic states choose the neoliberal development. Related to this neoliberal path, a few questions arouse: how do people evaluate the results of such policy and socioeconomic development? What are their priorities? And what are the values of the Baltic societies that support neoliberal policy? The purpose of this research – to analyze the socioeconomic context and the priorities and the values of the Baltics societies related to neoliberal regime. The main objectives are: firstly, to analyze the neoliberal socioeconomic features and results; secondly, to analyze people opinions and priorities about the results of neoliberal development; thirdly, to analyze the values of the Baltic societies related to the neoliberal policy. For the implementation of the purpose and objectives, the comparative analyses among European countries are used. The neoliberal regime was defined through two indicators: the taxes on capital income and expenditures on social protection. The socioeconomic outcomes of neoliberal welfare regime are defined through the Gini inequality and at risk of the poverty rate. For this analysis, the data of 2002-2013 of Eurostat were used. For the analyses of opinion about inequality and preferences on society, people want to live in, the preferences for distribution between capital and wages in enterprise data of Eurobarometer in 2010-2014 and the data of representative survey in the Baltic States in 2016 were used. The justice variable was selected as a variable reflecting the evaluation of socioeconomic context and analyzed using data of Eurobarometer 2006-2015. For the analyses of values were selected: solidarity, equality, and individual responsibility. The solidarity, equality was analyzed using data of Eurobarometer 2006-2015. The value “individual responsibility” was examined by opinions about reasons of inequality and poverty. The survey of population in the Baltic States in 2016 and data of Eurobarometer were used for this aim. The data are ranged in descending order for understanding the position of opinion of people in the Baltic States among European countries. The dynamics of indicators is also provided to examine stability of values. The main findings of the research are that people in the Baltics are dissatisfied with the results of the neoliberal socioeconomic development, they have priorities for equality and justice, but they have internalized the main neoliberal narrative- individual responsibility. The impact of socioeconomic context on values is huge, resulting in a change in quite stable opinions and values during the period of the financial crisis.

Keywords: neoliberal, inequality and poverty, solidarity, individual responsibility

Procedia PDF Downloads 232
26621 Ethics Can Enable Open Source Data Research

Authors: Dragana Calic

Abstract:

The openness, availability and the sheer volume of big data have provided, what some regard as, an invaluable and rich dataset. Researchers, businesses, advertising agencies, medical institutions, to name only a few, collect, share, and analyze this data to enable their processes and decision making. However, there are important ethical considerations associated with the use of big data. The rapidly evolving nature of online technologies has overtaken the many legislative, privacy, and ethical frameworks and principles that exist. For example, should we obtain consent to use people’s online data, and under what circumstances can privacy considerations be overridden? Current guidance on how to appropriately and ethically handle big data is inconsistent. Consequently, this paper focuses on two quite distinct but related ethical considerations that are at the core of the use of big data for research purposes. They include empowering the producers of data and empowering researchers who want to study big data. The first consideration focuses on informed consent which is at the core of empowering producers of data. In this paper, we discuss some of the complexities associated with informed consent and consider studies of producers’ perceptions to inform research ethics guidelines and practice. The second consideration focuses on the researcher. Similarly, we explore studies that focus on researchers’ perceptions and experiences.

Keywords: big data, ethics, producers’ perceptions, researchers’ perceptions

Procedia PDF Downloads 265
26620 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: big data, social networks, sentiment analysis, twitter

Procedia PDF Downloads 544
26619 Cross Project Software Fault Prediction at Design Phase

Authors: Pradeep Singh, Shrish Verma

Abstract:

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. The earlier we predict the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven data sets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.

Keywords: software metrics, fault prediction, cross project, within project.

Procedia PDF Downloads 314
26618 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 121
26617 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process

Procedia PDF Downloads 378
26616 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 254
26615 Analysis and Rule Extraction of Coronary Artery Disease Data Using Data Mining

Authors: Rezaei Hachesu Peyman, Oliyaee Azadeh, Salahzadeh Zahra, Alizadeh Somayyeh, Safaei Naser

Abstract:

Coronary Artery Disease (CAD) is one major cause of disability in adults and one main cause of death in developed. In this study, data mining techniques including Decision Trees, Artificial neural networks (ANNs), and Support Vector Machine (SVM) analyze CAD data. Data of 4948 patients who had suffered from heart diseases were included in the analysis. CAD is the target variable, and 24 inputs or predictor variables are used for the classification. The performance of these techniques is compared in terms of sensitivity, specificity, and accuracy. The most significant factor influencing CAD is chest pain. Elderly males (age > 53) have a high probability to be diagnosed with CAD. SVM algorithm is the most useful way for evaluation and prediction of CAD patients as compared to non-CAD ones. Application of data mining techniques in analyzing coronary artery diseases is a good method for investigating the existing relationships between variables.

Keywords: classification, coronary artery disease, data-mining, knowledge discovery, extract

Procedia PDF Downloads 633
26614 The Marketing Strategies of Five-Star Rated Herbal Businesses of One Tambon One Product (OTOP) Entrepreneurs in Songkhla Province, Thailand

Authors: S. Lungtae, C. Noknoi

Abstract:

The main purpose of this research is to analyze the marketing strategies of the various five-star rated herbal businesses of One Tambon One Product (OTOP) entrepreneurs in Songkhla province, Thailand. This includes the targeting, positioning and marketing mix in order to develop marketing strategies for OTOP entrepreneurs. The data were collected from the presidents of herbal-product enterprises in Songkhla province. The products of all these enterprises were selected as five-star herbal products for the OTOP project in 2012. In-depth interviews were conducted, and content analysis was used to analyze the data. The research found that the community enterprises should 1) increase the range of product sizes offered, 2) increase their distribution channels, 3) publicize more to inform consumers about their identities and products, 4) undertake promotional activities during the festival, and 5) choose salespeople who are knowledgeable about the features of their products.

Keywords: marketing mix, market positioning, marketing strategies, target market.

Procedia PDF Downloads 268
26613 Analysis of Creative City Indicators in Isfahan City, Iran

Authors: Reza Mokhtari Malek Abadi, Mohsen Saghaei, Fatemeh Iman

Abstract:

This paper investigates the indices of a creative city in Isfahan. Its main aim is to evaluate quantitative status of the creative city indices in Isfahan city, analyze the dispersion and distribution of these indices in Isfahan city. Concerning these, this study tries to analyze the creative city indices in fifteen area of Isfahan through secondary data, questionnaire, TOPSIS model, Shannon entropy and SPSS. Based on this, the fifteen areas of Isfahan city have been ranked with 12 factors of creative city indices. The results of studies show that fifteen areas of Isfahan city are not equally benefiting from creative indices and there is much difference between the areas of Isfahan city.

Keywords: grading, creative city, creative city evaluation indicators, regional planning model

Procedia PDF Downloads 438
26612 Impact of Stack Caches: Locality Awareness and Cost Effectiveness

Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang

Abstract:

Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.

Keywords: hit rate, locality of program, stack cache, stack data

Procedia PDF Downloads 280
26611 A Study of Variables Affecting on a Quality Assessment of Mathematics Subject in Thailand by Using Value Added Analysis on TIMSS 2011

Authors: Ruangdech Sirikit

Abstract:

The purposes of this research were to study the variables affecting the quality assessment of mathematics subject in Thailand by using value-added analysis on TIMSS 2011. The data used in this research is the secondary data from the 2011 Trends in International Mathematics and Science Study (TIMSS), collected from 6,124 students in 172 schools from Thailand, studying only mathematics subjects. The data were based on 14 assessment tests of knowledge in mathematics. There were 3 steps of data analysis: 1) To analyze descriptive statistics 2) To estimate competency of students from the assessment of their mathematics proficiency by using MULTILOG program; 3) analyze value added in the model of quality assessment using Value-Added Model with Hierarchical Linear Modeling (HLM) and 2 levels of analysis. The research results were as follows: 1. Student level variables that had significant effects on the competency of students at .01 levels were Parental care, Resources at home, Enjoyment of learning mathematics and Extrinsic motivation in learning mathematics. Variable that had significant effects on the competency of students at .05 levels were Education of parents and self-confident in learning mathematics. 2. School level variable that had significant effects on competency of students at .01 levels was Extra large school. Variable that had significant effects on competency of students at .05 levels was medium school.

Keywords: quality assessment, value-added model, TIMSS, mathematics, Thailand

Procedia PDF Downloads 260
26610 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis

Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales

Abstract:

This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.

Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis

Procedia PDF Downloads 167