Search results for: transactional data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 40640

Search results for: transactional data analysis

40610 A Data Envelopment Analysis Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most of Data Envelopment Analysis models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp Data Envelopment Analysis into Data Envelopment Analysis with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the Data Envelopment Analysis model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units' efficiency. Finally, the developed Data Envelopment Analysis model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, Data Envelopment Analysis, fuzzy, higher education, input, output

Procedia PDF Downloads 18
40609 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 822
40608 Enhancing Financial Security: Real-Time Anomaly Detection in Financial Transactions Using Machine Learning

Authors: Ali Kazemi

Abstract:

The digital evolution of financial services, while offering unprecedented convenience and accessibility, has also escalated the vulnerabilities to fraudulent activities. In this study, we introduce a distinct approach to real-time anomaly detection in financial transactions, aiming to fortify the defenses of banking and financial institutions against such threats. Utilizing unsupervised machine learning algorithms, specifically autoencoders and isolation forests, our research focuses on identifying irregular patterns indicative of fraud within transactional data, thus enabling immediate action to prevent financial loss. The data we used in this study included the monetary value of each transaction. This is a crucial feature as fraudulent transactions may have distributions of different amounts than legitimate ones, such as timestamps indicating when transactions occurred. Analyzing transactions' temporal patterns can reveal anomalies (e.g., unusual activity in the middle of the night). Also, the sector or category of the merchant where the transaction occurred, such as retail, groceries, online services, etc. Specific categories may be more prone to fraud. Moreover, the type of payment used (e.g., credit, debit, online payment systems). Different payment methods have varying risk levels associated with fraud. This dataset, anonymized to ensure privacy, reflects a wide array of transactions typical of a global banking institution, ranging from small-scale retail purchases to large wire transfers, embodying the diverse nature of potentially fraudulent activities. By engineering features that capture the essence of transactions, including normalized amounts and encoded categorical variables, we tailor our data to enhance model sensitivity to anomalies. The autoencoder model leverages its reconstruction error mechanism to flag transactions that deviate significantly from the learned normal pattern, while the isolation forest identifies anomalies based on their susceptibility to isolation from the dataset's majority. Our experimental results, validated through techniques such as k-fold cross-validation, are evaluated using precision, recall, and the F1 score alongside the area under the receiver operating characteristic (ROC) curve. Our models achieved an F1 score of 0.85 and a ROC AUC of 0.93, indicating high accuracy in detecting fraudulent transactions without excessive false positives. This study contributes to the academic discourse on financial fraud detection and provides a practical framework for banking institutions seeking to implement real-time anomaly detection systems. By demonstrating the effectiveness of unsupervised learning techniques in a real-world context, our research offers a pathway to significantly reduce the incidence of financial fraud, thereby enhancing the security and trustworthiness of digital financial services.

Keywords: anomaly detection, financial fraud, machine learning, autoencoders, isolation forest, transactional data analysis

Procedia PDF Downloads 24
40607 Delivering User Context-Sensitive Service in M-Commerce: An Empirical Assessment of the Impact of Urgency on Mobile Service Design for Transactional Apps

Authors: Daniela Stephanie Kuenstle

Abstract:

Complex industries such as banking or insurance experience slow growth in mobile sales. While today’s mobile applications are sophisticated and enable location based and personalized services, consumers prefer online or even face-to-face services to complete complex transactions. A possible reason for this reluctance is that the provided service within transactional mobile applications (apps) does not adequately correspond to users’ needs. Therefore, this paper examines the impact of the user context on mobile service (m-service) in m-commerce. Motivated by the potential which context-sensitive m-services hold for the future, the impact of temporal variations as a dimension of user context, on m-service design is examined. In particular, the research question asks: Does consumer urgency function as a determinant of m-service composition in transactional apps by moderating the relation between m-service type and m-service success? Thus, the aim is to explore the moderating influence of urgency on m-service types, which includes Technology Mediated Service and Technology Generated Service. While mobile applications generally comprise features of both service types, this thesis discusses whether unexpected urgency changes customer preferences for m-service types and how this consequently impacts the overall m-service success, represented by purchase intention, loyalty intention and service quality. An online experiment with a random sample of N=1311 participants was conducted. Participants were divided into four treatment groups varying in m-service types and urgency level. They were exposed to two different urgency scenarios (high/ low) and two different app versions conveying either technology mediated or technology generated service. Subsequently, participants completed a questionnaire to measure the effectiveness of the manipulation as well as the dependent variables. The research model was tested for direct and moderating effects of m-service type and urgency on m-service success. Three two-way analyses of variance confirmed the significance of main effects, but demonstrated no significant moderation of urgency on m-service types. The analysis of the gathered data did not confirm a moderating effect of urgency between m-service type and service success. Yet, the findings propose an additive effects model with the highest purchase and loyalty intention for Technology Generated Service and high urgency, while Technology Mediated Service and low urgency demonstrate the strongest effect for service quality. The results also indicate an antagonistic relation between service quality and purchase intention depending on the level of urgency. Although a confirmation of the significance of this finding is required, it suggests that only service convenience, as one dimension of mobile service quality, delivers conditional value under high urgency. This suggests a curvilinear pattern of service quality in e-commerce. Overall, the paper illustrates the complex interplay of technology, user variables, and service design. With this, it contributes to a finer-grained understanding of the relation between m-service design and situation dependency. Moreover, the importance of delivering situational value with apps depending on user context is emphasized. Finally, the present study raises the demand to continue researching the impact of situational variables on m-service design in order to develop more sophisticated m-services.

Keywords: mobile consumer behavior, mobile service design, mobile service success, self-service technology, situation dependency, user-context sensitivity

Procedia PDF Downloads 249
40606 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 353
40605 A Functional Analysis of a Political Leader in Terms of Marketing

Authors: Aşina Gülerarslan, M. Faik Özdengül

Abstract:

The new economic, social and political world order has led to the emergence of a wide range of persuasion strategies and practices based on an ever expanding marketing axis that involves organizations, ideas and persons as well as products and services. It is seen that since the 1990's, a wide variety of competitive marketing ideas have been offered systematically to target audiences in the field of politics as in other fields. When the components of marketing are taken into consideration, all kinds of communication efforts involving “political leaders”, who are conceptualized as products in terms of political marketing, serve a process of social persuasion, which cannot be restricted to election periods only, and a manageable “image”. In this context, image, which is concerned with how the political product is perceived, involves not only the political discourses shared with the public but also all kinds of biographical information about the leader, the leader’s specific way of living and routines and his/her attitudes and behaviors in their private lives, and all these are regarded as components of the “product image”. While on the one hand the leader’s verbal or supra-verbal references serve the way the “spirit of the product” is perceived –just as in brand positioning- they also show their self-esteem levels, in other words how they perceive themselves on the other hand. Indeed, their self-esteem levels are evaluated in three fundamental categories in the “Functional Analysis”, namely parent, child and adult, and it is revealed that the words, tone of voice and body language a person uses makes it easy to understand at what self-esteem level that person is. In this context, words, tone of voice and body language, which provide important clues as to the “self” of the person, are also an indication of how political leaders evaluate both “themselves” and “the mass/audience” in the communication they establish with their audiences. When the matter is taken from the perspective of Turkey, the levels of self-esteem in the relationships that the political leaders establish with the masses are also important in revealing how our society is seen from the perspective of a specific leader. Since the leader is a part of the marketing strategy of a political party as a product, this evaluation is significant in terms of the forms of relationships between political institutions in our country with the society. In this study, the self-esteem level in the documentary entitled “Master’s Story”, where Recep Tayyip Erdoğan’s life history is told, is analyzed in the context of words, tone of voice and body language. Within the scope of the study, at what level of self-esteem Recep Tayyip Erdoğan was in the “Master’s Story”, a documentary broadcast on Beyaz TV, was investigated using the content analysis method. First, based on the Functional Analysis Literature, a transactional approach scale was created regarding parent, adult and child self-esteem levels. On the basis of this scale, the prime minister’s self-esteem level was determined in three basic groups, namely “tone of voice”, “the words he used” and “body language”. Descriptive analyses were made to the data within the framework of these criteria and at what self-esteem level the prime minister spoke throughout the documentary was revealed.

Keywords: political marketing, leader image, level of self-esteem, transactional approach

Procedia PDF Downloads 312
40604 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 348
40603 Employee Commitment as a Means of Revitalising the Hospitality Industry post-Covid: Considering the Impact of Psychological Contract and Psychological Capital

Authors: Desere Kokt

Abstract:

Hospitality establishments worldwide are bearing the brunt of the effects of Covid-19. As the hospitality industry is looking to recover, emphasis is placed on rejuvenating the industry. This is especially pertinent for economic development in areas of high unemployment, such as the Free State province of South Africa. The province is not a main tourist area and thus depends on the influx of tourists. The province has great scenic beauty with many accommodation establishments that provide job opportunities to the local population. The two main economic hubs of the Free State province namely Bloemfontein and Clarens, were the focus of the investigation. The emphasis was on graded accommodation establishments as they must adhere to the quality principles of the Tourism Grading Council of South Africa (TGCSA) to obtain star grading. The hospitality industry is known for being labour intensive, and employees need to be available to cater for the needs of paying customers. This is referred to as ‘emotional labour’ and implies that employees need to manage their feelings and emotions as part of performing their jobs. The focus of this study was thus on psychological factors related to working in the hospitality industry – specifically psychological contract and psychological capital and its impact on the commitment of employees in graded accommodation establishments. Employee commitment can be explained as a psychological state that binds the individual to the organisation and involves a set of psychological relationships that include affective (emotions), normative (perceived obligation) and continuance (staying with the organisation) dimensions. Psychological contract refers to the reciprocal beliefs and expectations between the employer and the employee and consists of transactional and rational contracts. Transactional contracts are associated with the economic exchange, and contractional issues related to the employment contract and rational contracts relate to the social exchange between the employee and the organisation. Psychological capital refers to an individual’s positive psychology state of development that is characterised by self-efficiency (having confidence in doing one’s job), optimism (being positive and persevering towards achieving one’s goals), hope (expectations for goals to succeed) and resilience (bouncing back to attain success when beset by problems and adversity). The study employed a quantitative research approach, and a structured questionnaire was used to gather data from respondents. The study was conducted during the Covid-19 pandemic, which hampered the data gathering efforts of the researchers. Many accommodation establishments were either closed or temporarily closed, which meant that data gathering was an intensive and laborious process. The main researcher travelled to the various establishments to collect the data. Nine hospitality establishments participated in the study, and around 150 employees were targeted for data collection. Ninety-two (92) questionnaires were completed, which represents a response rate of 61%. Data were analysed using descriptive and inferential statistics, and partial least squares structural equation modelling (PLS-SEM) was applied to examine the relationship between the variables.

Keywords: employee commitment, hospitality industry, psychological contract, psychological capital

Procedia PDF Downloads 78
40602 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm

Procedia PDF Downloads 119
40601 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 411
40600 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: text mining, topic extraction, independent, incremental, independent component analysis

Procedia PDF Downloads 281
40599 A Review of Spatial Analysis as a Geographic Information Management Tool

Authors: Chidiebere C. Agoha, Armstong C. Awuzie, Chukwuebuka N. Onwubuariri, Joy O. Njoku

Abstract:

Spatial analysis is a field of study that utilizes geographic or spatial information to understand and analyze patterns, relationships, and trends in data. It is characterized by the use of geographic or spatial information, which allows for the analysis of data in the context of its location and surroundings. It is different from non-spatial or aspatial techniques, which do not consider the geographic context and may not provide as complete of an understanding of the data. Spatial analysis is applied in a variety of fields, which includes urban planning, environmental science, geosciences, epidemiology, marketing, to gain insights and make decisions about complex spatial problems. This review paper explores definitions of spatial analysis from various sources, including examples of its application and different analysis techniques such as Buffer analysis, interpolation, and Kernel density analysis (multi-distance spatial cluster analysis). It also contrasts spatial analysis with non-spatial analysis.

Keywords: aspatial technique, buffer analysis, epidemiology, interpolation

Procedia PDF Downloads 282
40598 Architectural Experience of the Everyday in Bangkok CBD

Authors: Thirayu Jumsai Na Ayudhya

Abstract:

The attempt to understand about what architecture means to people as they go about their everyday life revealed that knowledge such as environmental psychology, environmental perception, environmental aesthetics, inadequately address the contextualized and holistic theoretical framework. In my previous research, it was found that people’s making senses of their everyday architecture can be addressed in terms of four super‐ordinate themes; (1) building in urban (text), (2) building in (text), (3) building in human (text), (4) and building in time (text). In this research, Bangkok CBD was selected as the focal urban context that the integrated style of architecture is noticeable. It is expected that in a unique urban context like Bangkok CBD unprecedented super-ordinate themes will be unveiled through the reflection of people’s everyday experiences. In this research, people’s architectural experience conducted in Bangkok CBD, Thailand, will be presented succinctly. The research addresses the question of how do people make sense of their everyday architecture/buildings especially in a unique urban context, Bangkok CBD, and identifies ways in which people make sense of their everyday architecture. Two key methodologies are adopted. First, Participant-Produced-Photograph (PPP) allows people to express their experiences of the everyday urban context freely without any interference or forced-data generating by researchers. Second, Interpretative Phenomenological Analysis (IPA) are also applied as main methodologies. With IPA methodology, a small pool of participants is considered giving the detailed level of analysis and its potential to produce a meaningful outcome.

Keywords: architectural experience, building appreciation, design psychology, environmental psychology, sense-making, the everyday experience, transactional theory

Procedia PDF Downloads 301
40597 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis

Authors: N. R. N. Idris, S. Baharom

Abstract:

A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates. On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.

Keywords: aggregate data, combined-level data, individual patient data, meta-analysis

Procedia PDF Downloads 351
40596 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 415
40595 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop

Authors: Anuta Mukherjee, Saswati Mukherjee

Abstract:

Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.

Keywords: sentiment analysis, twitter, collision theory, discourse analysis

Procedia PDF Downloads 505
40594 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 118
40593 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 348
40592 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills

Authors: Kyle De Freitas, Margaret Bernard

Abstract:

Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.

Keywords: educational data mining, learning management system, learning analytics, EDM framework

Procedia PDF Downloads 297
40591 Quantile Coherence Analysis: Application to Precipitation Data

Authors: Yaeji Lim, Hee-Seok Oh

Abstract:

The coherence analysis measures the linear time-invariant relationship between two data sets and has been studied various fields such as signal processing, engineering, and medical science. However classical coherence analysis tends to be sensitive to outliers and focuses only on mean relationship. In this paper, we generalized cross periodogram to quantile cross periodogram and provide richer inter-relationship between two data sets. This is a general version of Laplace cross periodogram. We prove its asymptotic distribution under the long range process and compare them with ordinary coherence through numerical examples. We also present real data example to confirm the usefulness of quantile coherence analysis.

Keywords: coherence, cross periodogram, spectrum, quantile

Procedia PDF Downloads 364
40590 Modeling and Statistical Analysis of a Soap Production Mix in Bejoy Manufacturing Industry, Anambra State, Nigeria

Authors: Okolie Chukwulozie Paul, Iwenofu Chinwe Onyedika, Sinebe Jude Ebieladoh, M. C. Nwosu

Abstract:

The research work is based on the statistical analysis of the processing data. The essence is to analyze the data statistically and to generate a design model for the production mix of soap manufacturing products in Bejoy manufacturing company Nkpologwu, Aguata Local Government Area, Anambra state, Nigeria. The statistical analysis shows the statistical analysis and the correlation of the data. T test, Partial correlation and bi-variate correlation were used to understand what the data portrays. The design model developed was used to model the data production yield and the correlation of the variables show that the R2 is 98.7%. However, the results confirm that the data is fit for further analysis and modeling. This was proved by the correlation and the R-squared.

Keywords: General Linear Model, correlation, variables, pearson, significance, T-test, soap, production mix and statistic

Procedia PDF Downloads 413
40589 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis

Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales

Abstract:

This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.

Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis

Procedia PDF Downloads 167
40588 A Method of Detecting the Difference in Two States of Brain Using Statistical Analysis of EEG Raw Data

Authors: Digvijaysingh S. Bana, Kiran R. Trivedi

Abstract:

This paper introduces various methods for the alpha wave to detect the difference between two states of brain. One healthy subject participated in the experiment. EEG was measured on the forehead above the eye (FP1 Position) with reference and ground electrode are on the ear clip. The data samples are obtained in the form of EEG raw data. The time duration of reading is of one minute. Various test are being performed on the alpha band EEG raw data.The readings are performed in different time duration of the entire day. The statistical analysis is being carried out on the EEG sample data in the form of various tests.

Keywords: electroencephalogram(EEG), biometrics, authentication, EEG raw data

Procedia PDF Downloads 440
40587 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 57
40586 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 84
40585 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014

Authors: Alexiou Dimitra, Fragkaki Maria

Abstract:

The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.

Keywords: Multiple Factorial Correspondence Analysis, Principal Component Analysis, Factor Analysis, E.U.-28 countries, Statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu Statistics

Procedia PDF Downloads 486
40584 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: big data, social networks, sentiment analysis, twitter

Procedia PDF Downloads 541
40583 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research

Authors: Carla Silva

Abstract:

Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.

Keywords: data mining, research analysis, investment decision-making, educational research

Procedia PDF Downloads 331
40582 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 517
40581 Benchmarking Machine Learning Approaches for Forecasting Hotel Revenue

Authors: Rachel Y. Zhang, Christopher K. Anderson

Abstract:

A critical aspect of revenue management is a firm’s ability to predict demand as a function of price. Historically hotels have used simple time series models (regression and/or pick-up based models) owing to the complexities of trying to build casual models of demands. Machine learning approaches are slowly attracting attention owing to their flexibility in modeling relationships. This study provides an overview of approaches to forecasting hospitality demand – focusing on the opportunities created by machine learning approaches, including K-Nearest-Neighbors, Support vector machine, Regression Tree, and Artificial Neural Network algorithms. The out-of-sample performances of above approaches to forecasting hotel demand are illustrated by using a proprietary sample of the market level (24 properties) transactional data for Las Vegas NV. Causal predictive models can be built and evaluated owing to the availability of market level (versus firm level) data. This research also compares and contrast model accuracy of firm-level models (i.e. predictive models for hotel A only using hotel A’s data) to models using market level data (prices, review scores, location, chain scale, etc… for all hotels within the market). The prospected models will be valuable for hotel revenue prediction given the basic characters of a hotel property or can be applied in performance evaluation for an existed hotel. The findings will unveil the features that play key roles in a hotel’s revenue performance, which would have considerable potential usefulness in both revenue prediction and evaluation.

Keywords: hotel revenue, k-nearest-neighbors, machine learning, neural network, prediction model, regression tree, support vector machine

Procedia PDF Downloads 108