Search results for: data mining analytics
25444 Generating Real-Time Visual Summaries from Located Sensor-Based Data with Chorems
Authors: Z. Bouattou, R. Laurini, H. Belbachir
Abstract:
This paper describes a new approach for the automatic generation of the visual summaries dealing with cartographic visualization methods and sensors real time data modeling. Hence, the concept of chorems seems an interesting candidate to visualize real time geographic database summaries. Chorems have been defined by Roger Brunet (1980) as schematized visual representations of territories. However, the time information is not yet handled in existing chorematic map approaches, issue has been discussed in this paper. Our approach is based on spatial analysis by interpolating the values recorded at the same time, by sensors available, so we have a number of distributed observations on study areas and used spatial interpolation methods to find the concentration fields, from these fields and by using some spatial data mining procedures on the fly, it is possible to extract important patterns as geographic rules. Then, those patterns are visualized as chorems.Keywords: geovisualization, spatial analytics, real-time, geographic data streams, sensors, chorems
Procedia PDF Downloads 40025443 A Hybrid Data Mining Algorithm Based System for Intelligent Defence Mission Readiness and Maintenance Scheduling
Authors: Shivam Dwivedi, Sumit Prakash Gupta, Durga Toshniwal
Abstract:
It is a challenging task in today’s date to keep defence forces in the highest state of combat readiness with budgetary constraints. A huge amount of time and money is squandered in the unnecessary and expensive traditional maintenance activities. To overcome this limitation Defence Intelligent Mission Readiness and Maintenance Scheduling System has been proposed, which ameliorates the maintenance system by diagnosing the condition and predicting the maintenance requirements. Based on new data mining algorithms, this system intelligently optimises mission readiness for imminent operations and maintenance scheduling in repair echelons. With modified data mining algorithms such as Weighted Feature Ranking Genetic Algorithm and SVM-Random Forest Linear ensemble, it improves the reliability, availability and safety, alongside reducing maintenance cost and Equipment Out of Action (EOA) time. The results clearly conclude that the introduced algorithms have an edge over the conventional data mining algorithms. The system utilizing the intelligent condition-based maintenance approach improves the operational and maintenance decision strategy of the defence force.Keywords: condition based maintenance, data mining, defence maintenance, ensemble, genetic algorithms, maintenance scheduling, mission capability
Procedia PDF Downloads 29725442 Data Mining Spatial: Unsupervised Classification of Geographic Data
Authors: Chahrazed Zouaoui
Abstract:
In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.Keywords: mining, GIS, geo-clustering, neighborhood
Procedia PDF Downloads 37525441 Extending Smart City Infrastructure to Cover Natural Disasters
Authors: Nina Dasari, Satvik Dasari
Abstract:
Smart city solutions are being developed across the globe to transform urban areas. However, the infrastructure enablement for alerting natural disasters such as floods and wildfires is deficient. This paper discusses an innovative device that could be used as part of the smart city initiative to detect and provide alerts in case of floods at road crossings and wildfires. An Internet of Things (IoT) smart city node was designed, tested, and deployed with collaboration from the City of Austin. The end to end solution includes a 3G enabled IoT device, flood and fire sensors, cloud, a mobile app, and IoT analytics. The real-time data was collected and analyzed using IoT analytics to refine the solution for the past year. The results demonstrate that the proposed solution is reliable and provides accurate results. This low-cost solution is viable, and it can replace the current solution which costs tens of thousands of dollars.Keywords: analytics, internet of things, natural disasters, smart city
Procedia PDF Downloads 22425440 IoT and Advanced Analytics Integration in Biogas Modelling
Authors: Rakesh Choudhary, Ajay Kumar, Deepak Sharma
Abstract:
The main goal of this paper is to investigate the challenges and benefits of IoT integration in biogas production. This overview explains how the inclusion of IoT can enhance biogas production efficiency. Therefore, such collected data can be explored by advanced analytics, including Artificial intelligence (AI) and Machine Learning (ML) algorithms, consequently improving bio-energy processes. To boost biogas generation efficiency, this report examines the use of IoT devices for real-time data collection on key parameters, e.g., pH, temperature, gas composition, and microbial growth. Real-time monitoring through big data has made it possible to detect diverse, complex trends in the process of producing biogas. The Informed by advanced analytics can also help in improving bio-energy production as well as optimizing operational conditions. Moreover, IoT allows remote observation, control and management, which decreases manual intervention needed whilst increasing process effectiveness. Such a paradigm shift in the incorporation of IoT technologies into biogas production systems helps to achieve higher productivity levels as well as more practical biomass quality biomethane through real-time monitoring-based proactive decision-making, thus driving continuous performance improvement.Keywords: internet of things, biogas, renewable energy, sustainability, anaerobic digestion, real-time monitoring, optimization
Procedia PDF Downloads 2025439 Iot Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework
Authors: Femi Elegbeleye, Omobayo Esan, Muienge Mbodila, Patrick Bowe
Abstract:
This paper focused on cost effective storage architecture using fog and cloud data storage gateway and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. The several results obtained from this study on data privacy model shows that when two or more data privacy model is combined we tend to have a more stronger privacy to our data, and when fog storage gateway have several advantages over using the traditional cloud storage, from our result shows fog has reduced latency/delay, low bandwidth consumption, and energy usage when been compare with cloud storage, therefore, fog storage will help to lessen excessive cost. This paper dwelt more on the system descriptions, the researchers focused on the research design and framework design for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, its structure, and its interrelationships.Keywords: IoT, fog, cloud, data analysis, data privacy
Procedia PDF Downloads 9925438 Mining User-Generated Contents to Detect Service Failures with Topic Model
Authors: Kyung Bae Park, Sung Ho Ha
Abstract:
Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.Keywords: latent dirichlet allocation, R program, text mining, topic model, user generated contents, visualization
Procedia PDF Downloads 18725437 An Optimized Association Rule Mining Algorithm
Authors: Archana Singh, Jyoti Agarwal, Ajay Rana
Abstract:
Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph
Procedia PDF Downloads 42025436 Mining Educational Data to Support Students’ Major Selection
Authors: Kunyanuth Kularbphettong, Cholticha Tongsiri
Abstract:
This paper aims to create the model for student in choosing an emphasized track of student majoring in computer science at Suan Sunandha Rajabhat University. The objective of this research is to develop the suggested system using data mining technique to analyze knowledge and conduct decision rules. Such relationships can be used to demonstrate the reasonableness of student choosing a track as well as to support his/her decision and the system is verified by experts in the field. The sampling is from student of computer science based on the system and the questionnaire to see the satisfaction. The system result is found to be satisfactory by both experts and student as well.Keywords: data mining technique, the decision support system, knowledge and decision rules, education
Procedia PDF Downloads 42325435 Predicting Medical Check-Up Patient Re-Coming Using Sequential Pattern Mining and Association Rules
Authors: Rizka Aisha Rahmi Hariadi, Chao Ou-Yang, Han-Cheng Wang, Rajesri Govindaraju
Abstract:
As the increasing of medical check-up popularity, there are a huge number of medical check-up data stored in database and have not been useful. These data actually can be very useful for future strategic planning if we mine it correctly. In other side, a lot of patients come with unpredictable coming and also limited available facilities make medical check-up service offered by hospital not maximal. To solve that problem, this study used those medical check-up data to predict patient re-coming. Sequential pattern mining (SPM) and association rules method were chosen because these methods are suitable for predicting patient re-coming using sequential data. First, based on patient personal information the data was grouped into … groups then discriminant analysis was done to check significant of the grouping. Second, for each group some frequent patterns were generated using SPM method. Third, based on frequent patterns of each group, pairs of variable can be extracted using association rules to get general pattern of re-coming patient. Last, discussion and conclusion was done to give some implications of the results.Keywords: patient re-coming, medical check-up, health examination, data mining, sequential pattern mining, association rules, discriminant analysis
Procedia PDF Downloads 64025434 Exploring the Intersection of Accounting, Business, and Economics: Bridging Theory and Practice for Sustainable Growth
Authors: Stephen Acheampong Amoafoh
Abstract:
In today's dynamic economic landscape, businesses face multifaceted challenges that demand strategic foresight and informed decision-making. This abstract explores the pivotal role of financial analytics in driving business performance amidst evolving market conditions. By integrating accounting principles with economic insights, organizations can harness the power of data-driven strategies to optimize resource allocation, mitigate risks, and capitalize on emerging opportunities. This presentation will delve into the practical applications of financial analytics across various sectors, highlighting case studies and empirical evidence to underscore its efficacy in enhancing operational efficiency and fostering sustainable growth. From predictive modeling to performance benchmarking, attendees will gain invaluable insights into leveraging advanced analytics tools to drive profitability, streamline processes, and adapt to changing market dynamics. Moreover, this abstract will address the ethical considerations inherent in financial analytics, emphasizing the importance of transparency, integrity, and accountability in data-driven decision-making. By fostering a culture of ethical conduct and responsible stewardship, organizations can build trust with stakeholders and safeguard their long-term viability in an increasingly interconnected global economy. Ultimately, this abstract aims to stimulate dialogue and collaboration among scholars, practitioners, and policymakers, fostering knowledge exchange and innovation in the realms of accounting, business, and economics. Through interdisciplinary insights and actionable recommendations, participants will be equipped to navigate the complexities of today's business environment and seize opportunities for sustainable success.Keywords: financial analytics, business performance, data-driven strategies, sustainable growth
Procedia PDF Downloads 5325433 Modelling of Powered Roof Supports Work
Authors: Marcin Michalak
Abstract:
Due to the increasing efforts on saving our natural environment a change in the structure of energy resources can be observed - an increasing fraction of a renewable energy sources. In many countries traditional underground coal mining loses its significance but there are still countries, like Poland or Germany, in which the coal based technologies have the greatest fraction in a total energy production. This necessitates to make an effort to limit the costs and negative effects of underground coal mining. The longwall complex is as essential part of the underground coal mining. The safety and the effectiveness of the work is strongly dependent of the diagnostic state of powered roof supports. The building of a useful and reliable diagnostic system requires a lot of data. As the acquisition of a data of any possible operating conditions it is important to have a possibility to generate a demanded artificial working characteristics. In this paper a new approach of modelling a leg pressure in the single unit of powered roof support. The model is a result of the analysis of a typical working cycles.Keywords: machine modelling, underground mining, coal mining, structure
Procedia PDF Downloads 36825432 Exclusive Value Adding by iCenter Analytics on Transient Condition
Authors: Zhu Weimin, Allegorico Carmine, Ruggiero Gionata
Abstract:
During decades of Baker Hughes (BH) iCenter experience, it is demonstrated that in addition to conventional insights on equipment steady operation conditions, insights on transient conditions can add significant and exclusive value for anomaly detection, downtime saving, and predictive maintenance. Our work shows examples from the BH iCenter experience to introduce the advantages and features of using transient condition analytics: (i) Operation under critical engine conditions: e.g., high level or high change rate of temperature, pressure, flow, vibration, etc., that would not be reachable in normal operation, (ii) Management of dedicated sub-systems or components, many of which are often bottlenecks for reliability and maintenance, (iii) Indirect detection of anomalies in the absence of instrumentation, (iv) Repetitive sequences: if data is properly processed, the engineering features of transients provide not only anomaly detection but also problem characterization and prognostic indicators for predictive maintenance, (v) Engine variables accounting for fatigue analysis. iCenter has been developing and deploying a series of analytics based on transient conditions. They are contributing to exclusive value adding in the following areas: (i) Reliability improvement, (ii) Startup reliability improvement, (iii) Predictive maintenance, (iv) Repair/overhaul cost down. Illustrative examples for each of the above areas are presented in our study, focusing on challenges and adopted techniques ranging from purely statistical approaches to the implementation of machine learning algorithms. The obtained results demonstrate how the value is obtained using transient condition analytics in the BH iCenter experience.Keywords: analytics, diagnostics, monitoring, turbomachinery
Procedia PDF Downloads 7425431 Evaluating the Total Costs of a Ransomware-Resilient Architecture for Healthcare Systems
Authors: Sreejith Gopinath, Aspen Olmsted
Abstract:
This paper is based on our previous work that proposed a risk-transference-based architecture for healthcare systems to store sensitive data outside the system boundary, rendering the system unattractive to would-be bad actors. This architecture also allows a compromised system to be abandoned and a new system instance spun up in place to ensure business continuity without paying a ransom or engaging with a bad actor. This paper delves into the details of various attacks we simulated against the prototype system. In the paper, we discuss at length the time and computational costs associated with storing and retrieving data in the prototype system, abandoning a compromised system, and setting up a new instance with existing data. Lastly, we simulate some analytical workloads over the data stored in our specialized data storage system and discuss the time and computational costs associated with running analytics over data in a specialized storage system outside the system boundary. In summary, this paper discusses the total costs of data storage, access, and analytics incurred with the proposed architecture.Keywords: cybersecurity, healthcare, ransomware, resilience, risk transference
Procedia PDF Downloads 13225430 Distributed Perceptually Important Point Identification for Time Series Data Mining
Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung
Abstract:
In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining
Procedia PDF Downloads 43325429 Discovering User Behaviour Patterns from Web Log Analysis to Enhance the Accessibility and Usability of Website
Authors: Harpreet Singh
Abstract:
Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.Keywords: web usage mining, web mining, log file, data mining, deep log analyzer
Procedia PDF Downloads 24825428 A Theoretical Model for Pattern Extraction in Large Datasets
Authors: Muhammad Usman
Abstract:
Pattern extraction has been done in past to extract hidden and interesting patterns from large datasets. Recently, advancements are being made in these techniques by providing the ability of multi-level mining, effective dimension reduction, advanced evaluation and visualization support. This paper focuses on reviewing the current techniques in literature on the basis of these parameters. Literature review suggests that most of the techniques which provide multi-level mining and dimension reduction, do not handle mixed-type data during the process. Patterns are not extracted using advanced algorithms for large datasets. Moreover, the evaluation of patterns is not done using advanced measures which are suited for high-dimensional data. Techniques which provide visualization support are unable to handle a large number of rules in a small space. We present a theoretical model to handle these issues. The implementation of the model is beyond the scope of this paper.Keywords: association rule mining, data mining, data warehouses, visualization of association rules
Procedia PDF Downloads 22325427 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising
Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri
Abstract:
Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing
Procedia PDF Downloads 58925426 Emotion Mining and Attribute Selection for Actionable Recommendations to Improve Customer Satisfaction
Authors: Jaishree Ranganathan, Poonam Rajurkar, Angelina A. Tzacheva, Zbigniew W. Ras
Abstract:
In today’s world, business often depends on the customer feedback and reviews. Sentiment analysis helps identify and extract information about the sentiment or emotion of the of the topic or document. Attribute selection is a challenging problem, especially with large datasets in actionable pattern mining algorithms. Action Rule Mining is one of the methods to discover actionable patterns from data. Action Rules are rules that help describe specific actions to be made in the form of conditions that help achieve the desired outcome. The rules help to change from any undesirable or negative state to a more desirable or positive state. In this paper, we present a Lexicon based weighted scheme approach to identify emotions from customer feedback data in the area of manufacturing business. Also, we use Rough sets and explore the attribute selection method for large scale datasets. Then we apply Actionable pattern mining to extract possible emotion change recommendations. This kind of recommendations help business analyst to improve their customer service which leads to customer satisfaction and increase sales revenue.Keywords: actionable pattern discovery, attribute selection, business data, data mining, emotion
Procedia PDF Downloads 19925425 Human-Centred Data Analysis Method for Future Design of Residential Spaces: Coliving Case Study
Authors: Alicia Regodon Puyalto, Alfonso Garcia-Santos
Abstract:
This article presents a method to analyze the use of indoor spaces based on data analytics obtained from inbuilt digital devices. The study uses the data generated by the in-place devices, such as smart locks, Wi-Fi routers, and electrical sensors, to gain additional insights on space occupancy, user behaviour, and comfort. Those devices, originally installed to facilitate remote operations, report data through the internet that the research uses to analyze information on human real-time use of spaces. Using an in-place Internet of Things (IoT) network enables a faster, more affordable, seamless, and scalable solution to analyze building interior spaces without incorporating external data collection systems such as sensors. The methodology is applied to a real case study of coliving, a residential building of 3000m², 7 floors, and 80 users in the centre of Madrid. The case study applies the method to classify IoT devices, assess, clean, and analyze collected data based on the analysis framework. The information is collected remotely, through the different platforms devices' platforms; the first step is to curate the data, understand what insights can be provided from each device according to the objectives of the study, this generates an analysis framework to be escalated for future building assessment even beyond the residential sector. The method will adjust the parameters to be analyzed tailored to the dataset available in the IoT of each building. The research demonstrates how human-centered data analytics can improve the future spatial design of indoor spaces.Keywords: in-place devices, IoT, human-centred data-analytics, spatial design
Procedia PDF Downloads 19725424 Estimation of Service Quality and Its Impact on Market Share Using Business Analytics
Authors: Haritha Saranga
Abstract:
Service quality has become an important driver of competition in manufacturing industries of late, as many products are being sold in conjunction with service offerings. With increase in computational power and data capture capabilities, it has become possible to analyze and estimate various aspects of service quality at the granular level and determine their impact on business performance. In the current study context, dealer level, model-wise warranty data from one of the top two-wheeler manufacturers in India is used to estimate service quality of individual dealers and its impact on warranty related costs and sales performance. We collected primary data on warranty costs, number of complaints, monthly sales, type of quality upgrades, etc. from the two-wheeler automaker. In addition, we gathered secondary data on various regions in India, such as petrol and diesel prices, geographic and climatic conditions of various regions where the dealers are located, to control for customer usage patterns. We analyze this primary and secondary data with the help of a variety of analytics tools such as Auto-Regressive Integrated Moving Average (ARIMA), Seasonal ARIMA and ARIMAX. Study results, after controlling for a variety of factors, such as size, age, region of the dealership, and customer usage pattern, show that service quality does influence sales of the products in a significant manner. A more nuanced analysis reveals the dynamics between product quality and service quality, and how their interaction affects sales performance in the Indian two-wheeler industry context. We also provide various managerial insights using descriptive analytics and build a model that can provide sales projections using a variety of forecasting techniques.Keywords: service quality, product quality, automobile industry, business analytics, auto-regressive integrated moving average
Procedia PDF Downloads 12025423 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic
Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam
Abstract:
In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.Keywords: decision support system, data mining, knowledge discovery, data discovery, fuzzy logic
Procedia PDF Downloads 33525422 A Web Service-Based Framework for Mining E-Learning Data
Authors: Felermino D. M. A. Ali, S. C. Ng
Abstract:
E-learning is an evolutionary form of distance learning and has become better over time as new technologies emerged. Today, efforts are still being made to embrace E-learning systems with emerging technologies in order to make them better. Among these advancements, Educational Data Mining (EDM) is one that is gaining a huge and increasing popularity due to its wide application for improving the teaching-learning process in online practices. However, even though EDM promises to bring many benefits to educational industry in general and E-learning environments in particular, its principal drawback is the lack of easy to use tools. The current EDM tools usually require users to have some additional technical expertise to effectively perform EDM tasks. Thus, in response to these limitations, this study intends to design and implement an EDM application framework which aims at automating and simplify the development of EDM in E-learning environment. The application framework introduces a Service-Oriented Architecture (SOA) that hides the complexity of technical details and enables users to perform EDM in an automated fashion. The framework was designed based on abstraction, extensibility, and interoperability principles. The framework implementation was made up of three major modules. The first module provides an abstraction for data gathering, which was done by extending Moodle LMS (Learning Management System) source code. The second module provides data mining methods and techniques as services; it was done by converting Weka API into a set of Web services. The third module acts as an intermediary between the first two modules, it contains a user-friendly interface that allows dynamically locating data provider services, and running knowledge discovery tasks on data mining services. An experiment was conducted to evaluate the overhead of the proposed framework through a combination of simulation and implementation. The experiments have shown that the overhead introduced by the SOA mechanism is relatively small, therefore, it has been concluded that a service-oriented architecture can be effectively used to facilitate educational data mining in E-learning environments.Keywords: educational data mining, e-learning, distributed data mining, moodle, service-oriented architecture, Weka
Procedia PDF Downloads 23625421 Personalize E-Learning System Based on Clustering and Sequence Pattern Mining Approach
Authors: H. S. Saini, K. Vijayalakshmi, Rishi Sayal
Abstract:
Network-based education has been growing rapidly in size and quality. Knowledge clustering becomes more important in personalized information retrieval for web-learning. A personalized-Learning service after the learners’ knowledge has been classified with clustering. Through automatic analysis of learners’ behaviors, their partition with similar data level and interests may be discovered so as to produce learners with contents that best match educational needs for collaborative learning. We present a specific mining tool and a recommender engine that we have integrated in the online learning in order to help the teacher to carry out the whole e-learning process. We propose to use sequential pattern mining algorithms to discover the most used path by the students and from this information can recommend links to the new students automatically meanwhile they browse in the course. We have Developed a specific author tool in order to help the teacher to apply all the data mining process. We tend to report on many experiments with real knowledge so as to indicate the quality of using both clustering and sequential pattern mining algorithms together for discovering personalized e-learning systems.Keywords: e-learning, cluster, personalization, sequence, pattern
Procedia PDF Downloads 42725420 Sequential Pattern Mining from Data of Medical Record with Sequential Pattern Discovery Using Equivalent Classes (SPADE) Algorithm (A Case Study : Bolo Primary Health Care, Bima)
Authors: Rezky Rifaini, Raden Bagus Fajriya Hakim
Abstract:
This research was conducted at the Bolo primary health Care in Bima Regency. The purpose of the research is to find out the association pattern that is formed of medical record database from Bolo Primary health care’s patient. The data used is secondary data from medical records database PHC. Sequential pattern mining technique is the method that used to analysis. Transaction data generated from Patient_ID, Check_Date and diagnosis. Sequential Pattern Discovery Algorithms Using Equivalent Classes (SPADE) is one of the algorithm in sequential pattern mining, this algorithm find frequent sequences of data transaction, using vertical database and sequence join process. Results of the SPADE algorithm is frequent sequences that then used to form a rule. It technique is used to find the association pattern between items combination. Based on association rules sequential analysis with SPADE algorithm for minimum support 0,03 and minimum confidence 0,75 is gotten 3 association sequential pattern based on the sequence of patient_ID, check_Date and diagnosis data in the Bolo PHC.Keywords: diagnosis, primary health care, medical record, data mining, sequential pattern mining, SPADE algorithm
Procedia PDF Downloads 40125419 The Digital Desert in Global Business: Digital Analytics as an Oasis of Hope for Sub-Saharan Africa
Authors: David Amoah Oduro
Abstract:
In the ever-evolving terrain of international business, a profound revolution is underway, guided by the swift integration and advancement of disruptive technologies like digital analytics. In today's international business landscape, where competition is fierce, and decisions are data-driven, the essence of this paper lies in offering a tangible roadmap for practitioners. It is a guide that bridges the chasm between theory and actionable insights, helping businesses, investors, and entrepreneurs navigate the complexities of international expansion into sub-Saharan Africa. This practitioner paper distils essential insights, methodologies, and actionable recommendations for businesses seeking to leverage digital analytics in their pursuit of market entry and expansion across the African continent. What sets this paper apart is its unwavering focus on a region ripe with potential: sub-Saharan Africa. The adoption and adaptation of digital analytics are not mere luxuries but essential strategic tools for evaluating countries and entering markets within this dynamic region. With the spotlight firmly fixed on sub-Saharan Africa, the aim is to provide a compelling resource to guide practitioners in their quest to unearth the vast opportunities hidden within sub-Saharan Africa's digital desert. The paper illuminates the pivotal role of digital analytics in providing a data-driven foundation for market entry decisions. It highlights the ability to uncover market trends, consumer behavior, and competitive landscapes. By understanding Africa's incredible diversity, the paper underscores the importance of tailoring market entry strategies to account for unique cultural, economic, and regulatory factors. For practitioners, this paper offers a set of actionable recommendations, including the creation of cross-functional teams, the integration of local expertise, and the cultivation of long-term partnerships to ensure sustainable market entry success. It advocates for a commitment to continuous learning and flexibility in adapting strategies as the African market evolves. This paper represents an invaluable resource for businesses, investors, and entrepreneurs who are keen on unlocking the potential of digital analytics for informed market entry in Africa. It serves as a guiding light, equipping practitioners with the essential tools and insights needed to thrive in this dynamic and diverse continent. With these key insights, methodologies, and recommendations, this paper is a roadmap to prosperous and sustainable market entry in Africa. It is vital for anyone looking to harness the transformational potential of digital analytics to create prosperous and sustainable ventures in a region brimming with promise. In the ever-advancing digital age, this practitioner paper becomes a lodestar, guiding businesses and visionaries toward success amidst the unique challenges and rewards of sub-Saharan Africa's international business landscape.Keywords: global analytics, digital analytics, sub-Saharan Africa, data analytics
Procedia PDF Downloads 7225418 Secure Multiparty Computations for Privacy Preserving Classifiers
Authors: M. Sumana, K. S. Hareesha
Abstract:
Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.Keywords: homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data
Procedia PDF Downloads 41225417 Performance Evaluation of Production Schedules Based on Process Mining
Authors: Kwan Hee Han
Abstract:
External environment of enterprise is rapidly changing majorly by global competition, cost reduction pressures, and new technology. In these situations, production scheduling function plays a critical role to meet customer requirements and to attain the goal of operational efficiency. It deals with short-term decision making in the production process of the whole supply chain. The major task of production scheduling is to seek a balance between customer orders and limited resources. In manufacturing companies, this task is so difficult because it should efficiently utilize resource capacity under the careful consideration of many interacting constraints. At present, many computerized software solutions have been utilized in many enterprises to generate a realistic production schedule to overcome the complexity of schedule generation. However, most production scheduling systems do not provide sufficient information about the validity of the generated schedule except limited statistics. Process mining only recently emerged as a sub-discipline of both data mining and business process management. Process mining techniques enable the useful analysis of a wide variety of processes such as process discovery, conformance checking, and bottleneck analysis. In this study, the performance of generated production schedule is evaluated by mining event log data of production scheduling software system by using the process mining techniques since every software system generates event logs for the further use such as security investigation, auditing and error bugging. An application of process mining approach is proposed for the validation of the goodness of production schedule generated by scheduling software systems in this study. By using process mining techniques, major evaluation criteria such as utilization of workstation, existence of bottleneck workstations, critical process route patterns, and work load balance of each machine over time are measured, and finally, the goodness of production schedule is evaluated. By using the proposed process mining approach for evaluating the performance of generated production schedule, the quality of production schedule of manufacturing enterprises can be improved.Keywords: data mining, event log, process mining, production scheduling
Procedia PDF Downloads 27925416 Decision Support System in Air Pollution Using Data Mining
Authors: E. Fathallahi Aghdam, V. Hosseini
Abstract:
Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.Keywords: data mining, clustering, air pollution, crisp approach
Procedia PDF Downloads 42725415 Enhance the Power of Sentiment Analysis
Authors: Yu Zhang, Pedro Desouza
Abstract:
Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining
Procedia PDF Downloads 352