Search results for: data mining analytics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25009

Search results for: data mining analytics

24859 Hybrid Knowledge Approach for Determining Health Care Provider Specialty from Patient Diagnoses

Authors: Erin Lynne Plettenberg, Jeremy Vickery

Abstract:

In an access-control situation, the role of a user determines whether a data request is appropriate. This paper combines vetted web mining and logic modeling to build a lightweight system for determining the role of a health care provider based only on their prior authorized requests. The model identifies provider roles with 100% recall from very little data. This shows the value of vetted web mining in AI systems, and suggests the impact of the ICD classification on medical practice.

Keywords: electronic medical records, information extraction, logic modeling, ontology, vetted web mining

Procedia PDF Downloads 160
24858 Data Mining Spatial: Unsupervised Classification of Geographic Data

Authors: Chahrazed Zouaoui

Abstract:

In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.

Keywords: mining, GIS, geo-clustering, neighborhood

Procedia PDF Downloads 366
24857 A Hybrid Data Mining Algorithm Based System for Intelligent Defence Mission Readiness and Maintenance Scheduling

Authors: Shivam Dwivedi, Sumit Prakash Gupta, Durga Toshniwal

Abstract:

It is a challenging task in today’s date to keep defence forces in the highest state of combat readiness with budgetary constraints. A huge amount of time and money is squandered in the unnecessary and expensive traditional maintenance activities. To overcome this limitation Defence Intelligent Mission Readiness and Maintenance Scheduling System has been proposed, which ameliorates the maintenance system by diagnosing the condition and predicting the maintenance requirements. Based on new data mining algorithms, this system intelligently optimises mission readiness for imminent operations and maintenance scheduling in repair echelons. With modified data mining algorithms such as Weighted Feature Ranking Genetic Algorithm and SVM-Random Forest Linear ensemble, it improves the reliability, availability and safety, alongside reducing maintenance cost and Equipment Out of Action (EOA) time. The results clearly conclude that the introduced algorithms have an edge over the conventional data mining algorithms. The system utilizing the intelligent condition-based maintenance approach improves the operational and maintenance decision strategy of the defence force.

Keywords: condition based maintenance, data mining, defence maintenance, ensemble, genetic algorithms, maintenance scheduling, mission capability

Procedia PDF Downloads 279
24856 Extending Smart City Infrastructure to Cover Natural Disasters

Authors: Nina Dasari, Satvik Dasari

Abstract:

Smart city solutions are being developed across the globe to transform urban areas. However, the infrastructure enablement for alerting natural disasters such as floods and wildfires is deficient. This paper discusses an innovative device that could be used as part of the smart city initiative to detect and provide alerts in case of floods at road crossings and wildfires. An Internet of Things (IoT) smart city node was designed, tested, and deployed with collaboration from the City of Austin. The end to end solution includes a 3G enabled IoT device, flood and fire sensors, cloud, a mobile app, and IoT analytics. The real-time data was collected and analyzed using IoT analytics to refine the solution for the past year. The results demonstrate that the proposed solution is reliable and provides accurate results. This low-cost solution is viable, and it can replace the current solution which costs tens of thousands of dollars.

Keywords: analytics, internet of things, natural disasters, smart city

Procedia PDF Downloads 213
24855 An Optimized Association Rule Mining Algorithm

Authors: Archana Singh, Jyoti Agarwal, Ajay Rana

Abstract:

Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.

Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph

Procedia PDF Downloads 407
24854 Exploring the Intersection of Accounting, Business, and Economics: Bridging Theory and Practice for Sustainable Growth

Authors: Stephen Acheampong Amoafoh

Abstract:

In today's dynamic economic landscape, businesses face multifaceted challenges that demand strategic foresight and informed decision-making. This abstract explores the pivotal role of financial analytics in driving business performance amidst evolving market conditions. By integrating accounting principles with economic insights, organizations can harness the power of data-driven strategies to optimize resource allocation, mitigate risks, and capitalize on emerging opportunities. This presentation will delve into the practical applications of financial analytics across various sectors, highlighting case studies and empirical evidence to underscore its efficacy in enhancing operational efficiency and fostering sustainable growth. From predictive modeling to performance benchmarking, attendees will gain invaluable insights into leveraging advanced analytics tools to drive profitability, streamline processes, and adapt to changing market dynamics. Moreover, this abstract will address the ethical considerations inherent in financial analytics, emphasizing the importance of transparency, integrity, and accountability in data-driven decision-making. By fostering a culture of ethical conduct and responsible stewardship, organizations can build trust with stakeholders and safeguard their long-term viability in an increasingly interconnected global economy. Ultimately, this abstract aims to stimulate dialogue and collaboration among scholars, practitioners, and policymakers, fostering knowledge exchange and innovation in the realms of accounting, business, and economics. Through interdisciplinary insights and actionable recommendations, participants will be equipped to navigate the complexities of today's business environment and seize opportunities for sustainable success.

Keywords: financial analytics, business performance, data-driven strategies, sustainable growth

Procedia PDF Downloads 34
24853 Evaluating the Total Costs of a Ransomware-Resilient Architecture for Healthcare Systems

Authors: Sreejith Gopinath, Aspen Olmsted

Abstract:

This paper is based on our previous work that proposed a risk-transference-based architecture for healthcare systems to store sensitive data outside the system boundary, rendering the system unattractive to would-be bad actors. This architecture also allows a compromised system to be abandoned and a new system instance spun up in place to ensure business continuity without paying a ransom or engaging with a bad actor. This paper delves into the details of various attacks we simulated against the prototype system. In the paper, we discuss at length the time and computational costs associated with storing and retrieving data in the prototype system, abandoning a compromised system, and setting up a new instance with existing data. Lastly, we simulate some analytical workloads over the data stored in our specialized data storage system and discuss the time and computational costs associated with running analytics over data in a specialized storage system outside the system boundary. In summary, this paper discusses the total costs of data storage, access, and analytics incurred with the proposed architecture.

Keywords: cybersecurity, healthcare, ransomware, resilience, risk transference

Procedia PDF Downloads 124
24852 Mining Educational Data to Support Students’ Major Selection

Authors: Kunyanuth Kularbphettong, Cholticha Tongsiri

Abstract:

This paper aims to create the model for student in choosing an emphasized track of student majoring in computer science at Suan Sunandha Rajabhat University. The objective of this research is to develop the suggested system using data mining technique to analyze knowledge and conduct decision rules. Such relationships can be used to demonstrate the reasonableness of student choosing a track as well as to support his/her decision and the system is verified by experts in the field. The sampling is from student of computer science based on the system and the questionnaire to see the satisfaction. The system result is found to be satisfactory by both experts and student as well.

Keywords: data mining technique, the decision support system, knowledge and decision rules, education

Procedia PDF Downloads 412
24851 Predicting Medical Check-Up Patient Re-Coming Using Sequential Pattern Mining and Association Rules

Authors: Rizka Aisha Rahmi Hariadi, Chao Ou-Yang, Han-Cheng Wang, Rajesri Govindaraju

Abstract:

As the increasing of medical check-up popularity, there are a huge number of medical check-up data stored in database and have not been useful. These data actually can be very useful for future strategic planning if we mine it correctly. In other side, a lot of patients come with unpredictable coming and also limited available facilities make medical check-up service offered by hospital not maximal. To solve that problem, this study used those medical check-up data to predict patient re-coming. Sequential pattern mining (SPM) and association rules method were chosen because these methods are suitable for predicting patient re-coming using sequential data. First, based on patient personal information the data was grouped into … groups then discriminant analysis was done to check significant of the grouping. Second, for each group some frequent patterns were generated using SPM method. Third, based on frequent patterns of each group, pairs of variable can be extracted using association rules to get general pattern of re-coming patient. Last, discussion and conclusion was done to give some implications of the results.

Keywords: patient re-coming, medical check-up, health examination, data mining, sequential pattern mining, association rules, discriminant analysis

Procedia PDF Downloads 627
24850 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: latent dirichlet allocation, R program, text mining, topic model, user generated contents, visualization

Procedia PDF Downloads 177
24849 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 418
24848 Human-Centred Data Analysis Method for Future Design of Residential Spaces: Coliving Case Study

Authors: Alicia Regodon Puyalto, Alfonso Garcia-Santos

Abstract:

This article presents a method to analyze the use of indoor spaces based on data analytics obtained from inbuilt digital devices. The study uses the data generated by the in-place devices, such as smart locks, Wi-Fi routers, and electrical sensors, to gain additional insights on space occupancy, user behaviour, and comfort. Those devices, originally installed to facilitate remote operations, report data through the internet that the research uses to analyze information on human real-time use of spaces. Using an in-place Internet of Things (IoT) network enables a faster, more affordable, seamless, and scalable solution to analyze building interior spaces without incorporating external data collection systems such as sensors. The methodology is applied to a real case study of coliving, a residential building of 3000m², 7 floors, and 80 users in the centre of Madrid. The case study applies the method to classify IoT devices, assess, clean, and analyze collected data based on the analysis framework. The information is collected remotely, through the different platforms devices' platforms; the first step is to curate the data, understand what insights can be provided from each device according to the objectives of the study, this generates an analysis framework to be escalated for future building assessment even beyond the residential sector. The method will adjust the parameters to be analyzed tailored to the dataset available in the IoT of each building. The research demonstrates how human-centered data analytics can improve the future spatial design of indoor spaces.

Keywords: in-place devices, IoT, human-centred data-analytics, spatial design

Procedia PDF Downloads 179
24847 Exclusive Value Adding by iCenter Analytics on Transient Condition

Authors: Zhu Weimin, Allegorico Carmine, Ruggiero Gionata

Abstract:

During decades of Baker Hughes (BH) iCenter experience, it is demonstrated that in addition to conventional insights on equipment steady operation conditions, insights on transient conditions can add significant and exclusive value for anomaly detection, downtime saving, and predictive maintenance. Our work shows examples from the BH iCenter experience to introduce the advantages and features of using transient condition analytics: (i) Operation under critical engine conditions: e.g., high level or high change rate of temperature, pressure, flow, vibration, etc., that would not be reachable in normal operation, (ii) Management of dedicated sub-systems or components, many of which are often bottlenecks for reliability and maintenance, (iii) Indirect detection of anomalies in the absence of instrumentation, (iv) Repetitive sequences: if data is properly processed, the engineering features of transients provide not only anomaly detection but also problem characterization and prognostic indicators for predictive maintenance, (v) Engine variables accounting for fatigue analysis. iCenter has been developing and deploying a series of analytics based on transient conditions. They are contributing to exclusive value adding in the following areas: (i) Reliability improvement, (ii) Startup reliability improvement, (iii) Predictive maintenance, (iv) Repair/overhaul cost down. Illustrative examples for each of the above areas are presented in our study, focusing on challenges and adopted techniques ranging from purely statistical approaches to the implementation of machine learning algorithms. The obtained results demonstrate how the value is obtained using transient condition analytics in the BH iCenter experience.

Keywords: analytics, diagnostics, monitoring, turbomachinery

Procedia PDF Downloads 61
24846 Modelling of Powered Roof Supports Work

Authors: Marcin Michalak

Abstract:

Due to the increasing efforts on saving our natural environment a change in the structure of energy resources can be observed - an increasing fraction of a renewable energy sources. In many countries traditional underground coal mining loses its significance but there are still countries, like Poland or Germany, in which the coal based technologies have the greatest fraction in a total energy production. This necessitates to make an effort to limit the costs and negative effects of underground coal mining. The longwall complex is as essential part of the underground coal mining. The safety and the effectiveness of the work is strongly dependent of the diagnostic state of powered roof supports. The building of a useful and reliable diagnostic system requires a lot of data. As the acquisition of a data of any possible operating conditions it is important to have a possibility to generate a demanded artificial working characteristics. In this paper a new approach of modelling a leg pressure in the single unit of powered roof support. The model is a result of the analysis of a typical working cycles.

Keywords: machine modelling, underground mining, coal mining, structure

Procedia PDF Downloads 351
24845 Estimation of Service Quality and Its Impact on Market Share Using Business Analytics

Authors: Haritha Saranga

Abstract:

Service quality has become an important driver of competition in manufacturing industries of late, as many products are being sold in conjunction with service offerings. With increase in computational power and data capture capabilities, it has become possible to analyze and estimate various aspects of service quality at the granular level and determine their impact on business performance. In the current study context, dealer level, model-wise warranty data from one of the top two-wheeler manufacturers in India is used to estimate service quality of individual dealers and its impact on warranty related costs and sales performance. We collected primary data on warranty costs, number of complaints, monthly sales, type of quality upgrades, etc. from the two-wheeler automaker. In addition, we gathered secondary data on various regions in India, such as petrol and diesel prices, geographic and climatic conditions of various regions where the dealers are located, to control for customer usage patterns. We analyze this primary and secondary data with the help of a variety of analytics tools such as Auto-Regressive Integrated Moving Average (ARIMA), Seasonal ARIMA and ARIMAX. Study results, after controlling for a variety of factors, such as size, age, region of the dealership, and customer usage pattern, show that service quality does influence sales of the products in a significant manner. A more nuanced analysis reveals the dynamics between product quality and service quality, and how their interaction affects sales performance in the Indian two-wheeler industry context. We also provide various managerial insights using descriptive analytics and build a model that can provide sales projections using a variety of forecasting techniques.

Keywords: service quality, product quality, automobile industry, business analytics, auto-regressive integrated moving average

Procedia PDF Downloads 110
24844 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising

Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri

Abstract:

Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.

Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing

Procedia PDF Downloads 572
24843 A Theoretical Model for Pattern Extraction in Large Datasets

Authors: Muhammad Usman

Abstract:

Pattern extraction has been done in past to extract hidden and interesting patterns from large datasets. Recently, advancements are being made in these techniques by providing the ability of multi-level mining, effective dimension reduction, advanced evaluation and visualization support. This paper focuses on reviewing the current techniques in literature on the basis of these parameters. Literature review suggests that most of the techniques which provide multi-level mining and dimension reduction, do not handle mixed-type data during the process. Patterns are not extracted using advanced algorithms for large datasets. Moreover, the evaluation of patterns is not done using advanced measures which are suited for high-dimensional data. Techniques which provide visualization support are unable to handle a large number of rules in a small space. We present a theoretical model to handle these issues. The implementation of the model is beyond the scope of this paper.

Keywords: association rule mining, data mining, data warehouses, visualization of association rules

Procedia PDF Downloads 215
24842 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic

Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam

Abstract:

In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.

Keywords: decision support system, data mining, knowledge discovery, data discovery, fuzzy logic

Procedia PDF Downloads 321
24841 Emotion Mining and Attribute Selection for Actionable Recommendations to Improve Customer Satisfaction

Authors: Jaishree Ranganathan, Poonam Rajurkar, Angelina A. Tzacheva, Zbigniew W. Ras

Abstract:

In today’s world, business often depends on the customer feedback and reviews. Sentiment analysis helps identify and extract information about the sentiment or emotion of the of the topic or document. Attribute selection is a challenging problem, especially with large datasets in actionable pattern mining algorithms. Action Rule Mining is one of the methods to discover actionable patterns from data. Action Rules are rules that help describe specific actions to be made in the form of conditions that help achieve the desired outcome. The rules help to change from any undesirable or negative state to a more desirable or positive state. In this paper, we present a Lexicon based weighted scheme approach to identify emotions from customer feedback data in the area of manufacturing business. Also, we use Rough sets and explore the attribute selection method for large scale datasets. Then we apply Actionable pattern mining to extract possible emotion change recommendations. This kind of recommendations help business analyst to improve their customer service which leads to customer satisfaction and increase sales revenue.

Keywords: actionable pattern discovery, attribute selection, business data, data mining, emotion

Procedia PDF Downloads 187
24840 A Web Service-Based Framework for Mining E-Learning Data

Authors: Felermino D. M. A. Ali, S. C. Ng

Abstract:

E-learning is an evolutionary form of distance learning and has become better over time as new technologies emerged. Today, efforts are still being made to embrace E-learning systems with emerging technologies in order to make them better. Among these advancements, Educational Data Mining (EDM) is one that is gaining a huge and increasing popularity due to its wide application for improving the teaching-learning process in online practices. However, even though EDM promises to bring many benefits to educational industry in general and E-learning environments in particular, its principal drawback is the lack of easy to use tools. The current EDM tools usually require users to have some additional technical expertise to effectively perform EDM tasks. Thus, in response to these limitations, this study intends to design and implement an EDM application framework which aims at automating and simplify the development of EDM in E-learning environment. The application framework introduces a Service-Oriented Architecture (SOA) that hides the complexity of technical details and enables users to perform EDM in an automated fashion. The framework was designed based on abstraction, extensibility, and interoperability principles. The framework implementation was made up of three major modules. The first module provides an abstraction for data gathering, which was done by extending Moodle LMS (Learning Management System) source code. The second module provides data mining methods and techniques as services; it was done by converting Weka API into a set of Web services. The third module acts as an intermediary between the first two modules, it contains a user-friendly interface that allows dynamically locating data provider services, and running knowledge discovery tasks on data mining services. An experiment was conducted to evaluate the overhead of the proposed framework through a combination of simulation and implementation. The experiments have shown that the overhead introduced by the SOA mechanism is relatively small, therefore, it has been concluded that a service-oriented architecture can be effectively used to facilitate educational data mining in E-learning environments.

Keywords: educational data mining, e-learning, distributed data mining, moodle, service-oriented architecture, Weka

Procedia PDF Downloads 227
24839 Discovering User Behaviour Patterns from Web Log Analysis to Enhance the Accessibility and Usability of Website

Authors: Harpreet Singh

Abstract:

Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.

Keywords: web usage mining, web mining, log file, data mining, deep log analyzer

Procedia PDF Downloads 239
24838 Sequential Pattern Mining from Data of Medical Record with Sequential Pattern Discovery Using Equivalent Classes (SPADE) Algorithm (A Case Study : Bolo Primary Health Care, Bima)

Authors: Rezky Rifaini, Raden Bagus Fajriya Hakim

Abstract:

This research was conducted at the Bolo primary health Care in Bima Regency. The purpose of the research is to find out the association pattern that is formed of medical record database from Bolo Primary health care’s patient. The data used is secondary data from medical records database PHC. Sequential pattern mining technique is the method that used to analysis. Transaction data generated from Patient_ID, Check_Date and diagnosis. Sequential Pattern Discovery Algorithms Using Equivalent Classes (SPADE) is one of the algorithm in sequential pattern mining, this algorithm find frequent sequences of data transaction, using vertical database and sequence join process. Results of the SPADE algorithm is frequent sequences that then used to form a rule. It technique is used to find the association pattern between items combination. Based on association rules sequential analysis with SPADE algorithm for minimum support 0,03 and minimum confidence 0,75 is gotten 3 association sequential pattern based on the sequence of patient_ID, check_Date and diagnosis data in the Bolo PHC.

Keywords: diagnosis, primary health care, medical record, data mining, sequential pattern mining, SPADE algorithm

Procedia PDF Downloads 389
24837 The Digital Desert in Global Business: Digital Analytics as an Oasis of Hope for Sub-Saharan Africa

Authors: David Amoah Oduro

Abstract:

In the ever-evolving terrain of international business, a profound revolution is underway, guided by the swift integration and advancement of disruptive technologies like digital analytics. In today's international business landscape, where competition is fierce, and decisions are data-driven, the essence of this paper lies in offering a tangible roadmap for practitioners. It is a guide that bridges the chasm between theory and actionable insights, helping businesses, investors, and entrepreneurs navigate the complexities of international expansion into sub-Saharan Africa. This practitioner paper distils essential insights, methodologies, and actionable recommendations for businesses seeking to leverage digital analytics in their pursuit of market entry and expansion across the African continent. What sets this paper apart is its unwavering focus on a region ripe with potential: sub-Saharan Africa. The adoption and adaptation of digital analytics are not mere luxuries but essential strategic tools for evaluating countries and entering markets within this dynamic region. With the spotlight firmly fixed on sub-Saharan Africa, the aim is to provide a compelling resource to guide practitioners in their quest to unearth the vast opportunities hidden within sub-Saharan Africa's digital desert. The paper illuminates the pivotal role of digital analytics in providing a data-driven foundation for market entry decisions. It highlights the ability to uncover market trends, consumer behavior, and competitive landscapes. By understanding Africa's incredible diversity, the paper underscores the importance of tailoring market entry strategies to account for unique cultural, economic, and regulatory factors. For practitioners, this paper offers a set of actionable recommendations, including the creation of cross-functional teams, the integration of local expertise, and the cultivation of long-term partnerships to ensure sustainable market entry success. It advocates for a commitment to continuous learning and flexibility in adapting strategies as the African market evolves. This paper represents an invaluable resource for businesses, investors, and entrepreneurs who are keen on unlocking the potential of digital analytics for informed market entry in Africa. It serves as a guiding light, equipping practitioners with the essential tools and insights needed to thrive in this dynamic and diverse continent. With these key insights, methodologies, and recommendations, this paper is a roadmap to prosperous and sustainable market entry in Africa. It is vital for anyone looking to harness the transformational potential of digital analytics to create prosperous and sustainable ventures in a region brimming with promise. In the ever-advancing digital age, this practitioner paper becomes a lodestar, guiding businesses and visionaries toward success amidst the unique challenges and rewards of sub-Saharan Africa's international business landscape.

Keywords: global analytics, digital analytics, sub-Saharan Africa, data analytics

Procedia PDF Downloads 59
24836 Secure Multiparty Computations for Privacy Preserving Classifiers

Authors: M. Sumana, K. S. Hareesha

Abstract:

Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.

Keywords: homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data

Procedia PDF Downloads 402
24835 Personalize E-Learning System Based on Clustering and Sequence Pattern Mining Approach

Authors: H. S. Saini, K. Vijayalakshmi, Rishi Sayal

Abstract:

Network-based education has been growing rapidly in size and quality. Knowledge clustering becomes more important in personalized information retrieval for web-learning. A personalized-Learning service after the learners’ knowledge has been classified with clustering. Through automatic analysis of learners’ behaviors, their partition with similar data level and interests may be discovered so as to produce learners with contents that best match educational needs for collaborative learning. We present a specific mining tool and a recommender engine that we have integrated in the online learning in order to help the teacher to carry out the whole e-learning process. We propose to use sequential pattern mining algorithms to discover the most used path by the students and from this information can recommend links to the new students automatically meanwhile they browse in the course. We have Developed a specific author tool in order to help the teacher to apply all the data mining process. We tend to report on many experiments with real knowledge so as to indicate the quality of using both clustering and sequential pattern mining algorithms together for discovering personalized e-learning systems.

Keywords: e-learning, cluster, personalization, sequence, pattern

Procedia PDF Downloads 412
24834 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams

Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem

Abstract:

In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.

Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data

Procedia PDF Downloads 146
24833 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining

Procedia PDF Downloads 337
24832 Decision Support System in Air Pollution Using Data Mining

Authors: E. Fathallahi Aghdam, V. Hosseini

Abstract:

Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.

Keywords: data mining, clustering, air pollution, crisp approach

Procedia PDF Downloads 417
24831 Automated Detection of Targets and Retrieve the Corresponding Analytics Using Augmented Reality

Authors: Suvarna Kumar Gogula, Sandhya Devi Gogula, P. Chanakya

Abstract:

Augmented reality is defined as the collection of the digital (or) computer generated information like images, audio, video, 3d models, etc. and overlay them over the real time environment. Augmented reality can be thought as a blend between completely synthetic and completely real. Augmented reality provides scope in a wide range of industries like manufacturing, retail, gaming, advertisement, tourism, etc. and brings out new dimensions in the modern digital world. As it overlays the content, it makes the users enhance the knowledge by providing the content blended with real world. In this application, we integrated augmented reality with data analytics and integrated with cloud so the virtual content will be generated on the basis of the data present in the database and we used marker based augmented reality where every marker will be stored in the database with corresponding unique ID. This application can be used in wide range of industries for different business processes, but in this paper, we mainly focus on the marketing industry which helps the customer in gaining the knowledge about the products in the market which mainly focus on their prices, customer feedback, quality, and other benefits. This application also focuses on providing better market strategy information for marketing managers who obtain the data about the stocks, sales, customer response about the product, etc. In this paper, we also included the reports from the feedback got from different people after the demonstration, and finally, we presented the future scope of Augmented Reality in different business processes by integrating with new technologies like cloud, big data, artificial intelligence, etc.

Keywords: augmented reality, data analytics, catch room, marketing and sales

Procedia PDF Downloads 221
24830 Text Mining of Veterinary Forums for Epidemiological Surveillance Supplementation

Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves

Abstract:

Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand the smallholder farming communities within Scotland by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted in conjunction with text mining of the data in search of common themes, words, and topics found within the text. Results from bi-grams and topic modelling uncover four main topics of interest within the data pertaining to aspects of livestock husbandry: feeding, breeding, slaughter, and disposal. These topics were found amongst both the poultry and pig sub-forums. Topic modeling appears to be a useful method of unsupervised classification regarding this form of data, as it has produced clusters that relate to biosecurity and animal welfare. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter and Facebook/Meta, in addition to time series analysis to highlight temporal patterns.

Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, smallholding, social media, web scraping, sentiment analysis, geolocation, text mining, NLP

Procedia PDF Downloads 79