Search results for: privacy preserving data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25082

Search results for: privacy preserving data mining

24902 Emotion Mining and Attribute Selection for Actionable Recommendations to Improve Customer Satisfaction

Authors: Jaishree Ranganathan, Poonam Rajurkar, Angelina A. Tzacheva, Zbigniew W. Ras

Abstract:

In today’s world, business often depends on the customer feedback and reviews. Sentiment analysis helps identify and extract information about the sentiment or emotion of the of the topic or document. Attribute selection is a challenging problem, especially with large datasets in actionable pattern mining algorithms. Action Rule Mining is one of the methods to discover actionable patterns from data. Action Rules are rules that help describe specific actions to be made in the form of conditions that help achieve the desired outcome. The rules help to change from any undesirable or negative state to a more desirable or positive state. In this paper, we present a Lexicon based weighted scheme approach to identify emotions from customer feedback data in the area of manufacturing business. Also, we use Rough sets and explore the attribute selection method for large scale datasets. Then we apply Actionable pattern mining to extract possible emotion change recommendations. This kind of recommendations help business analyst to improve their customer service which leads to customer satisfaction and increase sales revenue.

Keywords: actionable pattern discovery, attribute selection, business data, data mining, emotion

Procedia PDF Downloads 172
24901 Study on Security and Privacy Issues of Mobile Operating Systems Based on Malware Attacks

Authors: Huang Dennis, Aurelio Aziel, Burra Venkata Durga Kumar

Abstract:

Nowadays, smartphones and mobile operating systems have been popularly widespread in our daily lives. As people use smartphones, they tend to store more private and essential data on their devices, because of this it is very important to develop more secure mobile operating systems and cloud storage to secure the data. However, several factors can cause security risks in mobile operating systems such as malware, malicious app, phishing attacks, ransomware, and more, all of which can cause a big problem for users as they can access the user's private data. Those problems can cause data loss, financial loss, identity theft, and other serious consequences. Other than that, during the pandemic, people will use their mobile devices more and do all sorts of transactions online, which may lead to more victims of online scams and inexperienced users being the target. With the increase in attacks, researchers have been actively working to develop several countermeasures to enhance the security of operating systems. This study aims to provide an overview of the security and privacy issues in mobile operating systems, identifying the potential risk of operating systems, and the possible solutions. By examining these issues, we want to provide an easy understanding to users and researchers to improve knowledge and develop more secure mobile operating systems.

Keywords: mobile operating system, security, privacy, Malware

Procedia PDF Downloads 56
24900 Electronic Health Record System: A Perspective to Improve the Value of Services Rendered to Patients in Healthcare Organization in Rwanda, Case of CHUB and Hopital De Nemba

Authors: Mugabe Nzarama Gabriel

Abstract:

In Rwanda, many healthcare organizations are still using a paper based patients’ data record system although it still present weaknesses to share health patients’ information across different services when necessary. In developed countries, the EHR has been put in place to revolutionize the paper based record system but still the EHR has some challenges related to privacy, security, or interoperability. The purpose of this research was to assess the existing patients’ data record system in healthcare sector in Rwanda, see what an EHR can improve to the system in place and assess the acceptance of EHR as system which is interoperable, very secure and interoperable and see whether stakeholders are ready to adopt the system. The case based methodology was used and TAM theoretical framework to design the questionnaire for the survey. A judgmental sample across two cases, CHUB and Hopital de Nemba, has been selected and SPSS has been used for descriptive statistics. After a qualitative analysis, the findings showed that the paper based record is useful, gives complete information about the patient, protects the privacy of patients but it is still less secure and less interoperable. The respondents shown that they are ready to use the proposed EHR System and want it secure, capable of enforcing the privacy but still they are not all ready for the interoperability. A conclusion has been formulated; recommendations and further research have been proposed.

Keywords: EHR system, healthcare service, TAM, privacy, interoperability

Procedia PDF Downloads 242
24899 Sequential Pattern Mining from Data of Medical Record with Sequential Pattern Discovery Using Equivalent Classes (SPADE) Algorithm (A Case Study : Bolo Primary Health Care, Bima)

Authors: Rezky Rifaini, Raden Bagus Fajriya Hakim

Abstract:

This research was conducted at the Bolo primary health Care in Bima Regency. The purpose of the research is to find out the association pattern that is formed of medical record database from Bolo Primary health care’s patient. The data used is secondary data from medical records database PHC. Sequential pattern mining technique is the method that used to analysis. Transaction data generated from Patient_ID, Check_Date and diagnosis. Sequential Pattern Discovery Algorithms Using Equivalent Classes (SPADE) is one of the algorithm in sequential pattern mining, this algorithm find frequent sequences of data transaction, using vertical database and sequence join process. Results of the SPADE algorithm is frequent sequences that then used to form a rule. It technique is used to find the association pattern between items combination. Based on association rules sequential analysis with SPADE algorithm for minimum support 0,03 and minimum confidence 0,75 is gotten 3 association sequential pattern based on the sequence of patient_ID, check_Date and diagnosis data in the Bolo PHC.

Keywords: diagnosis, primary health care, medical record, data mining, sequential pattern mining, SPADE algorithm

Procedia PDF Downloads 374
24898 Discovering User Behaviour Patterns from Web Log Analysis to Enhance the Accessibility and Usability of Website

Authors: Harpreet Singh

Abstract:

Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.

Keywords: web usage mining, web mining, log file, data mining, deep log analyzer

Procedia PDF Downloads 224
24897 Personalize E-Learning System Based on Clustering and Sequence Pattern Mining Approach

Authors: H. S. Saini, K. Vijayalakshmi, Rishi Sayal

Abstract:

Network-based education has been growing rapidly in size and quality. Knowledge clustering becomes more important in personalized information retrieval for web-learning. A personalized-Learning service after the learners’ knowledge has been classified with clustering. Through automatic analysis of learners’ behaviors, their partition with similar data level and interests may be discovered so as to produce learners with contents that best match educational needs for collaborative learning. We present a specific mining tool and a recommender engine that we have integrated in the online learning in order to help the teacher to carry out the whole e-learning process. We propose to use sequential pattern mining algorithms to discover the most used path by the students and from this information can recommend links to the new students automatically meanwhile they browse in the course. We have Developed a specific author tool in order to help the teacher to apply all the data mining process. We tend to report on many experiments with real knowledge so as to indicate the quality of using both clustering and sequential pattern mining algorithms together for discovering personalized e-learning systems.

Keywords: e-learning, cluster, personalization, sequence, pattern

Procedia PDF Downloads 404
24896 To Ensure Maximum Voter Privacy in E-Voting Using Blockchain, Convolutional Neural Network, and Quantum Key Distribution

Authors: Bhaumik Tyagi, Mandeep Kaur, Kanika Singla

Abstract:

The advancement of blockchain has facilitated scholars to remodel e-voting systems for future generations. Server-side attacks like SQL injection attacks and DOS attacks are the most common attacks nowadays, where malicious codes are injected into the system through user input fields by illicit users, which leads to data leakage in the worst scenarios. Besides, quantum attacks are also there which manipulate the transactional data. In order to deal with all the above-mentioned attacks, integration of blockchain, convolutional neural network (CNN), and Quantum Key Distribution is done in this very research. The utilization of blockchain technology in e-voting applications is not a novel concept. But privacy and security issues are still there in a public and private blockchains. To solve this, the use of a hybrid blockchain is done in this research. This research proposed cryptographic signatures and blockchain algorithms to validate the origin and integrity of the votes. The convolutional neural network (CNN), a normalized version of the multilayer perceptron, is also applied in the system to analyze visual descriptions upon registration in a direction to enhance the privacy of voters and the e-voting system. Quantum Key Distribution is being implemented in order to secure a blockchain-based e-voting system from quantum attacks using quantum algorithms. Implementation of e-voting blockchain D-app and providing a proposed solution for the privacy of voters in e-voting using Blockchain, CNN, and Quantum Key Distribution is done.

Keywords: hybrid blockchain, secure e-voting system, convolutional neural networks, quantum key distribution, one-time pad

Procedia PDF Downloads 58
24895 Personal Data Protection: A Legal Framework for Health Law in Turkey

Authors: Veli Durmus, Mert Uydaci

Abstract:

Every patient who needs to get a medical treatment should share health-related personal data with healthcare providers. Therefore, personal health data plays an important role to make health decisions and identify health threats during every encounter between a patient and caregivers. In other words, health data can be defined as privacy and sensitive information which is protected by various health laws and regulations. In many cases, the data are an outcome of the confidential relationship between patients and their healthcare providers. Globally, almost all nations have own laws, regulations or rules in order to protect personal data. There is a variety of instruments that allow authorities to use the health data or to set the barriers data sharing across international borders. For instance, Directive 95/46/EC of the European Union (EU) (also known as EU Data Protection Directive) establishes harmonized rules in European borders. In addition, the General Data Protection Regulation (GDPR) will set further common principles in 2018. Because of close policy relationship with EU, this study provides not only information on regulations, directives but also how they play a role during the legislative process in Turkey. Even if the decision is controversial, the Board has recently stated that private or public healthcare institutions are responsible for the patient call system, for doctors to call people waiting outside a consultation room, to prevent unlawful processing of personal data and unlawful access to personal data during the treatment. In Turkey, vast majority private and public health organizations provide a service that ensures personal data (i.e. patient’s name and ID number) to call the patient. According to the Board’s decision, hospital or other healthcare institutions are obliged to take all necessary administrative precautions and provide technical support to protect patient privacy. However, this application does not effectively and efficiently performing in most health services. For this reason, it is important to draw a legal framework of personal health data by stating what is the main purpose of this regulation and how to deal with complicated issues on personal health data in Turkey. The research is descriptive on data protection law for health care setting in Turkey. Primary as well as secondary data has been used for the study. The primary data includes the information collected under current national and international regulations or law. Secondary data include publications, books, journals, empirical legal studies. Consequently, privacy and data protection regimes in health law show there are some obligations, principles and procedures which shall be binding upon natural or legal persons who process health-related personal data. A comparative approach presents there are significant differences in some EU member states due to different legal competencies, policies, and cultural factors. This selected study provides theoretical and practitioner implications by highlighting the need to illustrate the relationship between privacy and confidentiality in Personal Data Protection in Health Law. Furthermore, this paper would help to define the legal framework for the health law case studies on data protection and privacy.

Keywords: data protection, personal data, privacy, healthcare, health law

Procedia PDF Downloads 189
24894 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams

Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem

Abstract:

In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.

Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data

Procedia PDF Downloads 133
24893 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining

Procedia PDF Downloads 328
24892 Factors of Social Network Platform Usage and Privacy Risk: A Unified Theory of Acceptance and Use of Technology2 Model

Authors: Wang Xue, Fan Liwei

Abstract:

The trust and use of social network platforms by users are instrumental factors that contribute to the platform’s sustainable development. Studying the influential factors of the use of social network platforms is beneficial for developing and maintaining a large user base. This study constructed an extended unified theory of acceptance and use of technology (UTAUT2) moderating model with perceived privacy risks to analyze the factors affecting the trust and use of social network platforms. 444 participants completed our 35 surveys, and we verified the survey results by structural equation model. Empirical results reveal the influencing factors that affect the trust and use of social network platforms, and the extended UTAUT2 model with perceived privacy risks increases the applicability of UTAUT2 in social network scenarios. Social networking platforms can increase their use rate by increasing the economics, functionality, entertainment, and privacy security of the platform.

Keywords: perceived privacy risk, social network, trust, use, UTAUT2 model

Procedia PDF Downloads 67
24891 Decision Support System in Air Pollution Using Data Mining

Authors: E. Fathallahi Aghdam, V. Hosseini

Abstract:

Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.

Keywords: data mining, clustering, air pollution, crisp approach

Procedia PDF Downloads 405
24890 Text Mining of Veterinary Forums for Epidemiological Surveillance Supplementation

Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves

Abstract:

Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand the smallholder farming communities within Scotland by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted in conjunction with text mining of the data in search of common themes, words, and topics found within the text. Results from bi-grams and topic modelling uncover four main topics of interest within the data pertaining to aspects of livestock husbandry: feeding, breeding, slaughter, and disposal. These topics were found amongst both the poultry and pig sub-forums. Topic modeling appears to be a useful method of unsupervised classification regarding this form of data, as it has produced clusters that relate to biosecurity and animal welfare. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter and Facebook/Meta, in addition to time series analysis to highlight temporal patterns.

Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, smallholding, social media, web scraping, sentiment analysis, geolocation, text mining, NLP

Procedia PDF Downloads 71
24889 A Concept of Data Mining with XML Document

Authors: Akshay Agrawal, Anand K. Srivastava

Abstract:

The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.

Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering

Procedia PDF Downloads 351
24888 Association of Social Data as a Tool to Support Government Decision Making

Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias

Abstract:

Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.

Keywords: social data, government decision making, association of social data, data mining

Procedia PDF Downloads 342
24887 Improving Security in Healthcare Applications Using Federated Learning System With Blockchain Technology

Authors: Aofan Liu, Qianqian Tan, Burra Venkata Durga Kumar

Abstract:

Data security is of the utmost importance in the healthcare area, as sensitive patient information is constantly sent around and analyzed by many different parties. The use of federated learning, which enables data to be evaluated locally on devices rather than being transferred to a central server, has emerged as a potential solution for protecting the privacy of user information. To protect against data breaches and unauthorized access, federated learning alone might not be adequate. In this context, the application of blockchain technology could provide the system extra protection. This study proposes a distributed federated learning system that is built on blockchain technology in order to enhance security in healthcare. This makes it possible for a wide variety of healthcare providers to work together on data analysis without raising concerns about the confidentiality of the data. The technical aspects of the system, including as the design and implementation of distributed learning algorithms, consensus mechanisms, and smart contracts, are also investigated as part of this process. The technique that was offered is a workable alternative that addresses concerns about the safety of healthcare while also fostering collaborative research and the interchange of data.

Keywords: data privacy, distributed system, federated learning, machine learning

Procedia PDF Downloads 87
24886 Evaluation of Classification Algorithms for Diagnosis of Asthma in Iranian Patients

Authors: Taha SamadSoltani, Peyman Rezaei Hachesu, Marjan GhaziSaeedi, Maryam Zolnoori

Abstract:

Introduction: Data mining defined as a process to find patterns and relationships along data in the database to build predictive models. Application of data mining extended in vast sectors such as the healthcare services. Medical data mining aims to solve real-world problems in the diagnosis and treatment of diseases. This method applies various techniques and algorithms which have different accuracy and precision. The purpose of this study was to apply knowledge discovery and data mining techniques for the diagnosis of asthma based on patient symptoms and history. Method: Data mining includes several steps and decisions should be made by the user which starts by creation of an understanding of the scope and application of previous knowledge in this area and identifying KD process from the point of view of the stakeholders and finished by acting on discovered knowledge using knowledge conducting, integrating knowledge with other systems and knowledge documenting and reporting.in this study a stepwise methodology followed to achieve a logical outcome. Results: Sensitivity, Specifity and Accuracy of KNN, SVM, Naïve bayes, NN, Classification tree and CN2 algorithms and related similar studies was evaluated and ROC curves were plotted to show the performance of the system. Conclusion: The results show that we can accurately diagnose asthma, approximately ninety percent, based on the demographical and clinical data. The study also showed that the methods based on pattern discovery and data mining have a higher sensitivity compared to expert and knowledge-based systems. On the other hand, medical guidelines and evidence-based medicine should be base of diagnostics methods, therefore recommended to machine learning algorithms used in combination with knowledge-based algorithms.

Keywords: asthma, datamining, classification, machine learning

Procedia PDF Downloads 426
24885 Develop a Conceptual Data Model of Geotechnical Risk Assessment in Underground Coal Mining Using a Cloud-Based Machine Learning Platform

Authors: Reza Mohammadzadeh

Abstract:

The major challenges in geotechnical engineering in underground spaces arise from uncertainties and different probabilities. The collection, collation, and collaboration of existing data to incorporate them in analysis and design for given prospect evaluation would be a reliable, practical problem solving method under uncertainty. Machine learning (ML) is a subfield of artificial intelligence in statistical science which applies different techniques (e.g., Regression, neural networks, support vector machines, decision trees, random forests, genetic programming, etc.) on data to automatically learn and improve from them without being explicitly programmed and make decisions and predictions. In this paper, a conceptual database schema of geotechnical risks in underground coal mining based on a cloud system architecture has been designed. A new approach of risk assessment using a three-dimensional risk matrix supported by the level of knowledge (LoK) has been proposed in this model. Subsequently, the model workflow methodology stages have been described. In order to train data and LoK models deployment, an ML platform has been implemented. IBM Watson Studio, as a leading data science tool and data-driven cloud integration ML platform, is employed in this study. As a Use case, a data set of geotechnical hazards and risk assessment in underground coal mining were prepared to demonstrate the performance of the model, and accordingly, the results have been outlined.

Keywords: data model, geotechnical risks, machine learning, underground coal mining

Procedia PDF Downloads 247
24884 Performance Evaluation of Production Schedules Based on Process Mining

Authors: Kwan Hee Han

Abstract:

External environment of enterprise is rapidly changing majorly by global competition, cost reduction pressures, and new technology. In these situations, production scheduling function plays a critical role to meet customer requirements and to attain the goal of operational efficiency. It deals with short-term decision making in the production process of the whole supply chain. The major task of production scheduling is to seek a balance between customer orders and limited resources. In manufacturing companies, this task is so difficult because it should efficiently utilize resource capacity under the careful consideration of many interacting constraints. At present, many computerized software solutions have been utilized in many enterprises to generate a realistic production schedule to overcome the complexity of schedule generation. However, most production scheduling systems do not provide sufficient information about the validity of the generated schedule except limited statistics. Process mining only recently emerged as a sub-discipline of both data mining and business process management. Process mining techniques enable the useful analysis of a wide variety of processes such as process discovery, conformance checking, and bottleneck analysis. In this study, the performance of generated production schedule is evaluated by mining event log data of production scheduling software system by using the process mining techniques since every software system generates event logs for the further use such as security investigation, auditing and error bugging. An application of process mining approach is proposed for the validation of the goodness of production schedule generated by scheduling software systems in this study. By using process mining techniques, major evaluation criteria such as utilization of workstation, existence of bottleneck workstations, critical process route patterns, and work load balance of each machine over time are measured, and finally, the goodness of production schedule is evaluated. By using the proposed process mining approach for evaluating the performance of generated production schedule, the quality of production schedule of manufacturing enterprises can be improved.

Keywords: data mining, event log, process mining, production scheduling

Procedia PDF Downloads 250
24883 Data Mining Approach: Classification Model Evaluation

Authors: Lubabatu Sada Sodangi

Abstract:

The rapid growth in exchange and accessibility of information via the internet makes many organisations acquire data on their own operation. The aim of data mining is to analyse the different behaviour of a dataset using observation. Although, the subset of the dataset being analysed may not display all the behaviours and relationships of the entire data and, therefore, may not represent other parts that exist in the dataset. There is a range of techniques used in data mining to determine the hidden or unknown information in datasets. In this paper, the performance of two algorithms Chi-Square Automatic Interaction Detection (CHAID) and multilayer perceptron (MLP) would be matched using an Adult dataset to find out the percentage of an/the adults that earn > 50k and those that earn <= 50k per year. The two algorithms were studied and compared using IBM SPSS statistics software. The result for CHAID shows that the most important predictors are relationship and education. The algorithm shows that those are married (husband) and have qualification: Bachelor, Masters, Doctorate or Prof-school whose their age is > 41<57 earn > 50k. Also, multilayer perceptron displays marital status and capital gain as the most important predictors of the income. It also shows that individuals that their capital gain is less than 6,849 and are single, separated or widow, earn <= 50K, whereas individuals with their capital gain is > 6,849, work > 35 hrs/wk, and > 27yrs their income will be > 50k. By comparing the two algorithms, it is observed that both algorithms are reliable but there is strong reliability in CHAID which clearly shows that relation and education contribute to the prediction as displayed in the data visualisation.

Keywords: data mining, CHAID, multi-layer perceptron, SPSS, Adult dataset

Procedia PDF Downloads 359
24882 Regulating Issues concerning Data Protection in Cloud Computing: Developing a Saudi Approach

Authors: Jumana Majdi Qutub

Abstract:

Rationale: Cloud computing has rapidly developed the past few years. Because of the importance of providing protection for personal data used in cloud computing, the role of data protection in promoting trust and confidence in users’ data has become an important policy priority. This research examines key regulatory challenges rose by the growing use and importance of cloud computing with focusing on protection of individuals personal data. Methodology: Describing and analyzing governance challenges facing policymakers and industry in Saudi Arabia, with an account of anticipated governance responses. The aim of the research is to describe and define the regulatory challenges on cloud computing for policy making in Saudi Arabia and comparing it with potential complied issues rose in respect of transported data to EU member state. In addition, it discusses information privacy issues. Finally, the research proposes policy recommendation that would resolve concerns surrounds the privacy and effectiveness of clouds computing frameworks for data protection. Results: There are still no clear regulation in Saudi Arabia specialized in legalizing cloud computing and specialty regulations in transferring data internationally and locally. Decision makers need to review the applicable law in Saudi Arabia that protect information in cloud computing. This should be from an international and a local view in order to identify all requirements surrounding this area. It is important to educate cloud computing users about their information value and rights before putting it in the cloud to avoid further legal complications, such as making an educational program to prevent giving personal information to a bank employee. Therefore, with many kinds of cloud computing services, it is important to have it covered by the law in all aspects.

Keywords: cloud computing, cyber crime, data protection, privacy

Procedia PDF Downloads 234
24881 Exploring Legal Liabilities of Mining Companies for Human Rights Abuses: Case Study of Mongolian Mine

Authors: Azzaya Enkhjargal

Abstract:

Context: The mining industry has a long history of human rights abuses, including forced labor, environmental pollution, and displacement of communities. In recent years, there has been growing international pressure to hold mining companies accountable for these abuses. Research Aim: This study explores the legal liabilities of mining companies for human rights abuses. The study specifically examines the case of Erdenet Mining Corporation (EMC), a large mining company in Mongolia that has been accused of human rights abuses. Methodology: The study used a mixed-methods approach, which included a review of legal literature, interviews with community members and NGOs, and a case study of EMC. Findings: The study found that mining companies can be held liable for human rights abuses under a variety of regulatory frameworks, including soft law and self-regulatory instruments in the mining industry, international law, national law, and corporate law. The study also found that there are a number of challenges to holding mining companies accountable for human rights abuses, including the lack of effective enforcement mechanisms and the difficulty of proving causation. Theoretical Importance: The study contributes to the growing body of literature on the legal liabilities of mining companies for human rights abuses. The study also provides insights into the challenges of holding mining companies accountable for human rights abuses. Data Collection: The data for the study was collected through a variety of methods, including a review of legal literature, interviews with community members and NGOs, and a case study of EMC. Analysis Procedures: The data was analyzed using a variety of methods, including content analysis, thematic analysis, and case study analysis. Conclusion: The study concludes that mining companies can be held liable for human rights abuses under a variety of legal and regulatory frameworks. There are positive developments in ensuring greater accountability and protection of affected communities and the environment in countries with a strong economy. Regrettably, access to avenues of redress is reasonably low in less developed countries, where the governments have not implemented a robust mechanism to enforce liability requirements in the mining industry. The study recommends that governments and mining companies take more ambitious steps to enhance corporate accountability.

Keywords: human rights, human rights abuses, ESG, litigation, Erdenet Mining Corporation, corporate social responsibility, soft law, self-regulation, mining industry, parent company liability, sustainability, environment, UN

Procedia PDF Downloads 55
24880 Generating Insights from Data Using a Hybrid Approach

Authors: Allmin Susaiyah, Aki Härmä, Milan Petković

Abstract:

Automatic generation of insights from data using insight mining systems (IMS) is useful in many applications, such as personal health tracking, patient monitoring, and business process management. Existing IMS face challenges in controlling insight extraction, scaling to large databases, and generalising to unseen domains. In this work, we propose a hybrid approach consisting of rule-based and neural components for generating insights from data while overcoming the aforementioned challenges. Firstly, a rule-based data 2CNL component is used to extract statistically significant insights from data and represent them in a controlled natural language (CNL). Secondly, a BERTSum-based CNL2NL component is used to convert these CNLs into natural language texts. We improve the model using task-specific and domain-specific fine-tuning. Our approach has been evaluated using statistical techniques and standard evaluation metrics. We overcame the aforementioned challenges and observed significant improvement with domain-specific fine-tuning.

Keywords: data mining, insight mining, natural language generation, pre-trained language models

Procedia PDF Downloads 81
24879 Assessing Supply Chain Performance through Data Mining Techniques: A Case of Automotive Industry

Authors: Emin Gundogar, Burak Erkayman, Nusret Sazak

Abstract:

Providing effective management performance through the whole supply chain is critical issue and hard to applicate. The proper evaluation of integrated data may conclude with accurate information. Analysing the supply chain data through OLAP (On-Line Analytical Processing) technologies may provide multi-angle view of the work and consolidation. In this study, association rules and classification techniques are applied to measure the supply chain performance metrics of an automotive manufacturer in Turkey. Main criteria and important rules are determined. The comparison of the results of the algorithms is presented.

Keywords: supply chain performance, performance measurement, data mining, automotive

Procedia PDF Downloads 485
24878 Using Data Mining Technique for Scholarship Disbursement

Authors: J. K. Alhassan, S. A. Lawal

Abstract:

This work is on decision tree-based classification for the disbursement of scholarship. Tree-based data mining classification technique is used in other to determine the generic rule to be used to disburse the scholarship. The system based on the defined rules from the tree is able to determine the class (status) to which an applicant shall belong whether Granted or Not Granted. The applicants that fall to the class of granted denote a successful acquirement of scholarship while those in not granted class are unsuccessful in the scheme. An algorithm that can be used to classify the applicants based on the rules from tree-based classification was also developed. The tree-based classification is adopted because of its efficiency, effectiveness, and easy to comprehend features. The system was tested with the data of National Information Technology Development Agency (NITDA) Abuja, a Parastatal of Federal Ministry of Communication Technology that is mandated to develop and regulate information technology in Nigeria. The system was found working according to the specification. It is therefore recommended for all scholarship disbursement organizations.

Keywords: classification, data mining, decision tree, scholarship

Procedia PDF Downloads 347
24877 Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Authors: Faisal Alshuwaier, Ali Areshey

Abstract:

Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound method to simplify the texts.

Keywords: extraction, max-prod, fuzzy relations, text mining, memberships, classification, memberships, classification

Procedia PDF Downloads 551
24876 Implications of Circular Economy on Users Data Privacy: A Case Study on Android Smartphones Second-Hand Market

Authors: Mariia Khramova, Sergio Martinez, Duc Nguyen

Abstract:

Modern electronic devices, particularly smartphones, are characterised by extremely high environmental footprint and short product lifecycle. Every year manufacturers release new models with even more superior performance, which pushes the customers towards new purchases. As a result, millions of devices are being accumulated in the urban mine. To tackle these challenges the concept of circular economy has been introduced to promote repair, reuse and recycle of electronics. In this case, electronic devices, that previously ended up in landfills or households, are getting the second life, therefore, reducing the demand for new raw materials. Smartphone reuse is gradually gaining wider adoption partly due to the price increase of flagship models, consequently, boosting circular economy implementation. However, along with reuse of communication device, circular economy approach needs to ensure the data of the previous user have not been 'reused' together with a device. This is especially important since modern smartphones are comparable with computers in terms of performance and amount of data stored. These data vary from pictures, videos, call logs to social security numbers, passport and credit card details, from personal information to corporate confidential data. To assess how well the data privacy requirements are followed on smartphones second-hand market, a sample of 100 Android smartphones has been purchased from IT Asset Disposition (ITAD) facilities responsible for data erasure and resell. Although devices should not have stored any user data by the time they leave ITAD, it has been possible to retrieve the data from 19% of the sample. Applied techniques varied from manual device inspection to sophisticated equipment and tools. These findings indicate significant barrier in implementation of circular economy and a limitation of smartphone reuse. Therefore, in order to motivate the users to donate or sell their old devices and make electronic use more sustainable, data privacy on second-hand smartphone market should be significantly improved. Presented research has been carried out in the framework of sustainablySMART project, which is part of Horizon 2020 EU Framework Programme for Research and Innovation.

Keywords: android, circular economy, data privacy, second-hand phones

Procedia PDF Downloads 109
24875 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Sidi Yang, Haiyi Zhang

Abstract:

Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: text mining, Twitter, topic model, sentiment analysis

Procedia PDF Downloads 154
24874 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining

Authors: Hina Kausher, Sangita Srivastava

Abstract:

In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which covers the variety of figure proportions in both height and girth. 3,000 data has been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from some states of India to produce the sizing system suitable for clothing manufacture and retailing. This data is used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from a large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.

Keywords: anthropometric data, data mining, decision tree, garments manufacturing, sizing systems, ready-made garments

Procedia PDF Downloads 115
24873 Recognizing Customer Preferences Using Review Documents: A Hybrid Text and Data Mining Approach

Authors: Oshin Anand, Atanu Rakshit

Abstract:

The vast increment in the e-commerce ventures makes this area a prominent research stream. Besides several quantified parameters, the textual content of reviews is a storehouse of many information that can educate companies and help them earn profit. This study is an attempt in this direction. The article attempts to categorize data based on a computed metric that quantifies the influencing capacity of reviews rendering two categories of high and low influential reviews. Further, each of these document is studied to conclude several product feature categories. Each of these categories along with the computed metric is converted to linguistic identifiers and are used in an association mining model. The article makes a novel attempt to combine feature attraction with quantified metric to categorize review text and finally provide frequent patterns that depict customer preferences. Frequent mentions in a highly influential score depict customer likes or preferred features in the product whereas prominent pattern in low influencing reviews highlights what is not important for customers. This is achieved using a hybrid approach of text mining for feature and term extraction, sentiment analysis, multicriteria decision-making technique and association mining model.

Keywords: association mining, customer preference, frequent pattern, online reviews, text mining

Procedia PDF Downloads 366