Search results for: data mining applications and discovery
30505 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering
Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel
Abstract:
Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.Keywords: classification, data mining, spam filtering, naive bayes, decision tree
Procedia PDF Downloads 41330504 Automated Testing to Detect Instance Data Loss in Android Applications
Authors: Anusha Konduru, Zhiyong Shan, Preethi Santhanam, Vinod Namboodiri, Rajiv Bagai
Abstract:
Mobile applications are increasing in a significant amount, each to address the requirements of many users. However, the quick developments and enhancements are resulting in many underlying defects. Android apps create and handle a large variety of 'instance' data that has to persist across runs, such as the current navigation route, workout results, antivirus settings, or game state. Due to the nature of Android, an app can be paused, sent into the background, or killed at any time. If the instance data is not saved and restored between runs, in addition to data loss, partially-saved or corrupted data can crash the app upon resume or restart. However, it is difficult for the programmer to manually test this issue for all the activities. This results in the issue of data loss that the data entered by the user are not saved when there is any interruption. This issue can degrade user experience because the user needs to reenter the information each time there is an interruption. Automated testing to detect such data loss is important to improve the user experience. This research proposes a tool, DroidDL, a data loss detector for Android, which detects the instance data loss from a given android application. We have tested 395 applications and found 12 applications with the issue of data loss. This approach is proved highly accurate and reliable to find the apps with this defect, which can be used by android developers to avoid such errors.Keywords: Android, automated testing, activity, data loss
Procedia PDF Downloads 23730503 Treatment of Cyanide Effluents with Platinum Impregned on Mg-Al Layered Hydroxides
Authors: María R. Contreras, Diana Endara
Abstract:
Cyanide leaching is the most used technology for gold mining industry, which produces large amounts of effluents requiring treatment. In Ecuador the development of gold mining industry has increased, causing significant environmental impacts due to the highly use of cyanide, it is estimated that 10 gr of extracted gold generates 7000 liters of water contaminated with 300mg/L of free cyanide. The most common methods used nowadays are the treatment with peroxodisulfuric acid, ozonation, H₂O₂ and other reactants which are expensive and present disadvantages. Several methods have been developed to treat this contaminant such as heterogeneous catalysts. Layered double hydroxides (LDHs) have received much attention due to their wide applications like a catalysis support. Therefore, in this study, Mg-Al/ LDH was synthetized by coprecipitation method and then platinum was impregned on it, in order to enhance its catalytic activity. Two methods of impregnation were used, the first one, called incipient wet impregnation and the second one was developed by continuous agitation of LDH in contact with chloroplatinic acid solution for 24 h. The support impregnated was analyzed by X-ray diffraction, FTIR and SEM. Finally, the oxidation of cyanide ion was performed by preparing synthetic solutions of sodium cyanide (NaCN) with an initial concentration of 500 mg/L at pH 10,5 and air flow of 180 NL/h. After 8 hours of treatment, an 80% of oxidation of ion cyanide was achieved.Keywords: catalysis, cyanide, LDHs, mining
Procedia PDF Downloads 14630502 Optimization of Air Pollution Control Model for Mining
Authors: Zunaira Asif, Zhi Chen
Abstract:
The sustainable measures on air quality management are recognized as one of the most serious environmental concerns in the mining region. The mining operations emit various types of pollutants which have significant impacts on the environment. This study presents a stochastic control strategy by developing the air pollution control model to achieve a cost-effective solution. The optimization method is formulated to predict the cost of treatment using linear programming with an objective function and multi-constraints. The constraints mainly focus on two factors which are: production of metal should not exceed the available resources, and air quality should meet the standard criteria of the pollutant. The applicability of this model is explored through a case study of an open pit metal mine, Utah, USA. This method simultaneously uses meteorological data as a dispersion transfer function to support the practical local conditions. The probabilistic analysis and the uncertainties in the meteorological conditions are accomplished by Monte Carlo simulation. Reasonable results have been obtained to select the optimized treatment technology for PM2.5, PM10, NOx, and SO2. Additional comparison analysis shows that baghouse is the least cost option as compared to electrostatic precipitator and wet scrubbers for particulate matter, whereas non-selective catalytical reduction and dry-flue gas desulfurization are suitable for NOx and SO2 reduction respectively. Thus, this model can aid planners to reduce these pollutants at a marginal cost by suggesting control pollution devices, while accounting for dynamic meteorological conditions and mining activities.Keywords: air pollution, linear programming, mining, optimization, treatment technologies
Procedia PDF Downloads 20830501 Modelling the Education Supply Chain with Network Data Envelopment Analysis
Authors: Sourour Ramzi, Claudia Sarrico
Abstract:
Little has been done on network DEA in education, and nobody has attempted to model the whole education supply chain using network DEA. As such the contribution of the present paper is to propose a model for measuring the efficiency of education supply chains using network DEA. First, we use a general survey of data envelopment analysis (DEA) to establish the emergent themes for research in DEA, and focus on the theme of Network DEA. Second, we use a survey on two-stage DEA models, and Network DEA to write a state of the art on Network DEA, particularly applied to supply chain management. Third, we use a survey on DEA applications to establish the most influential papers on DEA education applications, in order to establish the state of the art on applications of DEA in education, in general, and applications of DEA to education using network DEA, in particular. Finally, we propose a model for measuring the performance of education supply chains of different education systems (countries or states within a country, for instance). We then use this model on some empirical data.Keywords: supply chain, education, data envelopment analysis, network DEA
Procedia PDF Downloads 36930500 High-Speed Electrical Drives and Applications: A Review
Authors: Vaishnavi Patil, K. M. Kurundkar
Abstract:
Electrical Drives play a vital role in industry development and applications. Drives have an inevitable part in the needs of various fields such as industry, commercial, and domestic applications. The development of material technology, Power Electronics devices, and accompanying applications led to the focus of industry and researchers on high-speed electrical drives. Numerous articles charted the applications of electrical machines and various converters for high-speed applications. The choice depends on the application under study. This paper goals to highlight high-speed applications, main challenges, and some applications of electrical drives in the field.Keywords: high-speed, electrical machines, drives, applications
Procedia PDF Downloads 6830499 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink
Authors: Sanjay Rathee, Arti Kashyap
Abstract:
Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining
Procedia PDF Downloads 29830498 Bankruptcy Prediction Analysis on Mining Sector Companies in Indonesia
Authors: Devina Aprilia Gunawan, Tasya Aspiranti, Inugrah Ratia Pratiwi
Abstract:
This research aims to classify the mining sector companies based on Altman’s Z-score model, and providing an analysis based on the Altman’s Z-score model’s financial ratios to provide a picture about the financial condition in mining sector companies in Indonesia and their viability in the future, and to find out the partial and simultaneous impact of each of the financial ratio variables in the Altman’s Z-score model, namely (WC/TA), (RE/TA), (EBIT/TA), (MVE/TL), and (S/TA), toward the financial condition represented by the Z-score itself. Among 38 mining sector companies listed in Indonesia Stock Exchange (IDX), 28 companies are selected as research sample according to the purposive sampling criteria.The results of this research showed that during 3 years research period at 2010-2012, the amount of the companies that was predicted to be healthy in each year was less than half of the total sample companies and not even reach up to 50%. The multiple regression analysis result showed that all of the research hypotheses are accepted, which means that (WC/TA), (RE/TA), (EBIT/TA), (MVE/TL), and (S/TA), both partially and simultaneously had an impact towards company’s financial condition.Keywords: Altman’s Z-score model, financial condition, mining companies, Indonesia
Procedia PDF Downloads 52930497 A Discovery on the Symmetrical Pattern of Mirror Primes in P²: Applications in the Formal Proof of the Goldbach Conjecture
Authors: Yingxu Wang
Abstract:
The base 6 structure and properties of mirror primes are discovered in this work towards the proof of Goldbach Conjecture. This paper reveals a fundamental pattern on pairs of mirror primes adjacent to any even number nₑ > 2 with symmetrical distances on both sides determined by a methodology of Mirror Prime Decomposition (MPD). MPD leads to a formal proof of the Goldbach conjecture, which states that the conjecture holds because any pivot even number, nₑ > 2, is a sum of at least an adjacent pair of primes divided by 2. This work has not only revealed the analytic pattern of base 6 primes but also proven the infinitive validation of the Goldbach conjecture.Keywords: number theory, primes, mirror primes, double recursive patterns, Goldbach conjecture, formal proof, mirror-prime decomposition, applications
Procedia PDF Downloads 5230496 Estimation of Coefficients of Ridge and Principal Components Regressions with Multicollinear Data
Authors: Rajeshwar Singh
Abstract:
The presence of multicollinearity is common in handling with several explanatory variables simultaneously due to exhibiting a linear relationship among them. A great problem arises in understanding the impact of explanatory variables on the dependent variable. Thus, the method of least squares estimation gives inexact estimates. In this case, it is advised to detect its presence first before proceeding further. Using the ridge regression degree of its occurrence is reduced but principal components regression gives good estimates in this situation. This paper discusses well-known techniques of the ridge and principal components regressions and applies to get the estimates of coefficients by both techniques. In addition to it, this paper also discusses the conflicting claim on the discovery of the method of ridge regression based on available documents.Keywords: conflicting claim on credit of discovery of ridge regression, multicollinearity, principal components and ridge regressions, variance inflation factor
Procedia PDF Downloads 42130495 A Study on Big Data Analytics, Applications and Challenges
Authors: Chhavi Rana
Abstract:
The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, Healthcare, and business intelligence contain voluminous and incremental data, which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organization's decision-making strategy can be enhanced using big data analytics and applying different machine learning techniques and statistical tools on such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates on various frameworks in the process of Analysis using different machine-learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.Keywords: big data, big data analytics, machine learning, review
Procedia PDF Downloads 8530494 A Study on Big Data Analytics, Applications, and Challenges
Authors: Chhavi Rana
Abstract:
The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, healthcare, and business intelligence contain voluminous and incremental data which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organisation decision-making strategy can be enhanced by using big data analytics and applying different machine learning techniques and statistical tools to such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates various frameworks in the process of analysis using different machine learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.Keywords: big data, big data analytics, machine learning, review
Procedia PDF Downloads 9530493 A General Framework for Measuring the Internal Fraud Risk of an Enterprise Resource Planning System
Authors: Imran Dayan, Ashiqul Khan
Abstract:
Internal corporate fraud, which is fraud carried out by internal stakeholders of a company, affects the well-being of the organisation just like its external counterpart. Even if such an act is carried out for the short-term benefit of a corporation, the act is ultimately harmful to the entity in the long run. Internal fraud is often carried out by relying upon aberrations from usual business processes. Business processes are the lifeblood of a company in modern managerial context. Such processes are developed and fine-tuned over time as a corporation grows through its life stages. Modern corporations have embraced technological innovations into their business processes, and Enterprise Resource Planning (ERP) systems being at the heart of such business processes is a testimony to that. Since ERP systems record a huge amount of data in their event logs, the logs are a treasure trove for anyone trying to detect any sort of fraudulent activities hidden within the day-to-day business operations and processes. This research utilises the ERP systems in place within corporations to assess the likelihood of prospective internal fraud through developing a framework for measuring the risks of fraud through Process Mining techniques and hence finds risky designs and loose ends within these business processes. This framework helps not only in identifying existing cases of fraud in the records of the event log, but also signals the overall riskiness of certain business processes, and hence draws attention for carrying out a redesign of such processes to reduce the chance of future internal fraud while improving internal control within the organisation. The research adds value by applying the concepts of Process Mining into the analysis of data from modern day applications of business process records, which is the ERP event logs, and develops a framework that should be useful to internal stakeholders for strengthening internal control as well as provide external auditors with a tool of use in case of suspicion. The research proves its usefulness through a few case studies conducted with respect to big corporations with complex business processes and an ERP in place.Keywords: enterprise resource planning, fraud risk framework, internal corporate fraud, process mining
Procedia PDF Downloads 33630492 The Need of Sustainable Mining: Communities, Government and Legal Mining in Central Andes of Peru
Authors: Melissa R. Quispe-Zuniga, Daniel Callo-Concha, Christian Borgemeister, Klaus Greve
Abstract:
The Peruvian Andes have a high potential for mining, but many of the mining areas overlay with campesino community lands, being these key actors for agriculture and livestock production. Lead by economic incentives, some communities are renting their lands to mining companies for exploration or exploitation. However, a growing number of campesino communities, usually social and economically marginalized, have developed resistance, alluding consequences, such as water pollution, land-use change, insufficient economic compensation, etc. what eventually end up in Socio-Environmental Conflicts (SEC). It is hypothesized that disclosing the information on environmental pollution and enhance the involvement of communities in the decision-making process may contribute to prevent SEC. To assess whether such complains are grounded on the environmental impact of mining activities, we measured the heavy metals concentration in 24 indicative samples from rivers that run across mining exploitations and farming community lands. Samples were taken during the 2016 dry season and analyzed by inductively-coupled-plasma-atomic-emission-spectroscopy. The results were contrasted against the standards of monitoring government institutions (i.e., OEFA). Furthermore, we investigated the water/environmental complains related to mining in the neighboring 14 communities. We explored the relationship between communities and mining companies, via open-ended interviews with community authorities and non-participatory observations of community assemblies. We found that the concentrations of cadmium (0.023 mg/L), arsenic (0.562 mg/L) and copper (0.07 mg/L), surpass the national water quality standards for Andean rivers (0.00025 mg/L of cadmium, 0.15 mg/L of arsenic and 0.01 mg/L of copper). 57% of communities have posed environmental complains, but 21% of the total number of communities were receiving an annual economic benefit from mining projects. However, 87.5% of the communities who had posed complains have high concentration of heavy metals in their water streams. The evidence shows that mining activities tend to relate to the affectation and vulnerability of campesino community water streams, what justify the environmental complains and eventually the occurrence of a SEC.Keywords: mining companies, campesino community, water, socio-environmental conflict
Procedia PDF Downloads 19930491 Hierarchical Piecewise Linear Representation of Time Series Data
Authors: Vineetha Bettaiah, Heggere S. Ranganath
Abstract:
This paper presents a Hierarchical Piecewise Linear Approximation (HPLA) for the representation of time series data in which the time series is treated as a curve in the time-amplitude image space. The curve is partitioned into segments by choosing perceptually important points as break points. Each segment between adjacent break points is recursively partitioned into two segments at the best point or midpoint until the error between the approximating line and the original curve becomes less than a pre-specified threshold. The HPLA representation achieves dimensionality reduction while preserving prominent local features and general shape of time series. The representation permits course-fine processing at different levels of details, allows flexible definition of similarity based on mathematical measures or general time series shape, and supports time series data mining operations including query by content, clustering and classification based on whole or subsequence similarity.Keywords: data mining, dimensionality reduction, piecewise linear representation, time series representation
Procedia PDF Downloads 27630490 Data Analysis to Uncover Terrorist Attacks Using Data Mining Techniques
Authors: Saima Nazir, Mustansar Ali Ghazanfar, Sanay Muhammad Umar Saeed, Muhammad Awais Azam, Saad Ali Alahmari
Abstract:
Terrorism is an important and challenging concern. The entire world is threatened by only few sophisticated terrorist groups and especially in Gulf Region and Pakistan, it has become extremely destructive phenomena in recent years. Predicting the pattern of attack type, attack group and target type is an intricate task. This study offers new insight on terrorist group’s attack type and its chosen target. This research paper proposes a framework for prediction of terrorist attacks using the historical data and making an association between terrorist group, their attack type and target. Analysis shows that the number of attacks per year will keep on increasing, and Al-Harmayan in Saudi Arabia, Al-Qai’da in Gulf Region and Tehreek-e-Taliban in Pakistan will remain responsible for many future terrorist attacks. Top main targets of each group will be private citizen & property, police, government and military sector under constant circumstances.Keywords: data mining, counter terrorism, machine learning, SVM
Procedia PDF Downloads 41030489 i2kit: A Tool for Immutable Infrastructure Deployments
Authors: Pablo Chico De Guzman, Cesar Sanchez
Abstract:
Microservice architectures are increasingly in distributed cloud applications due to the advantages on the software composition, development speed, release cycle frequency and the business logic time to market. On the other hand, these architectures also introduce some challenges on the testing and release phases of applications. Container technology solves some of these issues by providing reproducible environments, easy of software distribution and isolation of processes. However, there are other issues that remain unsolved in current container technology when dealing with multiple machines, such as networking for multi-host communication, service discovery, load balancing or data persistency (even though some of these challenges are already solved by traditional cloud vendors in a very mature and widespread manner). Container cluster management tools, such as Kubernetes, Mesos or Docker Swarm, attempt to solve these problems by introducing a new control layer where the unit of deployment is the container (or the pod — a set of strongly related containers that must be deployed on the same machine). These tools are complex to configure and manage and they do not follow a pure immutable infrastructure approach since servers are reused between deployments. Indeed, these tools introduce dependencies at execution time for solving networking or service discovery problems. If an error on the control layer occurs, which would affect running applications, specific expertise is required to perform ad-hoc troubleshooting. As a consequence, it is not surprising that container cluster support is becoming a source of revenue for consulting services. This paper presents i2kit, a deployment tool based on the immutable infrastructure pattern, where the virtual machine is the unit of deployment. The input for i2kit is a declarative definition of a set of microservices, where each microservice is defined as a pod of containers. Microservices are built into machine images using linuxkit —- a tool for creating minimal linux distributions specialized in running containers. These machine images are then deployed to one or more virtual machines, which are exposed through a cloud vendor load balancer. Finally, the load balancer endpoint is set into other microservices using an environment variable, providing service discovery. The toolkit i2kit reuses the best ideas from container technology to solve problems like reproducible environments, process isolation, and software distribution, and at the same time relies on mature, proven cloud vendor technology for networking, load balancing and persistency. The result is a more robust system with no learning curve for troubleshooting running applications. We have implemented an open source prototype that transforms i2kit definitions into AWS cloud formation templates, where each microservice AMI (Amazon Machine Image) is created on the fly using linuxkit. Even though container cluster management tools have more flexibility for resource allocation optimization, we defend that adding a new control layer implies more important disadvantages. Resource allocation is greatly improved by using linuxkit, which introduces a very small footprint (around 35MB). Also, the system is more secure since linuxkit installs the minimum set of dependencies to run containers. The toolkit i2kit is currently under development at the IMDEA Software Institute.Keywords: container, deployment, immutable infrastructure, microservice
Procedia PDF Downloads 18030488 A Resource Optimization Strategy for CPU (Central Processing Unit) Intensive Applications
Authors: Junjie Peng, Jinbao Chen, Shuai Kong, Danxu Liu
Abstract:
On the basis of traditional resource allocation strategies, the usage of resources on physical servers in cloud data center is great uncertain. It will cause waste of resources if the assignment of tasks is not enough. On the contrary, it will cause overload if the assignment of tasks is too much. This is especially obvious when the applications are the same type because of its resource preferences. Considering CPU intensive application is one of the most common types of application in the cloud, we studied the optimization strategy for CPU intensive applications on the same server. We used resource preferences to analyze the case that multiple CPU intensive applications run simultaneously, and put forward a model which can predict the execution time for CPU intensive applications which run simultaneously. Based on the prediction model, we proposed the method to select the appropriate number of applications for a machine. Experiments show that the model can predict the execution time accurately for CPU intensive applications. To improve the execution efficiency of applications, we propose a scheduling model based on priority for CPU intensive applications. Extensive experiments verify the validity of the scheduling model.Keywords: cloud computing, CPU intensive applications, resource optimization, strategy
Procedia PDF Downloads 28030487 Design of Middleware for Mobile Group Control in Physical Proximity
Authors: Moon-Tak Oh, Kyung-Min Park, Tae-Eun Yoon, Hoon Choi, Chil-Woo Lee
Abstract:
This paper is about middle-ware which enables group-user applications on mobile devices in physical proximity to interact with other devices without intervention of a central server. Requirements of the middle-ware are identified from service usage scenarios, and the functional architecture of the middle-ware is specified. These requirements include group management, synchronization, and resource management. Group Management needs to provide various capabilities to such applications with respect to managing multiple users (e.g., creation of groups, discovery of group or individual users, member join/leave, election of a group manager and service-group association) using D2D communication technology. We designed the middle-ware for the above requirements on the Android platform.Keywords: group user, middleware, mobile service, physical proximity
Procedia PDF Downloads 50730486 The Role Of Data Gathering In NGOs
Authors: Hussaini Garba Mohammed
Abstract:
Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.Keywords: reliable information, data assessment, data mining, data communication
Procedia PDF Downloads 18130485 Exploring Gaming-Learning Interaction in MMOG Using Data Mining Methods
Authors: Meng-Tzu Cheng, Louisa Rosenheck, Chen-Yen Lin, Eric Klopfer
Abstract:
The purpose of the research is to explore some of the ways in which gameplay data can be analyzed to yield results that feedback into the learning ecosystem. Back-end data for all users as they played an MMOG, The Radix Endeavor, was collected, and this study reports the analyses on a specific genetics quest by using the data mining techniques, including the decision tree method. In the study, different reasons for quest failure between participants who eventually succeeded and who never succeeded were revealed. Regarding the in-game tools use, trait examiner was a key tool in the quest completion process. Subsequently, the results of decision tree showed that a lack of trait examiner usage can be made up with additional Punnett square uses, displaying multiple pathways to success in this quest. The methods of analysis used in this study and the resulting usage patterns indicate some useful ways that gameplay data can provide insights in two main areas. The first is for game designers to know how players are interacting with and learning from their game. The second is for players themselves as well as their teachers to get information on how they are progressing through the game, and to provide help they may need based on strategies and misconceptions identified in the data.Keywords: MMOG, decision tree, genetics, gaming-learning interaction
Procedia PDF Downloads 35830484 Understanding the Complexity of Corruption and Anti-Corruption in Indonesia's Mining Industry: Challenges and Opportunities
Authors: Ahmad Khoirul Umam, Iin Mayasari
Abstract:
Indonesia is blessed with rich natural resources and frequently dubbed as the 6th richest country in the world in terms of mining resources, including minerals and coal. Mining can contribute to the socio-economic development by generating state revenue for development, elevating poverty through employment, opening and developing remote areas, putting in basic infrastructure and creating new centres of developments. However, favouritism and rent-seeking behaviour committed by government officials, politicians, and business players in licensing and permit giving in mining and forestry sectors have resisted reforms. Even though Indonesia’s Corruption Eradication Commission (KPK) successfully targeted untouchable actors, public criticism continues to focus on questions of why corruption apparently remains systemic in mining industry in the country? This paper revealed that structural anomalies, as well as legacies of the Soeharto era’s power inequities, have severely inhibited Indonesia’s bureaucratic arrangements that continue to influence adversely the elements of transparency and accountability in mining industry governance. In the more liberalized and decentralized political system, the deficiencies have gradually assisted vested interest groups to band together, thus creating a coalition that can challenge, resist, and contain anti-graft actions. Therefore, Indonesia needs much more serious anti-corruption actions that would require eliminating the monopoly over power, enhancing competition, limiting discretion, and clarifying the rules of business and political competition in the mining sector in the country.Keywords: anti-corruption, public integrity, private integrity, mining industry, democratization
Procedia PDF Downloads 11430483 Sexting Phenomenon in Educational Settings: A Data Mining Approach
Authors: Koutsopoulou Ioanna, Gkintoni Evgenia, Halkiopoulos Constantinos, Antonopoulou Hera
Abstract:
Recent advances in Internet Computer Technology (ICT) and the ever-increasing use of technological equipment amongst adolescents and young adults along with unattended access to the internet and social media and uncontrolled use of smart phones and PCs have caused social problems like sexting to emerge. The main purpose of the present article is first to present an analytic theoretical framework of sexting as a recent social phenomenon based on studies that have been conducted the last decade or so; and second to investigate Greek students’ and also social network users, sexting perceptions and to record how often social media users exchange sexual messages and to retrace demographic variables predictors. Data from 1,000 students were collected and analyzed and all statistical analysis was done by the software package WEKA. The results indicate among others, that the use of data mining methods is an important tool to draw conclusions that could affect decision and policy making especially in the field and related social topics of educational psychology. To sum up, sexting lurks many risks for adolescents and young adults students in Greece and needs to be better addressed in relevance to the stakeholders as well as society in general. Furthermore, policy makers, legislation makers and authorities will have to take action to protect minors. Prevention strategies based on Greek cultural specificities are being proposed. This social problem has raised concerns in recent years and will most likely escalate concerns in global communities in the future.Keywords: educational ethics, sexting, Greek sexters, sex education, data mining
Procedia PDF Downloads 18230482 Exploring the Correlation between Population Distribution and Urban Heat Island under Urban Data: Taking Shenzhen Urban Heat Island as an Example
Authors: Wang Yang
Abstract:
Shenzhen is a modern city of China's reform and opening-up policy, the development of urban morphology has been established on the administration of the Chinese government. This city`s planning paradigm is primarily affected by the spatial structure and human behavior. The subjective urban agglomeration center is divided into several groups and centers. In comparisons of this effect, the city development law has better to be neglected. With the continuous development of the internet, extensive data technology has been introduced in China. Data mining and data analysis has become important tools in municipal research. Data mining has been utilized to improve data cleaning such as receiving business data, traffic data and population data. Prior to data mining, government data were collected by traditional means, then were analyzed using city-relationship research, delaying the timeliness of urban development, especially for the contemporary city. Data update speed is very fast and based on the Internet. The city's point of interest (POI) in the excavation serves as data source affecting the city design, while satellite remote sensing is used as a reference object, city analysis is conducted in both directions, the administrative paradigm of government is broken and urban research is restored. Therefore, the use of data mining in urban analysis is very important. The satellite remote sensing data of the Shenzhen city in July 2018 were measured by the satellite Modis sensor and can be utilized to perform land surface temperature inversion, and analyze city heat island distribution of Shenzhen. This article acquired and classified the data from Shenzhen by using Data crawler technology. Data of Shenzhen heat island and interest points were simulated and analyzed in the GIS platform to discover the main features of functional equivalent distribution influence. Shenzhen is located in the east-west area of China. The city’s main streets are also determined according to the direction of city development. Therefore, it is determined that the functional area of the city is also distributed in the east-west direction. The urban heat island can express the heat map according to the functional urban area. Regional POI has correspondence. The research result clearly explains that the distribution of the urban heat island and the distribution of urban POIs are one-to-one correspondence. Urban heat island is primarily influenced by the properties of the underlying surface, avoiding the impact of urban climate. Using urban POIs as analysis object, the distribution of municipal POIs and population aggregation are closely connected, so that the distribution of the population corresponded with the distribution of the urban heat island.Keywords: POI, satellite remote sensing, the population distribution, urban heat island thermal map
Procedia PDF Downloads 10530481 Shark Detection and Classification with Deep Learning
Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti
Abstract:
Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.Keywords: classification, data mining, Instagram, remote monitoring, sharks
Procedia PDF Downloads 12230480 Design of a Small and Medium Enterprise Growth Prediction Model Based on Web Mining
Authors: Yiea Funk Te, Daniel Mueller, Irena Pletikosa Cvijikj
Abstract:
Small and medium enterprises (SMEs) play an important role in the economy of many countries. When the overall world economy is considered, SMEs represent 95% of all businesses in the world, accounting for 66% of the total employment. Existing studies show that the current business environment is characterized as highly turbulent and strongly influenced by modern information and communication technologies, thus forcing SMEs to experience more severe challenges in maintaining their existence and expanding their business. To support SMEs at improving their competitiveness, researchers recently turned their focus on applying data mining techniques to build risk and growth prediction models. However, data used to assess risk and growth indicators is primarily obtained via questionnaires, which is very laborious and time-consuming, or is provided by financial institutes, thus highly sensitive to privacy issues. Recently, web mining (WM) has emerged as a new approach towards obtaining valuable insights in the business world. WM enables automatic and large scale collection and analysis of potentially valuable data from various online platforms, including companies’ websites. While WM methods have been frequently studied to anticipate growth of sales volume for e-commerce platforms, their application for assessment of SME risk and growth indicators is still scarce. Considering that a vast proportion of SMEs own a website, WM bears a great potential in revealing valuable information hidden in SME websites, which can further be used to understand SME risk and growth indicators, as well as to enhance current SME risk and growth prediction models. This study aims at developing an automated system to collect business-relevant data from the Web and predict future growth trends of SMEs by means of WM and data mining techniques. The envisioned system should serve as an 'early recognition system' for future growth opportunities. In an initial step, we examine how structured and semi-structured Web data in governmental or SME websites can be used to explain the success of SMEs. WM methods are applied to extract Web data in a form of additional input features for the growth prediction model. The data on SMEs provided by a large Swiss insurance company is used as ground truth data (i.e. growth-labeled data) to train the growth prediction model. Different machine learning classification algorithms such as the Support Vector Machine, Random Forest and Artificial Neural Network are applied and compared, with the goal to optimize the prediction performance. The results are compared to those from previous studies, in order to assess the contribution of growth indicators retrieved from the Web for increasing the predictive power of the model.Keywords: data mining, SME growth, success factors, web mining
Procedia PDF Downloads 26930479 Delivery Service and Online-and-Offline Purchasing for Collaborative Recommendations on Retail Cross-Channels
Authors: S. H. Liao, J. M. Huang
Abstract:
The delivery service business model is the final link in logistics for both online-and-offline businesses. The online-and-offline business model focuses on the entire customer purchasing process online and offline, placing greater emphasis on the importance of data to optimize overall retail operations. For the retail industry, it is an important task of information and management to strengthen the collection and investigation of consumers' online and offline purchasing data to better understand customers and then recommend products. This study implements two-stage data mining analytics for clustering and association rules analysis to investigate Taiwanese consumers' (n=2,209) preferences for delivery service. This process clarifies online-and-offline purchasing behaviors and preferences to find knowledge profiles/patterns/rules for cross-channel collaborative recommendations. Finally, theoretical and practical implications for methodology and enterprise are presented.Keywords: delivery service, online-and-offline purchasing, retail cross-channel, collaborative recommendations, data mining analytics
Procedia PDF Downloads 3330478 The Significance of Picture Mining in the Fashion and Design as a New Research Method
Authors: Katsue Edo, Yu Hiroi
Abstract:
T Increasing attention has been paid to using pictures and photographs in research since the beginning of the 21th century in social sciences. Meanwhile we have been studying the usefulness of Picture mining, which is one of the new ways for a these picture using researches. Picture Mining is an explorative research analysis method that takes useful information from pictures, photographs and static or moving images. It is often compared with the methods of text mining. The Picture Mining concept includes observational research in the broad sense, because it also aims to analyze moving images (Ochihara and Edo 2013). In the recent literature, studies and reports using pictures are increasing due to the environmental changes. These are identified as technological and social changes (Edo et.al. 2013). Low price digital cameras and i-phones, high information transmission speed, low costs for information transferring and high performance and resolution of the cameras of mobile phones have changed the photographing behavior of people. Consequently, there is less resistance in taking and processing photographs for most of the people in the developing countries. In these studies, this method of collecting data from respondents is often called as ‘participant-generated photography’ or ‘respondent-generated visual imagery’, which focuses on the collection of data and its analysis (Pauwels 2011, Snyder 2012). But there are few systematical and conceptual studies that supports it significance of these methods. We have discussed in the recent years to conceptualize these picture using research methods and formalize theoretical findings (Edo et. al. 2014). We have identified the most efficient fields of Picture mining in the following areas inductively and in case studies; 1) Research in Consumer and Customer Lifestyles. 2) New Product Development. 3) Research in Fashion and Design. Though we have found that it will be useful in these fields and areas, we must verify these assumptions. In this study we will focus on the field of fashion and design, to determine whether picture mining methods are really reliable in this area. In order to do so we have conducted an empirical research of the respondents’ attitudes and behavior concerning pictures and photographs. We compared the attitudes and behavior of pictures toward fashion to meals, and found out that taking pictures of fashion is not as easy as taking meals and food. Respondents do not often take pictures of fashion and upload their pictures online, such as Facebook and Instagram, compared to meals and food because of the difficulty of taking them. We concluded that we should be more careful in analyzing pictures in the fashion area for there still might be some kind of bias existing even if the environment of pictures have drastically changed in these years.Keywords: empirical research, fashion and design, Picture Mining, qualitative research
Procedia PDF Downloads 36330477 Environmental Impact Assessments in Peru: Tools for Violence
Authors: Nadia Degregori
Abstract:
This paper focuses on Peru’s Environmental Impact Assessment’s communication and participation mechanisms, whose rationale is to prevent conflictive situations by –supposedly- providing high-quality information about mining projects and their impacts to affected stakeholders. It is argued that, in fact, these mechanisms enhance citizens’ feelings of fear and/or mistrust towards mining projects and the companies behind them because their design follows a top-down perspective that limits “participation” to a passive reception of information, and which does not address power unbalances between communities and companies or government. As well, the paper contends that this way of managing the social aspects of Environmental Impact Assessments in Peru leads stakeholders who possess less power (typically communities) to incline towards maintaining the status quo and avoiding negotiations with either the central government or mining companies as a defence mechanism for avoiding a bad negotiation.Keywords: community relations, environmental impact assessments, governance and participation, mining, Peru
Procedia PDF Downloads 43430476 Analyzing the Water Quality of Settling Pond after Revegetation at Ex-Mining Area
Authors: Iis Diatin, Yani Hadiroseyani, Muhammad Mujahid, Ahmad Teduh, Juang R. Matangaran
Abstract:
One of silica quarry managed by a mining company is located at Sukabumi District of West Java Province Indonesia with an area of approximately 70 hectares. Since 2013 this company stopped the mining activities. The company tries to restore the ecosystem post-mining with rehabilitation activities such as reclamation and revegetation of their ex-mining area. After three years planting the area the trees grown well. Not only planting some tree species but also some cover crop has covered the soil surface. There are two settling ponds located in the middle of the ex-mining area. Those settling pond were built in order to prevent the effect of acid mine drainage. Acid mine drainage (AMD) or the acidic water is created when sulphide minerals are exposed to air and water and through a natural chemical reaction produce sulphuric acid. AMD is the main pollutant at the open pit mining. The objective of the research was to analyze the effect of revegetation on water quality change at the settling pond. The physical and chemical of water quality parameter were measured and analysed at site and at the laboratory. Physical parameter such as temperature, turbidity and total organic matter were analyse. Also heavy metal and some other chemical parameter such as dissolved oxygen, alkalinity, pH, total ammonia nitrogen, nitrate and nitrite were analysed. The result showed that the acidity of first settling pond was higher than that of the second settling pond. Both settling pond water’s contained heavy metal. The turbidity and total organic matter were the parameter of water quality which become better after revegetation.Keywords: acid mine drainage, ex-mining area, revegetation, settling pond, water quality
Procedia PDF Downloads 304