Search results for: data mining analytics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25009

Search results for: data mining analytics

24379 Brainbow Image Segmentation Using Bayesian Sequential Partitioning

Authors: Yayun Hsu, Henry Horng-Shing Lu

Abstract:

This paper proposes a data-driven, biology-inspired neural segmentation method of 3D drosophila Brainbow images. We use Bayesian Sequential Partitioning algorithm for probabilistic modeling, which can be used to detect somas and to eliminate cross talk effects. This work attempts to develop an automatic methodology for neuron image segmentation, which nowadays still lacks a complete solution due to the complexity of the image. The proposed method does not need any predetermined, risk-prone thresholds since biological information is inherently included in the image processing procedure. Therefore, it is less sensitive to variations in neuron morphology; meanwhile, its flexibility would be beneficial for tracing the intertwining structure of neurons.

Keywords: brainbow, 3D imaging, image segmentation, neuron morphology, biological data mining, non-parametric learning

Procedia PDF Downloads 475
24378 Artificial Reproduction System and Imbalanced Dataset: A Mendelian Classification

Authors: Anita Kushwaha

Abstract:

We propose a new evolutionary computational model called Artificial Reproduction System which is based on the complex process of meiotic reproduction occurring between male and female cells of the living organisms. Artificial Reproduction System is an attempt towards a new computational intelligence approach inspired by the theoretical reproduction mechanism, observed reproduction functions, principles and mechanisms. A reproductive organism is programmed by genes and can be viewed as an automaton, mapping and reducing so as to create copies of those genes in its off springs. In Artificial Reproduction System, the binding mechanism between male and female cells is studied, parameters are chosen and a network is constructed also a feedback system for self regularization is established. The model then applies Mendel’s law of inheritance, allele-allele associations and can be used to perform data analysis of imbalanced data, multivariate, multiclass and big data. In the experimental study Artificial Reproduction System is compared with other state of the art classifiers like SVM, Radial Basis Function, neural networks, K-Nearest Neighbor for some benchmark datasets and comparison results indicates a good performance.

Keywords: bio-inspired computation, nature- inspired computation, natural computing, data mining

Procedia PDF Downloads 258
24377 Occupational Safety and Health in the Wake of Drones

Authors: Hoda Rahmani, Gary Weckman

Abstract:

The body of research examining the integration of drones into various industries is expanding rapidly. Despite progress made in addressing the cybersecurity concerns for commercial drones, knowledge deficits remain in determining potential occupational hazards and risks of drone use to employees’ well-being and health in the workplace. This creates difficulty in identifying key approaches to risk mitigation strategies and thus reflects the need for raising awareness among employers, safety professionals, and policymakers about workplace drone-related accidents. The purpose of this study is to investigate the prevalence of and possible risk factors for drone-related mishaps by comparing the application of drones in construction with manufacturing industries. The chief reason for considering these specific sectors is to ascertain whether there exists any significant difference between indoor and outdoor flights since most construction sites use drones outside and vice versa. Therefore, the current research seeks to examine the causes and patterns of workplace drone-related mishaps and suggest possible ergonomic interventions through data collection. Potential ergonomic practices to mitigate hazards associated with flying drones could include providing operators with professional pieces of training, conducting a risk analysis, and promoting the use of personal protective equipment. For the purpose of data analysis, two data mining techniques, the random forest and association rule mining algorithms, will be performed to find meaningful associations and trends in data as well as influential features that have an impact on the occurrence of drone-related accidents in construction and manufacturing sectors. In addition, Spearman’s correlation and chi-square tests will be used to measure the possible correlation between different variables. Indeed, by recognizing risks and hazards, occupational safety stakeholders will be able to pursue data-driven and evidence-based policy change with the aim of reducing drone mishaps, increasing productivity, creating a safer work environment, and extending human performance in safe and fulfilling ways. This research study was supported by the National Institute for Occupational Safety and Health through the Pilot Research Project Training Program of the University of Cincinnati Education and Research Center Grant #T42OH008432.

Keywords: commercial drones, ergonomic interventions, occupational safety, pattern recognition

Procedia PDF Downloads 192
24376 Power Recovery from Waste Air of Mine Ventilation Fans Using Wind Turbines

Authors: Soumyadip Banerjee, Tanmoy Maity

Abstract:

The recovery of power from waste air generated by mine ventilation fans presents a promising avenue for enhancing energy efficiency in mining operations. This abstract explores the feasibility and benefits of utilizing turbine generators to capture the kinetic energy present in waste air and convert it into electrical power. By integrating turbine generator systems into mine ventilation infrastructures, the potential to harness and utilize the previously untapped energy within the waste air stream is realized. This study examines the principles underlying turbine generator technology and its application within the context of mine ventilation systems. The process involves directing waste air from ventilation fans through specially designed turbines, where the kinetic energy of the moving air is converted into rotational motion. This mechanical energy is then transferred to connected generators, which convert it into electrical power. The recovered electricity can be employed for various on-site applications, including powering mining equipment, lighting, and control systems. The benefits of power recovery from waste air using turbine generators are manifold. Improved energy efficiency within the mining environment results in reduced dependence on external power sources and associated cost savings. Additionally, this approach contributes to environmental sustainability by utilizing a previously wasted resource for power generation. Resource conservation is further enhanced, aligning with modern principles of sustainable mining practices. However, successful implementation requires careful consideration of factors such as waste air characteristics, turbine design, generator efficiency, and integration into existing mine infrastructure. Maintenance and monitoring protocols are necessary to ensure consistent performance and longevity of the turbine generator systems. While there is an initial investment associated with equipment procurement, installation, and integration, the long-term benefits of reduced energy costs and environmental impact make this approach economically viable. In conclusion, the recovery of power from waste air from mine ventilation fans using turbine generators offers a tangible solution to enhance energy efficiency and sustainability within mining operations. By capturing and converting the kinetic energy of waste air into usable electrical power, mines can optimize resource utilization, reduce operational costs, and contribute to a greener future for the mining industry.

Keywords: waste to energy, wind power generation, exhaust air, power recovery

Procedia PDF Downloads 19
24375 Unsupervised Domain Adaptive Text Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, unsupervised training, text retrieval

Procedia PDF Downloads 56
24374 Design and Development of a Computerized Medical Record System for Hospitals in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

A computerized medical record system is a collection of medical information about a person that is stored on a computer. One principal problem of most hospitals in rural areas is using the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This data mining application is to be designed using a structured system analysis and design method which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the design and implementation of a computerized medical record system. This computerized system will replace the file management system and help to quickly retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: programming, data, software development, innovation

Procedia PDF Downloads 71
24373 Measurement of Natural Radioactivity and Health Hazard Index Evaluation in Major Soils of Tin Mining Areas of Perak

Authors: Habila Nuhu

Abstract:

Natural radionuclides in the environment can significantly contribute to human exposure to ionizing radiation. The knowledge of their levels in an environment can help the radiological protection agencies in policymaking. Measurement of natural radioactivity in major soils in the tin mining state of Perak Malaysia has been conducted using an HPGe detector. Seventy (70) soil samples were collected at widely distributed locations in the state. Six major soil types were sampled, and thirteen districts around the state were covered. The following were the results of the 226Ra (238U), 228Ra (232Th), and 40K activity in the soil samples: 226Ra (238U) has a mean activity concentration of 191.83 Bq kg⁻¹, more than five times the UNSCEAR reference limits of 35 Bq kg⁻¹. The mean activity concentration of 228Ra (232Th) with a value of 232.41 Bq kg⁻¹ is over seven times the UNSCEAR reference values of 30 Bq kg⁻¹. The average concentration of 40K activity was 275.24 Bq kg⁻¹, which was less than the UNSCEAR reference limit of 400 Bq Kg⁻¹. The range of external hazards index (Hₑₓ) values was from 1.03 to 2.05, while the internal hazards index (Hin) was from 1.48 to 3.08. The Hex and Hin should be less than one for minimal external and internal radiation threats as well as secure use of soil material for building construction. The Hₑₓ and Hin results generally indicate that while using the soil types and their derivatives as building materials in the study area, care must be taken.

Keywords: activity concentration, hazard index, soil samples, tin mining

Procedia PDF Downloads 96
24372 A Data Driven Methodological Approach to Economic Pre-Evaluation of Reuse Projects of Ancient Urban Centers

Authors: Pietro D'Ambrosio, Roberta D'Ambrosio

Abstract:

The upgrading of the architectural and urban heritage of the urban historic centers almost always involves the planning for the reuse and refunctionalization of the structures. Such interventions have complexities linked to the need to take into account the urban and social context in which the structure and its intrinsic characteristics such as historical and artistic value are inserted. To these, of course, we have to add the need to make a preliminary estimate of recovery costs and more generally to assess the economic and financial sustainability of the whole project of re-socialization. Particular difficulties are encountered during the pre-assessment of costs since it is often impossible to perform analytical surveys and structural tests for both structural conditions and obvious cost and time constraints. The methodology proposed in this work, based on a multidisciplinary and data-driven approach, is aimed at obtaining, at very low cost, reasonably priced economic evaluations of the interventions to be carried out. In addition, the specific features of the approach used, derived from the predictive analysis techniques typically applied in complex IT domains (big data analytics), allow to obtain as a result indirectly the evaluation process of a shared database that can be used on a generalized basis to estimate such other projects. This makes the methodology particularly indicated in those cases where it is expected to intervene massively across entire areas of historical city centers. The methodology has been partially tested during a study aimed at assessing the feasibility of a project for the reuse of the monumental complex of San Massimo, located in the historic center of Salerno, and is being further investigated.

Keywords: evaluation, methodology, restoration, reuse

Procedia PDF Downloads 164
24371 Application of a Modified Crank-Nicolson Method in Metallurgy

Authors: Kobamelo Mashaba

Abstract:

The molten slag has a high substantial temperatures range between 1723-1923, carrying a huge amount of useful energy for reducing energy consumption and CO₂ emissions under the heat recovery process. Therefore in this study, we investigated the performance of the modified crank Nicolson method for a delayed partial differential equation on the heat recovery of molten slag in the metallurgical mining environment. It was proved that the proposed method converges quickly compared to the classic method with the existence of a unique solution. It was inferred from numerical result that the proposed methodology is more viable and profitable for the mining industry.

Keywords: delayed partial differential equation, modified Crank-Nicolson Method, molten slag, heat recovery, parabolic equation

Procedia PDF Downloads 91
24370 Convergence and Stability in Federated Learning with Adaptive Differential Privacy Preservation

Authors: Rizwan Rizwan

Abstract:

This paper provides an overview of Federated Learning (FL) and its application in enhancing data security, privacy, and efficiency. FL utilizes three distinct architectures to ensure privacy is never compromised. It involves training individual edge devices and aggregating their models on a server without sharing raw data. This approach not only provides secure models without data sharing but also offers a highly efficient privacy--preserving solution with improved security and data access. Also we discusses various frameworks used in FL and its integration with machine learning, deep learning, and data mining. In order to address the challenges of multi--party collaborative modeling scenarios, a brief review FL scheme combined with an adaptive gradient descent strategy and differential privacy mechanism. The adaptive learning rate algorithm adjusts the gradient descent process to avoid issues such as model overfitting and fluctuations, thereby enhancing modeling efficiency and performance in multi-party computation scenarios. Additionally, to cater to ultra-large-scale distributed secure computing, the research introduces a differential privacy mechanism that defends against various background knowledge attacks.

Keywords: federated learning, differential privacy, gradient descent strategy, convergence, stability, threats

Procedia PDF Downloads 11
24369 Short Life Cycle Time Series Forecasting

Authors: Shalaka Kadam, Dinesh Apte, Sagar Mainkar

Abstract:

The life cycle of products is becoming shorter and shorter due to increased competition in market, shorter product development time and increased product diversity. Short life cycles are normal in retail industry, style business, entertainment media, and telecom and semiconductor industry. The subject of accurate forecasting for demand of short lifecycle products is of special enthusiasm for many researchers and organizations. Due to short life cycle of products the amount of historical data that is available for forecasting is very minimal or even absent when new or modified products are launched in market. The companies dealing with such products want to increase the accuracy in demand forecasting so that they can utilize the full potential of the market at the same time do not oversupply. This provides the challenge to develop a forecasting model that can forecast accurately while handling large variations in data and consider the complex relationships between various parameters of data. Many statistical models have been proposed in literature for forecasting time series data. Traditional time series forecasting models do not work well for short life cycles due to lack of historical data. Also artificial neural networks (ANN) models are very time consuming to perform forecasting. We have studied the existing models that are used for forecasting and their limitations. This work proposes an effective and powerful forecasting approach for short life cycle time series forecasting. We have proposed an approach which takes into consideration different scenarios related to data availability for short lifecycle products. We then suggest a methodology which combines statistical analysis with structured judgement. Also the defined approach can be applied across domains. We then describe the method of creating a profile from analogous products. This profile can then be used for forecasting products with historical data of analogous products. We have designed an application which combines data, analytics and domain knowledge using point-and-click technology. The forecasting results generated are compared using MAPE, MSE and RMSE error scores. Conclusion: Based on the results it is observed that no one approach is sufficient for short life-cycle forecasting and we need to combine two or more approaches for achieving the desired accuracy.

Keywords: forecast, short life cycle product, structured judgement, time series

Procedia PDF Downloads 340
24368 Heart Ailment Prediction Using Machine Learning Methods

Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula

Abstract:

The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.

Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting

Procedia PDF Downloads 34
24367 A Methodology for Developing New Technology Ideas to Avoid Patent Infringement: F-Term Based Patent Analysis

Authors: Kisik Song, Sungjoo Lee

Abstract:

With the growing importance of intangible assets recently, the impact of patent infringement on the business of a company has become more evident. Accordingly, it is essential for firms to estimate the risk of patent infringement risk before developing a technology and create new technology ideas to avoid the risk. Recognizing the needs, several attempts have been made to help develop new technology opportunities and most of them have focused on identifying emerging vacant technologies from patent analysis. In these studies, the IPC (International Patent Classification) system or keywords from text-mining application to patent documents was generally used to define vacant technologies. Unlike those studies, this study adopted F-term, which classifies patent documents according to the technical features of the inventions described in them. Since the technical features are analyzed by various perspectives by F-term, F-term provides more detailed information about technologies compared to IPC while more systematic information compared to keywords. Therefore, if well utilized, it can be a useful guideline to create a new technology idea. Recognizing the potential of F-term, this paper aims to suggest a novel approach to developing new technology ideas to avoid patent infringement based on F-term. For this purpose, we firstly collected data about F-term and then applied text-mining to the descriptions about classification criteria and attributes. From the text-mining results, we could identify other technologies with similar technical features of the existing one, the patented technology. Finally, we compare the technologies and extract the technical features that are commonly used in other technologies but have not been used in the existing one. These features are presented in terms of “purpose”, “function”, “structure”, “material”, “method”, “processing and operation procedure” and “control means” and so are useful for creating new technology ideas that help avoid infringing patent rights of other companies. Theoretically, this is one of the earliest attempts to adopt F-term to patent analysis; the proposed methodology can show how to best take advantage of F-term with the wealth of technical information. In practice, the proposed methodology can be valuable in the ideation process for successful product and service innovation without infringing the patents of other companies.

Keywords: patent infringement, new technology ideas, patent analysis, F-term

Procedia PDF Downloads 255
24366 Challenges in Achieving Profitability for MRO Companies in the Aviation Industry: An Analytical Approach

Authors: Nur Sahver Uslu, Ali̇ Hakan Büyüklü

Abstract:

Maintenance, Repair, and Overhaul (MRO) costs are significant in the aviation industry. On the other hand, companies that provide MRO services to the aviation industry but are not dominant in the sector, need to determine the right strategies for sustainable profitability in a competitive environment. This study examined the operational real data of a small medium enterprise (SME) MRO company where analytical methods are not widely applied. The company's customers were divided into two categories: airline companies and non-airline companies, and the variables that best explained profitability were analyzed with Logistic Regression for each category and the results were compared. First, data reduction was applied to the transformed variables that went through the data cleaning and preparation stages, and the variables to be included in the model were decided. The misclassification rates for the logistic regression results concerning both customer categories are similar, indicating consistent model performance across different segments. Less profit margin is obtained from airline customers, which can be explained by the variables part description, time to quotation (TTQ), turnaround time (TAT), manager, part cost, and labour cost. The higher profit margin obtained from non-airline customers is explained only by the variables part description, part cost, and labour cost. Based on the two models, it can be stated that it is significantly more challenging for the MRO company, which is the subject of our study, to achieve profitability from Airline customers. While operational processes and organizational structure also affect the profit from airline customers, only the type of parts and costs determine the profit for non-airlines.

Keywords: aircraft, aircraft components, aviation, data analytics, data science, gini index, maintenance, repair, and overhaul, MRO, logistic regression, profit, variable clustering, variable reduction

Procedia PDF Downloads 9
24365 Cultural Dynamics in Online Consumer Behavior: Exploring Cross-Country Variances in Review Influence

Authors: Eunjung Lee

Abstract:

This research investigates the intricate connection between cultural differences and online consumer behaviors by integrating Hofstede's Cultural Dimensions theory with analysis methodologies such as text mining, data mining, and topic analysis. Our aim is to provide a comprehensive understanding of how national cultural differences influence individuals' behaviors when engaging with online reviews. To ensure the relevance of our investigation, we systematically analyze and interpret the cultural nuances influencing online consumer behaviors, especially in the context of online reviews. By anchoring our research in Hofstede's Cultural Dimensions theory, we seek to offer valuable insights for marketers to tailor their strategies based on the cultural preferences of diverse global consumer bases. In our methodology, we employ advanced text mining techniques to extract insights from a diverse range of online reviews gathered globally for a specific product or service like Netflix. This approach allows us to reveal hidden cultural cues in the language used by consumers from various backgrounds. Complementing text mining, data mining techniques are applied to extract meaningful patterns from online review datasets collected from different countries, aiming to unveil underlying structures and gain a deeper understanding of the impact of cultural differences on online consumer behaviors. The study also integrates topic analysis to identify recurring subjects, sentiments, and opinions within online reviews. Marketers can leverage these insights to inform the development of culturally sensitive strategies, enhance target audience segmentation, and refine messaging approaches aligned with cultural preferences. Anchored in Hofstede's Cultural Dimensions theory, our research employs sophisticated methodologies to delve into the intricate relationship between cultural differences and online consumer behaviors. Applied to specific cultural dimensions, such as individualism vs. collectivism, masculinity vs. femininity, uncertainty avoidance, and long-term vs. short-term orientation, the study uncovers nuanced insights. For example, in exploring individualism vs. collectivism, we examine how reviewers from individualistic cultures prioritize personal experiences while those from collectivistic cultures emphasize communal opinions. Similarly, within masculinity vs. femininity, we investigate whether distinct topics align with cultural notions, such as robust features in masculine cultures and user-friendliness in feminine cultures. Examining information-seeking behaviors under uncertainty avoidance reveals how cultures differ in seeking detailed information or providing succinct reviews based on their comfort with ambiguity. Additionally, in assessing long-term vs. short-term orientation, the research explores how cultural focus on enduring benefits or immediate gratification influences reviews. These concrete examples contribute to the theoretical enhancement of Hofstede's Cultural Dimensions theory, providing a detailed understanding of cultural impacts on online consumer behaviors. As online reviews become increasingly crucial in decision-making, this research not only contributes to the academic understanding of cultural influences but also proposes practical recommendations for enhancing online review systems. Marketers can leverage these findings to design targeted and culturally relevant strategies, ultimately enhancing their global marketing effectiveness and optimizing online review systems for maximum impact.

Keywords: comparative analysis, cultural dimensions, marketing intelligence, national culture, online consumer behavior, text mining

Procedia PDF Downloads 32
24364 Automatic Adjustment of Thresholds via Closed-Loop Feedback Mechanism for Solder Paste Inspection

Authors: Chia-Chen Wei, Pack Hsieh, Jeffrey Chen

Abstract:

Surface Mount Technology (SMT) is widely used in the area of the electronic assembly in which the electronic components are mounted to the surface of the printed circuit board (PCB). Most of the defects in the SMT process are mainly related to the quality of solder paste printing. These defects lead to considerable manufacturing costs in the electronics assembly industry. Therefore, the solder paste inspection (SPI) machine for controlling and monitoring the amount of solder paste printing has become an important part of the production process. So far, the setting of the SPI threshold is based on statistical analysis and experts’ experiences to determine the appropriate threshold settings. Because the production data are not normal distribution and there are various variations in the production processes, defects related to solder paste printing still occur. In order to solve this problem, this paper proposes an online machine learning algorithm, called the automatic threshold adjustment (ATA) algorithm, and closed-loop architecture in the SMT process to determine the best threshold settings. Simulation experiments prove that our proposed threshold settings improve the accuracy from 99.85% to 100%.

Keywords: big data analytics, Industry 4.0, SPI threshold setting, surface mount technology

Procedia PDF Downloads 105
24363 Hybridized Approach for Distance Estimation Using K-Means Clustering

Authors: Ritu Vashistha, Jitender Kumar

Abstract:

Clustering using the K-means algorithm is a very common way to understand and analyze the obtained output data. When a similar object is grouped, this is called the basis of Clustering. There is K number of objects and C number of cluster in to single cluster in which k is always supposed to be less than C having each cluster to be its own centroid but the major problem is how is identify the cluster is correct based on the data. Formulation of the cluster is not a regular task for every tuple of row record or entity but it is done by an iterative process. Each and every record, tuple, entity is checked and examined and similarity dissimilarity is examined. So this iterative process seems to be very lengthy and unable to give optimal output for the cluster and time taken to find the cluster. To overcome the drawback challenge, we are proposing a formula to find the clusters at the run time, so this approach can give us optimal results. The proposed approach uses the Euclidian distance formula as well melanosis to find the minimum distance between slots as technically we called clusters and the same approach we have also applied to Ant Colony Optimization(ACO) algorithm, which results in the production of two and multi-dimensional matrix.

Keywords: ant colony optimization, data clustering, centroids, data mining, k-means

Procedia PDF Downloads 117
24362 Information Needs and Information Usage of the Older Person Club’s Members in Bangkok

Authors: Siriporn Poolsuwan

Abstract:

This research aims to explore the information needs, information usages, and problems of information usage of the older people club’s members in Dusit District, Bangkok. There are 12 clubs and 746 club’s members in this district. The research results use for older person service in this district. Data is gathered from 252 club’s members by using questionnaires. The quantitative approach uses in research by percentage, means and standard deviation. The results are as follows (1) The older people need Information for entertainment, occupation and academic in the field of short story, computer work, and religion and morality. (2) The participants use Information from various sources. (3) The Problem of information usage is their language skills because of the older people’s literacy problem.

Keywords: information behavior, older person, information seeking, knowledge discovery and data mining

Procedia PDF Downloads 256
24361 Exploring Influence Range of Tainan City Using Electronic Toll Collection Big Data

Authors: Chen Chou, Feng-Tyan Lin

Abstract:

Big Data has been attracted a lot of attentions in many fields for analyzing research issues based on a large number of maternal data. Electronic Toll Collection (ETC) is one of Intelligent Transportation System (ITS) applications in Taiwan, used to record starting point, end point, distance and travel time of vehicle on the national freeway. This study, taking advantage of ETC big data, combined with urban planning theory, attempts to explore various phenomena of inter-city transportation activities. ETC, one of government's open data, is numerous, complete and quick-update. One may recall that living area has been delimited with location, population, area and subjective consciousness. However, these factors cannot appropriately reflect what people’s movement path is in daily life. In this study, the concept of "Living Area" is replaced by "Influence Range" to show dynamic and variation with time and purposes of activities. This study uses data mining with Python and Excel, and visualizes the number of trips with GIS to explore influence range of Tainan city and the purpose of trips, and discuss living area delimited in current. It dialogues between the concepts of "Central Place Theory" and "Living Area", presents the new point of view, integrates the application of big data, urban planning and transportation. The finding will be valuable for resource allocation and land apportionment of spatial planning.

Keywords: Big Data, ITS, influence range, living area, central place theory, visualization

Procedia PDF Downloads 265
24360 The Reduction of Post-Blast Fumes to Improve Productivity and Safety: A Review Paper

Authors: Nhleko Monique Chiloane

Abstract:

The gold mining industry has predominantly used ammonium nitrate fuel oil (ANFO) explosives for decades, although these are known to be “gassier” and their detonation results in toxic fumes, for example, carbon monoxide (CO), nitrogen oxides (NOx) and ammonia. Re-entry into underground workings too soon after blasting can lead to fatal exposure to toxic fumes. It is, therefore, required that the polluted air be removed from the affected areas within a reasonable period before employees' re-entry into the working area. Post-blast re-entry times have therefore been described as a productivity bottleneck. The known causes of post-blast fumes are water ingress, incorrect fuel to oxygen ratio, confinement, explosive additives etc. To prevent or minimize post-blast fumes, some researchers have used neutralization, re-burning technique and non-explosive products or different oxidizing agents. The use of commercial explosives without nitrate oxidizing agents can also minimize the production of blasting fumes and thereby reduce the time needed for the clearance of these fumes to allow workers to re-enter the underground workings safely. The reduction in non-production time directly contributes to an increase in the available time per shift for productive work, thus leading to continuous mining. However, owing to its low cost and ease of use, ANFO is still widely used in South African underground blasting operations.

Keywords: post-blast fumes, continuous mining, ammonium nitrate explosive, non-explosive blasting, re-entry period

Procedia PDF Downloads 170
24359 Credit Card Fraud Detection with Ensemble Model: A Meta-Heuristic Approach

Authors: Gong Zhilin, Jing Yang, Jian Yin

Abstract:

The purpose of this paper is to develop a novel system for credit card fraud detection based on sequential modeling of data using hybrid deep learning models. The projected model encapsulates five major phases are pre-processing, imbalance-data handling, feature extraction, optimal feature selection, and fraud detection with an ensemble classifier. The collected raw data (input) is pre-processed to enhance the quality of the data through alleviation of the missing data, noisy data as well as null values. The pre-processed data are class imbalanced in nature, and therefore they are handled effectively with the K-means clustering-based SMOTE model. From the balanced class data, the most relevant features like improved Principal Component Analysis (PCA), statistical features (mean, median, standard deviation) and higher-order statistical features (skewness and kurtosis). Among the extracted features, the most optimal features are selected with the Self-improved Arithmetic Optimization Algorithm (SI-AOA). This SI-AOA model is the conceptual improvement of the standard Arithmetic Optimization Algorithm. The deep learning models like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and optimized Quantum Deep Neural Network (QDNN). The LSTM and CNN are trained with the extracted optimal features. The outcomes from LSTM and CNN will enter as input to optimized QDNN that provides the final detection outcome. Since the QDNN is the ultimate detector, its weight function is fine-tuned with the Self-improved Arithmetic Optimization Algorithm (SI-AOA).

Keywords: credit card, data mining, fraud detection, money transactions

Procedia PDF Downloads 115
24358 A Method to Evaluate and Compare Web Information Extractors

Authors: Patricia Jiménez, Rafael Corchuelo, Hassan A. Sleiman

Abstract:

Web mining is gaining importance at an increasing pace. Currently, there are many complementary research topics under this umbrella. Their common theme is that they all focus on applying knowledge discovery techniques to data that is gathered from the Web. Sometimes, these data are relatively easy to gather, chiefly when it comes from server logs. Unfortunately, there are cases in which the data to be mined is the data that is displayed on a web document. In such cases, it is necessary to apply a pre-processing step to first extract the information of interest from the web documents. Such pre-processing steps are performed using so-called information extractors, which are software components that are typically configured by means of rules that are tailored to extracting the information of interest from a web page and structuring it according to a pre-defined schema. Paramount to getting good mining results is that the technique used to extract the source information is exact, which requires to evaluate and compare the different proposals in the literature from an empirical point of view. According to Google Scholar, about 4 200 papers on information extraction have been published during the last decade. Unfortunately, they were not evaluated within a homogeneous framework, which leads to difficulties to compare them empirically. In this paper, we report on an original information extraction evaluation method. Our contribution is three-fold: a) this is the first attempt to provide an evaluation method for proposals that work on semi-structured documents; the little existing work on this topic focuses on proposals that work on free text, which has little to do with extracting information from semi-structured documents. b) It provides a method that relies on statistically sound tests to support the conclusions drawn; the previous work does not provide clear guidelines or recommend statistically sound tests, but rather a survey that collects many features to take into account as well as related work; c) We provide a novel method to compute the performance measures regarding unsupervised proposals; otherwise they would require the intervention of a user to compute them by using the annotations on the evaluation sets and the information extracted. Our contributions will definitely help researchers in this area make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it will also help practitioners make informed decisions on which proposal is the most adequate for a particular problem. This conference is a good forum to discuss on our ideas so that we can spread them to help improve the evaluation of information extraction proposals and gather valuable feedback from other researchers.

Keywords: web information extractors, information extraction evaluation method, Google scholar, web

Procedia PDF Downloads 238
24357 Improvement of Microstructure, Wear and Mechanical Properties of Modified G38NiCrMo8-4-4 Steel Used in Mining Industry

Authors: Mustafa Col, Funda Gul Koc, Merve Yangaz, Eylem Subasi, Can Akbasoglu

Abstract:

G38NiCrMo8-4-4 steel is widely used in mining industries, machine parts, gears due to its high strength and toughness properties. In this study, microstructure, wear and mechanical properties of G38NiCrMo8-4-4 steel modified with boron used in the mining industry were investigated. For this purpose, cast materials were alloyed by melting in an induction furnace to include boron with the rates of 0 ppm, 15 ppm, and 50 ppm (wt.) and were formed in the dimensions of 150x200x150 mm by casting into the sand mould. Homogenization heat treatment was applied to the specimens at 1150˚C for 7 hours. Then all specimens were austenitized at 930˚C for 1 hour, quenched in the polymer solution and tempered at 650˚C for 1 hour. Microstructures of the specimens were investigated by using light microscope and SEM to determine the effect of boron and heat treatment conditions. Changes in microstructure properties and material hardness were obtained due to increasing boron content and heat treatment conditions after microstructure investigations and hardness tests. Wear tests were carried out using a pin-on-disc tribometer under dry sliding conditions. Charpy V notch impact test was performed to determine the toughness properties of the specimens. Fracture and worn surfaces were investigated with scanning electron microscope (SEM). The results show that boron element has a positive effect on the hardness and wear properties of G38NiCrMo8-4-4 steel.

Keywords: G38NiCrMo8-4-4 steel, boron, heat treatment, microstructure, wear, mechanical properties

Procedia PDF Downloads 182
24356 Impact of Coal Mining on River Sediment Quality in the Sydney Basin, Australia

Authors: A. Ali, V. Strezov, P. Davies, I. Wright, T. Kan

Abstract:

The environmental impacts arising from mining activities affect the air, water, and soil quality. Impacts may result in unexpected and adverse environmental outcomes. This study reports on the impact of coal production on sediment in Sydney region of Australia. The sediment samples upstream and downstream from the discharge points from three mines were taken, and 80 parameters were tested. The results were assessed against sediment quality based on presence of metals. The study revealed the increment of metal content in the sediment downstream of the reference locations. In many cases, the sediment was above the Australia and New Zealand Environment Conservation Council and international sediment quality guidelines value (SQGV). The major outliers to the guidelines were nickel (Ni) and zinc (Zn).

Keywords: coal mine, environmental impact, produced water, sediment quality guidelines value (SQGV)

Procedia PDF Downloads 293
24355 Information Communication Technology Based Road Traffic Accidents’ Identification, and Related Smart Solution Utilizing Big Data

Authors: Ghulam Haider Haidaree, Nsenda Lukumwena

Abstract:

Today the world of research enjoys abundant data, available in virtually any field, technology, science, and business, politics, etc. This is commonly referred to as big data. This offers a great deal of precision and accuracy, supportive of an in-depth look at any decision-making process. When and if well used, Big Data affords its users with the opportunity to produce substantially well supported and good results. This paper leans extensively on big data to investigate possible smart solutions to urban mobility and related issues, namely road traffic accidents, its casualties, and fatalities based on multiple factors, including age, gender, location occurrences of accidents, etc. Multiple technologies were used in combination to produce an Information Communication Technology (ICT) based solution with embedded technology. Those technologies include principally Geographic Information System (GIS), Orange Data Mining Software, Bayesian Statistics, to name a few. The study uses the Leeds accident 2016 to illustrate the thinking process and extracts thereof a model that can be tested, evaluated, and replicated. The authors optimistically believe that the proposed model will significantly and smartly help to flatten the curve of road traffic accidents in the fast-growing population densities, which increases considerably motor-based mobility.

Keywords: accident factors, geographic information system, information communication technology, mobility

Procedia PDF Downloads 198
24354 Improved Classification Procedure for Imbalanced and Overlapped Situations

Authors: Hankyu Lee, Seoung Bum Kim

Abstract:

The issue with imbalance and overlapping in the class distribution becomes important in various applications of data mining. The imbalanced dataset is a special case in classification problems in which the number of observations of one class (i.e., major class) heavily exceeds the number of observations of the other class (i.e., minor class). Overlapped dataset is the case where many observations are shared together between the two classes. Imbalanced and overlapped data can be frequently found in many real examples including fraud and abuse patients in healthcare, quality prediction in manufacturing, text classification, oil spill detection, remote sensing, and so on. The class imbalance and overlap problem is the challenging issue because this situation degrades the performance of most of the standard classification algorithms. In this study, we propose a classification procedure that can effectively handle imbalanced and overlapped datasets by splitting data space into three parts: nonoverlapping, light overlapping, and severe overlapping and applying the classification algorithm in each part. These three parts were determined based on the Hausdorff distance and the margin of the modified support vector machine. An experiments study was conducted to examine the properties of the proposed method and compared it with other classification algorithms. The results showed that the proposed method outperformed the competitors under various imbalanced and overlapped situations. Moreover, the applicability of the proposed method was demonstrated through the experiment with real data.

Keywords: classification, imbalanced data with class overlap, split data space, support vector machine

Procedia PDF Downloads 297
24353 A Survey on Compression Methods for Table Constraints

Authors: N. Gharbi

Abstract:

Constraint Satisfaction problems are mathematical problems that are often used to model many real-world problems for which we look if there exists a solution satisfying all its constraints. Table constraints are important for modeling parts of many problems since they list all combinations of allowed or forbidden values. However, they admit practical limitations because they are sometimes too large to be represented in a direct way. In this paper, we present a survey of the different categories of the proposed approaches to compress table constraints in order to reduce both space and time complexities.

Keywords: constraint programming, compression, data mining, table constraints

Procedia PDF Downloads 314
24352 Semi-Automatic Method to Assist Expert for Association Rules Validation

Authors: Amdouni Hamida, Gammoudi Mohamed Mohsen

Abstract:

In order to help the expert to validate association rules extracted from data, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data quality from which the rules are extracted. The second one consists on providing to the expert some tools in the objective to explore and visualize rules during the evaluation step. However, the number of extracted rules to validate remains high. Thus, the manually mining rules task is very hard. To solve this problem, we propose, in this paper, a semi-automatic method to assist the expert during the association rule's validation. Our method uses rule-based classification as follow: (i) We transform association rules into classification rules (classifiers), (ii) We use the generated classifiers for data classification. (iii) We visualize association rules with their quality classification to give an idea to the expert and to assist him during validation process.

Keywords: association rules, rule-based classification, classification quality, validation

Procedia PDF Downloads 424
24351 The Impact of Mining Activities on the Surface Water Quality: A Case Study of the Kaap River in Barberton, Mpumalanga

Authors: M. F. Mamabolo

Abstract:

Mining activities are identified as the most significant source of heavy metal contamination in river basins, due to inadequate disposal of mining waste thus resulting in acid mine drainage. Waste materials generated from gold mining and processing have severe and widespread impacts on water resources. Therefore, a total of 30 water samples were collected from Fig Tree Creek, Kaapriver, Sheba mine stream & Sauid kaap river to investigate the impact of gold mines on the Kaap River system. Physicochemical parameters (pH, EC and TDS) were taken using a BANTE 900P portable water quality meter. The concentration of Fe, Cu, Co, and SO₄²⁻ in water samples were analysed using Inductively Coupled Plasma-Mass spectrophotometry (ICP-MS) at 0.01 mg/L. The results were compared to the regulatory guideline of the World Health Organization (WHO) and the South Africa National Standards (SANS). It was found that Fe, Cu and Co were below the guideline values while SO₄²⁻ detected in Sheba mine stream exceeded the 250 mg/L limit for both seasons, attributed by mine wastewater. SO₄²⁻ was higher in wet season due to high evaporation rates and greater interaction between rocks and water. The pH of all the streams was within the limit (≥5 to ≤9.7), however EC of the Sheba mine stream, Suid Kaap River & where the tributary connects with the Fig Tree Creek exceeded 1700 uS/m, due to dissolved material. The TDS of Sheba mine stream exceeded 1000 mg/L, attributed by high SO₄²⁻ concentration. While the tributary connecting to the Fig Tree Creek exceed the value due to pollution from household waste, runoff from agriculture etc. In conclusion, the water from all sampled streams were safe for consumption due to low concentrations of physicochemical parameters. However, elevated concentration of SO₄²⁻ should be monitored and managed to avoid water quality deterioration in the Kaap River system.

Keywords: Kaap river system, mines, heavy metals, sulphate

Procedia PDF Downloads 62
24350 Data-Driven Strategies for Enhancing Food Security in Vulnerable Regions: A Multi-Dimensional Analysis of Crop Yield Predictions, Supply Chain Optimization, and Food Distribution Networks

Authors: Sulemana Ibrahim

Abstract:

Food security remains a paramount global challenge, with vulnerable regions grappling with issues of hunger and malnutrition. This study embarks on a comprehensive exploration of data-driven strategies aimed at ameliorating food security in such regions. Our research employs a multifaceted approach, integrating data analytics to predict crop yields, optimizing supply chains, and enhancing food distribution networks. The study unfolds as a multi-dimensional analysis, commencing with the development of robust machine learning models harnessing remote sensing data, historical crop yield records, and meteorological data to foresee crop yields. These predictive models, underpinned by convolutional and recurrent neural networks, furnish critical insights into anticipated harvests, empowering proactive measures to confront food insecurity. Subsequently, the research scrutinizes supply chain optimization to address food security challenges, capitalizing on linear programming and network optimization techniques. These strategies intend to mitigate loss and wastage while streamlining the distribution of agricultural produce from field to fork. In conjunction, the study investigates food distribution networks with a particular focus on network efficiency, accessibility, and equitable food resource allocation. Network analysis tools, complemented by data-driven simulation methodologies, unveil opportunities for augmenting the efficacy of these critical lifelines. This study also considers the ethical implications and privacy concerns associated with the extensive use of data in the realm of food security. The proposed methodology outlines guidelines for responsible data acquisition, storage, and usage. The ultimate aspiration of this research is to forge a nexus between data science and food security policy, bestowing actionable insights to mitigate the ordeal of food insecurity. The holistic approach converging data-driven crop yield forecasts, optimized supply chains, and improved distribution networks aspire to revitalize food security in the most vulnerable regions, elevating the quality of life for millions worldwide.

Keywords: data-driven strategies, crop yield prediction, supply chain optimization, food distribution networks

Procedia PDF Downloads 45