Search results for: Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24815

Search results for: Data Mining

24125 A Practical and Theoretical Study on the Electromotor Bearing Defect Detection in a Wet Mill Using the Vibration Analysis Method and Defect Length Calculation in the Bearing

Authors: Mostafa Firoozabadi, Alireza Foroughi Nematollahi

Abstract:

Wet mills are one of the most important equipment in the mining industries and any defect occurrence in them can stop the production line and it can make some irrecoverable damages to the system. Electromotors are the significant parts of a mill and their monitoring is a necessary process to prevent unwanted defects. The purpose of this study is to investigate the Electromotor bearing defects, theoretically and practically, using the vibration analysis method. When a defect happens in a bearing, it can be transferred to the other parts of the equipment like inner ring, outer ring, balls, and the bearing cage. The electromotor defects source can be electrical or mechanical. Sometimes, the electrical and mechanical defect frequencies are modulated and the bearing defect detection becomes difficult. In this paper, to detect the electromotor bearing defects, the electrical and mechanical defect frequencies are extracted firstly. Then, by calculating the bearing defect frequencies, and the spectrum and time signal analysis, the bearing defects are detected. In addition, the obtained frequency determines that the bearing level in which the defect has happened and by comparing this level to the standards it determines the bearing remaining lifetime. Finally, the defect length is calculated by theoretical equations to demonstrate that there is no need to replace the bearing. The results of the proposed method, which has been implemented on the wet mills in the Golgohar mining and industrial company in Iran, show that this method is capable of detecting the electromotor bearing defects accurately and on time.

Keywords: bearing defect length, defect frequency, electromotor defects, vibration analysis

Procedia PDF Downloads 481
24124 Protecting the Cloud Computing Data Through the Data Backups

Authors: Abdullah Alsaeed

Abstract:

Virtualized computing and cloud computing infrastructures are no longer fuzz or marketing term. They are a core reality in today’s corporate Information Technology (IT) organizations. Hence, developing an effective and efficient methodologies for data backup and data recovery is required more than any time. The purpose of data backup and recovery techniques are to assist the organizations to strategize the business continuity and disaster recovery approaches. In order to accomplish this strategic objective, a variety of mechanism were proposed in the recent years. This research paper will explore and examine the latest techniques and solutions to provide data backup and restoration for the cloud computing platforms.

Keywords: data backup, data recovery, cloud computing, business continuity, disaster recovery, cost-effective, data encryption.

Procedia PDF Downloads 66
24123 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: data estimation, link data, machine learning, road network

Procedia PDF Downloads 497
24122 Analysis of the Development of Mining Companies Social Corporate Responsibility Based on the Rating Score

Authors: Tatiana Ponomarenko, Oksana Marinina, Marina Nevskaya

Abstract:

Modern corporate social responsibility (CSR) is a sphere of multilevel responsibility of a company toward society represented by various stakeholders. The relevance of CSR management grows due to the active development of socially responsible investing (principles for responsible investment) taking into account factors of environmental, social and corporate governance (ESG), growing attention of the investment community in general to the long-term stability of companies and the quality of control of nonfinancial risks. The modern approach to CSR strategic management is aimed at the creation of trustful relationships with stakeholders, on the basis of which a contribution to the sustainable development of companies, regions, and national economics is insured. However, the practical concepts of social responsibility in mining companies are different, which leads to various degrees of application of CSR. A number of companies implement CSR using a traditional (limited) understanding of responsibility toward employees and counteragents, the others understand CSR much wider and try to use leverages of efficient cooperation. As in large mining companies the scope of CSR measures is diverse and characterized by different indices, the study was aimed at evaluating CSR efficiency on the basis of a proprietary methodology and determining the level of development of CSR management in terms of anti-crisis, reactive and proactive development. The methodology of the research includes analysis of integrated global reporting initiative (GRI) reports of large mining companies; choice of most representative sectoral agents by a criterion of the regularity of issuance and publication of reports; calculation of indices of evaluation of CSR level of the selected companies in dynamics. The methodology of evaluation of CSR level is based on a rating score of changes in standard indices of GRI reports by economic, environmental, and social directions. Result. By the results of the analysis, companies of fuel and energy and metallurgic complexes, in overwhelming majority, reflecting three indices out of a wide range of possible indicators of SDGs (Sustainable Development Goals), were selected for the study. The evaluation of the scopes of CSR of the companies Gazprom, LUKOIL, Metalloinvest, Nornikel, Rosneft, Severstal, SIBUR, SUEK corresponds to the reactive type of development according to a scale of CSR strategic management, which is the average value out of the possible values. The chief drawback is that companies, in the process of analyzing global goals, often choose the goals which relate to their own activities, paying insufficient attention to the interests of the stakeholders inside the country. This fact evidences the necessity of searching for more effective mechanisms of CSR control. Acknowledgment: This article is prepared within grant support of the RFBR, project 19-510-44013 'Development of the concept of mineral resources value formation in the context of sustainable development in resource-oriented economies'.

Keywords: sustainable development, corporate social responsibility, development strategies, efficiency assessment

Procedia PDF Downloads 118
24121 The “Bright Side” of COVID-19: Effects of Livestream Affordances on Consumer Purchase Willingness: Explicit IT Affordances Perspective

Authors: Isaac Owusu Asante, Yushi Jiang, Hailin Tao

Abstract:

Live streaming marketing, the new electronic commerce element, became an optional marketing channel following the COVID-19 pandemic. Many sellers have leveraged the features presented by live streaming to increase sales. Studies on live streaming have focused on gaming and consumers’ loyalty to brands through live streaming, using interview questionnaires. This study, however, was conducted to measure real-time observable interactions between consumers and sellers. Based on the affordance theory, this study conceptualized constructs representing the interactive features and examined how they drive consumers’ purchase willingness during live streaming sessions using 1238 datasets from Amazon Live, following the manual observation of transaction records. Using structural equation modeling, the ordinary least square regression suggests that live viewers, new followers, live chats, and likes positively affect purchase willingness. The Sobel and Monte Carlo tests show that new followers, live chats, and likes significantly mediate the relationship between live viewers and purchase willingness. The study introduces a new way of measuring interactions in live streaming commerce and proposes a way to manually gather data on consumer behaviors in live streaming platforms when the application programming interface (API) of such platforms does not support data mining algorithms.

Keywords: livestreaming marketing, live chats, live viewers, likes, new followers, purchase willingness

Procedia PDF Downloads 57
24120 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: big data, open data, productivity, data governance

Procedia PDF Downloads 351
24119 Infrastructure Project Management and Implementation: A Case Study Of the Mokolo-Crocodile Water Augmentation Project in South Africa

Authors: Elkington Sibusiso Mnguni

Abstract:

The Mokolo-Crocodile Water Augmentation Project (MCWAP) is located in the Limpopo Province in the northern-western part of South Africa. Its purpose is to increase water supply by 30 million cubic meters per year to meet current and future demand for users, including power stations, mining houses, and the local municipality in the Lephalale area. This paper documents the planning and implementation aspects of the MCWAP infrastructure project. The study will add to the body of knowledge with respect to bulk water infrastructure development in water-scarce regions. The method used to gather and collate relevant data and information was the desktop study. The key finding was that the project was successfully completed in 2015 using conventional project management and construction methods. The project is currently being operated and maintained by the National Department of Water and Sanitation.

Keywords: construction, contract management, infrastructure project, project management

Procedia PDF Downloads 285
24118 A Systematic Review on Challenges in Big Data Environment

Authors: Rimmy Yadav, Anmol Preet Kaur

Abstract:

Big Data has demonstrated the vast potential in streamlining, deciding, spotting business drifts in different fields, for example, producing, fund, Information Technology. This paper gives a multi-disciplinary diagram of the research issues in enormous information and its procedures, instruments, and system identified with the privacy, data storage management, network and energy utilization, adaptation to non-critical failure and information representations. Other than this, result difficulties and openings accessible in this Big Data platform have made.

Keywords: big data, privacy, data management, network and energy consumption

Procedia PDF Downloads 285
24117 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall

Procedia PDF Downloads 262
24116 Survey on Big Data Stream Classification by Decision Tree

Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: big data, data streams, classification, decision tree

Procedia PDF Downloads 500
24115 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 359
24114 Pattern Discovery from Student Feedback: Identifying Factors to Improve Student Emotions in Learning

Authors: Angelina A. Tzacheva, Jaishree Ranganathan

Abstract:

Interest in (STEM) Science Technology Engineering Mathematics education especially Computer Science education has seen a drastic increase across the country. This fuels effort towards recruiting and admitting a diverse population of students. Thus the changing conditions in terms of the student population, diversity and the expected teaching and learning outcomes give the platform for use of Innovative Teaching models and technologies. It is necessary that these methods adapted should also concentrate on raising quality of such innovations and have positive impact on student learning. Light-Weight Team is an Active Learning Pedagogy, which is considered to be low-stake activity and has very little or no direct impact on student grades. Emotion plays a major role in student’s motivation to learning. In this work we use the student feedback data with emotion classification using surveys at a public research institution in the United States. We use Actionable Pattern Discovery method for this purpose. Actionable patterns are patterns that provide suggestions in the form of rules to help the user achieve better outcomes. The proposed method provides meaningful insight in terms of changes that can be incorporated in the Light-Weight team activities, resources utilized in the course. The results suggest how to enhance student emotions to a more positive state, in particular focuses on the emotions ‘Trust’ and ‘Joy’.

Keywords: actionable pattern discovery, education, emotion, data mining

Procedia PDF Downloads 71
24113 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication

Authors: Aishwarya Shekhar, Himanshu Sharma

Abstract:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.

Keywords: confidentiality, deduplication, data compression, hybridity of cloud

Procedia PDF Downloads 366
24112 Study of the Stability of the Slope Open-Pit Mines: Case of the Mine of Phosphates – Tebessa, Algeria

Authors: Mohamed Fredj, Abdallah Hafsaoui, Radouane Nakache

Abstract:

The study of the stability of the mining works in rock masses fractured is the major concern of the operating engineer. For geotechnical works in mines and quarries, it there is not today's general methodology for analysis and the quantification of the risks relating to the dangers inherent in these concrete types (falling boulders, landslides, etc.). The reasons for this are uncertainty, which weighs on available data or lack of knowledge of the values of the parameters required for this analysis type. Stability calculations must be based on reliable knowledge of the distribution of discontinuities that dissect the Rocky massif and the resistance to shear of the intact rock and discontinuities. This study is aimed to study the stability of slope of mine (Kef Sennoun - Tebessa, Algeria). The problem is analyzed using a numerical model based on the finite elements (software Plaxis 3D).

Keywords: stability, discontinuities, finite elements, rock mass, open-pit mine

Procedia PDF Downloads 299
24111 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.

Abstract:

Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 417
24110 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption

Procedia PDF Downloads 164
24109 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García

Abstract:

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Keywords: Intelligent Transportation Systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning

Procedia PDF Downloads 449
24108 The Effects of Human Activities on Plant Diversity in Tropical Wetlands of Lake Tana (Ethiopia)

Authors: Abrehet Kahsay Mehari

Abstract:

Aquatic plants provide the physical structure of wetlands and increase their habitat complexity and heterogeneity, and as such, have a profound influence on other biotas. In this study, we investigated how human disturbance activities influenced the species richness and community composition of aquatic plants in the wetlands of Lake Tana, Ethiopia. Twelve wetlands were selected: four lacustrine, four river mouths, and four riverine papyrus swamps. Data on aquatic plants, environmental variables, and human activities were collected during the dry and wet seasons of 2018. A linear mixed effect model and a distance-based Redundancy Analysis (db-RDA) were used to relate aquatic plant species richness and community composition, respectively, to human activities and environmental variables. A total of 113 aquatic plant species, belonging to 38 families, were identified across all wetlands during the dry and wet seasons. Emergent species had the maximum area covered at 73.45 % and attained the highest relative abundance, followed by amphibious and other forms. The mean taxonomic richness of aquatic plants was significantly lower in wetlands with high overall human disturbance scores compared to wetlands with low overall human disturbance scores. Moreover, taxonomic richness showed a negative correlation with livestock grazing, tree plantation, and sand mining. The community composition also varied across wetlands with varying levels of human disturbance and was primarily driven by turnover (i.e., replacement of species) rather than nestedness resultant(i.e., loss of species). Distance-based redundancy analysis revealed that livestock grazing, tree plantation, sand mining, waste dumping, and crop cultivation were significant predictors of variation in aquatic plant communities’ composition in the wetlands. Linear mixed effect models and distance-based redundancy analysis also revealed that water depth, turbidity, conductivity, pH, sediment depth, and temperature were important drivers of variations in aquatic plant species richness and community composition. Papyrus swamps had the highest species richness and supported different plant communities. Conservation efforts should therefore focus on these habitats and measures should be taken to restore the highly disturbed and species poor wetlands near the river mouths.

Keywords: species richness, community composition, aquatic plants, wetlands, Lake Tana, human disturbance activities

Procedia PDF Downloads 98
24107 Artificial Intelligence Methods in Estimating the Minimum Miscibility Pressure Required for Gas Flooding

Authors: Emad A. Mohammed

Abstract:

Utilizing the capabilities of Data Mining and Artificial Intelligence in the prediction of the minimum miscibility pressure (MMP) required for multi-contact miscible (MCM) displacement of reservoir petroleum by hydrocarbon gas flooding using Fuzzy Logic models and Artificial Neural Network models will help a lot in giving accurate results. The factors affecting the (MMP) as it is proved from the literature and from the dataset are as follows: XC2-6: Intermediate composition in the oil-containing C2-6, CO2 and H2S, in mole %, XC1: Amount of methane in the oil (%),T: Temperature (°C), MwC7+: Molecular weight of C7+ (g/mol), YC2+: Mole percent of C2+ composition in injected gas (%), MwC2+: Molecular weight of C2+ in injected gas. Fuzzy Logic and Neural Networks have been used widely in prediction and classification, with relatively high accuracy, in different fields of study. It is well known that the Fuzzy Inference system can handle uncertainty within the inputs such as in our case. The results of this work showed that our proposed models perform better with higher performance indices than other emprical correlations.

Keywords: MMP, gas flooding, artificial intelligence, correlation

Procedia PDF Downloads 127
24106 The Various Legal Dimensions of Genomic Data

Authors: Amy Gooden

Abstract:

When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.

Keywords: artificial intelligence, data, law, genomics, rights

Procedia PDF Downloads 130
24105 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)

Procedia PDF Downloads 221
24104 Gene Names Identity Recognition Using Siamese Network for Biomedical Publications

Authors: Micheal Olaolu Arowolo, Muhammad Azam, Fei He, Mihail Popescu, Dong Xu

Abstract:

As the quantity of biological articles rises, so does the number of biological route figures. Each route figure shows gene names and relationships. Annotating pathway diagrams manually is time-consuming. Advanced image understanding models could speed up curation, but they must be more precise. There is rich information in biological pathway figures. The first step to performing image understanding of these figures is to recognize gene names automatically. Classical optical character recognition methods have been employed for gene name recognition, but they are not optimized for literature mining data. This study devised a method to recognize an image bounding box of gene name as a photo using deep Siamese neural network models to outperform the existing methods using ResNet, DenseNet and Inception architectures, the results obtained about 84% accuracy.

Keywords: biological pathway, gene identification, object detection, Siamese network

Procedia PDF Downloads 263
24103 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 255
24102 Geological Structure Identification in Semilir Formation: An Correlated Geological and Geophysical (Very Low Frequency) Data for Zonation Disaster with Current Density Parameters and Geological Surface Information

Authors: E. M. Rifqi Wilda Pradana, Bagus Bayu Prabowo, Meida Riski Pujiyati, Efraim Maykhel Hagana Ginting, Virgiawan Arya Hangga Reksa

Abstract:

The VLF (Very Low Frequency) method is an electromagnetic method that uses low frequencies between 10-30 KHz which results in a fairly deep penetration. In this study, the VLF method was used for zonation of disaster-prone areas by identifying geological structures in the form of faults. Data acquisition was carried out in Trimulyo Region, Jetis District, Bantul Regency, Special Region of Yogyakarta, Indonesia with 8 measurement paths. This study uses wave transmitters from Japan and Australia to obtain Tilt and Elipt values that can be used to create RAE (Rapat Arus Ekuivalen or Current Density) sections that can be used to identify areas that are easily crossed by electric current. This section will indicate the existence of a geological structure in the form of faults in the study area which is characterized by a high RAE value. In data processing of VLF method, it is obtained Tilt vs Elliptical graph and Moving Average (MA) Tilt vs Moving Average (MA) Elipt graph of each path that shows a fluctuating pattern and does not show any intersection at all. Data processing uses Matlab software and obtained areas with low RAE values that are 0%-6% which shows medium with low conductivity and high resistivity and can be interpreted as sandstone, claystone, and tuff lithology which is part of the Semilir Formation. Whereas a high RAE value of 10% -16% which shows a medium with high conductivity and low resistivity can be interpreted as a fault zone filled with fluid. The existence of the fault zone is strengthened by the discovery of a normal fault on the surface with strike N550W and dip 630E at coordinates X= 433256 and Y= 9127722 so that the activities of residents in the zone such as housing, mining activities and other activities can be avoided to reduce the risk of natural disasters.

Keywords: current density, faults, very low frequency, zonation

Procedia PDF Downloads 156
24101 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: biological ontology, linked data, semantic data integration, semantic web

Procedia PDF Downloads 428
24100 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 109
24099 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault

Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola

Abstract:

Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.

Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula

Procedia PDF Downloads 51
24098 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 411
24097 An Automated Approach to the Nozzle Configuration of Polycrystalline Diamond Compact Drill Bits for Effective Cuttings Removal

Authors: R. Suresh, Pavan Kumar Nimmagadda, Ming Zo Tan, Shane Hart, Sharp Ugwuocha

Abstract:

Polycrystalline diamond compact (PDC) drill bits are extensively used in the oil and gas industry as well as the mining industry. Industry engineers continually improve upon PDC drill bit designs and hydraulic conditions. Optimized injection nozzles play a key role in improving the drilling performance and efficiency of these ever changing PDC drill bits. In the first part of this study, computational fluid dynamics (CFD) modelling is performed to investigate the hydrodynamic characteristics of drilling fluid flow around the PDC drill bit. An Open-source CFD software – OpenFOAM simulates the flow around the drill bit, based on the field input data. A specifically developed console application integrates the entire CFD process including, domain extraction, meshing, and solving governing equations and post-processing. The results from the OpenFOAM solver are then compared with that of the ANSYS Fluent software. The data from both software programs agree. The second part of the paper describes the parametric study of the PDC drill bit nozzle to determine the effect of parameters such as number of nozzles, nozzle velocity, nozzle radial position and orientations on the flow field characteristics and bit washing patterns. After analyzing a series of nozzle configurations, the best configuration is identified and recommendations are made for modifying the PDC bit design.

Keywords: ANSYS Fluent, computational fluid dynamics, nozzle configuration, OpenFOAM, PDC dill bit

Procedia PDF Downloads 401
24096 Improving Trainings of Mineral Processing Operators Through Gamification and Modelling and Simulation

Authors: Pedro A. S. Bergamo, Emilia S. Streng, Jan Rosenkranz, Yousef Ghorbani

Abstract:

Within the often-hazardous mineral industry, simulation training has speedily gained appreciation as an important method of increasing site safety and productivity through enhanced operator skill and knowledge. Performance calculations related to froth flotation, one of the most important concentration methods, is probably the hardest topic taught during the training of plant operators. Currently, most training teach those skills by traditional methods like slide presentations and hand-written exercises with a heavy focus on memorization. To optimize certain aspects of these pieces of training, we developed “MinFloat”, which teaches the operation formulas of the froth flotation process with the help of gamification. The simulation core based on a first-principles flotation model was implemented in Unity3D and an instructor tutoring system was developed, which presents didactic content and reviews the selected answers. The game was tested by 25 professionals with extensive experience in the mining industry based on a questionnaire formulated for training evaluations. According to their feedback, the game scored well in terms of quality, didactic efficacy and inspiring character. The feedback of the testers on the main target audience and the outlook of the mentioned solution is presented. This paper aims to provide technical background on the construction of educational games for the mining industry besides showing how feedback from experts can more efficiently be gathered thanks to new technologies such as online forms.

Keywords: training evaluation, simulation based training, modelling, and simulation, froth flotation

Procedia PDF Downloads 102