Search results for: omics data analysis
41766 Using Discriminant Analysis to Forecast Crime Rate in Nigeria
Authors: O. P. Popoola, O. A. Alawode, M. O. Olayiwola, A. M. Oladele
Abstract:
This research work is based on using discriminant analysis to forecast crime rate in Nigeria between 1996 and 2008. The work is interested in how gender (male and female) relates to offences committed against the government, against other properties, disturbance in public places, murder/robbery offences and other offences. The data used was collected from the National Bureau of Statistics (NBS). SPSS, the statistical package was used to analyse the data. Time plot was plotted on all the 29 offences gotten from the raw data. Eigenvalues and Multivariate tests, Wilks’ Lambda, standardized canonical discriminant function coefficients and the predicted classifications were estimated. The research shows that the distribution of the scores from each function is standardized to have a mean O and a standard deviation of 1. The magnitudes of the coefficients indicate how strongly the discriminating variable affects the score. In the predicted group membership, 172 cases that were predicted to commit crime against Government group, 66 were correctly predicted and 106 were incorrectly predicted. After going through the predicted classifications, we found out that most groups numbers that were correctly predicted were less than those that were incorrectly predicted.Keywords: discriminant analysis, DA, multivariate analysis of variance, MANOVA, canonical correlation, and Wilks’ Lambda
Procedia PDF Downloads 46841765 Document-level Sentiment Analysis: An Exploratory Case Study of Low-resource Language Urdu
Authors: Ammarah Irum, Muhammad Ali Tahir
Abstract:
Document-level sentiment analysis in Urdu is a challenging Natural Language Processing (NLP) task due to the difficulty of working with lengthy texts in a language with constrained resources. Deep learning models, which are complex neural network architectures, are well-suited to text-based applications in addition to data formats like audio, image, and video. To investigate the potential of deep learning for Urdu sentiment analysis, we implemented five different deep learning models, including Bidirectional Long Short Term Memory (BiLSTM), Convolutional Neural Network (CNN), Convolutional Neural Network with Bidirectional Long Short Term Memory (CNN-BiLSTM), and Bidirectional Encoder Representation from Transformer (BERT). In this study, we developed a hybrid deep learning model called BiLSTM-Single Layer Multi Filter Convolutional Neural Network (BiLSTM-SLMFCNN) by fusing BiLSTM and CNN architecture. The proposed and baseline techniques are applied on Urdu Customer Support data set and IMDB Urdu movie review data set by using pre-trained Urdu word embedding that are suitable for sentiment analysis at the document level. Results of these techniques are evaluated and our proposed model outperforms all other deep learning techniques for Urdu sentiment analysis. BiLSTM-SLMFCNN outperformed the baseline deep learning models and achieved 83%, 79%, 83% and 94% accuracy on small, medium and large sized IMDB Urdu movie review data set and Urdu Customer Support data set respectively.Keywords: urdu sentiment analysis, deep learning, natural language processing, opinion mining, low-resource language
Procedia PDF Downloads 7241764 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data
Authors: Adarsh Shroff
Abstract:
Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.Keywords: big data, map reduce, incremental processing, iterative computation
Procedia PDF Downloads 35041763 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach
Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini
Abstract:
Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing
Procedia PDF Downloads 16741762 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach
Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini
Abstract:
Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanismsKeywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing
Procedia PDF Downloads 15941761 Possible Risks for Online Orders in the Furniture Industry - Customer and Entrepreneur Perspective
Authors: Justyna Żywiołek, Marek Matulewski
Abstract:
Data, is information processed by enterprises for primary and secondary purposes as processes. Thanks to processing, the sales process takes place; in the case of the surveyed companies, sales take place online. However, this indirect form of contact with the customer causes many problems for both customers and furniture manufacturers. The article presents solutions that would solve problems related to the analysis of data and information in the order fulfillment process sent to post-warranty service. The article also presents an analysis of threats to the security of this information, both for customers and the enterprise.Keywords: ordering furniture online, information security, furniture industry, enterprise security, risk analysis
Procedia PDF Downloads 4841760 An Analysis of Privacy and Security for Internet of Things Applications
Authors: Dhananjay Singh, M. Abdullah-Al-Wadud
Abstract:
The Internet of Things is a concept of a large scale ecosystem of wireless actuators. The actuators are defined as things in the IoT, those which contribute or produces some data to the ecosystem. However, ubiquitous data collection, data security, privacy preserving, large volume data processing, and intelligent analytics are some of the key challenges into the IoT technologies. In order to solve the security requirements, challenges and threats in the IoT, we have discussed a message authentication mechanism for IoT applications. Finally, we have discussed data encryption mechanism for messages authentication before propagating into IoT networks.Keywords: Internet of Things (IoT), message authentication, privacy, security
Procedia PDF Downloads 38241759 Design and Development of Data Mining Application for Medical Centers in Remote Areas
Authors: Grace Omowunmi Soyebi
Abstract:
Data Mining is the extraction of information from a large database which helps in predicting a trend or behavior, thereby helping management make knowledge-driven decisions. One principal problem of most hospitals in rural areas is making use of the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method, which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to easily retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.Keywords: data mining, medical record system, systems programming, computing
Procedia PDF Downloads 20941758 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach
Authors: Theertha Chandroth
Abstract:
This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.Keywords: XML, JSON, data comparison, integration testing, Python, SQL
Procedia PDF Downloads 14041757 Application of Regularized Spatio-Temporal Models to the Analysis of Remote Sensing Data
Authors: Salihah Alghamdi, Surajit Ray
Abstract:
Space-time data can be observed over irregularly shaped manifolds, which might have complex boundaries or interior gaps. Most of the existing methods do not consider the shape of the data, and as a result, it is difficult to model irregularly shaped data accommodating the complex domain. We used a method that can deal with space-time data that are distributed over non-planner shaped regions. The method is based on partial differential equations and finite element analysis. The model can be estimated using a penalized least squares approach with a regularization term that controls the over-fitting. The model is regularized using two roughness penalties, which consider the spatial and temporal regularities separately. The integrated square of the second derivative of the basis function is used as temporal penalty. While the spatial penalty consists of the integrated square of Laplace operator, which is integrated exclusively over the domain of interest that is determined using finite element technique. In this paper, we applied a spatio-temporal regression model with partial differential equations regularization (ST-PDE) approach to analyze a remote sensing data measuring the greenness of vegetation, measure by an index called enhanced vegetation index (EVI). The EVI data consist of measurements that take values between -1 and 1 reflecting the level of greenness of some region over a period of time. We applied (ST-PDE) approach to irregular shaped region of the EVI data. The approach efficiently accommodates the irregular shaped regions taking into account the complex boundaries rather than smoothing across the boundaries. Furthermore, the approach succeeds in capturing the temporal variation in the data.Keywords: irregularly shaped domain, partial differential equations, finite element analysis, complex boundray
Procedia PDF Downloads 14041756 TRACE/FRAPTRAN Analysis of Kuosheng Nuclear Power Plant Dry-Storage System
Authors: J. R. Wang, Y. Chiang, W. Y. Li, H. T. Lin, H. C. Chen, C. Shih, S. W. Chen
Abstract:
The dry-storage systems of nuclear power plants (NPPs) in Taiwan have become one of the major safety concerns. There are two steps considered in this study. The first step is the verification of the TRACE by using VSC-17 experimental data. The results of TRACE were similar to the VSC-17 data. It indicates that TRACE has the respectable accuracy in the simulation and analysis of the dry-storage systems. The next step is the application of TRACE in the dry-storage system of Kuosheng NPP (BWR/6). Kuosheng NPP is the second BWR NPP of Taiwan Power Company. In order to solve the storage of the spent fuels, Taiwan Power Company developed the new dry-storage system for Kuosheng NPP. In this step, the dry-storage system model of Kuosheng NPP was established by TRACE. Then, the steady state simulation of this model was performed and the results of TRACE were compared with the Kuosheng NPP data. Finally, this model was used to perform the safety analysis of Kuosheng NPP dry-storage system. Besides, FRAPTRAN was used tocalculate the transient performance of fuel rods.Keywords: BWR, TRACE, FRAPTRAN, dry-storage
Procedia PDF Downloads 51941755 Using Machine Learning Techniques to Extract Useful Information from Dark Data
Authors: Nigar Hussain
Abstract:
It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.Keywords: big data, dark data, machine learning, heatmap, random forest
Procedia PDF Downloads 2841754 Evaluating Learning Outcomes in the Implementation of Flipped Teaching Using Data Envelopment Analysis
Authors: Huie-Wen Lin
Abstract:
This study integrated various teaching factors -based on the idea of a flipped classroom- in a financial management course. The study’s aim was to establish an effective teaching implementation strategy and evaluation mechanism with respect to learning outcomes, which can serve as a reference for the future modification of teaching methods. This study implemented a teaching method in five stages and estimated the learning efficiencies of 22 students (in the teaching scenario and over two semesters). Subsequently, data envelopment analysis (DEA) was used to compare, for each student, between the learning efficiencies before and after participation in the flipped classroom -in the first and second semesters, respectively- to identify the crucial external factors influencing learning efficiency. According to the results, the average overall student learning efficiency increased from 0.901 in the first semester to 0.967 in the second semester, which demonstrate that the flipped classroom approach can improve teaching effectiveness and learning outcomes. The results also revealed a difference in learning efficiency between male and female students.Keywords: data envelopment analysis, flipped classroom, learning outcome, teaching and learning
Procedia PDF Downloads 15641753 Using RASCAL and ALOHA Codes to Establish an Analysis Methodology for Hydrogen Fluoride Evaluation
Authors: J. R. Wang, Y. Chiang, W. S. Hsu, H. C. Chen, S. H. Chen, J. H. Yang, S. W. Chen, C. Shih
Abstract:
In this study, the RASCAL and ALOHA codes are used to establish an analysis methodology for hydrogen fluoride (HF) evaluation. There are three main steps in this study. First, the UF6 data were collected. Second, one postulated case was analyzed by using the RASCAL and UF6 data. This postulated case assumes that fire occurring and UF6 is releasing from a building. Third, the results of RASCAL for HF mass were as the input data of ALOHA. Two postulated cases of HF were analyzed by using ALOHA code and the results of RASCAL. These postulated cases assume fire occurring and HF is releasing with no raining (Case 1) or raining (Case 2) condition. According to the analysis results of ALOHA, the HF concentration of Case 2 is smaller than Case 1. The results can be a reference for the preparing of emergency plans for the release of HF.Keywords: RASCAL, ALOHA, UF₆, hydrogen fluoride
Procedia PDF Downloads 75041752 Ethical Leadership and Individual Creativity: The Mediating Role of Psychological Safety
Authors: Hyeondal Jeong, Yoonjung Baek
Abstract:
This study examines the relationship between ethical leadership and individual creativity and focused on mediating effects of psychological safety. In order to clarify the mechanism of ethical leadership, psychological safety of the members was set as a mediator. Using data gathered from a sample of 150 employees. For data analysis, exploratory factor analysis, correlation analysis, hierarchical regression analysis and Sobel-Test were performed. The results showed that ethical leadership had a positive effect on psychological safety and individual creativity, and psychological safety had a positive mediating effect. Since the mediating effect of psychological safety has been confirmed, we need to find ways to improve the psychological safety of the members in terms of organizational management. Psychological safety has a positive effect on individual creativity, which can have a positive impact on innovation throughout the organization.Keywords: ethical leadership, creativity, psychological safety, ethics management, innovative behaviors
Procedia PDF Downloads 24941751 Social Media Data Analysis for Personality Modelling and Learning Styles Prediction Using Educational Data Mining
Authors: Srushti Patil, Preethi Baligar, Gopalkrishna Joshi, Gururaj N. Bhadri
Abstract:
In designing learning environments, the instructional strategies can be tailored to suit the learning style of an individual to ensure effective learning. In this study, the information shared on social media like Facebook is being used to predict learning style of a learner. Previous research studies have shown that Facebook data can be used to predict user personality. Users with a particular personality exhibit an inherent pattern in their digital footprint on Facebook. The proposed work aims to correlate the user's’ personality, predicted from Facebook data to the learning styles, predicted through questionnaires. For Millennial learners, Facebook has become a primary means for information sharing and interaction with peers. Thus, it can serve as a rich bed for research and direct the design of learning environments. The authors have conducted this study in an undergraduate freshman engineering course. Data from 320 freshmen Facebook users was collected. The same users also participated in the learning style and personality prediction survey. The Kolb’s Learning style questionnaires and Big 5 personality Inventory were adopted for the survey. The users have agreed to participate in this research and have signed individual consent forms. A specific page was created on Facebook to collect user data like personal details, status updates, comments, demographic characteristics and egocentric network parameters. This data was captured by an application created using Python program. The data captured from Facebook was subjected to text analysis process using the Linguistic Inquiry and Word Count dictionary. An analysis of the data collected from the questionnaires performed reveals individual student personality and learning style. The results obtained from analysis of Facebook, learning style and personality data were then fed into an automatic classifier that was trained by using the data mining techniques like Rule-based classifiers and Decision trees. This helps to predict the user personality and learning styles by analysing the common patterns. Rule-based classifiers applied for text analysis helps to categorize Facebook data into positive, negative and neutral. There were totally two models trained, one to predict the personality from Facebook data; another one to predict the learning styles from the personalities. The results show that the classifier model has high accuracy which makes the proposed method to be a reliable one for predicting the user personality and learning styles.Keywords: educational data mining, Facebook, learning styles, personality traits
Procedia PDF Downloads 23141750 Study and Analysis of the Factors Affecting Road Safety Using Decision Tree Algorithms
Authors: Naina Mahajan, Bikram Pal Kaur
Abstract:
The purpose of traffic accident analysis is to find the possible causes of an accident. Road accidents cannot be totally prevented but by suitable traffic engineering and management the accident rate can be reduced to a certain extent. This paper discusses the classification techniques C4.5 and ID3 using the WEKA Data mining tool. These techniques use on the NH (National highway) dataset. With the C4.5 and ID3 technique it gives best results and high accuracy with less computation time and error rate.Keywords: C4.5, ID3, NH(National highway), WEKA data mining tool
Procedia PDF Downloads 33841749 A Study on the HTML5 Based Multi Media Contents Authority Tool
Authors: Heesuk Seo, Yongtae Kim
Abstract:
Online learning started in the 1990s, the spread of the Internet has been through the era of e-learning paradigm of online education in the era of smart learning change. Reflecting the different nature of the mobile to anywhere anytime, anywhere was also allows the form of learning, it was also available through the learning content and interaction. We are developing a cloud system, 'TLINKS CLOUD' that allows you to configure the environment of the smart learning without the need for additional infrastructure. Using the big-data analysis for e-learning contents, we provide an integrated solution for e-learning tailored to individual study.Keywords: authority tool, big data analysis, e-learning, HTML5
Procedia PDF Downloads 40641748 Multichannel Analysis of the Surface Waves of Earth Materials in Some Parts of Lagos State, Nigeria
Authors: R. B. Adegbola, K. F. Oyedele, L. Adeoti
Abstract:
We present a method that utilizes Multi-channel Analysis of Surface Waves, which was used to measure shear wave velocities with a view to establishing the probable causes of road failure, subsidence and weakening of structures in some Local Government Area, Lagos, Nigeria. Multi channel Analysis of Surface waves (MASW) data were acquired using 24-channel seismograph. The acquired data were processed and transformed into two-dimensional (2-D) structure reflective of depth and surface wave velocity distribution within a depth of 0–15m beneath the surface using SURFSEIS software. The shear wave velocity data were compared with other geophysical/borehole data that were acquired along the same profile. The comparison and correlation illustrates the accuracy and consistency of MASW derived-shear wave velocity profiles. Rigidity modulus and N-value were also generated. The study showed that the low velocity/very low velocity are reflective of organic clay/peat materials and thus likely responsible for the failed, subsidence/weakening of structures within the study areas.Keywords: seismograph, road failure, rigidity modulus, N-value, subsidence
Procedia PDF Downloads 36341747 Fuzzy Approach for Fault Tree Analysis of Water Tube Boiler
Authors: Syed Ahzam Tariq, Atharva Modi
Abstract:
This paper presents a probabilistic analysis of the safety of water tube boilers using fault tree analysis (FTA). A fault tree has been constructed by considering all possible areas where a malfunction could lead to a boiler accident. Boiler accidents are relatively rare, causing a scarcity of data. The fuzzy approach is employed to perform a quantitative analysis, wherein theories of fuzzy logic are employed in conjunction with expert elicitation to calculate failure probabilities. The Fuzzy Fault Tree Analysis (FFTA) provides a scientific and contingent method to forecast and prevent accidents.Keywords: fault tree analysis water tube boiler, fuzzy probability score, failure probability
Procedia PDF Downloads 12741746 Big Data: Appearance and Disappearance
Authors: James Moir
Abstract:
The mainstay of Big Data is prediction in that it allows practitioners, researchers, and policy analysts to predict trends based upon the analysis of large and varied sources of data. These can range from changing social and political opinions, patterns in crimes, and consumer behaviour. Big Data has therefore shifted the criterion of success in science from causal explanations to predictive modelling and simulation. The 19th-century science sought to capture phenomena and seek to show the appearance of it through causal mechanisms while 20th-century science attempted to save the appearance and relinquish causal explanations. Now 21st-century science in the form of Big Data is concerned with the prediction of appearances and nothing more. However, this pulls social science back in the direction of a more rule- or law-governed reality model of science and away from a consideration of the internal nature of rules in relation to various practices. In effect Big Data offers us no more than a world of surface appearance and in doing so it makes disappear any context-specific conceptual sensitivity.Keywords: big data, appearance, disappearance, surface, epistemology
Procedia PDF Downloads 42141745 Recommendations Using Online Water Quality Sensors for Chlorinated Drinking Water Monitoring at Drinking Water Distribution Systems Exposed to Glyphosate
Authors: Angela Maria Fasnacht
Abstract:
Detection of anomalies due to contaminants’ presence, also known as early detection systems in water treatment plants, has become a critical point that deserves an in-depth study for their improvement and adaptation to current requirements. The design of these systems requires a detailed analysis and processing of the data in real-time, so it is necessary to apply various statistical methods appropriate to the data generated, such as Spearman’s Correlation, Factor Analysis, Cross-Correlation, and k-fold Cross-validation. Statistical analysis and methods allow the evaluation of large data sets to model the behavior of variables; in this sense, statistical treatment or analysis could be considered a vital step to be able to develop advanced models focused on machine learning that allows optimized data management in real-time, applied to early detection systems in water treatment processes. These techniques facilitate the development of new technologies used in advanced sensors. In this work, these methods were applied to identify the possible correlations between the measured parameters and the presence of the glyphosate contaminant in the single-pass system. The interaction between the initial concentration of glyphosate and the location of the sensors on the reading of the reported parameters was studied.Keywords: glyphosate, emergent contaminants, machine learning, probes, sensors, predictive
Procedia PDF Downloads 12141744 Cross Project Software Fault Prediction at Design Phase
Authors: Pradeep Singh, Shrish Verma
Abstract:
Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. The earlier we predict the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven data sets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.Keywords: software metrics, fault prediction, cross project, within project.
Procedia PDF Downloads 34441743 Reviewing Privacy Preserving Distributed Data Mining
Authors: Sajjad Baghernezhad, Saeideh Baghernezhad
Abstract:
Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.Keywords: data mining, distributed data mining, privacy protection, privacy preserving
Procedia PDF Downloads 52541742 Statistical Correlation between Logging-While-Drilling Measurements and Wireline Caliper Logs
Authors: Rima T. Alfaraj, Murtadha J. Al Tammar, Khaqan Khan, Khalid M. Alruwaili
Abstract:
OBJECTIVE/SCOPE (25-75): Caliper logging data provides critical information about wellbore shape and deformations, such as stress-induced borehole breakouts or washouts. Multiarm mechanical caliper logs are often run using wireline, which can be time-consuming, costly, and/or challenging to run in certain formations. To minimize rig time and improve operational safety, it is valuable to develop analytical solutions that can estimate caliper logs using available Logging-While-Drilling (LWD) data without the need to run wireline caliper logs. As a first step, the objective of this paper is to perform statistical analysis using an extensive datasetto identify important physical parameters that should be considered in developing such analytical solutions. METHODS, PROCEDURES, PROCESS (75-100): Caliper logs and LWD data of eleven wells, with a total of more than 80,000 data points, were obtained and imported into a data analytics software for analysis. Several parameters were selected to test the relationship of the parameters with the measured maximum and minimum caliper logs. These parameters includegamma ray, porosity, shear, and compressional sonic velocities, bulk densities, and azimuthal density. The data of the eleven wells were first visualized and cleaned.Using the analytics software, several analyses were then preformed, including the computation of Pearson’s correlation coefficients to show the statistical relationship between the selected parameters and the caliper logs. RESULTS, OBSERVATIONS, CONCLUSIONS (100-200): The results of this statistical analysis showed that some parameters show good correlation to the caliper log data. For instance, the bulk density and azimuthal directional densities showedPearson’s correlation coefficients in the range of 0.39 and 0.57, which wererelatively high when comparedto the correlation coefficients of caliper data with other parameters. Other parameters such as porosity exhibited extremely low correlation coefficients to the caliper data. Various crossplots and visualizations of the data were also demonstrated to gain further insights from the field data. NOVEL/ADDITIVE INFORMATION (25-75): This study offers a unique and novel look into the relative importance and correlation between different LWD measurements and wireline caliper logs via an extensive dataset. The results pave the way for a more informed development of new analytical solutions for estimating the size and shape of the wellbore in real-time while drilling using LWD data.Keywords: LWD measurements, caliper log, correlations, analysis
Procedia PDF Downloads 12141741 The Right to Data Portability and Its Influence on the Development of Digital Services
Authors: Roman Bieda
Abstract:
The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.Keywords: data portability, digital market, GDPR, personal data
Procedia PDF Downloads 47341740 Performance Analysis of Multichannel OCDMA-FSO Network under Different Pervasive Conditions
Authors: Saru Arora, Anurag Sharma, Harsukhpreet Singh
Abstract:
To meet the growing need of high data rate and bandwidth, various efforts has been made nowadays for the efficient communication systems. Optical Code Division Multiple Access over Free space optics communication system seems an effective role for providing transmission at high data rate with low bit error rate and low amount of multiple access interference. This paper demonstrates the OCDMA over FSO communication system up to the range of 7000 m at a data rate of 5 Gbps. Initially, the 8 user OCDMA-FSO system is simulated and pseudo orthogonal codes are used for encoding. Also, the simulative analysis of various performance parameters like power and core effective area that are having an effect on the Bit error rate (BER) of the system is carried out. The simulative analysis reveals that the length of the transmission is limited by the multi-access interference (MAI) effect which arises when the number of users increases in the system.Keywords: FSO, PSO, bit error rate (BER), opti system simulation, multiple access interference (MAI), q-factor
Procedia PDF Downloads 36541739 Analysis of Delivery of Quad Play Services
Authors: Rahul Malhotra, Anurag Sharma
Abstract:
Fiber based access networks can deliver performance that can support the increasing demands for high speed connections. One of the new technologies that have emerged in recent years is Passive Optical Networks. This paper is targeted to show the simultaneous delivery of triple play service (data, voice, and video). The comparative investigation and suitability of various data rates is presented. It is demonstrated that as we increase the data rate, number of users to be accommodated decreases due to increase in bit error rate.Keywords: FTTH, quad play, play service, access networks, data rate
Procedia PDF Downloads 41441738 Dynamic Analysis and Vibration Response of Thermoplastic Rolling Elements in a Rotor Bearing System
Authors: Nesrine Gaaliche
Abstract:
This study provides a finite element dynamic model for analyzing rolling bearing system vibration response. The vibration responses of polypropylene bearings with and without defects are studied using FE analysis and compared to experimental data. The viscoelastic behavior of thermoplastic is investigated in this work to evaluate the influence of material flexibility and damping viscosity. The vibrations are detected using 3D dynamic analysis. Peak vibrations are more noticeable in an inner ring defect than in an outer ring defect, according to test data. The performance of thermoplastic bearings is compared to that of metal parts using vibration signals. Both the test and numerical results show that Polypropylene bearings exhibit less vibration than steel counterparts. Unlike bearings made from metal, polypropylene bearings absorb vibrations and handle shaft misalignments. Following validation of the overall vibration spectrum data, Von Mises stresses inside the rings are assessed under high loads. Stress is significantly high under the balls, according to the simulation findings. For the test cases, the computational findings correspond closely to the experimental results.Keywords: viscoelastic, FE analysis, polypropylene, bearings
Procedia PDF Downloads 10441737 Infrastructural Investment and Economic Growth in Indian States: A Panel Data Analysis
Authors: Jonardan Koner, Basabi Bhattacharya, Avinash Purandare
Abstract:
The study is focused to find out the impact of infrastructural investment on economic development in Indian states. The study uses panel data analysis to measure the impact of infrastructural investment on Real Gross Domestic Product in Indian States. Panel data analysis incorporates Unit Root Test, Cointegration Teat, Pooled Ordinary Least Squares, Fixed Effect Approach, Random Effect Approach, Hausman Test. The study analyzes panel data (annual in frequency) ranging from 1991 to 2012 and concludes that infrastructural investment has a desirable impact on economic development in Indian. Finally, the study reveals that the infrastructural investment significantly explains the variation of economic indicator.Keywords: infrastructural investment, real GDP, unit root test, cointegration teat, pooled ordinary least squares, fixed effect approach, random effect approach, Hausman test
Procedia PDF Downloads 402