Search results for: data envelopment analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 40502

Search results for: data envelopment analysis

40142 Statistical Correlation between Logging-While-Drilling Measurements and Wireline Caliper Logs

Authors: Rima T. Alfaraj, Murtadha J. Al Tammar, Khaqan Khan, Khalid M. Alruwaili

Abstract:

OBJECTIVE/SCOPE (25-75): Caliper logging data provides critical information about wellbore shape and deformations, such as stress-induced borehole breakouts or washouts. Multiarm mechanical caliper logs are often run using wireline, which can be time-consuming, costly, and/or challenging to run in certain formations. To minimize rig time and improve operational safety, it is valuable to develop analytical solutions that can estimate caliper logs using available Logging-While-Drilling (LWD) data without the need to run wireline caliper logs. As a first step, the objective of this paper is to perform statistical analysis using an extensive datasetto identify important physical parameters that should be considered in developing such analytical solutions. METHODS, PROCEDURES, PROCESS (75-100): Caliper logs and LWD data of eleven wells, with a total of more than 80,000 data points, were obtained and imported into a data analytics software for analysis. Several parameters were selected to test the relationship of the parameters with the measured maximum and minimum caliper logs. These parameters includegamma ray, porosity, shear, and compressional sonic velocities, bulk densities, and azimuthal density. The data of the eleven wells were first visualized and cleaned.Using the analytics software, several analyses were then preformed, including the computation of Pearson’s correlation coefficients to show the statistical relationship between the selected parameters and the caliper logs. RESULTS, OBSERVATIONS, CONCLUSIONS (100-200): The results of this statistical analysis showed that some parameters show good correlation to the caliper log data. For instance, the bulk density and azimuthal directional densities showedPearson’s correlation coefficients in the range of 0.39 and 0.57, which wererelatively high when comparedto the correlation coefficients of caliper data with other parameters. Other parameters such as porosity exhibited extremely low correlation coefficients to the caliper data. Various crossplots and visualizations of the data were also demonstrated to gain further insights from the field data. NOVEL/ADDITIVE INFORMATION (25-75): This study offers a unique and novel look into the relative importance and correlation between different LWD measurements and wireline caliper logs via an extensive dataset. The results pave the way for a more informed development of new analytical solutions for estimating the size and shape of the wellbore in real-time while drilling using LWD data.

Keywords: LWD measurements, caliper log, correlations, analysis

Procedia PDF Downloads 88
40141 Dynamic Analysis and Vibration Response of Thermoplastic Rolling Elements in a Rotor Bearing System

Authors: Nesrine Gaaliche

Abstract:

This study provides a finite element dynamic model for analyzing rolling bearing system vibration response. The vibration responses of polypropylene bearings with and without defects are studied using FE analysis and compared to experimental data. The viscoelastic behavior of thermoplastic is investigated in this work to evaluate the influence of material flexibility and damping viscosity. The vibrations are detected using 3D dynamic analysis. Peak vibrations are more noticeable in an inner ring defect than in an outer ring defect, according to test data. The performance of thermoplastic bearings is compared to that of metal parts using vibration signals. Both the test and numerical results show that Polypropylene bearings exhibit less vibration than steel counterparts. Unlike bearings made from metal, polypropylene bearings absorb vibrations and handle shaft misalignments. Following validation of the overall vibration spectrum data, Von Mises stresses inside the rings are assessed under high loads. Stress is significantly high under the balls, according to the simulation findings. For the test cases, the computational findings correspond closely to the experimental results.

Keywords: viscoelastic, FE analysis, polypropylene, bearings

Procedia PDF Downloads 73
40140 Big Data: Appearance and Disappearance

Authors: James Moir

Abstract:

The mainstay of Big Data is prediction in that it allows practitioners, researchers, and policy analysts to predict trends based upon the analysis of large and varied sources of data. These can range from changing social and political opinions, patterns in crimes, and consumer behaviour. Big Data has therefore shifted the criterion of success in science from causal explanations to predictive modelling and simulation. The 19th-century science sought to capture phenomena and seek to show the appearance of it through causal mechanisms while 20th-century science attempted to save the appearance and relinquish causal explanations. Now 21st-century science in the form of Big Data is concerned with the prediction of appearances and nothing more. However, this pulls social science back in the direction of a more rule- or law-governed reality model of science and away from a consideration of the internal nature of rules in relation to various practices. In effect Big Data offers us no more than a world of surface appearance and in doing so it makes disappear any context-specific conceptual sensitivity.

Keywords: big data, appearance, disappearance, surface, epistemology

Procedia PDF Downloads 384
40139 Infrastructural Investment and Economic Growth in Indian States: A Panel Data Analysis

Authors: Jonardan Koner, Basabi Bhattacharya, Avinash Purandare

Abstract:

The study is focused to find out the impact of infrastructural investment on economic development in Indian states. The study uses panel data analysis to measure the impact of infrastructural investment on Real Gross Domestic Product in Indian States. Panel data analysis incorporates Unit Root Test, Cointegration Teat, Pooled Ordinary Least Squares, Fixed Effect Approach, Random Effect Approach, Hausman Test. The study analyzes panel data (annual in frequency) ranging from 1991 to 2012 and concludes that infrastructural investment has a desirable impact on economic development in Indian. Finally, the study reveals that the infrastructural investment significantly explains the variation of economic indicator.

Keywords: infrastructural investment, real GDP, unit root test, cointegration teat, pooled ordinary least squares, fixed effect approach, random effect approach, Hausman test

Procedia PDF Downloads 374
40138 Cross Project Software Fault Prediction at Design Phase

Authors: Pradeep Singh, Shrish Verma

Abstract:

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. The earlier we predict the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven data sets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.

Keywords: software metrics, fault prediction, cross project, within project.

Procedia PDF Downloads 309
40137 Analysis of Delivery of Quad Play Services

Authors: Rahul Malhotra, Anurag Sharma

Abstract:

Fiber based access networks can deliver performance that can support the increasing demands for high speed connections. One of the new technologies that have emerged in recent years is Passive Optical Networks. This paper is targeted to show the simultaneous delivery of triple play service (data, voice, and video). The comparative investigation and suitability of various data rates is presented. It is demonstrated that as we increase the data rate, number of users to be accommodated decreases due to increase in bit error rate.

Keywords: FTTH, quad play, play service, access networks, data rate

Procedia PDF Downloads 380
40136 Using RASCAL Code to Analyze the Postulated UF6 Fire Accident

Authors: J. R. Wang, Y. Chiang, W. S. Hsu, S. H. Chen, J. H. Yang, S. W. Chen, C. Shih, Y. F. Chang, Y. H. Huang, B. R. Shen

Abstract:

In this research, the RASCAL code was used to simulate and analyze the postulated UF6 fire accident which may occur in the Institute of Nuclear Energy Research (INER). There are four main steps in this research. In the first step, the UF6 data of INER were collected. In the second step, the RASCAL analysis methodology and model was established by using these data. Third, this RASCAL model was used to perform the simulation and analysis of the postulated UF6 fire accident. Three cases were simulated and analyzed in this step. Finally, the analysis results of RASCAL were compared with the hazardous levels of the chemicals. According to the compared results of three cases, Case 3 has the maximum danger in human health.

Keywords: RASCAL, UF₆, safety, hydrogen fluoride

Procedia PDF Downloads 187
40135 Mobile Learning: Toward Better Understanding of Compression Techniques

Authors: Farouk Lawan Gambo

Abstract:

Data compression shrinks files into fewer bits then their original presentation. It has more advantage on internet because the smaller a file, the faster it can be transferred but learning most of the concepts in data compression are abstract in nature therefore making them difficult to digest by some students (Engineers in particular). To determine the best approach toward learning data compression technique, this paper first study the learning preference of engineering students who tend to have strong active, sensing, visual and sequential learning preferences, the paper also study the advantage that mobility of learning have experienced; Learning at the point of interest, efficiency, connection, and many more. A survey is carried out with some reasonable number of students, through random sampling to see whether considering the learning preference and advantages in mobility of learning will give a promising improvement over the traditional way of learning. Evidence from data analysis using Ms-Excel as a point of concern for error-free findings shows that there is significance different in the students after using learning content provided on smart phone, also the result of the findings presented in, bar charts and pie charts interpret that mobile learning has to be promising feature of learning.

Keywords: data analysis, compression techniques, learning content, traditional learning approach

Procedia PDF Downloads 323
40134 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 488
40133 Anomaly Detection in Financial Markets Using Tucker Decomposition

Authors: Salma Krafessi

Abstract:

The financial markets have a multifaceted, intricate environment, and enormous volumes of data are produced every day. To find investment possibilities, possible fraudulent activity, and market oddities, accurate anomaly identification in this data is essential. Conventional methods for detecting anomalies frequently fail to capture the complex organization of financial data. In order to improve the identification of abnormalities in financial time series data, this study presents Tucker Decomposition as a reliable multi-way analysis approach. We start by gathering closing prices for the S&P 500 index across a number of decades. The information is converted to a three-dimensional tensor format, which contains internal characteristics and temporal sequences in a sliding window structure. The tensor is then broken down using Tucker Decomposition into a core tensor and matching factor matrices, allowing latent patterns and relationships in the data to be captured. A possible sign of abnormalities is the reconstruction error from Tucker's Decomposition. We are able to identify large deviations that indicate unusual behavior by setting a statistical threshold. A thorough examination that contrasts the Tucker-based method with traditional anomaly detection approaches validates our methodology. The outcomes demonstrate the superiority of Tucker's Decomposition in identifying intricate and subtle abnormalities that are otherwise missed. This work opens the door for more research into multi-way data analysis approaches across a range of disciplines and emphasizes the value of tensor-based methods in financial analysis.

Keywords: tucker decomposition, financial markets, financial engineering, artificial intelligence, decomposition models

Procedia PDF Downloads 21
40132 Analyzing Keyword Networks for the Identification of Correlated Research Topics

Authors: Thiago M. R. Dias, Patrícia M. Dias, Gray F. Moita

Abstract:

The production and publication of scientific works have increased significantly in the last years, being the Internet the main factor of access and distribution of these works. Faced with this, there is a growing interest in understanding how scientific research has evolved, in order to explore this knowledge to encourage research groups to become more productive. Therefore, the objective of this work is to explore repositories containing data from scientific publications and to characterize keyword networks of these publications, in order to identify the most relevant keywords, and to highlight those that have the greatest impact on the network. To do this, each article in the study repository has its keywords extracted and in this way the network is  characterized, after which several metrics for social network analysis are applied for the identification of the highlighted keywords.

Keywords: bibliometrics, data analysis, extraction and data integration, scientometrics

Procedia PDF Downloads 218
40131 Confirmatory Factor Analysis of Smartphone Addiction Inventory (SPAI) in the Yemeni Environment

Authors: Mohammed Al-Khadher

Abstract:

Currently, we are witnessing rapid advancements in the field of information and communications technology, forcing us, as psychologists, to combat the psychological and social effects of such developments. It also drives us to continually look for the development and preparation of measurement tools compatible with the changes brought about by the digital revolution. In this context, the current study aimed to identify the factor analysis of the Smartphone Addiction Inventory (SPAI) in the Republic of Yemen. The sample consisted of (1920) university students (1136 males and 784 females) who answered the inventory, and the data was analyzed using the statistical software (AMOS V25). The factor analysis results showed a goodness-of-fit of the data five-factor model with excellent indicators, as RMSEA-(.052), CFI-(.910), GFI-(.931), AGFI-(.915), TLI-(.897), NFI-(.895), RFI-(.880), and RMR-(.032). All within the ideal range to prove the model's fit of the scale’s factor analysis. The confirmatory factor analysis results showed factor loading in (4) items on (Time Spent), (4) items on (Compulsivity), (8) items on (Daily Life Interference), (5) items on (Craving), and (3) items on (Sleep interference); and all standard values of factor loading were statistically significant at the significance level (>.001).

Keywords: smartphone addiction inventory (SPAI), confirmatory factor analysis (CFA), yemeni students, people at risk of smartphone addiction

Procedia PDF Downloads 57
40130 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 292
40129 A Safety Analysis Method for Multi-Agent Systems

Authors: Ching Louis Liu, Edmund Kazmierczak, Tim Miller

Abstract:

Safety analysis for multi-agent systems is complicated by the, potentially nonlinear, interactions between agents. This paper proposes a method for analyzing the safety of multi-agent systems by explicitly focusing on interactions and the accident data of systems that are similar in structure and function to the system being analyzed. The method creates a Bayesian network using the accident data from similar systems. A feature of our method is that the events in accident data are labeled with HAZOP guide words. Our method uses an Ontology to abstract away from the details of a multi-agent implementation. Using the ontology, our methods then constructs an “Interaction Map,” a graphical representation of the patterns of interactions between agents and other artifacts. Interaction maps combined with statistical data from accidents and the HAZOP classifications of events can be converted into a Bayesian Network. Bayesian networks allow designers to explore “what it” scenarios and make design trade-offs that maintain safety. We show how to use the Bayesian networks, and the interaction maps to improve multi-agent system designs.

Keywords: multi-agent system, safety analysis, safety model, integration map

Procedia PDF Downloads 391
40128 The Development of the Website Learning the Local Wisdom in Phra Nakhon Si Ayutthaya Province

Authors: Bunthida Chunngam, Thanyanan Worasesthaphong

Abstract:

This research had objective to develop of the website learning the local wisdom in Phra Nakhon Si Ayutthaya province and studied satisfaction of system user. This research sample was multistage sample for 100 questionnaires, analyzed data to calculated reliability value with Cronbach’s alpha coefficient method α=0.82. This system had 3 functions which were system using, system feather evaluation and system accuracy evaluation which the statistics used for data analysis was descriptive statistics to explain sample feature so these statistics were frequency, percentage, mean and standard deviation. This data analysis result found that the system using performance quality had good level satisfaction (4.44 mean), system feather function analysis had good level satisfaction (4.11 mean) and system accuracy had good level satisfaction (3.74 mean).

Keywords: website, learning, local wisdom, Phra Nakhon Si Ayutthaya province

Procedia PDF Downloads 92
40127 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 27
40126 Autonomic Threat Avoidance and Self-Healing in Database Management System

Authors: Wajahat Munir, Muhammad Haseeb, Adeel Anjum, Basit Raza, Ahmad Kamran Malik

Abstract:

Databases are the key components of the software systems. Due to the exponential growth of data, it is the concern that the data should be accurate and available. The data in databases is vulnerable to internal and external threats, especially when it contains sensitive data like medical or military applications. Whenever the data is changed by malicious intent, data analysis result may lead to disastrous decisions. Autonomic self-healing is molded toward computer system after inspiring from the autonomic system of human body. In order to guarantee the accuracy and availability of data, we propose a technique which on a priority basis, tries to avoid any malicious transaction from execution and in case a malicious transaction affects the system, it heals the system in an isolated mode in such a way that the availability of system would not be compromised. Using this autonomic system, the management cost and time of DBAs can be minimized. In the end, we test our model and present the findings.

Keywords: autonomic computing, self-healing, threat avoidance, security

Procedia PDF Downloads 474
40125 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 442
40124 Seismic Performance Evaluation of Existing Building Using Structural Information Modeling

Authors: Byungmin Cho, Dongchul Lee, Taejin Kim, Minhee Lee

Abstract:

The procedure for the seismic retrofit of existing buildings includes the seismic evaluation. In the evaluation step, it is assessed whether the buildings have satisfactory performance against seismic load. Based on the results of that, the buildings are upgraded. To evaluate seismic performance of the buildings, it usually goes through the model transformation from elastic analysis to inelastic analysis. However, when the data is not delivered through the interwork, engineers should manually input the data. In this process, since it leads to inaccuracy and loss of information, the results of the analysis become less accurate. Therefore, in this study, the process for the seismic evaluation of existing buildings using structural information modeling is suggested. This structural information modeling makes the work economic and accurate. To this end, it is determined which part of the process could be computerized through the investigation of the process for the seismic evaluation based on ASCE 41. The structural information modeling process is developed to apply to the seismic evaluation using Perform 3D program usually used for the nonlinear response history analysis. To validate this process, the seismic performance of an existing building is investigated.

Keywords: existing building, nonlinear analysis, seismic performance, structural information modeling

Procedia PDF Downloads 351
40123 Implementation and Performance Analysis of Data Encryption Standard and RSA Algorithm with Image Steganography and Audio Steganography

Authors: S. C. Sharma, Ankit Gambhir, Rajeev Arya

Abstract:

In today’s era data security is an important concern and most demanding issues because it is essential for people using online banking, e-shopping, reservations etc. The two major techniques that are used for secure communication are Cryptography and Steganography. Cryptographic algorithms scramble the data so that intruder will not able to retrieve it; however steganography covers that data in some cover file so that presence of communication is hidden. This paper presents the implementation of Ron Rivest, Adi Shamir, and Leonard Adleman (RSA) Algorithm with Image and Audio Steganography and Data Encryption Standard (DES) Algorithm with Image and Audio Steganography. The coding for both the algorithms have been done using MATLAB and its observed that these techniques performed better than individual techniques. The risk of unauthorized access is alleviated up to a certain extent by using these techniques. These techniques could be used in Banks, RAW agencies etc, where highly confidential data is transferred. Finally, the comparisons of such two techniques are also given in tabular forms.

Keywords: audio steganography, data security, DES, image steganography, intruder, RSA, steganography

Procedia PDF Downloads 260
40122 Customers’ Acceptability of Islamic Banking: Employees’ Perspective in Peshawar

Authors: Tahira Imtiaz, Karim Ullah

Abstract:

This paper aims to incorporate the banks employees’ perspective on acceptability of Islamic banking by the customers of Peshawar. A qualitative approach is adopted for which six in-depth interviews with employees of Islamic banks are conducted. The employees were asked to share their experience regarding customers’ acceptance attitude towards acceptability of Islamic banking. Collected data was analyzed through thematic analysis technique and its synthesis with the current literature. Through data analysis a theoretical framework is developed, which highlights the factors which drive customers towards Islamic banking, as witnessed by the employees. The practical implication of analyzed data evident that a new model could be developed on the basis of four determinants of human preference namely: inner satisfaction, time, faith and market forces.

Keywords: customers’ attraction, employees’ perspective, Islamic banking, Riba

Procedia PDF Downloads 302
40121 An Architectural Model for APT Detection

Authors: Nam-Uk Kim, Sung-Hwan Kim, Tai-Myoung Chung

Abstract:

Typical security management systems are not suitable for detecting APT attack, because they cannot draw the big picture from trivial events of security solutions. Although SIEM solutions have security analysis engine for that, their security analysis mechanisms need to be verified in academic field. Although this paper proposes merely an architectural model for APT detection, we will keep studying on correlation analysis mechanism in the future.

Keywords: advanced persistent threat, anomaly detection, data mining

Procedia PDF Downloads 495
40120 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 466
40119 TAXAPRO, A Streamlined Pipeline to Analyze Shotgun Metagenomes

Authors: Sofia Sehli, Zainab El Ouafi, Casey Eddington, Soumaya Jbara, Kasambula Arthur Shem, Islam El Jaddaoui, Ayorinde Afolayan, Olaitan I. Awe, Allissa Dillman, Hassan Ghazal

Abstract:

The ability to promptly sequence whole genomes at a relatively low cost has revolutionized the way we study the microbiome. Microbiologists are no longer limited to studying what can be grown in a laboratory and instead are given the opportunity to rapidly identify the makeup of microbial communities in a wide variety of environments. Analyzing whole genome sequencing (WGS) data is a complex process that involves multiple moving parts and might be rather unintuitive for scientists that don’t typically work with this type of data. Thus, to help lower the barrier for less-computationally inclined individuals, TAXAPRO was developed at the first Omics Codeathon held virtually by the African Society for Bioinformatics and Computational Biology (ASBCB) in June 2021. TAXAPRO is an advanced metagenomics pipeline that accurately assembles organelle genomes from whole-genome sequencing data. TAXAPRO seamlessly combines WGS analysis tools to create a pipeline that automatically processes raw WGS data and presents organism abundance information in both a tabular and graphical format. TAXAPRO was evaluated using COVID-19 patient gut microbiome data. Analysis performed by TAXAPRO demonstrated a high abundance of Clostridia and Bacteroidia genera and a low abundance of Proteobacteria genera relative to others in the gut microbiome of patients hospitalized with COVID-19, consistent with the original findings derived using a different analysis methodology. This provides crucial evidence that the TAXAPRO workflow dispenses reliable organism abundance information overnight without the hassle of performing the analysis manually.

Keywords: metagenomics, shotgun metagenomic sequence analysis, COVID-19, pipeline, bioinformatics

Procedia PDF Downloads 173
40118 Geographic Information Systems and Remotely Sensed Data for the Hydrological Modelling of Mazowe Dam

Authors: Ellen Nhedzi Gozo

Abstract:

Unavailability of adequate hydro-meteorological data has always limited the analysis and understanding of hydrological behaviour of several dam catchments including Mazowe Dam in Zimbabwe. The problem of insufficient data for Mazowe Dam catchment analysis was solved by extracting catchment characteristics and aerial hydro-meteorological data from ASTER, LANDSAT, Shuttle Radar Topographic Mission SRTM remote sensing (RS) images using ILWIS, ArcGIS and ERDAS Imagine geographic information systems (GIS) software. Available observed hydrological as well as meteorological data complemented the use of the remotely sensed information. Ground truth land cover was mapped using a Garmin Etrex global positioning system (GPS) system. This information was then used to validate land cover classification detail that was obtained from remote sensing images. A bathymetry survey was conducted using a SONAR system connected to GPS. Hydrological modelling using the HBV model was then performed to simulate the hydrological process of the catchment in an effort to verify the reliability of the derived parameters. The model output shows a high Nash-Sutcliffe Coefficient that is close to 1 indicating that the parameters derived from remote sensing and GIS can be applied with confidence in the analysis of Mazowe Dam catchment.

Keywords: geographic information systems, hydrological modelling, remote sensing, water resources management

Procedia PDF Downloads 296
40117 The Data Quality Model for the IoT based Real-time Water Quality Monitoring Sensors

Authors: Rabbia Idrees, Ananda Maiti, Saurabh Garg, Muhammad Bilal Amin

Abstract:

IoT devices are the basic building blocks of IoT network that generate enormous volume of real-time and high-speed data to help organizations and companies to take intelligent decisions. To integrate this enormous data from multisource and transfer it to the appropriate client is the fundamental of IoT development. The handling of this huge quantity of devices along with the huge volume of data is very challenging. The IoT devices are battery-powered and resource-constrained and to provide energy efficient communication, these IoT devices go sleep or online/wakeup periodically and a-periodically depending on the traffic loads to reduce energy consumption. Sometime these devices get disconnected due to device battery depletion. If the node is not available in the network, then the IoT network provides incomplete, missing, and inaccurate data. Moreover, many IoT applications, like vehicle tracking and patient tracking require the IoT devices to be mobile. Due to this mobility, If the distance of the device from the sink node become greater than required, the connection is lost. Due to this disconnection other devices join the network for replacing the broken-down and left devices. This make IoT devices dynamic in nature which brings uncertainty and unreliability in the IoT network and hence produce bad quality of data. Due to this dynamic nature of IoT devices we do not know the actual reason of abnormal data. If data are of poor-quality decisions are likely to be unsound. It is highly important to process data and estimate data quality before bringing it to use in IoT applications. In the past many researchers tried to estimate data quality and provided several Machine Learning (ML), stochastic and statistical methods to perform analysis on stored data in the data processing layer, without focusing the challenges and issues arises from the dynamic nature of IoT devices and how it is impacting data quality. A comprehensive review on determining the impact of dynamic nature of IoT devices on data quality is done in this research and presented a data quality model that can deal with this challenge and produce good quality of data. This research presents the data quality model for the sensors monitoring water quality. DBSCAN clustering and weather sensors are used in this research to make data quality model for the sensors monitoring water quality. An extensive study has been done in this research on finding the relationship between the data of weather sensors and sensors monitoring water quality of the lakes and beaches. The detailed theoretical analysis has been presented in this research mentioning correlation between independent data streams of the two sets of sensors. With the help of the analysis and DBSCAN, a data quality model is prepared. This model encompasses five dimensions of data quality: outliers’ detection and removal, completeness, patterns of missing values and checks the accuracy of the data with the help of cluster’s position. At the end, the statistical analysis has been done on the clusters formed as the result of DBSCAN, and consistency is evaluated through Coefficient of Variation (CoV).

Keywords: clustering, data quality, DBSCAN, and Internet of things (IoT)

Procedia PDF Downloads 107
40116 Positive Affect, Negative Affect, Organizational and Motivational Factor on the Acceptance of Big Data Technologies

Authors: Sook Ching Yee, Angela Siew Hoong Lee

Abstract:

Big data technologies have become a trend to exploit business opportunities and provide valuable business insights through the analysis of big data. However, there are still many organizations that have yet to adopt big data technologies especially small and medium organizations (SME). This study uses the technology acceptance model (TAM) to look into several constructs in the TAM and other additional constructs which are positive affect, negative affect, organizational factor and motivational factor. The conceptual model proposed in the study will be tested on the relationship and influence of positive affect, negative affect, organizational factor and motivational factor towards the intention to use big data technologies to produce an outcome. Empirical research is used in this study by conducting a survey to collect data.

Keywords: big data technologies, motivational factor, negative affect, organizational factor, positive affect, technology acceptance model (TAM)

Procedia PDF Downloads 323
40115 An Analysis System for Integrating High-Throughput Transcript Abundance Data with Metabolic Pathways in Green Algae

Authors: Han-Qin Zheng, Yi-Fan Chiang-Hsieh, Chia-Hung Chien, Wen-Chi Chang

Abstract:

As the most important non-vascular plants, algae have many research applications, including high species diversity, biofuel sources, adsorption of heavy metals and, following processing, health supplements. With the increasing availability of next-generation sequencing (NGS) data for algae genomes and transcriptomes, an integrated resource for retrieving gene expression data and metabolic pathway is essential for functional analysis and systems biology in algae. However, gene expression profiles and biological pathways are displayed separately in current resources, and making it impossible to search current databases directly to identify the cellular response mechanisms. Therefore, this work develops a novel AlgaePath database to retrieve gene expression profiles efficiently under various conditions in numerous metabolic pathways. AlgaePath, a web-based database, integrates gene information, biological pathways, and next-generation sequencing (NGS) datasets in Chlamydomonasreinhardtii and Neodesmus sp. UTEX 2219-4. Users can identify gene expression profiles and pathway information by using five query pages (i.e. Gene Search, Pathway Search, Differentially Expressed Genes (DEGs) Search, Gene Group Analysis, and Co-Expression Analysis). The gene expression data of 45 and 4 samples can be obtained directly on pathway maps in C. reinhardtii and Neodesmus sp. UTEX 2219-4, respectively. Genes that are differentially expressed between two conditions can be identified in Folds Search. Furthermore, the Gene Group Analysis of AlgaePath includes pathway enrichment analysis, and can easily compare the gene expression profiles of functionally related genes in a map. Finally, Co-Expression Analysis provides co-expressed transcripts of a target gene. The analysis results provide a valuable reference for designing further experiments and elucidating critical mechanisms from high-throughput data. More than an effective interface to clarify the transcript response mechanisms in different metabolic pathways under various conditions, AlgaePath is also a data mining system to identify critical mechanisms based on high-throughput sequencing.

Keywords: next-generation sequencing (NGS), algae, transcriptome, metabolic pathway, co-expression

Procedia PDF Downloads 377
40114 Chemometric-Based Voltammetric Method for Analysis of Vitamins and Heavy Metals in Honey Samples

Authors: Marwa A. A. Ragab, Amira F. El-Yazbi, Amr El-Hawiet

Abstract:

The analysis of heavy metals in honey samples is crucial. When found in honey, they denote environmental pollution. Some of these heavy metals as lead either present at low or high concentrations are considered to be toxic. Other heavy metals, for example, copper and zinc, if present at low concentrations, they considered safe even vital minerals. On the contrary, if they present at high concentrations, they are toxic. Their voltammetric determination in honey represents a challenge due to the presence of other electro-active components as vitamins, which may overlap with the peaks of the metal, hindering their accurate and precise determination. The simultaneous analysis of some vitamins: nicotinic acid (B3) and riboflavin (B2), and heavy metals: lead, cadmium, and zinc, in honey samples, was addressed. The analysis was done in 0.1 M Potassium Chloride (KCl) using a hanging mercury drop electrode (HMDE), followed by chemometric manipulation of the voltammetric data using the derivative method. Then the derivative data were convoluted using discrete Fourier functions. The proposed method allowed the simultaneous analysis of vitamins and metals though their varied responses and sensitivities. Although their peaks were overlapped, the proposed chemometric method allowed their accurate and precise analysis. After the chemometric treatment of the data, metals were successfully quantified at low levels in the presence of vitamins (1: 2000). The heavy metals limit of detection (LOD) values after the chemometric treatment of data decreased by more than 60% than those obtained from the direct voltammetric method. The method applicability was tested by analyzing the selected metals and vitamins in real honey samples obtained from different botanical origins.

Keywords: chemometrics, overlapped voltammetric peaks, derivative and convoluted derivative methods, metals and vitamins

Procedia PDF Downloads 120
40113 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 372