Search results for: data databases

24812 SPARK: An Open-Source Knowledge Discovery Platform That Leverages Non-Relational Databases and Massively Parallel Computational Power for Heterogeneous Genomic Datasets

Authors: Thilina Ranaweera, Enes Makalic, John L. Hopper, Adrian Bickerstaffe

Abstract:

Data are the primary asset of biomedical researchers, and the engine for both discovery and research translation. As the volume and complexity of research datasets increase, especially with new technologies such as large single nucleotide polymorphism (SNP) chips, so too does the requirement for software to manage, process and analyze the data. Researchers often need to execute complicated queries and conduct complex analyzes of large-scale datasets. Existing tools to analyze such data, and other types of high-dimensional data, unfortunately suffer from one or more major problems. They typically require a high level of computing expertise, are too simplistic (i.e., do not fit realistic models that allow for complex interactions), are limited by computing power, do not exploit the computing power of large-scale parallel architectures (e.g. supercomputers, GPU clusters etc.), or are limited in the types of analysis available, compounded by the fact that integrating new analysis methods is not straightforward. Solutions to these problems, such as those developed and implemented on parallel architectures, are currently available to only a relatively small portion of medical researchers with access and know-how. The past decade has seen a rapid expansion of data management systems for the medical domain. Much attention has been given to systems that manage phenotype datasets generated by medical studies. The introduction of heterogeneous genomic data for research subjects that reside in these systems has highlighted the need for substantial improvements in software architecture. To address this problem, we have developed SPARK, an enabling and translational system for medical research, leveraging existing high performance computing resources, and analysis techniques currently available or being developed. It builds these into The Ark, an open-source web-based system designed to manage medical data. SPARK provides a next-generation biomedical data management solution that is based upon a novel Micro-Service architecture and Big Data technologies. The system serves to demonstrate the applicability of Micro-Service architectures for the development of high performance computing applications. When applied to high-dimensional medical datasets such as genomic data, relational data management approaches with normalized data structures suffer from unfeasibly high execution times for basic operations such as insert (i.e. importing a GWAS dataset) and the queries that are typical of the genomics research domain. SPARK resolves these problems by incorporating non-relational NoSQL databases that have been driven by the emergence of Big Data. SPARK provides researchers across the world with user-friendly access to state-of-the-art data management and analysis tools while eliminating the need for high-level informatics and programming skills. The system will benefit health and medical research by eliminating the burden of large-scale data management, querying, cleaning, and analysis. SPARK represents a major advancement in genome research technologies, vastly reducing the burden of working with genomic datasets, and enabling cutting edge analysis approaches that have previously been out of reach for many medical researchers.

Keywords: biomedical research, genomics, information systems, software

Procedia PDF Downloads 255

24811 The Influence of the Form of Grain on the Mechanical Behaviour of Sand

Authors: Mohamed Boualem Salah

Abstract:

The size and shape of soil particles reflect the formation history of the grains. In turn, the macro scale behavior of the soil mass results from particle level interactions which are affected by particle shape. Sphericity, roundness and smoothness characterize different scales associated to particle shape. New experimental data and data from previously published studies are gathered into two databases to explore the effects of particle shape on packing as well as small and large-strain properties of sandy soils. Data analysis shows that increased particle irregularity (angularity and/or eccentricity) leads to: an increase in emax and emin, a decrease in stiffness yet with increased sensitivity to the state of stress, an increase in compressibility under zero-lateral strain loading, and an increase in critical state friction angle φcs and intercept Γ with a weak effect on slope λ. Therefore, particle shape emerges as a significant soil index property that needs to be properly characterized and documented, particularly in clean sands and gravels. The systematic assessment of particle shape will lead to a better understanding of sand behavior.

Keywords: angularity, eccentricity, shape particle, behavior of soil

Procedia PDF Downloads 400

24810 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: bioassay, machine learning, preprocessing, virtual screen

Procedia PDF Downloads 261

24809 The Inequality Effects of Natural Disasters: Evidence from Thailand

Authors: Annop Jaewisorn

Abstract:

This study explores the relationship between natural disasters and inequalities -both income and expenditure inequality- at a micro-level of Thailand as the first study of this nature for this country. The analysis uses a unique panel and remote-sensing dataset constructed for the purpose of this research. It contains provincial inequality measures and other economic and social indicators based on the Thailand Household Survey during the period between 1992 and 2019. Meanwhile, the data on natural disasters, which are remote-sensing data, are received from several official geophysical or meteorological databases. Employing a panel fixed effects, the results show that natural disasters significantly reduce household income and expenditure inequality as measured by the Gini index, implying that rich people in Thailand bear a higher cost of natural disasters when compared to poor people. The effect on income inequality is mainly driven by droughts, while the effect on expenditure inequality is mainly driven by flood events. The results are robust across heterogeneity of the samples, lagged effects, outliers, and an alternative inequality measure.

Keywords: inequality, natural disasters, remote-sensing data, Thailand

Procedia PDF Downloads 108

24808 Graph Neural Network-Based Classification for Disease Prediction in Health Care Heterogeneous Data Structures of Electronic Health Record

Authors: Raghavi C. Janaswamy

Abstract:

In the healthcare sector, heterogenous data elements such as patients, diagnosis, symptoms, conditions, observation text from physician notes, and prescriptions form the essentials of the Electronic Health Record (EHR). The data in the form of clear text and images are stored or processed in a relational format in most systems. However, the intrinsic structure restrictions and complex joins of relational databases limit the widespread utility. In this regard, the design and development of realistic mapping and deep connections as real-time objects offer unparallel advantages. Herein, a graph neural network-based classification of EHR data has been developed. The patient conditions have been predicted as a node classification task using a graph-based open source EHR data, Synthea Database, stored in Tigergraph. The Synthea DB dataset is leveraged due to its closer representation of the real-time data and being voluminous. The graph model is built from the EHR heterogeneous data using python modules, namely, pyTigerGraph to get nodes and edges from the Tigergraph database, PyTorch to tensorize the nodes and edges, PyTorch-Geometric (PyG) to train the Graph Neural Network (GNN) and adopt the self-supervised learning techniques with the AutoEncoders to generate the node embeddings and eventually perform the node classifications using the node embeddings. The model predicts patient conditions ranging from common to rare situations. The outcome is deemed to open up opportunities for data querying toward better predictions and accuracy.

Keywords: electronic health record, graph neural network, heterogeneous data, prediction

Procedia PDF Downloads 76

24807 Ensuring Consistency under the Snapshot Isolation

Authors: Carlos Roberto Valêncio, Fábio Renato de Almeida, Thatiane Kawabata, Leandro Alves Neves, Julio Cesar Momente, Mario Luiz Tronco, Angelo Cesar Colombini

Abstract:

By running transactions under the Snapshot isolation we can achieve a good level of concurrency, specially in databases with high-intensive read workloads. However, Snapshot is not immune to all the problems that arise from competing transactions and therefore no serialization warranty exists. We propose in this paper a technique to obtain data consistency with Snapshot by using some special triggers that we named Daemon Triggers. Besides keeping the benefits of the Snapshot isolation, the technique is specially useful for those database systems that do not have an isolation level that ensures serializability, like Firebird and Oracle. We describe all the anomalies that might arise when using the Snapshot isolation and show how to preclude them with Daemon Triggers. Based on the methodology presented here, it is also proposed the creation of a new isolation level: Daemon Snapshot.

Keywords: data consistency, serialization, snapshot, isolation

Procedia PDF Downloads 309

24806 Programming Language Extension Using Structured Query Language for Database Access

Authors: Chapman Eze Nnadozie

Abstract:

Relational databases constitute a very vital tool for the effective management and administration of both personal and organizational data. Data access ranges from a single user database management software to a more complex distributed server system. This paper intends to appraise the use a programming language extension like structured query language (SQL) to establish links to a relational database (Microsoft Access 2013) using Visual C++ 9 programming language environment. The methodology used involves the creation of tables to form a database using Microsoft Access 2013, which is Object Linking and Embedding (OLE) database compliant. The SQL command is used to query the tables in the database for easy extraction of expected records inside the visual C++ environment. The findings of this paper reveal that records can easily be accessed and manipulated to filter exactly what the user wants, such as retrieval of records with specified criteria, updating of records, and deletion of part or the whole records in a table.

Keywords: data access, database, database management system, OLE, programming language, records, relational database, software, SQL, table

Procedia PDF Downloads 171

24805 Anomaly Detection with ANN and SVM for Telemedicine Networks

Authors: Edward Guillén, Jeisson Sánchez, Carlos Omar Ramos

Abstract:

In recent years, a wide variety of applications are developed with Support Vector Machines -SVM- methods and Artificial Neural Networks -ANN-. In general, these methods depend on intrusion knowledge databases such as KDD99, ISCX, and CAIDA among others. New classes of detectors are generated by machine learning techniques, trained and tested over network databases. Thereafter, detectors are employed to detect anomalies in network communication scenarios according to user’s connections behavior. The first detector based on training dataset is deployed in different real-world networks with mobile and non-mobile devices to analyze the performance and accuracy over static detection. The vulnerabilities are based on previous work in telemedicine apps that were developed on the research group. This paper presents the differences on detections results between some network scenarios by applying traditional detectors deployed with artificial neural networks and support vector machines.

Keywords: anomaly detection, back-propagation neural networks, network intrusion detection systems, support vector machines

Procedia PDF Downloads 343

24804 Examining Statistical Monitoring Approach against Traditional Monitoring Techniques in Detecting Data Anomalies during Conduct of Clinical Trials

Authors: Sheikh Omar Sillah

Abstract:

Introduction: Monitoring is an important means of ensuring the smooth implementation and quality of clinical trials. For many years, traditional site monitoring approaches have been critical in detecting data errors but not optimal in identifying fabricated and implanted data as well as non-random data distributions that may significantly invalidate study results. The objective of this paper was to provide recommendations based on best statistical monitoring practices for detecting data-integrity issues suggestive of fabrication and implantation early in the study conduct to allow implementation of meaningful corrective and preventive actions. Methodology: Electronic bibliographic databases (Medline, Embase, PubMed, Scopus, and Web of Science) were used for the literature search, and both qualitative and quantitative studies were sought. Search results were uploaded into Eppi-Reviewer Software, and only publications written in the English language from 2012 were included in the review. Gray literature not considered to present reproducible methods was excluded. Results: A total of 18 peer-reviewed publications were included in the review. The publications demonstrated that traditional site monitoring techniques are not efficient in detecting data anomalies. By specifying project-specific parameters such as laboratory reference range values, visit schedules, etc., with appropriate interactive data monitoring, statistical monitoring can offer early signals of data anomalies to study teams. The review further revealed that statistical monitoring is useful to identify unusual data patterns that might be revealing issues that could impact data integrity or may potentially impact study participants' safety. However, subjective measures may not be good candidates for statistical monitoring. Conclusion: The statistical monitoring approach requires a combination of education, training, and experience sufficient to implement its principles in detecting data anomalies for the statistical aspects of a clinical trial.

Keywords: statistical monitoring, data anomalies, clinical trials, traditional monitoring

Procedia PDF Downloads 56

24803 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 306

24802 Analysis of Citation Rate and Data Reuse for Openly Accessible Biodiversity Datasets on Global Biodiversity Information Facility

Authors: Nushrat Khan, Mike Thelwall, Kayvan Kousha

Abstract:

Making research data openly accessible has been mandated by most funders over the last 5 years as it promotes reproducibility in science and reduces duplication of effort to collect the same data. There are evidence that articles that publicly share research data have higher citation rates in biological and social sciences. However, how and whether shared data is being reused is not always intuitive as such information is not easily accessible from the majority of research data repositories. This study aims to understand the practice of data citation and how data is being reused over the years focusing on biodiversity since research data is frequently reused in this field. Metadata of 38,878 datasets including citation counts were collected through the Global Biodiversity Information Facility (GBIF) API for this purpose. GBIF was used as a data source since it provides citation count for datasets, not a commonly available feature for most repositories. Analysis of dataset types, citation counts, creation and update time of datasets suggests that citation rate varies for different types of datasets, where occurrence datasets that have more granular information have higher citation rates than checklist and metadata-only datasets. Another finding is that biodiversity datasets on GBIF are frequently updated, which is unique to this field. Majority of the datasets from the earliest year of 2007 were updated after 11 years, with no dataset that was not updated since creation. For each year between 2007 and 2017, we compared the correlations between update time and citation rate of four different types of datasets. While recent datasets do not show any correlations, 3 to 4 years old datasets show weak correlation where datasets that were updated more recently received high citations. The results are suggestive that it takes several years to cumulate citations for research datasets. However, this investigation found that when searched on Google Scholar or Scopus databases for the same datasets, the number of citations is often not the same as GBIF. Hence future aim is to further explore the citation count system adopted by GBIF to evaluate its reliability and whether it can be applicable to other fields of studies as well.

Keywords: data citation, data reuse, research data sharing, webometrics

Procedia PDF Downloads 164

24801 Trend Analysis of Africa’s Entrepreneurial Framework Conditions

Authors: Sheng-Hung Chen, Grace Mmametena Mahlangu, Hui-Cheng Wang

Abstract:

This study aims to explore the trends of the Entrepreneurial Framework Conditions (EFCs) in the five African regions. The Global Entrepreneur Monitor (GEM) is the primary source of data. The data drawn were organized into a panel (2000-2021) and obtained from the National Expert Survey (NES) databases as harmonized by the (GEM). The Methodology used is descriptive and uses mainly charts and tables; this is in line with the approach used by the GEM. The GEM draws its data from the National Expert Survey (NES). The survey by the NES is administered to experts in each country. The GEM collects entrepreneurship data specific to each country. It provides information about entrepreneurial ecosystems and their impact on entrepreneurship. The secondary source is from the literature review. This study focuses on the following GEM indicators: Financing for Entrepreneurs, Government support and Policies, Taxes and Bureaucracy, Government programs, Basic School Entrepreneurial Education and Training, Post school Entrepreneurial Education and Training, R&D Transfer, Commercial And Professional Infrastructure, Internal Market Dynamics, Internal Market Openness, Physical and Service Infrastructure, and Cultural And Social Norms, based on GEM Report 2020/21. The limitation of the study is the lack of updated data from some countries. Countries have to fund their own regional studies; African countries do not regularly participate due to a lack of resources.

Keywords: trend analysis, entrepreneurial framework conditions (EFCs), African region, government programs

Procedia PDF Downloads 53

24800 Using Data Mining in Automotive Safety

Authors: Carine Cridelich, Pablo Juesas Cano, Emmanuel Ramasso, Noureddine Zerhouni, Bernd Weiler

Abstract:

Safety is one of the most important considerations when buying a new car. While active safety aims at avoiding accidents, passive safety systems such as airbags and seat belts protect the occupant in case of an accident. In addition to legal regulations, organizations like Euro NCAP provide consumers with an independent assessment of the safety performance of cars and drive the development of safety systems in automobile industry. Those ratings are mainly based on injury assessment reference values derived from physical parameters measured in dummies during a car crash test. The components and sub-systems of a safety system are designed to achieve the required restraint performance. Sled tests and other types of tests are then carried out by car makers and their suppliers to confirm the protection level of the safety system. A Knowledge Discovery in Databases (KDD) process is proposed in order to minimize the number of tests. The KDD process is based on the data emerging from sled tests according to Euro NCAP specifications. About 30 parameters of the passive safety systems from different data sources (crash data, dummy protocol) are first analysed together with experts opinions. A procedure is proposed to manage missing data and validated on real data sets. Finally, a procedure is developed to estimate a set of rough initial parameters of the passive system before testing aiming at reducing the number of tests.

Keywords: KDD process, passive safety systems, sled test, dummy injury assessment reference values, frontal impact

Procedia PDF Downloads 369

24799 Content-Based Mammograms Retrieval Based on Breast Density Criteria Using Bidimensional Empirical Mode Decomposition

Authors: Sourour Khouaja, Hejer Jlassi, Nadia Feddaoui, Kamel Hamrouni

Abstract:

Most medical images, and especially mammographies, are now stored in large databases. Retrieving a desired image is considered of great importance in order to find previous similar cases diagnosis. Our method is implemented to assist radiologists in retrieving mammographic images containing breast with similar density aspect as seen on the mammogram. This is becoming a challenge seeing the importance of density criteria in cancer provision and its effect on segmentation issues. We used the BEMD (Bidimensional Empirical Mode Decomposition) to characterize the content of images and Euclidean distance measure similarity between images. Through the experiments on the MIAS mammography image database, we confirm that the results are promising. The performance was evaluated using precision and recall curves comparing query and retrieved images. Computing recall-precision proved the effectiveness of applying the CBIR in the large mammographic image databases. We found a precision of 91.2% for mammography with a recall of 86.8%.

Keywords: BEMD, breast density, contend-based, image retrieval, mammography

Procedia PDF Downloads 220

24798 Using Large Databases and Interviews to Explore the Temporal Phases of Technology-Based Entrepreneurial Ecosystems

Authors: Elsie L. Echeverri-Carroll

Abstract:

Entrepreneurial ecosystems have become an important concept to explain the birth and sustainability of technology-based entrepreneurship within regions. However, as a theoretical concept, the temporal evolution of entrepreneurship systems remain underdeveloped, making it difficult to understand their dynamic contributions to entrepreneurs. This paper argues that successful technology-based ecosystems go over three cumulative spawning stages: corporate spawning, entrepreneurial spawning, and community spawning. The importance of corporate incubation in vibrant entrepreneurial ecosystems is well documented in the entrepreneurial literature. Similarly, entrepreneurial spawning processes for venture capital-backed startups are well documented in the financial literature. In contrast, there is little understanding of both the third stage of entrepreneurial spawning (when a community of entrepreneurs become a source of firm spawning) and the temporal sequence in which spawning effects occur in a region. We test this three-stage model of entrepreneurial spawning using data from two large databases on firm births—the Secretary of State (160,000 observations) and the National Establishment Time Series (NEST with 150,000 observations)—and information collected from 60 1½-hour interviews with startup founders and representatives of key entrepreneurial organizations. This temporal model is illustrated with case study of Austin, Texas ranked by the Kauffman Foundation as the number one entrepreneurial city in the United States in 2015 and 2016. The 1½-year study founded by the Kauffman Foundation demonstrates the importance of taken into consideration the temporal contributions of both large and entrepreneurial firms in understanding the factors that contribute to the birth and growth of technology-based entrepreneurial regions. More important, these learnings could offer an important road map for regions that pursue to advance their entrepreneurial ecosystems.

Keywords: entrepreneurial ecosystems, entrepreneurial industrial clusters, high-technology, temporal changes

Procedia PDF Downloads 256

24797 Information Literacy Skills of Legal Practitioners in Khyber Pakhtunkhwa-Pakistan: An Empirical Study

Authors: Saeed Ullah Jan, Shaukat Ullah

Abstract:

Purpose of the study: The main theme of this study is to explore the information literacy skills of the law practitioners in Khyber Pakhtunkhwa-Pakistan under the heading "Information Literacy Skills of Legal Practitioners in Khyber Pakhtunkhwa-Pakistan: An Empirical Study." Research Method and Procedure: To conduct this quantitative study, the simple random sample approach is used. An adapted questionnaire is distributed among 254 lawyers of Dera Ismail Khan through personal visits and electronic means. The data collected is analyzed through SPSS (Statistical Package for Social Sciences) software. Delimitations of the study: The study is delimited to the southern district of Khyber Pakhtunkhwa: Dera Ismael Khan. Key Findings: Most of the lawyers of District Dera Ismail Khan of Khyber Pakhtunkhwa can recognize and understand the needed information. A large number of lawyers are capable of presenting information in both written and electronic forms. They are not comfortable with different legal databases and using various searching and keyword techniques. They have less knowledge of Boolean operators for locating online information. Conclusion and Recommendations: Efforts should be made to arrange refresher courses and training workshops on the utilization of different legal databases and different search techniques for retrieval of information sources. This practice will enhance the information literacy skills of lawyers, which will ultimately result in a better legal system in Pakistan. Practical implication(s): The findings of the study will motivate the policymakers and authorities of legal forums to restructure the information literacy programs to fulfill the lawyers' information needs. Contribution to the knowledge: No significant work has been done on the lawyers' information literacy skills in Khyber Pakhtunkhwa-Pakistan. It will bring a clear picture of the information literacy skills of law practitioners and address the problems faced by them during the seeking process.

Keywords: information literacy-Pakistan, infromation literacy-lawyers, information literacy-lawyers-KP, law practitioners-Pakistan

Procedia PDF Downloads 135

24796 Exploring the Landscape of Information Visualization through a Mark Lombardi Lens

Authors: Alon Friedman, Antonio Sanchez Chinchon

Abstract:

This bibliometric study takes an artistic and storytelling approach to explore the term ”information visualization.” Analyzing over 1008 titles collected from databases that specialize in data visualization research, we examine the titles of these publications to report on the characteristics and development trends in the field. Employing a qualitative methodology, we delve into the titles of these publications, extracting leading terms and exploring the cooccurrence of these terms to gain deeper insights. By systematically analyzing the leading terms and their relationships within the titles, we shed light on the prevailing themes that shape the landscape of ”information visualization” by employing the artist Mark Lombardi’s techniques to visualize our findings. By doing so, this study provides valuable insights into bibliometrics visualization while also opening new avenues for leveraging art and storytelling to enhance data representation.

Keywords: bibliometrics analysis, Mark Lombardi design, information visualization, qualitative methodology

Procedia PDF Downloads 69

24795 A Blockchain-Based Privacy-Preserving Physical Delivery System

Authors: Shahin Zanbaghi, Saeed Samet

Abstract:

The internet has transformed the way we shop. Previously, most of our purchases came in the form of shopping trips to a nearby store. Now, it’s as easy as clicking a mouse. But with great convenience comes great responsibility. We have to be constantly vigilant about our personal information. In this work, our proposed approach is to encrypt the information printed on the physical packages, which include personal information in plain text, using a symmetric encryption algorithm; then, we store that encrypted information into a Blockchain network rather than storing them in companies or corporations centralized databases. We present, implement and assess a blockchain-based system using Ethereum smart contracts. We present detailed algorithms that explain the details of our smart contract. We present the security, cost, and performance analysis of the proposed method. Our work indicates that the proposed solution is economically attainable and provides data integrity, security, transparency, and data traceability.

Keywords: blockchain, Ethereum, smart contract, commit-reveal scheme

Procedia PDF Downloads 135

24794 Review on Effective Texture Classification Techniques

Authors: Sujata S. Kulkarni

Abstract:

Effective and efficient texture feature extraction and classification is an important problem in image understanding and recognition. This paper gives a review on effective texture classification method. The objective of the problem of texture representation is to reduce the amount of raw data presented by the image, while preserving the information needed for the task. Texture analysis is important in many applications of computer image analysis for classification include industrial and biomedical surface inspection, for example for defects and disease, ground classification of satellite or aerial imagery and content-based access to image databases.

Keywords: compressed sensing, feature extraction, image classification, texture analysis

Procedia PDF Downloads 416

24793 The Quality Assessment of Seismic Reflection Survey Data Using Statistical Analysis: A Case Study of Fort Abbas Area, Cholistan Desert, Pakistan

Authors: U. Waqas, M. F. Ahmed, A. Mehmood, M. A. Rashid

Abstract:

In geophysical exploration surveys, the quality of acquired data holds significant importance before executing the data processing and interpretation phases. In this study, 2D seismic reflection survey data of Fort Abbas area, Cholistan Desert, Pakistan was taken as test case in order to assess its quality on statistical bases by using normalized root mean square error (NRMSE), Cronbach’s alpha test (α) and null hypothesis tests (t-test and F-test). The analysis challenged the quality of the acquired data and highlighted the significant errors in the acquired database. It is proven that the study area is plain, tectonically least affected and rich in oil and gas reserves. However, subsurface 3D modeling and contouring by using acquired database revealed high degrees of structural complexities and intense folding. The NRMSE had highest percentage of residuals between the estimated and predicted cases. The outcomes of hypothesis testing also proved the biasness and erraticness of the acquired database. Low estimated value of alpha (α) in Cronbach’s alpha test confirmed poor reliability of acquired database. A very low quality of acquired database needs excessive static correction or in some cases, reacquisition of data is also suggested which is most of the time not feasible on economic grounds. The outcomes of this study could be used to assess the quality of large databases and to further utilize as a guideline to establish database quality assessment models to make much more informed decisions in hydrocarbon exploration field.

Keywords: Data quality, Null hypothesis, Seismic lines, Seismic reflection survey

Procedia PDF Downloads 144

24792 The Role of Cyfra 21-1 in Diagnosing Non Small Cell Lung Cancer (NSCLC)

Authors: H. J. T. Kevin Mozes, Dyah Purnamasari

Abstract:

Background: Lung cancer accounted for the fourth most common cancer in Indonesia. 85% of all lung cancer cases are the Non-Small Cell Lung Cancer (NSCLC). The indistinct signs and symptoms of NSCLC sometimes lead to misdiagnosis. The gold standard assessment for the diagnosis of NSCLC is the histopathological biopsy, which is invasive. Cyfra 21-1 is a tumor marker, which can be found in the intermediate protein structure in the epitel. The accuracy of Cyfra 21-1 in diagnosing NSCLC is not yet known, so this report is made to seek the answer for the question above. Methods: Literature searching is done using online databases. Proquest and Pubmed are online databases being used in this report. Then, literature selection is done by excluding and including based on inclusion criterias and exclusion criterias. The selected literature is then being appraised using the criteria of validity, importance, and validity. Results: From six journals appraised, five of them are valid. Sensitivity value acquired from all five literature is ranging from 50-84.5 %, meanwhile the specificity is 87.8 %-94.4 %. Likelihood the ratio of all appraised literature is ranging from 5.09 -10.54, which categorized to Intermediate High. Conclusion: Serum Cyfra 21-1 is a sensitive and very specific tumor marker for diagnosis of non-small cell lung cancer (NSCLC).

Keywords: cyfra 21-1, diagnosis, nonsmall cell lung cancer, NSCLC, tumor marker

Procedia PDF Downloads 220

24791 Traditional Chinese Medicine Treatment for Coronary Heart Disease: a Meta-Analysis

Authors: Yuxi Wang, Xuan Gao

Abstract:

Traditional Chinese medicine has been used in the treatment of coronary heart disease (CHD) for centuries, and in recent years, the research data on the efficacy of traditional Chinese medicine through clinical trials has gradually increased to explore its real efficacy and internal pharmacology. However, due to the complexity of traditional Chinese medicine prescriptions, the efficacy of each component is difficult to clarify, and pharmacological research is challenging. This study aims to systematically review and clarify the clinical efficacy of traditional Chinese medicine in the treatment of coronary heart disease through a meta-analysis. Based on PubMed, CNKI database, Wanfang data, and other databases, eleven randomized controlled trials and 1091 CHD subjects were included. Two researchers conducted a systematic review of the papers and conducted a meta-analysis supporting the positive therapeutic effect of traditional Chinese medicine in the treatment of CHD.

Keywords: coronary heart disease, Chinese medicine, treatment, meta-analysis

Procedia PDF Downloads 106

24790 Effectiveness of the Bundle Care to Relieve the Thirst for Intensive Care Unit Patients: Meta-Analysis

Authors: Wen Hsin Hsu, Pin Lin

Abstract:

Objective: Thirst discomfort is the most common yet often overlooked symptom in patients in the intensive care unit (ICU), with an incidence rate of 69.8%. If not properly cared for, it can easily lead to irritability, affect sleep quality, and increase the incidence of delirium, thereby extending the length of hospital stay. Research points out that the sensation of coldness is an effective strategy to alleviate thirst. Using a combined care approach for thirst can prolong the sensation of coldness in the mouth and reduce thirst discomfort. Therefore, it needs to be further analyzed and its effectiveness reviewed. Methods: This study uses systematic literature review and meta-analysis methodologies and searched databases including PubMed, MEDLINE, EMBASE, Cochrane, CINAHL, and two Chinese databases (CEPS and CJTD) based on keywords. JBI was used to appraise the quality of the literature. RevMen 5.4 software package was used, and Fix Effect Model was applied for data analysis. We selected experimental articles, including those in English and Chinese, that met the inclusion and exclusion criteria. Three research articles were included in total, with a sample size of 416 people. Two were randomized controlled trials, and one was a quasi-experimental design. Results: The results show that the combined care for thirst, which includes ice water spray or oral swab wipes, menthol mouthwash, and lip balm, can significantly relieve thirst intensity MD=-1.36 (3 studies, 95% CI (-1.77, -0.95), p <0.001) and thirst distress MD=-0.71 (2 studies, 95% CI (-1.32, -0.10), p =0.02). Therefore, it is recommended that medical staff identify high-risk groups for thirst early on. Implications for Practice: For patients who cannot eat orally, providing combined care for thirst can increase oral comfort and improve the quality of care.

Keywords: thirst bundle care, intensive care units, meta-analysis, ice water spray, menthol

Procedia PDF Downloads 51

24789 An Architecture Based on Capsule Networks for the Identification of Handwritten Signature Forgery

Authors: Luisa Mesquita Oliveira Ribeiro, Alexei Manso Correa Machado

Abstract:

Handwritten signature is a unique form for recognizing an individual, used to discern documents, carry out investigations in the criminal, legal, banking areas and other applications. Signature verification is based on large amounts of biometric data, as they are simple and easy to acquire, among other characteristics. Given this scenario, signature forgery is a worldwide recurring problem and fast and precise techniques are needed to prevent crimes of this nature from occurring. This article carried out a study on the efficiency of the Capsule Network in analyzing and recognizing signatures. The chosen architecture achieved an accuracy of 98.11% and 80.15% for the CEDAR and GPDS databases, respectively.

Keywords: biometrics, deep learning, handwriting, signature forgery

Procedia PDF Downloads 67

24788 Streamlining .NET Data Access: Leveraging JSON for Data Operations in .NET

Authors: Tyler T. Procko, Steve Collins

Abstract:

New features in .NET (6 and above) permit streamlined access to information residing in JSON-capable relational databases, such as SQL Server (2016 and above). Traditional methods of data access now comparatively involve unnecessary steps which compromise system performance. This work posits that the established ORM (Object Relational Mapping) based methods of data access in applications and APIs result in common issues, e.g., object-relational impedance mismatch. Recent developments in C# and .NET Core combined with a framework of modern SQL Server coding conventions have allowed better technical solutions to the problem. As an amelioration, this work details the language features and coding conventions which enable this streamlined approach, resulting in an open-source .NET library implementation called Codeless Data Access (CODA). Canonical approaches rely on ad-hoc mapping code to perform type conversions between the client and back-end database; with CODA, no mapping code is needed, as JSON is freely mapped to SQL and vice versa. CODA streamlines API data access by improving on three aspects of immediate concern to web developers, database engineers and cybersecurity professionals: Simplicity, Speed and Security. Simplicity is engendered by cutting out the “middleman” steps, effectively making API data access a whitebox, whereas traditional methods are blackbox. Speed is improved because of the fewer translational steps taken, and security is improved as attack surfaces are minimized. An empirical evaluation of the speed of the CODA approach in comparison to ORM approaches ] is provided and demonstrates that the CODA approach is significantly faster. CODA presents substantial benefits for API developer workflows by simplifying data access, resulting in better speed and security and allowing developers to focus on productive development rather than being mired in data access code. Future considerations include a generalization of the CODA method and extension outside of the .NET ecosystem to other programming languages.

Keywords: API data access, database, JSON, .NET core, SQL server

Procedia PDF Downloads 54

24787 Formulating Rough Approximations in Information Tables with Possibilistic Information

Authors: Michinori Nakata, Hiroshi Sakai

Abstract:

A rough set, which consists of lower and upper approximations, is formulated in information tables containing possibilistic information. First, lower and upper approximations on the basis of possible world semantics in the same way as Lipski did in the field of incomplete databases are shown in order to clarify fundamentals of rough sets under possibilistic information. Possibility and necessity measures are used, as is done in possibilistic databases. As a result, each object has certain and possible membership degrees to lower and upper approximations, which degrees are the lower and upper bounds. Therefore, the degree that the object belongs to lower and upper approximations is expressed by an interval value. And the complementary property linked with the lower and upper approximations holds, as is valid under complete information. Second, the approach based on indiscernibility relations, which is proposed by Dubois and Prade, are extended in three cases. The first case is that objects used to approximate a set of objects are characterized by possibilistic information. The second case is that objects used to approximate a set of objects with possibilistic information are characterized by complete information. The third case is that objects that are characterized by possibilistic information approximate a set of objects with possibilistic information. The extended approach create the same results as the approach based on possible world semantics. This justifies our extension.

Keywords: rough sets, possibilistic information, possible world semantics, indiscernibility relations, lower approximations, upper approximations

Procedia PDF Downloads 310

24786 Assimilating Remote Sensing Data Into Crop Models: A Global Systematic Review

Authors: Luleka Dlamini, Olivier Crespo, Jos van Dam

Abstract:

Accurately estimating crop growth and yield is pivotal for timely sustainable agricultural management and ensuring food security. Crop models and remote sensing can complement each other and form a robust analysis tool to improve crop growth and yield estimations when combined. This study thus aims to systematically evaluate how research that exclusively focuses on assimilating RS data into crop models varies among countries, crops, data assimilation methods, and farming conditions. A strict search string was applied in the Scopus and Web of Science databases, and 497 potential publications were obtained. After screening for relevance with predefined inclusion/exclusion criteria, 123 publications were considered in the final review. Results indicate that over 81% of the studies were conducted in countries associated with high socio-economic and technological advancement, mainly China, the United States of America, France, Germany, and Italy. Many of these studies integrated MODIS or Landsat data into WOFOST to improve crop growth and yield estimation of staple crops at the field and regional scales. Most studies use recalibration or updating methods alongside various algorithms to assimilate remotely sensed leaf area index into crop models. However, these methods cannot account for the uncertainties in remote sensing observations and the crop model itself. l. Over 85% of the studies were based on commercial and irrigated farming systems. Despite a great global interest in data assimilation into crop models, limited research has been conducted in resource- and data-limited regions like Africa. We foresee a great potential for such application in those conditions. Hence facilitating and expanding the use of such an approach, from which developing farming communities could benefit.

Keywords: crop models, remote sensing, data assimilation, crop yield estimation

Procedia PDF Downloads 112

24785 Assimilating Remote Sensing Data into Crop Models: A Global Systematic Review

Authors: Luleka Dlamini, Olivier Crespo, Jos van Dam

Abstract:

Accurately estimating crop growth and yield is pivotal for timely sustainable agricultural management and ensuring food security. Crop models and remote sensing can complement each other and form a robust analysis tool to improve crop growth and yield estimations when combined. This study thus aims to systematically evaluate how research that exclusively focuses on assimilating RS data into crop models varies among countries, crops, data assimilation methods, and farming conditions. A strict search string was applied in the Scopus and Web of Science databases, and 497 potential publications were obtained. After screening for relevance with predefined inclusion/exclusion criteria, 123 publications were considered in the final review. Results indicate that over 81% of the studies were conducted in countries associated with high socio-economic and technological advancement, mainly China, the United States of America, France, Germany, and Italy. Many of these studies integrated MODIS or Landsat data into WOFOST to improve crop growth and yield estimation of staple crops at the field and regional scales. Most studies use recalibration or updating methods alongside various algorithms to assimilate remotely sensed leaf area index into crop models. However, these methods cannot account for the uncertainties in remote sensing observations and the crop model itself. l. Over 85% of the studies were based on commercial and irrigated farming systems. Despite a great global interest in data assimilation into crop models, limited research has been conducted in resource- and data-limited regions like Africa. We foresee a great potential for such application in those conditions. Hence facilitating and expanding the use of such an approach, from which developing farming communities could benefit.

Keywords: crop models, remote sensing, data assimilation, crop yield estimation

Procedia PDF Downloads 69

24784 e-Learning Security: A Distributed Incident Response Generator

Authors: Bel G Raggad

Abstract:

An e-Learning setting is a distributed computing environment where information resources can be connected to any public network. Public networks are very unsecure which can compromise the reliability of an e-Learning environment. This study is only concerned with the intrusion detection aspect of e-Learning security and how incident responses are planned. The literature reported great advances in intrusion detection system (ids) but neglected to study an important ids weakness: suspected events are detected but an intrusion is not determined because it is not defined in ids databases. We propose an incident response generator (DIRG) that produces incident responses when the working ids system suspects an event that does not correspond to a known intrusion. Data involved in intrusion detection when ample uncertainty is present is often not suitable to formal statistical models including Bayesian. We instead adopt Dempster and Shafer theory to process intrusion data for the unknown event. The DIRG engine transforms data into a belief structure using incident scenarios deduced by the security administrator. Belief values associated with various incident scenarios are then derived and evaluated to choose the most appropriate scenario for which an automatic incident response is generated. This article provides a numerical example demonstrating the working of the DIRG system.

Keywords: decision support system, distributed computing, e-Learning security, incident response, intrusion detection, security risk, statefull inspection

Procedia PDF Downloads 421

24783 An Informetrics Analysis of Research on Phishing in Scopus and Web of Science Databases from 2012 to 2021

Authors: Nkosingiphile Mbusozayo Zungu

Abstract:

The purpose of the current study is to adopt informetrics methods to analyse the research on phishing from 2012 to 2021 in three selected databases in order to contribute to global cybersecurity through impactful research. The study follows a quantitative research methodology. We opted for the positivist epistemology and objectivist ontology. The analysis focuses on: (i) the productivity of individual authors, institutions, and countries; (ii) the research contributions, using co-authorship as a measure of collaboration; (iii) the altmetrics of selected research contributions; (iv) the citation patterns and research impact of research on phishing; and (v) research contributions by keywords, to discover the concepts that are related to phishing. The preliminary findings favour developed countries in terms of quantity and quality of research in the domain. There are unique research trends and patterns in the developing countries, including those in Africa, that provide opportunities for research development in the domain in the region. This study explores an important research domain by using unexplored method in the region. The study supports the SDG Agenda 2030, such as ending abuse, exploitation, trafficking, and all other forms of violence and torture of children through the use of cyberspace (SDG 16). Further, the results from this study can inform research, teaching, and learning largely in Africa. Invariably, the study contributes to cybersecurity awareness that will mitigate cybersecurity threats against vulnerable communities.

Keywords: phishing, cybersecurity, informetrics, information security

Procedia PDF Downloads 95