Search results for: sequence data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25251

Search results for: sequence data

24711 Update on Epithelial Ovarian Cancer (EOC), Types, Origin, Molecular Pathogenesis, and Biomarkers

Authors: Salina Yahya Saddick

Abstract:

Ovarian cancer remains the most lethal gynecological malignancy due to the lack of highly sensitive and specific screening tools for detection of early-stage disease. The OSE provides the progenitor cells for 90% of human ovarian cancers. Recent morphologic, immunohistochemical and molecular genetic studies have led to the development of a new paradigm for the pathogenesis and origin of epithelial ovarian cancer (EOC) based on a ualistic model of carcinogenesis that divides EOC into two broad categories designated Types I and II which are characterized by specific mutations, including KRAS, BRAF, ERBB2, CTNNB1, PTEN PIK3CA, ARID1A, and PPPR1A, which target specific cell signaling pathways. Type 1 tumors rarely harbor TP53. type I tumors are relatively genetically stable and typically display a variety of somatic sequence mutations that include KRAS, BRAF, PTEN, PIK3CA CTNNB1 (the gene encoding beta catenin), ARID1A and PPP2R1A but very rarely TP53 . The cancer stem cell (CSC) hypothesis postulates that the tumorigenic potential of CSCs is confined to a very small subset of tumor cells and is defined by their ability to self-renew and differentiate leading to the formation of a tumor mass. Potential protein biomarker miRNA, are promising biomarkers as they are remarkably stable to allow isolation and analysis from tissues and from blood in which they can be found as free circulating nucleic acids and in mononuclear cells. Recently, genomic anaylsis have identified biomarkers and potential therapeutic targets for ovarian cancer namely, FGF18 which plays an active role in controlling migration, invasion, and tumorigenicity of ovarian cancer cells through NF-κB activation, which increased the production of oncogenic cytokines and chemokines. This review summarizes update information on epithelial ovarian cancers and point out to the most recent ongoing research.

Keywords: epithelial ovarian cancers, somatic sequence mutations, cancer stem cell (CSC), potential protein, biomarker, genomic analysis, FGF18 biomarker

Procedia PDF Downloads 361
24710 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 32
24709 A Critical Re-Evaluation of Knowledge Management Definitions and Terminologies

Authors: Raymond Olayinka

Abstract:

The last three decades have witnessed myriads of definitions of knowledge management proposed by researchers and industry practitioners. Despite the magnitude of research and available literature on knowledge management, there is yet to be a consensus on what constitutes a good definition. There exists an in-exhaustive list of definitions which can appear confusing, conflicting and overlapping. What is even more daunting is the lack of common terminology in describing knowledge management processes and the inconsistency in the sequence in which the processes take. Whilst newbies to knowledge management research would struggle to make sense of knowledge management definitions, industry practitioners would struggle with their applicability. Against this backdrop, this study aimed to re-evaluate knowledge management definitions and terminologies. The objectives were threefold: (1) to conduct a critical review of an existing body of work around knowledge management concepts and definitions (2) to analyse and synthesise findings (3) to present conclusions and recommendations. The methodology for this study centres around the review of the literature and secondary data sources. A total of 48 knowledge management processes were found and extracted from various definitions (e.g. ‘identify’, ‘capture’, ‘codify’, ‘store’…). A taxonomy of the processes was created based on the commonality of the entities. The 48 processes were classified under 8 headings which were further converged into 3 main headings namely ‘acquire’, ‘exploit’ and ‘evaluate’, of which all definitions therefore hinge. The study concludes that in the multitude of knowledge management definitions, there is a consistent pattern to which the processes are organised and should be utilised. The contribution of this study is in the synthesis of previous work by various authors and the presentation of a more holistic approach to knowledge management definitions and terminologies.

Keywords: knowledge management definitions, knowledge management terminologies, knowledge management processes, literature review

Procedia PDF Downloads 240
24708 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 536
24707 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Authors: Usama Ahmed

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 132
24706 Analysis of Aquifer Productivity in the Mbouda Area (West Cameroon)

Authors: Folong Tchoffo Marlyse Fabiola, Anaba Onana Achille Basile

Abstract:

Located in the western region of Cameroon, in the BAMBOUTOS department, the city of Mbouda belongs to the Pan-African basement. The water resources exploited in this region consist of surface water and groundwater from weathered and fractured aquifers within the same basement. To study the factors determining the productivity of aquifers in the Mbouda area, we adopted a methodology based on collecting data from boreholes drilled in the region, identifying different types of rocks, analyzing structures, and conducting geophysical surveys in the field. The results obtained allowed us to distinguish two main types of rocks: metamorphic rocks composed of amphibolites and migmatitic gneisses and igneous rocks, namely granodiorites and granites. Several types of structures were also observed, including planar structures (foliation and schistosity), folded structures (folds), and brittle structures (fractures and lineaments). A structural synthesis combines all these elements into three major phases of deformation. Phase D1 is characterized by foliation and schistosity, phase D2 is marked by shear planes and phase D3 is characterized by open and sealed fractures. The analysis of structures (fractures in outcrops, Landsat lineaments, subsurface structures) shows a predominance of ENE-WSW and WNW-ESE directions. Through electrical surveys and borehole data, we were able to identify the sequence of different geological formations. Four geo-electric layers were identified, each with a different electrical conductivity: conductive, semi-resistive, or resistive. The last conductive layer is considered a potentially aquiferous zone. The flow rates of the boreholes ranged from 2.6 to 12 m3/h, classified as moderate to high according to the CIEH classification. The boreholes were mainly located in basalts, which are mineralogically rich in ferromagnesian minerals. This mineral composition contributes to their high productivity as they are more likely to be weathered. The boreholes were positioned along linear structures or at their intersections.

Keywords: Mbouda, Pan-African basement, productivity, west-Cameroon

Procedia PDF Downloads 46
24705 Comprehensive Study of Data Science

Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly

Abstract:

Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.

Keywords: data science, machine learning, data analytics, artificial intelligence

Procedia PDF Downloads 63
24704 Identification and Molecular Profiling of A Family I Cystatin Homologue from Sebastes schlegeli Deciphering Its Putative Role in Host Immunity

Authors: Don Anushka Sandaruwan Elvitigala, P. D. S. U. Wickramasinghe, Jehee Lee

Abstract:

Cystatins are a large superfamily of proteins which act as reversible inhibitors of cysteine proteases. Papain proteases and cysteine cathepsins are predominant substrates of cystatins. Cystatin superfamily can be further clustered into three groups as Stefins, Cystatins, and Kininogens. Among them, stefines are also known as family 1 cystatins which harbors cystatin Bs and cystatin As. In this study, a homologue of family one cystatins more close to cystatin Bs was identified from Korean black rockfish (Sebastes schlegeli) using a prior constructed cDNA (complementary deoxyribonucleic acid) database and designated as RfCyt1. The full-length cDNA of RfCyt1 consisted of 573 bp, with a coding region of 294 bp. It comprised a 5´-untranslated region (UTR) of 55 bp, and 3´-UTR of 263 bp. The coding sequence encodes a polypeptide consisting of 97 amino acids with a predicted molecular weight of 11kDa and theoretical isoelectric point of 6.3. The RfCyt1 shared homology with other teleosts and vertebrate species and consisted conserved features of cystatin family signature including single cystatin-like domain, cysteine protease inhibitory signature of pentapeptide (QXVXG) consensus sequence and N-terminal two conserved neighboring glycine (⁸GG⁹) residues. As expected, phylogenetic reconstruction developed using the neighbor-joining method showed that RfCyt1 is clustered with the cystatin family 1 members, in which more closely with its teleostan orthologues. An SYBR Green qPCR (quantitative polymerase chain reaction) assay was performed to quantify the RfCytB transcripts in different tissues in healthy and immune stimulated fish. RfCyt1 was ubiquitously expressed in all tissue types of healthy animals with gill and spleen being the highest. Temporal expression of RfCyt1 displayed significant up-regulation upon infection with Aeromonas salmonicida. Recombinantly expressed RfCyt1 showed concentration-dependent papain inhibitory activity. Collectively these findings evidence for detectable protease inhibitory and immunity relevant roles of RfCyt1 in Sebastes schlegeli.

Keywords: Sebastes schlegeli, family 1 cystatin, immune stimulation, expressional modulation

Procedia PDF Downloads 122
24703 Quality Service Standard of Food and Beverage Service Staff in Hotel

Authors: Thanasit Suksutdhi

Abstract:

This survey research aims to study the standard of service quality of food and beverage service staffs in hotel business by studying the service standard of three sample hotels, Siam Kempinski Hotel Bangkok, Four Seasons Resort Chiang Mai, and Banyan Tree Phuket. In order to find the international service standard of food and beverage service, triangular research, i.e. quantitative, qualitative, and survey were employed. In this research, questionnaires and in-depth interview were used for getting the information on the sequences and method of services. There were three parts of modified questionnaires to measure service quality and guest’s satisfaction including service facilities, attentiveness, responsibility, reliability, and circumspection. This study used sample random sampling to derive subjects with the return rate of the questionnaires was 70% or 280. Data were analyzed by SPSS to find arithmetic mean, SD, percentage, and comparison by t-test and One-way ANOVA. The results revealed that the service quality of the three hotels were in the international level which could create high satisfaction to the international customers. Recommendations for research implementations were to maintain the area of good service quality, and to improve some dimensions of service quality such as reliability. Training in service standard, product knowledge, and new technology for employees should be provided. Furthermore, in order to develop the service quality of the industry, training collaboration between hotel organization and educational institutions in food and beverage service should be considered.

Keywords: service standard, food and beverage department, sequence of service, service method

Procedia PDF Downloads 334
24702 Methodologies for Deriving Semantic Technical Information Using an Unstructured Patent Text Data

Authors: Jaehyung An, Sungjoo Lee

Abstract:

Patent documents constitute an up-to-date and reliable source of knowledge for reflecting technological advance, so patent analysis has been widely used for identification of technological trends and formulation of technology strategies. But, identifying technological information from patent data entails some limitations such as, high cost, complexity, and inconsistency because it rely on the expert’ knowledge. To overcome these limitations, researchers have applied to a quantitative analysis based on the keyword technique. By using this method, you can include a technological implication, particularly patent documents, or extract a keyword that indicates the important contents. However, it only uses the simple-counting method by keyword frequency, so it cannot take into account the sematic relationship with the keywords and sematic information such as, how the technologies are used in their technology area and how the technologies affect the other technologies. To automatically analyze unstructured technological information in patents to extract the semantic information, it should be transformed into an abstracted form that includes the technological key concepts. Specific sentence structure ‘SAO’ (subject, action, object) is newly emerged by representing ‘key concepts’ and can be extracted by NLP (Natural language processor). An SAO structure can be organized in a problem-solution format if the action-object (AO) states that the problem and subject (S) form the solution. In this paper, we propose the new methodology that can extract the SAO structure through technical elements extracting rules. Although sentence structures in the patents text have a unique format, prior studies have depended on general NLP (Natural language processor) applied to the common documents such as newspaper, research paper, and twitter mentions, so it cannot take into account the specific sentence structure types of the patent documents. To overcome this limitation, we identified a unique form of the patent sentences and defined the SAO structures in the patents text data. There are four types of technical elements that consist of technology adoption purpose, application area, tool for technology, and technical components. These four types of sentence structures from patents have their own specific word structure by location or sequence of the part of speech at each sentence. Finally, we developed algorithms for extracting SAOs and this result offer insight for the technology innovation process by providing different perspectives of technology.

Keywords: NLP, patent analysis, SAO, semantic-analysis

Procedia PDF Downloads 253
24701 Rapid Discrimination of Porcine and Tilapia Fish Gelatin by Fourier Transform Infrared- Attenuated Total Reflection Combined with 2 Dimensional Infrared Correlation Analysis

Authors: Norhidayu Muhamad Zain

Abstract:

Gelatin, a purified protein derived mostly from porcine and bovine sources, is used widely in food manufacturing, pharmaceutical, and cosmetic industries. However, the presence of any porcine-related products are strictly forbidden for Muslim and Jewish consumption. Therefore, analytical methods offering reliable results to differentiate the sources of gelatin are needed. The aim of this study was to differentiate the sources of gelatin (porcine and tilapia fish) using Fourier transform infrared- attenuated total reflection (FTIR-ATR) combined with two dimensional infrared (2DIR) correlation analysis. Porcine gelatin (PG) and tilapia fish gelatin (FG) samples were diluted in distilled water at concentrations ranged from 4-20% (w/v). The samples were then analysed using FTIR-ATR and 2DIR correlation software. The results showed a significant difference in the pattern map of synchronous spectra at the region of 1000 cm⁻¹ to 1100 cm⁻¹ between PG and FG samples. The auto peak at 1080 cm⁻¹ that attributed to C-O functional group was observed at high intensity in PG samples compared to FG samples. Meanwhile, two auto peaks (1080 cm⁻¹ and 1030 cm⁻¹) at lower intensity were identified in FG samples. In addition, using 2D correlation analysis, the original broad water OH bands in 1D IR spectra can be effectively differentiated into six auto peaks located at 3630, 3340, 3230, 3065, 2950 and 2885 cm⁻¹ for PG samples and five auto peaks at 3630, 3330, 3230, 3060 and 2940 cm⁻¹ for FG samples. Based on the rule proposed by Noda, the sequence of the spectral changes in PG samples is as following: NH₃⁺ amino acid > CH₂ and CH₃ aliphatic > OH stretch > carboxylic acid OH stretch > NH in secondary amide > NH in primary amide. In contrast, the sequence was totally in the opposite direction for FG samples and thus both samples provide different 2D correlation spectra ranged from 2800 cm-1 to 3700 cm⁻¹. This method may provide a rapid determination of gelatin source for application in food, pharmaceutical, and cosmetic products.

Keywords: 2 dimensional infrared (2DIR) correlation analysis, Fourier transform infrared- attenuated total reflection (FTIR-ATR), porcine gelatin, tilapia fish gelatin

Procedia PDF Downloads 227
24700 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 258
24699 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 176
24698 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 29
24697 Influence of Geometry on Performance of Type-4 Filament Wound Composite Cylinder for Compressed Gas Storage

Authors: Pranjali Sharma, Swati Neogi

Abstract:

Composite pressure vessels are low weight structures mainly used in a variety of applications such as automobiles, aeronautics and chemical engineering. Fiber reinforced polymer (FRP) composite materials offer the simplicity of design and use, high fuel storage capacity, rapid refueling capability, excellent shelf life, minimal infrastructure impact, high safety due to the inherent strength of the pressure vessel, and little to no development risk. Apart from these preliminary merits, the subsidized weight of composite vessels over metallic cylinders act as the biggest asset to the automotive industry, increasing the fuel efficiency. The result is a lightweight, flexible, non-explosive, and non-fragmenting pressure vessel that can be tailor-made to attune with specific applications. The winding pattern of the composite over-wrap is a primary focus while designing a pressure vessel. The critical stresses in the system depend on the thickness, angle and sequence of the composite layers. The composite over-wrap is wound over a plastic liner, whose geometry can be varied for the ease of winding. In the present study, we aim to optimize the FRP vessel geometry that provides an ease in winding and also aids in weight reduction for enhancing the vessel performance. Finite element analysis is used to study the effect of dome geometry, yielding a design with maximum value of burst pressure and least value of vessel weight. The stress and strain analysis of different dome ends along with the cylindrical portion is carried out in ANSYS 19.2. The failure is predicted using different failure theories like Tsai-Wu theory, Tsai-Hill theory and Maximum stress theory. Corresponding to a given winding sequence, the optimum dome geometry is determined for a fixed internal pressure to identify the theoretical value of burst pressure. Finally, this geometry is used to decrease the number of layers to reach the set value of safety in accordance with the available safety standards. This results in decrease in the weight of the composite over-wrap and manufacturing cost of the pressure vessel. An improvement in the overall weight performance of the pressure vessel gives higher fuel efficiency for its use in automobile applications.

Keywords: Compressed Gas Storage, Dome geometry, Theoretical Analysis, Type-4 Composite Pressure Vessel, Improvement in Vessel Weight Performance

Procedia PDF Downloads 132
24696 Data Access, AI Intensity, and Scale Advantages

Authors: Chuping Lo

Abstract:

This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.

Keywords: digital intensity, digital divide, international trade, scale of economics

Procedia PDF Downloads 51
24695 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data

Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju

Abstract:

Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.

Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding

Procedia PDF Downloads 396
24694 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 235
24693 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 123
24692 Defect Identification in Partial Discharge Patterns of Gas Insulated Switchgear and Straight Cable Joint

Authors: Chien-Kuo Chang, Yu-Hsiang Lin, Yi-Yun Tang, Min-Chiu Wu

Abstract:

With the trend of technological advancement, the harm caused by power outages is substantial, mostly due to problems in the power grid. This highlights the necessity for further improvement in the reliability of the power system. In the power system, gas-insulated switches (GIS) and power cables play a crucial role. Long-term operation under high voltage can cause insulation materials in the equipment to crack, potentially leading to partial discharges. If these partial discharges (PD) can be analyzed, preventative maintenance and replacement of equipment can be carried out, there by improving the reliability of the power grid. This research will diagnose defects by identifying three different defects in GIS and three different defects in straight cable joints, for a total of six types of defects. The partial discharge data measured will be converted through phase analysis diagrams and pulse sequence analysis. Discharge features will be extracted using convolutional image processing, and three different deep learning models, CNN, ResNet18, and MobileNet, will be used for training and evaluation. Class Activation Mapping will be utilized to interpret the black-box problem of deep learning models, with each model achieving an accuracy rate of over 95%. Lastly, the overall model performance will be enhanced through an ensemble learning voting method.

Keywords: partial discharge, gas-insulated switches, straight cable joint, defect identification, deep learning, ensemble learning

Procedia PDF Downloads 61
24691 Rheumatoid Arthritis, Periodontitis and the Subgingival Microbiome: A Circular Relationship

Authors: Isabel Lopez-Oliva, Akshay Paropkari, Shweta Saraswat, Stefan Serban, Paola de Pablo, Karim Raza, Andrew Filer, Iain Chapple, Thomas Dietrich, Melissa Grant, Purnima Kumar

Abstract:

Objective: We aimed to explicate the role of the subgingival microbiome in the causal link between rheumatoid arthritis (RA) and periodontitis (PD). Methods: Subjects with/without RA and with/without PD were randomized for treatment with scaling and root planing (SRP) or oral hygiene instructions. Subgingival biofilm, gingival crevicular fluid, and serum were collected at baseline and at 3- and 6-months post-operatively. Correlations were generated between 72 million 16S rDNA sequences, immuno-inflammatory mediators, circulating antibodies to oral microbial antigens, serum inflammatory molecules, and clinical metrics of RA. The dynamics of inter-microbial and host-microbial interactions were modeled using differential network analysis. Results: RA superseded periodontitis as a determinant of microbial composition, and DAS28 score superseded the severity of periodontitis as a driver of microbial assemblages (p=0.001, ANOSIM). RA subjects evidenced higher serum anti-PPAD (p=0.0013), anti-Pg-enolase (p=0.0031), anti-RPP3, anti- Pg-OMP and anti- Pi-OMP (p=0.001) antibodies than non-RA controls (with and without periodontitis). Following SRP, bacterial networks anchored by IL-1b, IL-4, IL-6, IL-10, IL-13, MIP-1b, and PDGF-b underwent ≥5-fold higher rewiring; and serum antibodies to microbial antigens decreased significantly. Conclusions: Our data suggest a circular relationship between RA and PD, beginning with an RA-influenced dysbiosis within the healthy subgingival microbiome that leads to exaggerated local inflammation in periodontitis and circulating antibodies to periodontal pathogens and positive correlation between severity of periodontitis and RA activity. Periodontal therapy restores host-microbial homeostasis, reduces local inflammation, and decreases circulating microbial antigens. Our data highlights the importance of integrating periodontal care into the management of RA patients.

Keywords: rheumatoid arthritis, periodontal, subgingival, DNA sequence analysis, oral microbiome

Procedia PDF Downloads 87
24690 Novel Adomet Analogs as Tools for Nucleic Acids Labeling

Authors: Milda Nainyte, Viktoras Masevicius

Abstract:

Biological methylation is a methyl group transfer from S-adenosyl-L-methionine (AdoMet) onto N-, C-, O- or S-nucleophiles in DNA, RNA, proteins or small biomolecules. The reaction is catalyzed by enzymes called AdoMet-dependent methyltransferases (MTases), which represent more than 3 % of the proteins in the cell. As a general mechanism, the methyl group from AdoMet replaces a hydrogen atom of nucleophilic center producing methylated DNA and S-adenosyl-L-homocysteine (AdoHcy). Recently, DNA methyltransferases have been used for the sequence-specific, covalent labeling of biopolymers. Two types of MTase catalyzed labeling of biopolymers are known, referred as two-step and one-step. During two-step labeling, an alkylating fragment is transferred onto DNA in a sequence-specific manner and then the reporter group, such as biotin, is attached for selective visualization using suitable chemistries of coupling. This approach of labeling is quite difficult and the chemical hitching does not always proceed at 100 %, but in the second step the variety of reporter groups can be selected and that gives the flexibility for this labeling method. In the one-step labeling, AdoMet analog is designed with the reporter group already attached to the functional group. Thus, the one-step labeling method would be more comfortable tool for labeling of biopolymers in order to prevent additional chemical reactions and selection of reaction conditions. Also, time costs would be reduced. However, effective AdoMet analog appropriate for one-step labeling of biopolymers and containing cleavable bond, required for reduction of PCR interferation, is still not known. To expand the practical utility of this important enzymatic reaction, cofactors with activated sulfonium-bound side-chains have been produced and can serve as surrogate cofactors for a variety of wild-type and mutant DNA and RNA MTases enabling covalent attachment of these chains to their target sites in DNA, RNA or proteins (the approach named methyltransferase-directed Transfer of Activated Groups, mTAG). Compounds containing hex-2-yn-1-yl moiety has proved to be efficient alkylating agents for labeling of DNA. Herein we describe synthetic procedures for the preparation of N-biotinoyl-N’-(pent-4-ynoyl)cystamine starting from the coupling of cystamine with pentynoic acid and finally attaching the biotin as a reporter group. The synthesis of the first AdoMet based cofactor containing a cleavable reporter group and appropriate for one-step labeling was developed.

Keywords: adoMet analogs, DNA alkylation, cofactor, methyltransferases

Procedia PDF Downloads 182
24689 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 118
24688 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 420
24687 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 79
24686 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network

Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan

Abstract:

Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.

Keywords: aggregation point, data communication, data aggregation, wireless sensor network

Procedia PDF Downloads 144
24685 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data

Procedia PDF Downloads 573
24684 A NoSQL Based Approach for Real-Time Managing of Robotics's Data

Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir

Abstract:

This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.

Keywords: NoSQL databases, database management systems, robotics, big data

Procedia PDF Downloads 332
24683 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 166
24682 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data

Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim

Abstract:

Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.

Keywords: activity pattern, data fusion, smart-card, XGboost

Procedia PDF Downloads 227