Search results for: ERA-5 analysis data
41526 The Importance of including All Data in a Linear Model for the Analysis of RNAseq Data
Authors: Roxane A. Legaie, Kjiana E. Schwab, Caroline E. Gargett
Abstract:
Studies looking at the changes in gene expression from RNAseq data often make use of linear models. It is also common practice to focus on a subset of data for a comparison of interest, leaving aside the samples not involved in this particular comparison. This work shows the importance of including all observations in the modeling process to better estimate variance parameters, even when the samples included are not directly used in the comparison under test. The human endometrium is a dynamic tissue, which undergoes cycles of growth and regression with each menstrual cycle. The mesenchymal stem cells (MSCs) present in the endometrium are likely responsible for this remarkable regenerative capacity. However recent studies suggest that MSCs also plays a role in the pathogenesis of endometriosis, one of the most common medical conditions affecting the lower abdomen in women in which the endometrial tissue grows outside the womb. In this study we compared gene expression profiles between MSCs and non-stem cell counterparts (‘non-MSC’) obtained from women with (‘E’) or without (‘noE’) endometriosis from RNAseq. Raw read counts were used for differential expression analysis using a linear model with the limma-voom R package, including either all samples in the study or only the samples belonging to the subset of interest (e.g. for the comparison ‘E vs noE in MSC cells’, including only MSC samples from E and noE patients but not the non-MSC ones). Using the full dataset we identified about 100 differentially expressed (DE) genes between E and noE samples in MSC samples (adj.p-val < 0.05 and |logFC|>1) while only 9 DE genes were identified when using only the subset of data (MSC samples only). Important genes known to be involved in endometriosis such as KLF9 and RND3 were missed in the latter case. When looking at the MSC vs non-MSC cells comparison, the linear model including all samples identified 260 genes for noE samples (including the stem cell marker SUSD2) while the subset analysis did not identify any DE genes. When looking at E samples, 12 genes were identified with the first approach and only 1 with the subset approach. Although the stem cell marker RGS5 was found in both cases, the subset test missed important genes involved in stem cell differentiation such as NOTCH3 and other potentially related genes to be used for further investigation and pathway analysis.Keywords: differential expression, endometriosis, linear model, RNAseq
Procedia PDF Downloads 43241525 Development of a Data Security Model Using Steganography
Authors: Terungwa Simon Yange, Agana Moses A.
Abstract:
This paper studied steganography and designed a simplistic approach to a steganographic tool for hiding information in image files with the view of addressing the security challenges with data by hiding data from unauthorized users to improve its security. The Structured Systems Analysis and Design Method (SSADM) was used in this work. The system was developed using Java Development Kit (JDK) 1.7.0_10 and MySQL Server as its backend. The system was tested with some hypothetical health records which proved the possibility of protecting data from unauthorized users by making it secret so that its existence cannot be easily recognized by fraudulent users. It further strengthens the confidentiality of patient records kept by medical practitioners in the health setting. In conclusion, this work was able to produce a user friendly steganography software that is very fast to install and easy to operate to ensure privacy and secrecy of sensitive data. It also produced an exact copy of the original image and the one carrying the secret message when compared with each.Keywords: steganography, cryptography, encryption, decryption, secrecy
Procedia PDF Downloads 26741524 Time Series Analysis the Case of China and USA Trade Examining during Covid-19 Trade Enormity of Abnormal Pricing with the Exchange rate
Authors: Md. Mahadi Hasan Sany, Mumenunnessa Keya, Sharun Khushbu, Sheikh Abujar
Abstract:
Since the beginning of China's economic reform, trade between the U.S. and China has grown rapidly, and has increased since China's accession to the World Trade Organization in 2001. The US imports more than it exports from China, reducing the trade war between China and the U.S. for the 2019 trade deficit, but in 2020, the opposite happens. In international and U.S. trade, Washington launched a full-scale trade war against China in March 2016, which occurred a catastrophic epidemic. The main goal of our study is to measure and predict trade relations between China and the U.S., before and after the arrival of the COVID epidemic. The ML model uses different data as input but has no time dimension that is present in the time series models and is only able to predict the future from previously observed data. The LSTM (a well-known Recurrent Neural Network) model is applied as the best time series model for trading forecasting. We have been able to create a sustainable forecasting system in trade between China and the US by closely monitoring a dataset published by the State Website NZ Tatauranga Aotearoa from January 1, 2015, to April 30, 2021. Throughout the survey, we provided a 180-day forecast that outlined what would happen to trade between China and the US during COVID-19. In addition, we have illustrated that the LSTM model provides outstanding outcome in time series data analysis rather than RFR and SVR (e.g., both ML models). The study looks at how the current Covid outbreak affects China-US trade. As a comparative study, RMSE transmission rate is calculated for LSTM, RFR and SVR. From our time series analysis, it can be said that the LSTM model has given very favorable thoughts in terms of China-US trade on the future export situation.Keywords: RFR, China-U.S. trade war, SVR, LSTM, deep learning, Covid-19, export value, forecasting, time series analysis
Procedia PDF Downloads 19841523 In-Depth Analysis of Involved Factors to Car-Motorcycle Accidents in Budapest City
Authors: Danish Farooq, Janos Juhasz
Abstract:
Car-motorcycle accidents have been observed higher in recent years, which caused mainly riders’ fatalities and serious injuries. In-depth crash investigation methods aim to investigate the main factors which are likely involved in fatal road accidents and injury outcomes. The main objective of this study is to investigate the involved factors in car-motorcycle accidents in Budapest city. The procedure included statistical analysis and data sampling to identify car-motorcycle accidents by dominant accident types based on collision configurations. The police report was used as a data source for specified accidents, and simulation models were plotted according to scale (M 1:200). Car-motorcycle accidents were simulated in Virtual Crash software for 5 seconds before the collision. The simulation results showed that the main involved factors to car-motorcycle accidents were human behavior and view obstructions. The comprehensive, in-depth analysis also found that most of the car drivers and riders were unable to perform collision avoidance manoeuvres before the collision. This study can help the traffic safety authorities to focus on simulated involved factors to solve road safety issues in car-motorcycle accidents. The study also proposes safety measures to improve safe movements among road users.Keywords: car motorcycle accidents, in-depth analysis, microscopic simulation, safety measures
Procedia PDF Downloads 15141522 Comparative Study between Herzberg’s and Maslow’s Theories in Maritime Transport Education
Authors: Nermin Mahmoud Gohar, Aisha Tarek Noour
Abstract:
Learner satisfaction has been a vital field of interest in the literature. Accordingly, the paper will explore the reasons behind individual differences in motivation and satisfaction. This study examines the effect of both; Herzberg’s and Maslow’s theories on learners satisfaction. A self-administered questionnaire was used to collect data from learners who were geographically widely spread around the College of Maritime Transport and Technology (CMTT) at the Arab Academy for Science, Technology and Maritime Transport (AAST&MT) in Egypt. One hundred and fifty undergraduates responded to a questionnaire survey. Respondents were drawn from two branches in Alexandria and Port Said. The data analysis used was SPSS 22 and AMOS 18. Factor analysis technique was used to find out the dimensions under study verified by Herzberg’s and Maslow’s theories. In addition, regression analysis and structural equation modeling were applied to find the effect of the above-mentioned theories on maritime transport learners’ satisfaction. Concerning the limitation of this study, it used the available number of learners in the CMTT due to the relatively low population in this field.Keywords: motivation, satisfaction, needs, education, Herzberg’s and Maslow’s theories
Procedia PDF Downloads 43641521 The Various Legal Dimensions of Genomic Data
Authors: Amy Gooden
Abstract:
When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.Keywords: artificial intelligence, data, law, genomics, rights
Procedia PDF Downloads 14041520 Variance-Aware Routing and Authentication Scheme for Harvesting Data in Cloud-Centric Wireless Sensor Networks
Authors: Olakanmi Oladayo Olufemi, Bamifewe Olusegun James, Badmus Yaya Opeyemi, Adegoke Kayode
Abstract:
The wireless sensor network (WSN) has made a significant contribution to the emergence of various intelligent services or cloud-based applications. Most of the time, these data are stored on a cloud platform for efficient management and sharing among different services or users. However, the sensitivity of the data makes them prone to various confidentiality and performance-related attacks during and after harvesting. Various security schemes have been developed to ensure the integrity and confidentiality of the WSNs' data. However, their specificity towards particular attacks and the resource constraint and heterogeneity of WSNs make most of these schemes imperfect. In this paper, we propose a secure variance-aware routing and authentication scheme with two-tier verification to collect, share, and manage WSN data. The scheme is capable of classifying WSN into different subnets, detecting any attempt of wormhole and black hole attack during harvesting, and enforcing access control on the harvested data stored in the cloud. The results of the analysis showed that the proposed scheme has more security functionalities than other related schemes, solves most of the WSNs and cloud security issues, prevents wormhole and black hole attacks, identifies the attackers during data harvesting, and enforces access control on the harvested data stored in the cloud at low computational, storage, and communication overheads.Keywords: data block, heterogeneous IoT network, data harvesting, wormhole attack, blackhole attack access control
Procedia PDF Downloads 8541519 Restricted Boltzmann Machines and Deep Belief Nets for Market Basket Analysis: Statistical Performance and Managerial Implications
Authors: H. Hruschka
Abstract:
This paper presents the first comparison of the performance of the restricted Boltzmann machine and the deep belief net on binary market basket data relative to binary factor analysis and the two best-known topic models, namely Dirichlet allocation and the correlated topic model. This comparison shows that the restricted Boltzmann machine and the deep belief net are superior to both binary factor analysis and topic models. Managerial implications that differ between the investigated models are treated as well. The restricted Boltzmann machine is defined as joint Boltzmann distribution of hidden variables and observed variables (purchases). It comprises one layer of observed variables and one layer of hidden variables. Note that variables of the same layer are not connected. The comparison also includes deep belief nets with three layers. The first layer is a restricted Boltzmann machine based on category purchases. Hidden variables of the first layer are used as input variables by the second-layer restricted Boltzmann machine which then generates second-layer hidden variables. Finally, in the third layer hidden variables are related to purchases. A public data set is analyzed which contains one month of real-world point-of-sale transactions in a typical local grocery outlet. It consists of 9,835 market baskets referring to 169 product categories. This data set is randomly split into two halves. One half is used for estimation, the other serves as holdout data. Each model is evaluated by the log likelihood for the holdout data. Performance of the topic models is disappointing as the holdout log likelihood of the correlated topic model – which is better than Dirichlet allocation - is lower by more than 25,000 compared to the best binary factor analysis model. On the other hand, binary factor analysis on its own is clearly surpassed by both the restricted Boltzmann machine and the deep belief net whose holdout log likelihoods are higher by more than 23,000. Overall, the deep belief net performs best. We also interpret hidden variables discovered by binary factor analysis, the restricted Boltzmann machine and the deep belief net. Hidden variables characterized by the product categories to which they are related differ strongly between these three models. To derive managerial implications we assess the effect of promoting each category on total basket size, i.e., the number of purchased product categories, due to each category's interdependence with all the other categories. The investigated models lead to very different implications as they disagree about which categories are associated with higher basket size increases due to a promotion. Of course, recommendations based on better performing models should be preferred. The impressive performance advantages of the restricted Boltzmann machine and the deep belief net suggest continuing research by appropriate extensions. To include predictors, especially marketing variables such as price, seems to be an obvious next step. It might also be feasible to take a more detailed perspective by considering purchases of brands instead of purchases of product categories.Keywords: binary factor analysis, deep belief net, market basket analysis, restricted Boltzmann machine, topic models
Procedia PDF Downloads 20241518 Meta Model for Optimum Design Objective Function of Steel Frames Subjected to Seismic Loads
Authors: Salah R. Al Zaidee, Ali S. Mahdi
Abstract:
Except for simple problems of statically determinate structures, optimum design problems in structural engineering have implicit objective functions where structural analysis and design are essential within each searching loop. With these implicit functions, the structural engineer is usually enforced to write his/her own computer code for analysis, design, and searching for optimum design among many feasible candidates and cannot take advantage of available software for structural analysis, design, and searching for the optimum solution. The meta-model is a regression model used to transform an implicit objective function into objective one and leads in turn to decouple the structural analysis and design processes from the optimum searching process. With the meta-model, well-known software for structural analysis and design can be used in sequence with optimum searching software. In this paper, the meta-model has been used to develop an explicit objective function for plane steel frames subjected to dead, live, and seismic forces. Frame topology is assumed as predefined based on architectural and functional requirements. Columns and beams sections and different connections details are the main design variables in this study. Columns and beams are grouped to reduce the number of design variables and to make the problem similar to that adopted in engineering practice. Data for the implicit objective function have been generated based on analysis and assessment for many design proposals with CSI SAP software. These data have been used later in SPSS software to develop a pure quadratic nonlinear regression model for the explicit objective function. Good correlations with a coefficient, R2, in the range from 0.88 to 0.99 have been noted between the original implicit functions and the corresponding explicit functions generated with meta-model.Keywords: meta-modal, objective function, steel frames, seismic analysis, design
Procedia PDF Downloads 24541517 Comparative Analysis of Effecting Factors on Fertility by Birth Order: A Hierarchical Approach
Authors: Ali Hesari, Arezoo Esmaeeli
Abstract:
Regarding to dramatic changes of fertility and higher order births during recent decades in Iran, access to knowledge about affecting factors on different birth orders has crucial importance. In this study, According to hierarchical structure of many of social sciences data and the effect of variables of different levels of social phenomena that determine different birth orders in 365 days ending to 1390 census have been explored by multilevel approach. In this paper, 2% individual row data for 1390 census is analyzed by HLM software. Three different hierarchical linear regression models are estimated for data analysis of the first and second, third, fourth and more birth order. Research results displays different outcomes for three models. Individual level variables entered in equation are; region of residence (rural/urban), age, educational level and labor participation status and province level variable is GDP per capita. Results show that individual level variables have different effects in these three models and in second level we have different random and fixed effects in these models.Keywords: fertility, birth order, hierarchical approach, fixe effects, random effects
Procedia PDF Downloads 33941516 A Review Paper on Data Mining and Genetic Algorithm
Authors: Sikander Singh Cheema, Jasmeen Kaur
Abstract:
In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining
Procedia PDF Downloads 59341515 Gender Analysis of the Influence of Sources of Information on the Adoption of Tenera Oil Palm Technology among Smallholder Farmers in Edo State, Nigeria
Authors: Cornelius Michael Ekenta
Abstract:
The research made a gender comparative analysis of the influence of sources of information on the adoption of tenera improved oil palm technology. Purposive, stratified and random sampling techniques were used to sample a total of 292 farmers (155 males and 137 females) for the study. Structured questionnaire was used to obtain primary data used for analysis. Obtained data were analyzed with descriptive statistics and Logit regressions analysis. Findings revealed that radio, extension office, television and farmers’ group were the most preferred sources of information by the farmers both male and female. Also, males perceived information from radio (92%) and farmers’ group (84%) to be available and information from Research Institutes as credible (95%). Similarly, the female perceived information from Research Institutes to be reliable (70%). The study showed that 38% of men adopted the variety, 25% of the women adopted the variety while 32% of both men and women adopted the variety in the study area. Regressions analysis indicated that radio, extension office, television, farmers’ group and research institute were significant at 0.5% of probability for men and female farmers. The study concluded that the adoption of tenera improved oil palm technology was low among male and female farmers though men adopted more than the women. It was recommended therefore that Agricultural Development Programme (ADP) in other states of the country should partner with their state radio and television stations to broadcast agricultural programmes periodically to ensure efficient dissemination of agricultural information to the farmers.Keywords: analysis, Edo, gender, influence, information, sources, tenera
Procedia PDF Downloads 11341514 One of the Missing Pieces of Inclusive Education: Sexual Orientations
Authors: Sıla Uzkul
Abstract:
As a requirement of human rights and children's rights, the basic condition of inclusive education is that it covers all children. However, the reforms made in the context of education in Turkey and around the world include a limited level of inclusiveness. Generally, the inclusiveness mentioned is for individuals who need special education. Educational reforms superficially state that differences are tolerated, but these differences are extremely limited and often do not include sexual orientation. When we look at the education modules of the Ministry of National Education within the scope of inclusive education in Turkey, there are children with special needs, bilingual children, children exposed to violence, children under temporary protection, children affected by migration and terrorism, and children affected by natural disasters. No training modules or inclusion terms regarding sexual orientations could be found. This research aimed to understand the perspectives of research assistants working in the preschool education department regarding sexual orientations within the scope of inclusive education. Six research assistants working in the preschool teaching department at a public university in Ankara (Turkey) participated in this qualitative research study. Participants were determined by typical case sampling, which is one of the purposeful sampling methods. The data of this research was obtained through a "survey consisting of open-ended questions". Raw data from the surveys were analyzed and interpreted using the "content analysis technique" (Yıldırım & Şimşek, 2005). During the data analysis process, the data from the participants were first numbered, then all the data were read, and content analysis was performed, and possible themes, categories, and codes were extracted. The opinions of the participants in the research regarding sexual orientations in inclusive education are presented under three main headings within the scope of the research questions. These are: (a) their views on inclusive education, (b) their views on sexual orientations (c) their views on sexual orientations in the preschool period.Keywords: sexual orientation, inclusive education, child rights, preschool education
Procedia PDF Downloads 6541513 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring
Authors: Seung-Lock Seo
Abstract:
This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.Keywords: data mining, process data, monitoring, safety, industrial processes
Procedia PDF Downloads 40141512 Prediction of Anticancer Potential of Curcumin Nanoparticles by Means of Quasi-Qsar Analysis Using Monte Carlo Method
Authors: Ruchika Goyal, Ashwani Kumar, Sandeep Jain
Abstract:
The experimental data for anticancer potential of curcumin nanoparticles was calculated by means of eclectic data. The optimal descriptors were examined using Monte Carlo method based CORAL SEA software. The statistical quality of the model is following: n = 14, R² = 0.6809, Q² = 0.5943, s = 0.175, MAE = 0.114, F = 26 (sub-training set), n =5, R²= 0.9529, Q² = 0.7982, s = 0.086, MAE = 0.068, F = 61, Av Rm² = 0.7601, ∆R²m = 0.0840, k = 0.9856 and kk = 1.0146 (test set) and n = 5, R² = 0.6075 (validation set). This data can be used to build predictive QSAR models for anticancer activity.Keywords: anticancer potential, curcumin, model, nanoparticles, optimal descriptors, QSAR
Procedia PDF Downloads 32041511 A Fault Analysis Cracked-Rotor-to-Stator Rub and Unbalance by Vibration Analysis Technique
Authors: B. X. Tchomeni, A. A. Alugongo, L. M. Masu
Abstract:
An analytical 4-DOF nonlinear model of a de Laval rotor-stator system based on Energy Principles has been used theoretically and experimentally to investigate fault symptoms in a rotating system. The faults, namely rotor-stator-rub, crack and unbalance are modelled as excitations on the rotor shaft. Mayes steering function is used to simulate the breathing behaviour of the crack. The fault analysis technique is based on waveform signal, orbits and Fast Fourier Transform (FFT) derived from simulated and real measured signals. Simulated and experimental results manifest considerable mutual resemblance of elliptic-shaped orbits and FFT for a same range of test data.Keywords: a breathing crack, fault, FFT, nonlinear, orbit, rotor-stator rub, vibration analysis
Procedia PDF Downloads 30941510 An Online Adaptive Thresholding Method to Classify Google Trends Data Anomalies for Investor Sentiment Analysis
Authors: Duygu Dere, Mert Ergeneci, Kaan Gokcesu
Abstract:
Google Trends data has gained increasing popularity in the applications of behavioral finance, decision science and risk management. Because of Google’s wide range of use, the Trends statistics provide significant information about the investor sentiment and intention, which can be used as decisive factors for corporate and risk management fields. However, an anomaly, a significant increase or decrease, in a certain query cannot be detected by the state of the art applications of computation due to the random baseline noise of the Trends data, which is modelled as an Additive white Gaussian noise (AWGN). Since through time, the baseline noise power shows a gradual change an adaptive thresholding method is required to track and learn the baseline noise for a correct classification. To this end, we introduce an online method to classify meaningful deviations in Google Trends data. Through extensive experiments, we demonstrate that our method can successfully classify various anomalies for plenty of different data.Keywords: adaptive data processing, behavioral finance , convex optimization, online learning, soft minimum thresholding
Procedia PDF Downloads 16941509 The Trigger-DAQ System in the Mu2e Experiment
Authors: Antonio Gioiosa, Simone Doanti, Eric Flumerfelt, Luca Morescalchi, Elena Pedreschi, Gianantonio Pezzullo, Ryan A. Rivera, Franco Spinella
Abstract:
The Mu2e experiment at Fermilab aims to measure the charged-lepton flavour violating neutrino-less conversion of a negative muon into an electron in the field of an aluminum nucleus. With the expected experimental sensitivity, Mu2e will improve the previous limit of four orders of magnitude. The Mu2e data acquisition (DAQ) system provides hardware and software to collect digitized data from the tracker, calorimeter, cosmic ray veto, and beam monitoring systems. Mu2e’s trigger and data acquisition system (TDAQ) uses otsdaq as its solution. developed at Fermilab, otsdaq uses the artdaq DAQ framework and art analysis framework, under-the-hood, for event transfer, filtering, and processing. Otsdaq is an online DAQ software suite with a focus on flexibility and scalability while providing a multi-user, web-based interface accessible through the Chrome or Firefox web browser. The detector read out controller (ROC) from the tracker and calorimeter stream out zero-suppressed data continuously to the data transfer controller (DTC). Data is then read over the PCIe bus to a software filter algorithm that selects events which are finally combined with the data flux that comes from a cosmic ray veto system (CRV).Keywords: trigger, daq, mu2e, Fermilab
Procedia PDF Downloads 15541508 The Influence of Job Recognition and Job Motivation on Organizational Commitment in Public Sector: The Mediation Role of Employee Engagement
Authors: Muhammad Tayyab, Saba Saira
Abstract:
It is an established fact that organizations across the globe consider employees as their assets and try to advance their well-being. However, the local firms of developing countries are mostly profit oriented and do not have much concern about their employees’ engagement or commitment. Like other developing countries, the local organizations of Pakistan are also less concerned about the well-being of their employees. Especially public sector organizations lack concern regarding engagement, satisfaction or commitment of the employees. Therefore, this study aimed at investigating the impact of job recognition and job motivation on organizational commitment in the mediation role of employee engagement. The data were collected from land record officers of board of revenue, Punjab, Pakistan. Structured questionnaire was used to collect data through physically visiting land record officers and also through the internet. A total of 318 land record officers’ responses were finalized to perform data analysis. The data were analyzed through confirmatory factor analysis and structural equation modeling technique. The findings revealed that job recognition and job motivation have direct as well as indirect positive and significant impact on organizational commitment. The limitations, practical implications and future research indications are also explained.Keywords: job motivation, job recognition, employee engagement, employee commitment, public sector, land record officers
Procedia PDF Downloads 13641507 Extracting Opinions from Big Data of Indonesian Customer Reviews Using Hadoop MapReduce
Authors: Veronica S. Moertini, Vinsensius Kevin, Gede Karya
Abstract:
Customer reviews have been collected by many kinds of e-commerce websites selling products, services, hotel rooms, tickets and so on. Each website collects its own customer reviews. The reviews can be crawled, collected from those websites and stored as big data. Text analysis techniques can be used to analyze that data to produce summarized information, such as customer opinions. Then, these opinions can be published by independent service provider websites and used to help customers in choosing the most suitable products or services. As the opinions are analyzed from big data of reviews originated from many websites, it is expected that the results are more trusted and accurate. Indonesian customers write reviews in Indonesian language, which comes with its own structures and uniqueness. We found that most of the reviews are expressed with “daily language”, which is informal, do not follow the correct grammar, have many abbreviations and slangs or non-formal words. Hadoop is an emerging platform aimed for storing and analyzing big data in distributed systems. A Hadoop cluster consists of master and slave nodes/computers operated in a network. Hadoop comes with distributed file system (HDFS) and MapReduce framework for supporting parallel computation. However, MapReduce has weakness (i.e. inefficient) for iterative computations, specifically, the cost of reading/writing data (I/O cost) is high. Given this fact, we conclude that MapReduce function is best adapted for “one-pass” computation. In this research, we develop an efficient technique for extracting or mining opinions from big data of Indonesian reviews, which is based on MapReduce with one-pass computation. In designing the algorithm, we avoid iterative computation and instead adopt a “look up table” technique. The stages of the proposed technique are: (1) Crawling the data reviews from websites; (2) cleaning and finding root words from the raw reviews; (3) computing the frequency of the meaningful opinion words; (4) analyzing customers sentiments towards defined objects. The experiments for evaluating the performance of the technique were conducted on a Hadoop cluster with 14 slave nodes. The results show that the proposed technique (stage 2 to 4) discovers useful opinions, is capable of processing big data efficiently and scalable.Keywords: big data analysis, Hadoop MapReduce, analyzing text data, mining Indonesian reviews
Procedia PDF Downloads 20141506 Graph Based Traffic Analysis and Delay Prediction Using a Custom Built Dataset
Authors: Gabriele Borg, Alexei Debono, Charlie Abela
Abstract:
There on a constant rise in the availability of high volumes of data gathered from multiple sources, resulting in an abundance of unprocessed information that can be used to monitor patterns and trends in user behaviour. Similarly, year after year, Malta is also constantly experiencing ongoing population growth and an increase in mobilization demand. This research takes advantage of data which is continuously being sourced and converting it into useful information related to the traffic problem on the Maltese roads. The scope of this paper is to provide a methodology to create a custom dataset (MalTra - Malta Traffic) compiled from multiple participants from various locations across the island to identify the most common routes taken to expose the main areas of activity. This use of big data is seen being used in various technologies and is referred to as ITSs (Intelligent Transportation Systems), which has been concluded that there is significant potential in utilising such sources of data on a nationwide scale. Furthermore, a series of traffic prediction graph neural network models are conducted to compare MalTra to large-scale traffic datasets.Keywords: graph neural networks, traffic management, big data, mobile data patterns
Procedia PDF Downloads 13341505 Meanings and Concepts of Standardization in Systems Medicine
Authors: Imme Petersen, Wiebke Sick, Regine Kollek
Abstract:
In systems medicine, high-throughput technologies produce large amounts of data on different biological and pathological processes, including (disturbed) gene expressions, metabolic pathways and signaling. The large volume of data of different types, stored in separate databases and often located at different geographical sites have posed new challenges regarding data handling and processing. Tools based on bioinformatics have been developed to resolve the upcoming problems of systematizing, standardizing and integrating the various data. However, the heterogeneity of data gathered at different levels of biological complexity is still a major challenge in data analysis. To build multilayer disease modules, large and heterogeneous data of disease-related information (e.g., genotype, phenotype, environmental factors) are correlated. Therefore, a great deal of attention in systems medicine has been put on data standardization, primarily to retrieve and combine large, heterogeneous datasets into standardized and incorporated forms and structures. However, this data-centred concept of standardization in systems medicine is contrary to the debate in science and technology studies (STS) on standardization that rather emphasizes the dynamics, contexts and negotiations of standard operating procedures. Based on empirical work on research consortia that explore the molecular profile of diseases to establish systems medical approaches in the clinic in Germany, we trace how standardized data are processed and shaped by bioinformatics tools, how scientists using such data in research perceive such standard operating procedures and which consequences for knowledge production (e.g. modeling) arise from it. Hence, different concepts and meanings of standardization are explored to get a deeper insight into standard operating procedures not only in systems medicine, but also beyond.Keywords: data, science and technology studies (STS), standardization, systems medicine
Procedia PDF Downloads 34241504 Discourse Analysis of the Concept of Citizenship in Textbooks in Iran
Authors: Jafar Ahmadi
Abstract:
This research has been done as a discourse analysis of the concept of citizenship in textbooks in Iran. The purpose of this study is to identify the dominant citizenship discourse in textbooks in the content of textbooks. The research method in this research is qualitative and qualitative content analysis. The statistical sample was selected in a purposeful manner and according to the research topic of books related to Persian literature, religious education and social education. The selected theoretical framework of this research is the three theories of citizenship (pre-modern, modern and postmodern). For each of these discourses, components and indicators have been extracted that are the basis of data analysis. The research findings show that the dominant citizenship discourse on the content of Iranian textbooks is pre-modern discourse and is the basis of this type of religious citizenship discourse. Finally, the findings show that the government uses the institution of education to reproduce its power.Keywords: citizenship, textbooks, discourse analysis, religious citizenship, representation
Procedia PDF Downloads 19941503 Effects of Elastic, Plyometric and Strength Training on Selected Anaerobic Factors in Sanandaj Elite Volleyball Players
Authors: Majed Zobairy, Fardin Kalvandi, Kamal Azizbaigi
Abstract:
This research was carried out for evaluation of elastic, plyometric and resistance training on selected anaerobic factors in men volleyball players. For these reason 30 elite volleyball players of Sanandaj city randomly divided into 3 groups as follow: elastic training, plyometric training and resistance training. Pre-exercise tests which include vertical jumping, 50 yard speed running and scat test were done and data were recorded. Specific exercise protocol regimen was done for each group and then post-exercise tests again were done. Data analysis showed that there were significant increases in exercise test in each group. One way ANOVA analysis showed that increases in speed records in elastic group were significantly higher than the other groups (p<0/05),based on research data it seems that elastic training can be a useful method and new approach in improving functional test and training regimen.Keywords: elastic training, plyometric training, strength training, anaerobic power
Procedia PDF Downloads 53041502 PaSA: A Dataset for Patent Sentiment Analysis to Highlight Patent Paragraphs
Authors: Renukswamy Chikkamath, Vishvapalsinhji Ramsinh Parmar, Christoph Hewel, Markus Endres
Abstract:
Given a patent document, identifying distinct semantic annotations is an interesting research aspect. Text annotation helps the patent practitioners such as examiners and patent attorneys to quickly identify the key arguments of any invention, successively providing a timely marking of a patent text. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This semantic annotation process is laborious and time-consuming. To alleviate such a problem, we proposed a dataset to train machine learning algorithms to automate the highlighting process. The contributions of this work are: i) we developed a multi-class dataset of size 150k samples by traversing USPTO patents over a decade, ii) articulated statistics and distributions of data using imperative exploratory data analysis, iii) baseline Machine Learning models are developed to utilize the dataset to address patent paragraph highlighting task, and iv) future path to extend this work using Deep Learning and domain-specific pre-trained language models to develop a tool to highlight is provided. This work assists patent practitioners in highlighting semantic information automatically and aids in creating a sustainable and efficient patent analysis using the aptitude of machine learning.Keywords: machine learning, patents, patent sentiment analysis, patent information retrieval
Procedia PDF Downloads 9341501 Investors' Ratio Analysis and the Profitability of Listed Firms: Evidence from Nigeria
Authors: Abisola Akinola, Akinsulere Femi
Abstract:
The stock market has continually been a source of economic development in most developing countries. This study examined the relationship between investors’ ratio analysis and profitability of quoted companies in Nigeria using secondary data obtained from the annual reports of forty-two (42) companies. The study employed the multiple regression technique to analyze the relationship between investors’ ratio analysis (measured by dividend per share and earning per share) and profitability (measured by the return on equity). The results from the analysis show that investors’ ratio analysis, when measured by earnings per share, have a positive and significant impact on profitability. However, the study noted that investors’ ratio analysis, when measured by dividend per share, tend to have a positive impact on profitability but it is statistically insignificant. By implication, investors and other stakeholders that are interested in investing in stocks can predict the earning capacity of listed firms in the stock market.Keywords: dividend per share, earnings per share, profitability, return on equity
Procedia PDF Downloads 13741500 Data Management System for Environmental Remediation
Authors: Elizaveta Petelina, Anton Sizo
Abstract:
Environmental remediation projects deal with a wide spectrum of data, including data collected during site assessment, execution of remediation activities, and environmental monitoring. Therefore, an appropriate data management is required as a key factor for well-grounded decision making. The Environmental Data Management System (EDMS) was developed to address all necessary data management aspects, including efficient data handling and data interoperability, access to historical and current data, spatial and temporal analysis, 2D and 3D data visualization, mapping, and data sharing. The system focuses on support of well-grounded decision making in relation to required mitigation measures and assessment of remediation success. The EDMS is a combination of enterprise and desktop level data management and Geographic Information System (GIS) tools assembled to assist to environmental remediation, project planning, and evaluation, and environmental monitoring of mine sites. EDMS consists of seven main components: a Geodatabase that contains spatial database to store and query spatially distributed data; a GIS and Web GIS component that combines desktop and server-based GIS solutions; a Field Data Collection component that contains tools for field work; a Quality Assurance (QA)/Quality Control (QC) component that combines operational procedures for QA and measures for QC; Data Import and Export component that includes tools and templates to support project data flow; a Lab Data component that provides connection between EDMS and laboratory information management systems; and a Reporting component that includes server-based services for real-time report generation. The EDMS has been successfully implemented for the Project CLEANS (Clean-up of Abandoned Northern Mines). Project CLEANS is a multi-year, multimillion-dollar project aimed at assessing and reclaiming 37 uranium mine sites in northern Saskatchewan, Canada. The EDMS has effectively facilitated integrated decision-making for CLEANS project managers and transparency amongst stakeholders.Keywords: data management, environmental remediation, geographic information system, GIS, decision making
Procedia PDF Downloads 16341499 Development of a Predictive Model to Prevent Financial Crisis
Authors: Tengqin Han
Abstract:
Delinquency has been a crucial factor in economics throughout the years. Commonly seen in credit card and mortgage, it played one of the crucial roles in causing the most recent financial crisis in 2008. In each case, a delinquency is a sign of the loaner being unable to pay off the debt, and thus may cause a lost of property in the end. Individually, one case of delinquency seems unimportant compared to the entire credit system. China, as an emerging economic entity, the national strength and economic strength has grown rapidly, and the gross domestic product (GDP) growth rate has remained as high as 8% in the past decades. However, potential risks exist behind the appearance of prosperity. Among the risks, the credit system is the most significant one. Due to long term and a large amount of balance of the mortgage, it is critical to monitor the risk during the performance period. In this project, about 300,000 mortgage account data are analyzed in order to develop a predictive model to predict the probability of delinquency. Through univariate analysis, the data is cleaned up, and through bivariate analysis, the variables with strong predictive power are detected. The project is divided into two parts. In the first part, the analysis data of 2005 are split into 2 parts, 60% for model development, and 40% for in-time model validation. The KS of model development is 31, and the KS for in-time validation is 31, indicating the model is stable. In addition, the model is further validation by out-of-time validation, which uses 40% of 2006 data, and KS is 33. This indicates the model is still stable and robust. In the second part, the model is improved by the addition of macroeconomic economic indexes, including GDP, consumer price index, unemployment rate, inflation rate, etc. The data of 2005 to 2010 is used for model development and validation. Compared with the base model (without microeconomic variables), KS is increased from 41 to 44, indicating that the macroeconomic variables can be used to improve the separation power of the model, and make the prediction more accurate.Keywords: delinquency, mortgage, model development, model validation
Procedia PDF Downloads 22841498 Nanowire Sensor Based on Novel Impedance Spectroscopy Approach
Authors: Valeriy M. Kondratev, Ekaterina A. Vyacheslavova, Talgat Shugabaev, Alexander S. Gudovskikh, Alexey D. Bolshakov
Abstract:
Modern sensorics imposes strict requirements on the biosensors characteristics, especially technological feasibility, and selectivity. There is a growing interest in the analysis of human health biological markers, which indirectly testifying the pathological processes in the body. Such markers are acids and alkalis produced by the human, in particular - ammonia and hydrochloric acid, which are found in human sweat, blood, and urine, as well as in gastric juice. Biosensors based on modern nanomaterials, especially low dimensional, can be used for this markers detection. Most classical adsorption sensors based on metal and silicon oxides are considered non-selective, because they identically change their electrical resistance (or impedance) under the action of adsorption of different target analytes. This work demonstrates a feasible frequency-resistive method of electrical impedance spectroscopy data analysis. The approach allows to obtain of selectivity in adsorption sensors of a resistive type. The method potential is demonstrated with analyzis of impedance spectra of silicon nanowires in the presence of NH3 and HCl vapors with concentrations of about 125 mmol/L (2 ppm) and water vapor. We demonstrate the possibility of unambiguous distinction of the sensory signal from NH3 and HCl adsorption. Moreover, the method is found applicable for analysis of the composition of ammonia and hydrochloric acid vapors mixture without water cross-sensitivity. Presented silicon sensor can be used to find diseases of the gastrointestinal tract by the qualitative and quantitative detection of ammonia and hydrochloric acid content in biological samples. The method of data analysis can be directly translated to other nanomaterials to analyze their applicability in the field of biosensory.Keywords: electrical impedance spectroscopy, spectroscopy data analysis, selective adsorption sensor, nanotechnology
Procedia PDF Downloads 11441497 A Low-Cost of Foot Plantar Shoes for Gait Analysis
Authors: Zulkifli Ahmad, Mohd Razlan Azizan, Nasrul Hadi Johari
Abstract:
This paper presents a study on development and conducting of a wearable sensor system for gait analysis measurement. For validation, the method of plantar surface measurement by force plate was prepared. In general gait analysis, force plate generally represents a studies about barefoot in whole steps and do not allow analysis of repeating movement step in normal walking and running. The measurements that were usually perform do not represent the whole daily plantar pressures in the shoe insole and only obtain the ground reaction force. The force plate measurement is usually limited a few step and it is done indoor and obtaining coupling information from both feet during walking is not easily obtained. Nowadays, in order to measure pressure for a large number of steps and obtain pressure in each insole part, it could be done by placing sensors within an insole. With this method, it will provide a method for determine the plantar pressures while standing, walking or running of a shoe wearing subject. Inserting pressure sensors in the insole will provide specific information and therefore the point of the sensor placement will result in obtaining the critical part under the insole. In the wearable shoe sensor project, the device consists left and right shoe insole with ten FSR. Arduino Mega was used as a micro-controller that read the analog input from FSR. The analog inputs were transmitted via bluetooth data transmission that gains the force data in real time on smartphone. Blueterm software which is an android application was used as an interface to read the FSR reading on the shoe wearing subject. The subject consist of two healthy men with different age and weight doing test while standing, walking (1.5 m/s), jogging (5 m/s) and running (9 m/s) on treadmill. The data obtain will be saved on the android device and for making an analysis and comparison graph.Keywords: gait analysis, plantar pressure, force plate, earable sensor
Procedia PDF Downloads 454