Search results for: time-to-event data
23035 Proposal Method of Prediction of the Early Stages of Dementia Using IoT and Magnet Sensors
Authors: João Filipe Papel, Tatsuji Munaka
Abstract:
With society's aging and the number of elderly with dementia rising, researchers have been actively studying how to support the elderly in the early stages of dementia with the objective of allowing them to have a better life quality and as much as possible independence. To make this possible, most researchers in this field are using the Internet Of Things to monitor the elderly activities and assist them in performing them. The most common sensor used to monitor the elderly activities is the Camera sensor due to its easy installation and configuration. The other commonly used sensor is the sound sensor. However, we need to consider privacy when using these sensors. This research aims to develop a system capable of predicting the early stages of dementia based on monitoring and controlling the elderly activities of daily living. To make this system possible, some issues need to be addressed. First, the issue related to elderly privacy when trying to detect their Activities of Daily Living. Privacy when performing detection and monitoring Activities of Daily Living it's a serious concern. One of the purposes of this research is to achieve this detection and monitoring without putting the privacy of the elderly at risk. To make this possible, the study focuses on using an approach based on using Magnet Sensors to collect binary data. The second is to use the data collected by monitoring Activities of Daily Living to predict the early stages of Dementia. To make this possible, the research team suggests developing a proprietary ontology combined with both data-driven and knowledge-driven.Keywords: dementia, activity recognition, magnet sensors, ontology, data driven and knowledge driven, IoT, activities of daily living
Procedia PDF Downloads 10423034 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data
Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone
Abstract:
This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression
Procedia PDF Downloads 13723033 Graph-Based Semantical Extractive Text Analysis
Authors: Mina Samizadeh
Abstract:
In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them), has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. This algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as a result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework, which can be used individually or as a part of generating the summary to overcome coverage problems.Keywords: keyword extraction, n-gram extraction, text summarization, topic clustering, semantic analysis
Procedia PDF Downloads 7123032 One of the Missing Pieces of Inclusive Education: Sexual Orientations
Authors: Sıla Uzkul
Abstract:
As a requirement of human rights and children's rights, the basic condition of inclusive education is that it covers all children. However, the reforms made in the context of education in Turkey and around the world include a limited level of inclusiveness. Generally, the inclusiveness mentioned is for individuals who need special education. Educational reforms superficially state that differences are tolerated, but these differences are extremely limited and often do not include sexual orientation. When we look at the education modules of the Ministry of National Education within the scope of inclusive education in Turkey, there are children with special needs, bilingual children, children exposed to violence, children under temporary protection, children affected by migration and terrorism, and children affected by natural disasters. No training modules or inclusion terms regarding sexual orientations could be found. This research aimed to understand the perspectives of research assistants working in the preschool education department regarding sexual orientations within the scope of inclusive education. Six research assistants working in the preschool teaching department at a public university in Ankara (Turkey) participated in this qualitative research study. Participants were determined by typical case sampling, which is one of the purposeful sampling methods. The data of this research was obtained through a "survey consisting of open-ended questions". Raw data from the surveys were analyzed and interpreted using the "content analysis technique" (Yıldırım & Şimşek, 2005). During the data analysis process, the data from the participants were first numbered, then all the data were read, and content analysis was performed, and possible themes, categories, and codes were extracted. The opinions of the participants in the research regarding sexual orientations in inclusive education are presented under three main headings within the scope of the research questions. These are: (a) their views on inclusive education, (b) their views on sexual orientations (c) their views on sexual orientations in the preschool period.Keywords: sexual orientation, inclusive education, child rights, preschool education
Procedia PDF Downloads 6323031 Discovering the Effects of Meteorological Variables on the Air Quality of Bogota, Colombia, by Data Mining Techniques
Authors: Fabiana Franceschi, Martha Cobo, Manuel Figueredo
Abstract:
Bogotá, the capital of Colombia, is its largest city and one of the most polluted in Latin America due to the fast economic growth over the last ten years. Bogotá has been affected by high pollution events which led to the high concentration of PM10 and NO2, exceeding the local 24-hour legal limits (100 and 150 g/m3 each). The most important pollutants in the city are PM10 and PM2.5 (which are associated with respiratory and cardiovascular problems) and it is known that their concentrations in the atmosphere depend on the local meteorological factors. Therefore, it is necessary to establish a relationship between the meteorological variables and the concentrations of the atmospheric pollutants such as PM10, PM2.5, CO, SO2, NO2 and O3. This study aims to determine the interrelations between meteorological variables and air pollutants in Bogotá, using data mining techniques. Data from 13 monitoring stations were collected from the Bogotá Air Quality Monitoring Network within the period 2010-2015. The Principal Component Analysis (PCA) algorithm was applied to obtain primary relations between all the parameters, and afterwards, the K-means clustering technique was implemented to corroborate those relations found previously and to find patterns in the data. PCA was also used on a per shift basis (morning, afternoon, night and early morning) to validate possible variation of the previous trends and a per year basis to verify that the identified trends have remained throughout the study time. Results demonstrated that wind speed, wind direction, temperature, and NO2 are the most influencing factors on PM10 concentrations. Furthermore, it was confirmed that high humidity episodes increased PM2,5 levels. It was also found that there are direct proportional relationships between O3 levels and wind speed and radiation, while there is an inverse relationship between O3 levels and humidity. Concentrations of SO2 increases with the presence of PM10 and decreases with the wind speed and wind direction. They proved as well that there is a decreasing trend of pollutant concentrations over the last five years. Also, in rainy periods (March-June and September-December) some trends regarding precipitations were stronger. Results obtained with K-means demonstrated that it was possible to find patterns on the data, and they also showed similar conditions and data distribution among Carvajal, Tunal and Puente Aranda stations, and also between Parque Simon Bolivar and las Ferias. It was verified that the aforementioned trends prevailed during the study period by applying the same technique per year. It was concluded that PCA algorithm is useful to establish preliminary relationships among variables, and K-means clustering to find patterns in the data and understanding its distribution. The discovery of patterns in the data allows using these clusters as an input to an Artificial Neural Network prediction model.Keywords: air pollution, air quality modelling, data mining, particulate matter
Procedia PDF Downloads 25823030 Comparative Analysis of Effecting Factors on Fertility by Birth Order: A Hierarchical Approach
Authors: Ali Hesari, Arezoo Esmaeeli
Abstract:
Regarding to dramatic changes of fertility and higher order births during recent decades in Iran, access to knowledge about affecting factors on different birth orders has crucial importance. In this study, According to hierarchical structure of many of social sciences data and the effect of variables of different levels of social phenomena that determine different birth orders in 365 days ending to 1390 census have been explored by multilevel approach. In this paper, 2% individual row data for 1390 census is analyzed by HLM software. Three different hierarchical linear regression models are estimated for data analysis of the first and second, third, fourth and more birth order. Research results displays different outcomes for three models. Individual level variables entered in equation are; region of residence (rural/urban), age, educational level and labor participation status and province level variable is GDP per capita. Results show that individual level variables have different effects in these three models and in second level we have different random and fixed effects in these models.Keywords: fertility, birth order, hierarchical approach, fixe effects, random effects
Procedia PDF Downloads 33923029 Axial Load Capacity of Drilled Shafts from In-Situ Test Data at Semani Site, in Albania
Authors: Neritan Shkodrani, Klearta Rrushi, Anxhela Shaha
Abstract:
Generally, the design of axial load capacity of deep foundations is based on the data provided from field tests, such as SPT (Standard Penetration Test) and CPT (Cone Penetration Test) tests. This paper reports the results of axial load capacity analysis of drilled shafts at a construction site at Semani, in Fier county, Fier prefecture in Albania. In this case, the axial load capacity analyses are based on the data of 416 SPT tests and 12 CPTU tests, which are carried out in this site construction using 12 boreholes (10 borings of a depth 30.0 m and 2 borings of a depth of 80.0m). The considered foundation widths range from 0.5m to 2.5 m and foundation embedment lengths is fixed at a value of 25m. SPT – based analytical methods from the Japanese practice of design (Building Standard Law of Japan) and CPT – based analytical Eslami and Fellenius methods are used for obtaining axial ultimate load capacity of drilled shafts. The considered drilled shaft (25m long and 0.5m - 2.5m in diameter) is analyzed for the soil conditions of each borehole. The values obtained from sets of calculations are shown in different charts. Then the reported axial load capacity values acquired from SPT and CPTU data are compared and some conclusions are found related to the mentioned methods of calculations.Keywords: deep foundations, drilled shafts, axial load capacity, ultimate load capacity, allowable load capacity, SPT test, CPTU test
Procedia PDF Downloads 10423028 A Grounded Theory on Marist Spirituality/Charism from the Perspective of the Lay Marists in the Philippines
Authors: Nino M. Pizarro
Abstract:
To the author’s knowledge, despite the written documents about Marist spirituality/charism, nothing has been done concerning a clear theoretical framework that highlights Marist spirituality/charism from the perspective or lived experience of the lay Marists of St. Marcellin Champagnat. The participants of the study are the lay Marist - educators who are from Marist Schools in the Philippines. Since the study would like to find out the respondents’ own concepts and meanings about Marist spirituality/charism, qualitative methodology is considered the approach to be used in the study. In particular, the study will use the qualitative methods of Barney Glaser. The theory will be generated systematically from data collection, coding and analyzing through memoing, theoretical sampling, sorting and writing and using the constant comparative method. The data collection method that will be employed in this grounded theory research is the in-depth interview that is semi-structured and participant driven. Data collection will be done through snowball sampling that is purposive. The study is considering to come up with a theoretical framework that will help the lay Marists to deepen their understanding of the Marist spirituality/charism and their vocation as lay partners of the Marist Brothers of the Schools.Keywords: grounded theory, Lay Marists, lived experience, Marist spirituality/charism
Procedia PDF Downloads 31123027 Annexing the Strength of Information and Communication Technology (ICT) for Real-time TB Reporting Using TB Situation Room (TSR) in Nigeria: Kano State Experience
Authors: Ibrahim Umar, Ashiru Rajab, Sumayya Chindo, Emmanuel Olashore
Abstract:
INTRODUCTION: Kano is the most populous state in Nigeria and one of the two states with the highest TB burden in the country. The state notifies an average of 8,000+ TB cases quarterly and has the highest yearly notification of all the states in Nigeria from 2020 to 2022. The contribution of the state TB program to the National TB notification varies from 9% to 10% quarterly between the first quarter of 2022 and second quarter of 2023. The Kano State TB Situation Room is an innovative platform for timely data collection, collation and analysis for informed decision in health system. During the 2023 second National TB Testing week (NTBTW) Kano TB program aimed at early TB detection, prevention and treatment. The state TB Situation room provided avenue to the state for coordination and surveillance through real time data reporting, review, analysis and use during the NTBTW. OBJECTIVES: To assess the role of innovative information and communication technology platform for real-time TB reporting during second National TB Testing week in Nigeria 2023. To showcase the NTBTW data cascade analysis using TSR as innovative ICT platform. METHODOLOGY: The State TB deployed a real-time virtual dashboard for NTBTW reporting, analysis and feedback. A data room team was set up who received realtime data using google link. Data received was analyzed using power BI analytic tool with statistical alpha level of significance of <0.05. RESULTS: At the end of the week-long activity and using the real-time dashboard with onsite mentorship of the field workers, the state TB program was able to screen a total of 52,054 people were screened for TB from 72,112 individuals eligible for screening (72% screening rate). A total of 9,910 presumptive TB clients were identified and evaluated for TB leading to diagnosis of 445 TB patients with TB (5% yield from presumptives) and placement of 435 TB patients on treatment (98% percentage enrolment). CONCLUSION: The TB Situation Room (TBSR) has been a great asset to Kano State TB Control Program in meeting up with the growing demand for timely data reporting in TB and other global health responses. The use of real time surveillance data during the 2023 NTBTW has in no small measure improved the TB response and feedback in Kano State. Scaling up this intervention to other disease areas, states and nations is a positive step in the right direction towards global TB eradication.Keywords: tuberculosis (tb), national tb testing week (ntbtw), tb situation rom (tsr), information communication technology (ict)
Procedia PDF Downloads 7123026 Density Measurement of Mixed Refrigerants R32+R1234yf and R125+R290 from 0°C to 100°C and at Pressures up to 10 MPa
Authors: Xiaoci Li, Yonghua Huang, Hui Lin
Abstract:
Optimization of the concentration of components in mixed refrigerants leads to potential improvement of either thermodynamic cycle performance or safety performance of heat pumps and refrigerators. R32+R1234yf and R125+R290 are two promising binary mixed refrigerants for the application of heat pumps working in the cold areas. The p-ρ-T data of these mixtures are one of the fundamental and necessary properties for design and evaluation of the performance of the heat pumps. Although the property data of mixtures can be predicted by the mixing models based on the pure substances incorporated in programs such as the NIST database Refprop, direct property measurement will still be helpful to reveal the true state behaviors and verify the models. Densities of the mixtures of R32+R1234yf an d R125+R290 are measured by an Anton Paar U shape oscillating tube digital densimeter DMA-4500 in the range of temperatures from 0°C to 100 °C and pressures up to 10 MPa. The accuracy of the measurement reaches 0.00005 g/cm³. The experimental data are compared with the predictions by Refprop in the corresponding range of pressure and temperature.Keywords: mixed refrigerant, density measurement, densimeter, thermodynamic property
Procedia PDF Downloads 29623025 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting
Authors: Yiannis G. Smirlis
Abstract:
The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction
Procedia PDF Downloads 16423024 Mapping of Traffic Noise in Riyadh City-Saudi Arabia
Authors: Khaled A. Alsaif, Mosaad A. Foda
Abstract:
The present work aims at development of traffic noise maps for Riyadh City using the software Lima. Road traffic data were estimated or measured as accurate as possible in order to obtain consistent noise maps. The predicted noise levels at some selected sites are validated by actual field measurements, which are obtained by a system that consists of a sound level meter, a GPS receiver and a database to manage the measured data. The maps show that noise levels remain over 50 dBA and can exceed 70 dBA at the nearside of major roads and highways.Keywords: noise pollution, road traffic noise, LimA predictor, GPS
Procedia PDF Downloads 38423023 The Introduction of a Tourniquet Checklist to Identify and Record Tourniquet Related Complications
Authors: Akash Soogumbur
Abstract:
Tourniquets are commonly used in orthopaedic surgery to provide hemostasis during procedures on the upper and lower limbs. However, there is a risk of complications associated with tourniquet use, such as nerve damage, skin necrosis, and compartment syndrome. The British Orthopaedic Association (BOAST) guidelines recommend the use of tourniquets at a pressure of 300 mmHg or less for a maximum of 2 hours. Research Aim: The aim of this study was to evaluate the effectiveness of a tourniquet checklist in improving compliance with the BOAST guidelines. Methodology: This was a retrospective study of all orthopaedic procedures performed at a single institution over a 12-month period. The study population included patients who had a tourniquet applied during surgery. Data were collected from the patients' medical records, including the duration of tourniquet use, the pressure used, and the method of exsanguination. Findings: The results showed that the use of the tourniquet checklist significantly improved compliance with the BOAST guidelines. Prior to the introduction of the checklist, compliance with the guidelines was 83% for the duration of tourniquet use and 73% for pressure used. After the introduction of the checklist, compliance increased to 100% for both duration of tourniquet use and pressure used. Theoretical Importance: The findings of this study suggest that the use of a tourniquet checklist can be an effective way to improve compliance with the BOAST guidelines. This is important because it can help to reduce the risk of complications associated with tourniquet use. Data Collection: Data were collected from the patients' medical records. The data included the following information: Patient demographics, procedure performed, duration of tourniquet use, pressure used, method of exsanguination. Analysis Procedures: The data were analyzed using descriptive statistics. The compliance with the BOAST guidelines was calculated as the percentage of patients who met the guidelines for the duration of tourniquet use and pressure used. Question Addressed: The question addressed by this study was whether the use of a tourniquet checklist could improve compliance with the BOAST guidelines. Conclusion: The results of this study suggest that the use of a tourniquet checklist can be an effective way to improve compliance with the BOAST guidelines. This is important because it can help to reduce the risk of complications associated with tourniquet use.Keywords: tourniquet, pressure, duration, complications, surgery
Procedia PDF Downloads 7123022 Data Analysis for Taxonomy Prediction and Annotation of 16S rRNA Gene Sequences from Metagenome Data
Authors: Suchithra V., Shreedhanya, Kavya Menon, Vidya Niranjan
Abstract:
Skin metagenomics has a wide range of applications with direct relevance to the health of the organism. It gives us insight to the diverse community of microorganisms (the microbiome) harbored on the skin. In the recent years, it has become increasingly apparent that the interaction between skin microbiome and the human body plays a prominent role in immune system development, cancer development, disease pathology, and many other biological implications. Next Generation Sequencing has led to faster and better understanding of environmental organisms and their mutual interactions. This project is studying the human skin microbiome of different individuals having varied skin conditions. Bacterial 16S rRNA data of skin microbiome is downloaded from SRA toolkit provided by NCBI to perform metagenomics analysis. Twelve samples are selected with two controls, and 3 different categories, i.e., sex (male/female), skin type (moist/intermittently moist/sebaceous) and occlusion (occluded/intermittently occluded/exposed). Quality of the data is increased using Cutadapt, and its analysis is done using FastQC. USearch, a tool used to analyze an NGS data, provides a suitable platform to obtain taxonomy classification and abundance of bacteria from the metagenome data. The statistical tool used for analyzing the USearch result is METAGENassist. The results revealed that the top three abundant organisms found were: Prevotella, Corynebacterium, and Anaerococcus. Prevotella is known to be an infectious bacterium found on wound, tooth cavity, etc. Corynebacterium and Anaerococcus are opportunist bacteria responsible for skin odor. This result infers that Prevotella thrives easily in sebaceous skin conditions. Therefore it is better to undergo intermittently occluded treatment such as applying ointments, creams, etc. to treat wound for sebaceous skin type. Exposing the wound should be avoided as it leads to an increase in Prevotella abundance. Moist skin type individuals can opt for occluded or intermittently occluded treatment as they have shown to decrease the abundance of bacteria during treatment.Keywords: bacterial 16S rRNA , next generation sequencing, skin metagenomics, skin microbiome, taxonomy
Procedia PDF Downloads 17223021 Development of a Predictive Model to Prevent Financial Crisis
Authors: Tengqin Han
Abstract:
Delinquency has been a crucial factor in economics throughout the years. Commonly seen in credit card and mortgage, it played one of the crucial roles in causing the most recent financial crisis in 2008. In each case, a delinquency is a sign of the loaner being unable to pay off the debt, and thus may cause a lost of property in the end. Individually, one case of delinquency seems unimportant compared to the entire credit system. China, as an emerging economic entity, the national strength and economic strength has grown rapidly, and the gross domestic product (GDP) growth rate has remained as high as 8% in the past decades. However, potential risks exist behind the appearance of prosperity. Among the risks, the credit system is the most significant one. Due to long term and a large amount of balance of the mortgage, it is critical to monitor the risk during the performance period. In this project, about 300,000 mortgage account data are analyzed in order to develop a predictive model to predict the probability of delinquency. Through univariate analysis, the data is cleaned up, and through bivariate analysis, the variables with strong predictive power are detected. The project is divided into two parts. In the first part, the analysis data of 2005 are split into 2 parts, 60% for model development, and 40% for in-time model validation. The KS of model development is 31, and the KS for in-time validation is 31, indicating the model is stable. In addition, the model is further validation by out-of-time validation, which uses 40% of 2006 data, and KS is 33. This indicates the model is still stable and robust. In the second part, the model is improved by the addition of macroeconomic economic indexes, including GDP, consumer price index, unemployment rate, inflation rate, etc. The data of 2005 to 2010 is used for model development and validation. Compared with the base model (without microeconomic variables), KS is increased from 41 to 44, indicating that the macroeconomic variables can be used to improve the separation power of the model, and make the prediction more accurate.Keywords: delinquency, mortgage, model development, model validation
Procedia PDF Downloads 22823020 Self-Supervised Learning for Hate-Speech Identification
Authors: Shrabani Ghosh
Abstract:
Automatic offensive language detection in social media has become a stirring task in today's NLP. Manual Offensive language detection is tedious and laborious work where automatic methods based on machine learning are only alternatives. Previous works have done sentiment analysis over social media in different ways such as supervised, semi-supervised, and unsupervised manner. Domain adaptation in a semi-supervised way has also been explored in NLP, where the source domain and the target domain are different. In domain adaptation, the source domain usually has a large amount of labeled data, while only a limited amount of labeled data is available in the target domain. Pretrained transformers like BERT, RoBERTa models are fine-tuned to perform text classification in an unsupervised manner to perform further pre-train masked language modeling (MLM) tasks. In previous work, hate speech detection has been explored in Gab.ai, which is a free speech platform described as a platform of extremist in varying degrees in online social media. In domain adaptation process, Twitter data is used as the source domain, and Gab data is used as the target domain. The performance of domain adaptation also depends on the cross-domain similarity. Different distance measure methods such as L2 distance, cosine distance, Maximum Mean Discrepancy (MMD), Fisher Linear Discriminant (FLD), and CORAL have been used to estimate domain similarity. Certainly, in-domain distances are small, and between-domain distances are expected to be large. The previous work finding shows that pretrain masked language model (MLM) fine-tuned with a mixture of posts of source and target domain gives higher accuracy. However, in-domain performance of the hate classifier on Twitter data accuracy is 71.78%, and out-of-domain performance of the hate classifier on Gab data goes down to 56.53%. Recently self-supervised learning got a lot of attention as it is more applicable when labeled data are scarce. Few works have already been explored to apply self-supervised learning on NLP tasks such as sentiment classification. Self-supervised language representation model ALBERTA focuses on modeling inter-sentence coherence and helps downstream tasks with multi-sentence inputs. Self-supervised attention learning approach shows better performance as it exploits extracted context word in the training process. In this work, a self-supervised attention mechanism has been proposed to detect hate speech on Gab.ai. This framework initially classifies the Gab dataset in an attention-based self-supervised manner. On the next step, a semi-supervised classifier trained on the combination of labeled data from the first step and unlabeled data. The performance of the proposed framework will be compared with the results described earlier and also with optimized outcomes obtained from different optimization techniques.Keywords: attention learning, language model, offensive language detection, self-supervised learning
Procedia PDF Downloads 10523019 Time and Cost Prediction Models for Language Classification Over a Large Corpus on Spark
Authors: Jairson Barbosa Rodrigues, Paulo Romero Martins Maciel, Germano Crispim Vasconcelos
Abstract:
This paper presents an investigation of the performance impacts regarding the variation of five factors (input data size, node number, cores, memory, and disks) when applying a distributed implementation of Naïve Bayes for text classification of a large Corpus on the Spark big data processing framework. Problem: The algorithm's performance depends on multiple factors, and knowing before-hand the effects of each factor becomes especially critical as hardware is priced by time slice in cloud environments. Objectives: To explain the functional relationship between factors and performance and to develop linear predictor models for time and cost. Methods: the solid statistical principles of Design of Experiments (DoE), particularly the randomized two-level fractional factorial design with replications. This research involved 48 real clusters with different hardware arrangements. The metrics were analyzed using linear models for screening, ranking, and measurement of each factor's impact. Results: Our findings include prediction models and show some non-intuitive results about the small influence of cores and the neutrality of memory and disks on total execution time, and the non-significant impact of data input scale on costs, although notably impacts the execution time.Keywords: big data, design of experiments, distributed machine learning, natural language processing, spark
Procedia PDF Downloads 12023018 The Developing of Teaching Materials Online for Students in Thailand
Authors: Pitimanus Bunlue
Abstract:
The objectives of this study were to identify the unique characteristics of Salaya Old market, Phutthamonthon, Nakhon Pathom and develop the effective video media to promote the homeland awareness among local people and the characteristic features of this community were collectively summarized based on historical data, community observation, and people’s interview. The acquired data were used to develop a media describing prominent features of the community. The quality of the media was later assessed by interviewing local people in the old market in terms of content accuracy, video, and narration qualities, and sense of homeland awareness after watching the video. The result shows a 6-minute video media containing historical data and outstanding features of this community was developed. Based on the interview, the content accuracy was good. The picture quality and the narration were very good. Most people developed a sense of homeland awareness after watching the video also as well.Keywords: audio-visual, creating homeland awareness, Phutthamonthon Nakhon Pathom, research and development
Procedia PDF Downloads 29123017 A Decision Support System for the Detection of Illicit Substance Production Sites
Authors: Krystian Chachula, Robert Nowak
Abstract:
Manufacturing home-made explosives and synthetic drugs is an increasing problem in Europe. To combat that, a data fusion system is proposed for the detection and localization of production sites in urban environments. The data consists of measurements of properties of wastewater performed by various sensors installed in a sewage network. A four-stage fusion strategy allows detecting sources of waste products from known chemical reactions. First, suspicious measurements are used to compute the amount and position of discharged compounds. Then, this information is propagated through the sewage network to account for missing sensors. The next step is clustering and the formation of tracks. Eventually, tracks are used to reconstruct discharge events. Sensor measurements are simulated by a subsystem based on real-world data. In this paper, different discharge scenarios are considered to show how the parameters of used algorithms affect the effectiveness of the proposed system. This research is a part of the SYSTEM project (SYnergy of integrated Sensors and Technologies for urban sEcured environMent).Keywords: continuous monitoring, information fusion and sensors, internet of things, multisensor fusion
Procedia PDF Downloads 11523016 Implementation of CNV-CH Algorithm Using Map-Reduce Approach
Authors: Aishik Deb, Rituparna Sinha
Abstract:
We have developed an algorithm to detect the abnormal segment/"structural variation in the genome across a number of samples. We have worked on simulated as well as real data from the BAM Files and have designed a segmentation algorithm where abnormal segments are detected. This algorithm aims to improve the accuracy and performance of the existing CNV-CH algorithm. The next-generation sequencing (NGS) approach is very fast and can generate large sequences in a reasonable time. So the huge volume of sequence information gives rise to the need for Big Data and parallel approaches of segmentation. Therefore, we have designed a map-reduce approach for the existing CNV-CH algorithm where a large amount of sequence data can be segmented and structural variations in the human genome can be detected. We have compared the efficiency of the traditional and map-reduce algorithms with respect to precision, sensitivity, and F-Score. The advantages of using our algorithm are that it is fast and has better accuracy. This algorithm can be applied to detect structural variations within a genome, which in turn can be used to detect various genetic disorders such as cancer, etc. The defects may be caused by new mutations or changes to the DNA and generally result in abnormally high or low base coverage and quantification values.Keywords: cancer detection, convex hull segmentation, map reduce, next generation sequencing
Procedia PDF Downloads 13623015 Inferring Human Mobility in India Using Machine Learning
Authors: Asra Yousuf, Ajaykumar Tannirkulum
Abstract:
Inferring rural-urban migration trends can help design effective policies that promote better urban planning and rural development. In this paper, we describe how machine learning algorithms can be applied to predict internal migration decisions of people. We consider data collected from household surveys in Tamil Nadu to train our model. To measure the performance of the model, we use data on past migration from National Sample Survey Organisation of India. The factors for training the model include socioeconomic characteristic of each individual like age, gender, place of residence, outstanding loans, strength of the household, etc. and his past migration history. We perform a comparative analysis of the performance of a number of machine learning algorithm to determine their prediction accuracy. Our results show that machine learning algorithms provide a stronger prediction accuracy as compared to statistical models. Our goal through this research is to propose the use of data science techniques in understanding human decisions and behaviour in developing countries.Keywords: development, migration, internal migration, machine learning, prediction
Procedia PDF Downloads 27123014 Temperament as a Success Determinant in Formative Assessment
Authors: George Fomunyam Kehdinga
Abstract:
Assessment is a vital part of the educational process, and formative assessment is a way of ensuring that higher education achieves the desired effects. Different factors influence how students perform in assessments in general, and formative assessment in particular and temperament is one of such determining factors. This paper which is a qualitative case study of four universities in four different countries examines how the temperamental make up of students either empowers them to perform excellently in formative assessment or incapacitates their performance. These four universities were chosen from Cameroon, South Africa, United Kingdom and the United States of America and three students were chosen from each institution, six of which were undergraduate student and six postgraduate students. Data in this paper was generated through qualitative interviews and document analyses which was preceded by a temperament test. From the data generated, it was discovered that cholerics who are natural leaders, hence do not struggle to express themselves often perform excellently in formative assessment while sanguines on the other hand who are also extroverts like cholerics perform relatively well. Phlegmatics and melancholics performed averagely and poorly respectively in formative assessment because they are naturally prone to fear and hate such activities because they like keeping to themselves. The paper, therefore, suggest that temperament is a success determinant in formative assessment. It also proposes that lecturers need and understanding of temperaments to be able to fully administer formative assessment in the lecturer room. It also suggests that assessment should be balance in the classroom so that some students because of their temperamental make-up are not naturally disadvantaged while others are performing excellently. Lastly, the paper suggests that since formative assessment is a process of generating data, it should be contextualised or given and individualised approach so as to ensure that trustworthy data is generated.Keywords: temperament, formative assessment, academic success, students
Procedia PDF Downloads 24823013 Effect of Measured and Calculated Static Torque on Instantaneous Torque Profile of Switched Reluctance Motor
Authors: Ali Asghar Memon
Abstract:
The simulation modeling of switched reluctance (SR) machine often relies and uses the three data tables identified as static torque characteristics that include flux linkage characteristics, co energy characteristics and static torque characteristics separately. It has been noticed from the literature that the data of static torque used in the simulation model is often calculated so far the literature is concerned. This paper presents the simulation model that include the data of measured and calculated static torque separately to see its effect on instantaneous torque profile of the machine. This is probably for the first time so far the literature review is concerned that static torque from co energy information, and measured static torque directly from experiments are separately used in the model. This research is helpful for accurate modeling of switched reluctance drive.Keywords: static characteristics, current chopping, flux linkage characteristics, switched reluctance motor
Procedia PDF Downloads 29223012 Hardware Implementation on Field Programmable Gate Array of Two-Stage Algorithm for Rough Set Reduct Generation
Authors: Tomasz Grzes, Maciej Kopczynski, Jaroslaw Stepaniuk
Abstract:
The rough sets theory developed by Prof. Z. Pawlak is one of the tools that can be used in the intelligent systems for data analysis and processing. Banking, medicine, image recognition and security are among the possible fields of utilization. In all these fields, the amount of the collected data is increasing quickly, but with the increase of the data, the computation speed becomes the critical factor. Data reduction is one of the solutions to this problem. Removing the redundancy in the rough sets can be achieved with the reduct. A lot of algorithms of generating the reduct were developed, but most of them are only software implementations, therefore have many limitations. Microprocessor uses the fixed word length, consumes a lot of time for either fetching as well as processing of the instruction and data; consequently, the software based implementations are relatively slow. Hardware systems don’t have these limitations and can process the data faster than a software. Reduct is the subset of the decision attributes that provides the discernibility of the objects. For the given decision table there can be more than one reduct. Core is the set of all indispensable condition attributes. None of its elements can be removed without affecting the classification power of all condition attributes. Moreover, every reduct consists of all the attributes from the core. In this paper, the hardware implementation of the two-stage greedy algorithm to find the one reduct is presented. The decision table is used as an input. Output of the algorithm is the superreduct which is the reduct with some additional removable attributes. First stage of the algorithm is calculating the core using the discernibility matrix. Second stage is generating the superreduct by enriching the core with the most common attributes, i.e., attributes that are more frequent in the decision table. Described above algorithm has two disadvantages: i) generating the superreduct instead of reduct, ii) additional first stage may be unnecessary if the core is empty. But for the systems focused on the fast computation of the reduct the first disadvantage is not the key problem. The core calculation can be achieved with a combinational logic block, and thus add respectively little time to the whole process. Algorithm presented in this paper was implemented in Field Programmable Gate Array (FPGA) as a digital device consisting of blocks that process the data in a single step. Calculating the core is done by the comparators connected to the block called 'singleton detector', which detects if the input word contains only single 'one'. Calculating the number of occurrences of the attribute is performed in the combinational block made up of the cascade of the adders. The superreduct generation process is iterative and thus needs the sequential circuit for controlling the calculations. For the research purpose, the algorithm was also implemented in C language and run on a PC. The times of execution of the reduct calculation in a hardware and software were considered. Results show increase in the speed of data processing.Keywords: data reduction, digital systems design, field programmable gate array (FPGA), reduct, rough set
Procedia PDF Downloads 21923011 Digitally Mapping Aboriginal Journey Ways
Authors: Paul Longley Arthur
Abstract:
This paper reports on an Australian Research Council-funded project utilising the Australian digital research infrastructure the ‘Time-Layered Cultural Map of Australia’ (TLCMap) (https://www.tlcmap.org/) [1]. This resource has been developed to help researchers create digital maps from cultural, textual, and historical data, layered with datasets registered on the platform. TLCMap is a set of online tools that allows humanities researchers to compile humanities data using spatio-temporal coordinates – to upload, gather, analyse and visualise data. It is the only purpose-designed, Australian-developed research tool for humanities and social science researchers to identify geographical clusters and parallel journeys by sight. This presentation discusses a series of Aboriginal mapping and visualisation experiments using TLCMap to show how Indigenous knowledge can reconfigure contemporary understandings of space including the urbanised landscape [2, 3]. The research data being generated – investigating the historical movements of Aboriginal people, the distribution of networks, and their relation to land – lends itself to mapping and geo-spatial visualisation and analysis. TLCMap allows researchers to create layers on a 3D map which pinpoint locations with accompanying information, and this has enabled our research team to plot out traditional historical journeys undertaken by Aboriginal people as well as to compile a gazetteer of Aboriginal place names, many of which have largely been undocumented until now [4]. The documented journeys intersect with and overlay many of today’s urban formations including main roads, municipal boundaries, and state borders. The paper questions how such data can be incorporated into a more culturally and ethically responsive understanding of contemporary urban spaces and as well as natural environments [5].Keywords: spatio-temporal mapping, visualisation, Indigenous knowledge, mobility and migration, research infrastructure
Procedia PDF Downloads 1823010 Japanese and Europe Legal Frameworks on Data Protection and Cybersecurity: Asymmetries from a Comparative Perspective
Authors: S. Fantin
Abstract:
This study is the result of the legal research on cybersecurity and data protection within the EUNITY (Cybersecurity and Privacy Dialogue between Europe and Japan) project, aimed at fostering the dialogue between the European Union and Japan. Based on the research undertaken therein, the author offers an outline of the main asymmetries in the laws governing such fields in the two regions. The research is a comparative analysis of the two legal frameworks, taking into account specific provisions, ratio legis and policy initiatives. Recent doctrine was taken into account, too, as well as empirical interviews with EU and Japanese stakeholders and project partners. With respect to the protection of personal data, the European Union has recently reformed its legal framework with a package which includes a regulation (General Data Protection Regulation), and a directive (Directive 680 on personal data processing in the law enforcement domain). In turn, the Japanese law under scrutiny for this study has been the Act on Protection of Personal Information. Based on a comparative analysis, some asymmetries arise. The main ones refer to the definition of personal information and the scope of the two frameworks. Furthermore, the rights of the data subjects are differently articulated in the two regions, while the nature of sanctions take two opposite approaches. Regarding the cybersecurity framework, the situation looks similarly misaligned. Japan’s main text of reference is the Basic Cybersecurity Act, while the European Union has a more fragmented legal structure (to name a few, Network and Information Security Directive, Critical Infrastructure Directive and Directive on the Attacks at Information Systems). On an relevant note, unlike a more industry-oriented European approach, the concept of cyber hygiene seems to be neatly embedded in the Japanese legal framework, with a number of provisions that alleviate operators’ liability by turning such a burden into a set of recommendations to be primarily observed by citizens. With respect to the reasons to fill such normative gaps, these are mostly grounded on three basis. Firstly, the cross-border nature of cybercrime brings to consider both magnitude of the issue and its regulatory stance globally. Secondly, empirical findings from the EUNITY project showed how recent data breaches and cyber-attacks had shared implications between Europe and Japan. Thirdly, the geopolitical context is currently going through the direction of bringing the two regions to significant agreements from a trade standpoint, but also from a data protection perspective (with an imminent signature by both parts of a so-called ‘Adequacy Decision’). The research conducted in this study reveals two asymmetric legal frameworks on cyber security and data protection. With a view to the future challenges presented by the strengthening of the collaboration between the two regions and the trans-national fashion of cybercrime, it is urged that solutions are found to fill in such gaps, in order to allow European Union and Japan to wisely increment their partnership.Keywords: cybersecurity, data protection, European Union, Japan
Procedia PDF Downloads 12323009 The Causality between Corruption and Economic Growth in MENA Countries: A Dynamic Panel-Data Analysis
Authors: Nour Mohamad Fayad
Abstract:
Complex and extensively researched, the impact of corruption on economic growth seems to be intricate. Many experts believe that corruption reduces economic development. However, counterarguments have suggested that corruption either promotes growth and development or has no significant impact on economic performance. Clearly, there is no consensus in the economics literature regarding the possible relationship between corruption and economic development. Corruption's complex and clandestine nature, which makes it difficult to define and measure, is one of the obstacles that must be overcome when investigating its effect on an economy. In an attempt to contribute to the ongoing debate, this study examines the impact of corruption on economic growth in the Middle East and North Africa (MENA) region between 2000 and 2021 using a Customized Corruption Index-CCI and panel data on MENA countries. These countries were selected because they are understudied in the economic literature, and despite the World Bank's recent emphasis on corruption in the developing world, the MENA countries have received little attention. The researcher used Cobb-Douglas functional form to test corruption in MENA using a customized index known as Customized Corruption Index-CCI to track corruption over almost 20 years, then used the dynamic panel data. The findings indicate that there is a positive correlation between corruption and economic growth, but this is not consistent across all MENA nations. First, the relatively recent lack of data from MENA nations. This issue is related to the inaccessibility of data for many MENA countries, particularly regarding the returns on resources, private malfeasance, and other variables in Gulf countries. In addition, the researcher encountered several restrictions, such as electricity and internet outages, due to the fact that he is from Lebanon, a country whose citizens have endured difficult living conditions since the Lebanese crisis began in 2019. Demonstrating a customized index known as Customized Corruption Index-CCI that suits the characteristics of MENA countries to peculiarly measure corruption in this region, the outcome of the Customized Corruption Index-CCI is then compared to the Corruption Perception Index-CPI and Control of Corruption from World Governance Indicator-CC from WGI.Keywords: corruption, economic growth, corruption measurements, empirical review, impact of corruption
Procedia PDF Downloads 7423008 Evaluation of Illegal Hunting of Red Deer and Conservation Policy of Department of Environment in Iran
Authors: Tahere Fazilat
Abstract:
Caspian red deer or maral (Cervus elaphus maral) is the largest type of deer in iran. Maral in the past has lived in the north forests of Iran from the Caspian sea coast, Alborz mountains chain and oak forest of Zagros margin from the Azarbaijan up to fars province. However, the generation of them was completely destroyed in the north west and west of Iran. According to reports about 50 years and out of reach of humans. In the present studies, data were collected from 2004 to 2014 in the Mazandaran state Hyrcanian forest by means of guard of environment and justiciary office of department of environment of Mazandaran in this process the all arrested illegal hunting of red deer and the population census, estimation and the correlation of these data was assayed. We provide a first evaluation of how suitable these methods are by comparing the results with population estimates obtained using cohort analysis, and by analyzing the within-season variation in number of seen deer. The data gave us the future of red deer in northern forest of Iran and the results of policy of department of environment in Iran in red deer conservation.Keywords: illegal hunting, red deer, census, concervation
Procedia PDF Downloads 55223007 Research on Straightening Process Model Based on Iteration and Self-Learning
Authors: Hong Lu, Xiong Xiao
Abstract:
Shaft parts are widely used in machinery industry, however, bending deformation often occurred when this kind of parts is being heat treated. This parts needs to be straightened to meet the requirement of straightness. As for the pressure straightening process, a good straightening stroke algorithm is related to the precision and efficiency of straightening process. In this paper, the relationship between straightening load and deflection during the straightening process is analyzed, and the mathematical model of the straightening process has been established. By the mathematical model, the iterative method is used to solve the straightening stroke. Compared to the traditional straightening stroke algorithm, straightening stroke calculated by this method is much more precise; because it can adapt to the change of material performance parameters. Considering that the straightening method is widely used in the mass production of the shaft parts, knowledge base is used to store the data of the straightening process, and a straightening stroke algorithm based on empirical data is set up. In this paper, the straightening process control model which combine the straightening stroke method based on iteration and straightening stroke algorithm based on empirical data has been set up. Finally, an experiment has been designed to verify the straightening process control model.Keywords: straightness, straightening stroke, deflection, shaft parts
Procedia PDF Downloads 32823006 Effects of Elastic, Plyometric and Strength Training on Selected Anaerobic Factors in Sanandaj Elite Volleyball Players
Authors: Majed Zobairy, Fardin Kalvandi, Kamal Azizbaigi
Abstract:
This research was carried out for evaluation of elastic, plyometric and resistance training on selected anaerobic factors in men volleyball players. For these reason 30 elite volleyball players of Sanandaj city randomly divided into 3 groups as follow: elastic training, plyometric training and resistance training. Pre-exercise tests which include vertical jumping, 50 yard speed running and scat test were done and data were recorded. Specific exercise protocol regimen was done for each group and then post-exercise tests again were done. Data analysis showed that there were significant increases in exercise test in each group. One way ANOVA analysis showed that increases in speed records in elastic group were significantly higher than the other groups (p<0/05),based on research data it seems that elastic training can be a useful method and new approach in improving functional test and training regimen.Keywords: elastic training, plyometric training, strength training, anaerobic power
Procedia PDF Downloads 528