Search results for: clustered data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25217

Search results for: clustered data

25187 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering

Authors: Emiel Caron

Abstract:

Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.

Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics

Procedia PDF Downloads 194
25186 Transcriptome and Metabolome Analysis of a Tomato Solanum Lycopersicum STAYGREEN1 Null Line Generated Using Clustered Regularly Interspaced Short Palindromic Repeats/Cas9 Technology

Authors: Jin Young Kim, Kwon Kyoo Kang

Abstract:

The SGR1 (STAYGREEN1) protein is a critical regulator of plant leaves in chlorophyll degradation and senescence. The functions and mechanisms of tomato SGR1 action are poorly understood and worthy of further investigation. To investigate the function of the SGR1 gene, we generated a SGR1-knockout (KO) null line via clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-mediated gene editing and conducted RNA sequencing and gas chromatography tandem mass spectrometry (GC-MS/MS) analysis to identify the differentially expressed genes. The SlSGR1 (Solanum lycopersicum SGR1) knockout null line clearly showed a turbid brown color with significantly higher chlorophyll and carotenoid content compared to wild-type (WT) fruit. Differential gene expression analysis revealed 728 differentially expressed genes (DEGs) between WT and sgr1 #1-6 line, including 263 and 465 downregulated and upregulated genes, respectively, for which fold change was >2, and the adjusted p-value was <0.05. Most of the DEGs were related to photosynthesis and chloroplast function. In addition, the pigment, carotenoid changes in sgr1 #1-6 line was accumulated of key primary metabolites such as sucrose and its derivatives (fructose, galactinol, raffinose), glycolytic intermediates (glucose, G6P, Fru6P) and tricarboxylic acid cycle (TCA) intermediates (malate and fumarate). Taken together, the transcriptome and metabolite profiles of SGR1-KO lines presented here provide evidence for the mechanisms underlying the effects of SGR1 and molecular pathways involved in chlorophyll degradation and carotenoid biosynthesis.

Keywords: tomato, CRISPR/Cas9, null line, RNA-sequencing, metabolite profiling

Procedia PDF Downloads 121
25185 Potential Ecological Risk Assessment of Selected Heavy Metals in Sediments of Tidal Flat Marsh, the Case Study: Shuangtai Estuary, China

Authors: Chang-Fa Liu, Yi-Ting Wang, Yuan Liu, Hai-Feng Wei, Lei Fang, Jin Li

Abstract:

Heavy metals in sediments can cause adverse ecological effects while it exceeds a given criteria. The present study investigated sediment environmental quality, pollutant enrichment, ecological risk, and source identification for copper, cadmium, lead, zinc, mercury, and arsenic in the sediments collected from tidal flat marsh of Shuangtai estuary, China. The arithmetic mean integrated pollution index, geometric mean integrated pollution index, fuzzy integrated pollution index, and principal component score were used to characterize sediment environmental quality; fuzzy similarity and geo-accumulation Index were used to evaluate pollutant enrichment; correlation matrix, principal component analysis, and cluster analysis were used to identify source of pollution; environmental risk index and potential ecological risk index were used to assess ecological risk. The environmental qualities of sediment are classified to very low degree of contamination or low contamination. The similar order to element background of soil in the Liaohe plain is region of Sanjiaozhou, Honghaitan, Sandaogou, Xiaohe by pollutant enrichment analysis. The source identification indicates that correlations are significantly among metals except between copper and cadmium. Cadmium, lead, zinc, mercury, and arsenic will be clustered in the same clustering as the first principal component. Copper will be clustered as second principal component. The environmental risk assessment level will be scaled to no risk in the studied area. The order of potential ecological risk is As > Cd > Hg > Cu > Pb > Zn.

Keywords: ecological risk assessment, heavy metals, sediment, marsh, Shuangtai estuary

Procedia PDF Downloads 347
25184 A Technique for Image Segmentation Using K-Means Clustering Classification

Authors: Sadia Basar, Naila Habib, Awais Adnan

Abstract:

The paper presents the Technique for Image Segmentation Using K-Means Clustering Classification. The presented algorithms were specific, however, missed the neighboring information and required high-speed computerized machines to run the segmentation algorithms. Clustering is the process of partitioning a group of data points into a small number of clusters. The proposed method is content-aware and feature extraction method which is able to run on low-end computerized machines, simple algorithm, required low-quality streaming, efficient and used for security purpose. It has the capability to highlight the boundary and the object. At first, the user enters the data in the representation of the input. Then in the next step, the digital image is converted into groups clusters. Clusters are divided into many regions. The same categories with same features of clusters are assembled within a group and different clusters are placed in other groups. Finally, the clusters are combined with respect to similar features and then represented in the form of segments. The clustered image depicts the clear representation of the digital image in order to highlight the regions and boundaries of the image. At last, the final image is presented in the form of segments. All colors of the image are separated in clusters.

Keywords: clustering, image segmentation, K-means function, local and global minimum, region

Procedia PDF Downloads 376
25183 Spatiotemporal Propagation and Pattern of Epileptic Spike Predict Seizure Onset Zone

Authors: Mostafa Mohammadpour, Christoph Kapeller, Christy Li, Josef Scharinger, Christoph Guger

Abstract:

Interictal spikes provide valuable information on electrocorticography (ECoG), which aids in surgical planning for patients who suffer from refractory epilepsy. However, the shape and temporal dynamics of these spikes remain unclear. The purpose of this work was to analyze the shape of interictal spikes and measure their distance to the seizure onset zone (SOZ) to use in epilepsy surgery. Thirteen patients' data from the iEEG portal were retrospectively studied. For analysis, half an hour of ECoG data was used from each patient, with the data being truncated before the onset of a seizure. Spikes were first detected and grouped in a sequence, then clustered into interictal epileptiform discharges (IEDs) and non-IED groups using two-step clustering. The distance of the spikes from IED and non-IED groups to SOZ was quantified and compared using the Wilcoxon rank-sum test. Spikes in the IED group tended to be in SOZ or close to it, while spikes in the non-IED group were in distance of SOZ or non-SOZ area. At the group level, the distribution for sharp wave, positive baseline shift, slow wave, and slow wave to sharp wave ratio was significantly different for IED and non-IED groups. The distance of the IED cluster was 10.00mm and significantly closer to the SOZ than the 17.65mm for non-IEDs. These findings provide insights into the shape and spatiotemporal dynamics of spikes that could influence the network mechanisms underlying refractory epilepsy.

Keywords: spike propagation, spike pattern, clustering, SOZ

Procedia PDF Downloads 63
25182 Mainstreaming Willingness among Black Owned Informal Small Micro Micro Enterprises in South Africa

Authors: Harris Maduku, Irrshad Kaseeram

Abstract:

The objective of this paper is to understand the factors behind the formalisation willingness of South African black owned SMMEs. Cross-sectional data were collected using a questionnaire from 390 informal businesses in Johannesburg and Pretoria using stratified random sampling and clustered sampling. This study employed a multinomial logistic regression to quantitatively understand what encourages informal SMMEs to be willing to mainstreaming their operations. We find government support, corruption, employment compensation, family labour, success perception, education status, age and financing as key drivers on willingness of SMMEs to formalize their operations. The findings of our study points to government departments to invest more on both financial and non-financial strategies like capacity building and business education on informal SMMEs to cultivate their willingness to mainstream.

Keywords: mainstreaming, transition, informal, willingness, multinomial logit

Procedia PDF Downloads 154
25181 Distance and Coverage: An Assessment of Location-Allocation Models for Fire Stations in Kuwait City, Kuwait

Authors: Saad M. Algharib

Abstract:

The major concern of planners when placing fire stations is finding their optimal locations such that the fire companies can reach fire locations within reasonable response time or distance. Planners are also concerned with the numbers of fire stations that are needed to cover all service areas and the fires, as demands, with standard response time or distance. One of the tools for such analysis is location-allocation models. Location-allocation models enable planners to determine the optimal locations of facilities in an area in order to serve regional demands in the most efficient way. The purpose of this study is to examine the geographic distribution of the existing fire stations in Kuwait City. This study utilized location-allocation models within the Geographic Information System (GIS) environment and a number of statistical functions to assess the current locations of fire stations in Kuwait City. Further, this study investigated how well all service areas are covered and how many and where additional fire stations are needed. Four different location-allocation models were compared to find which models cover more demands than the others, given the same number of fire stations. This study tests many ways to combine variables instead of using one variable at a time when applying these models in order to create a new measurement that influences the optimal locations for locating fire stations. This study also tests how location-allocation models are sensitive to different levels of spatial dependency. The results indicate that there are some districts in Kuwait City that are not covered by the existing fire stations. These uncovered districts are clustered together. This study also identifies where to locate the new fire stations. This study provides users of these models a new variable that can assist them to select the best locations for fire stations. The results include information about how the location-allocation models behave in response to different levels of spatial dependency of demands. The results show that these models perform better with clustered demands. From the additional analysis carried out in this study, it can be concluded that these models applied differently at different spatial patterns.

Keywords: geographic information science, GIS, location-allocation models, geography

Procedia PDF Downloads 177
25180 Relative Entropy Used to Determine the Divergence of Cells in Single Cell RNA Sequence Data Analysis

Authors: An Chengrui, Yin Zi, Wu Bingbing, Ma Yuanzhu, Jin Kaixiu, Chen Xiao, Ouyang Hongwei

Abstract:

Single cell RNA sequence (scRNA-seq) is one of the effective tools to study transcriptomics of biological processes. Recently, similarity measurement of cells is Euclidian distance or its derivatives. However, the process of scRNA-seq is a multi-variate Bernoulli event model, thus we hypothesize that it would be more efficient when the divergence between cells is valued with relative entropy than Euclidian distance. In this study, we compared the performances of Euclidian distance, Spearman correlation distance and Relative Entropy using scRNA-seq data of the early, medial and late stage of limb development generated in our lab. Relative Entropy is better than other methods according to cluster potential test. Furthermore, we developed KL-SNE, an algorithm modifying t-SNE whose definition of divergence between cells Euclidian distance to Kullback–Leibler divergence. Results showed that KL-SNE was more effective to dissect cell heterogeneity than t-SNE, indicating the better performance of relative entropy than Euclidian distance. Specifically, the chondrocyte expressing Comp was clustered together with KL-SNE but not with t-SNE. Surprisingly, cells in early stage were surrounded by cells in medial stage in the processing of KL-SNE while medial cells neighbored to late stage with the process of t-SNE. This results parallel to Heatmap which showed cells in medial stage were more heterogenic than cells in other stages. In addition, we also found that results of KL-SNE tend to follow Gaussian distribution compared with those of the t-SNE, which could also be verified with the analysis of scRNA-seq data from another study on human embryo development. Therefore, it is also an effective way to convert non-Gaussian distribution to Gaussian distribution and facilitate the subsequent statistic possesses. Thus, relative entropy is potentially a better way to determine the divergence of cells in scRNA-seq data analysis.

Keywords: Single cell RNA sequence, Similarity measurement, Relative Entropy, KL-SNE, t-SNE

Procedia PDF Downloads 340
25179 Laser Data Based Automatic Generation of Lane-Level Road Map for Intelligent Vehicles

Authors: Zehai Yu, Hui Zhu, Linglong Lin, Huawei Liang, Biao Yu, Weixin Huang

Abstract:

With the development of intelligent vehicle systems, a high-precision road map is increasingly needed in many aspects. The automatic lane lines extraction and modeling are the most essential steps for the generation of a precise lane-level road map. In this paper, an automatic lane-level road map generation system is proposed. To extract the road markings on the ground, the multi-region Otsu thresholding method is applied, which calculates the intensity value of laser data that maximizes the variance between background and road markings. The extracted road marking points are then projected to the raster image and clustered using a two-stage clustering algorithm. Lane lines are subsequently recognized from these clusters by the shape features of their minimum bounding rectangle. To ensure the storage efficiency of the map, the lane lines are approximated to cubic polynomial curves using a Bayesian estimation approach. The proposed lane-level road map generation system has been tested on urban and expressway conditions in Hefei, China. The experimental results on the datasets show that our method can achieve excellent extraction and clustering effect, and the fitted lines can reach a high position accuracy with an error of less than 10 cm.

Keywords: curve fitting, lane-level road map, line recognition, multi-thresholding, two-stage clustering

Procedia PDF Downloads 128
25178 A Concept for Flexible Battery Cell Manufacturing from Low to Medium Volumes

Authors: Tim Giesen, Raphael Adamietz, Pablo Mayer, Philipp Stiefel, Patrick Alle, Dirk Schlenker

Abstract:

The competitiveness and success of new electrical energy storages such as battery cells are significantly dependent on a short time-to-market. Producers who decide to supply new battery cells to the market need to be easily adaptable in manufacturing with respect to the early customers’ needs in terms of cell size, materials, delivery time and quantity. In the initial state, the required output rates do not yet allow the producers to have a fully automated manufacturing line nor to supply handmade battery cells. Yet there was no solution for manufacturing battery cells in low to medium volumes in a reproducible way. Thus, in terms of cell format and output quantity, a concept for the flexible assembly of battery cells was developed by the Fraunhofer-Institute for Manufacturing Engineering and Automation. Based on clustered processes, the modular system platform can be modified, enlarged or retrofitted in a short time frame according to the ordered product. The paper shows the analysis of the production steps from a conventional battery cell assembly line. Process solutions were found by using I/O-analysis, functional structures, and morphological boxes. The identified elementary functions were subsequently clustered by functional coherences for automation solutions and thus the single process cluster was generated. The result presented in this paper enables to manufacture different cell products on the same production system using seven process clusters. The paper shows the solution for a batch-wise flexible battery cell production using advanced process control. Further, the performed tests and benefits by using the process clusters as cyber-physical systems for an integrated production and value chain are discussed. The solution lowers the hurdles for SMEs to launch innovative cell products on the global market.

Keywords: automation, battery production, carrier, advanced process control, cyber-physical system

Procedia PDF Downloads 337
25177 The Stock Price Effect of Apple Keynotes

Authors: Ethan Petersen

Abstract:

In this paper, we analyze the volatility of Apple’s stock beginning January 3, 2005 up to October 9, 2014, then focus on a range from 30 days prior to each product announcement until 30 days after. Product announcements are filtered; announcements whose 60 day range is devoid of other events are separated. This filtration is chosen to isolate, and study, a potential cross-effect. Concerning Apple keynotes, there are two significant dates: the day the invitations to the event are received and the day of the event itself. As such, the statistical analysis is conducted for both invite-centered and event-centered time frames. A comparison to the VIX is made to determine if the trend is simply following the market or deviating. Regardless of the filtration, we find that there is a clear deviation from the market. Comparing these data sets, there are significantly different trends: isolated events have a constantly decreasing, erratic trend in volatility but an increasing, linear trend is observed for clustered events. According to the Efficient Market Hypothesis, we would expect a change when new information is publicly known and the results of this study support this claim.

Keywords: efficient market hypothesis, event study, volatility, VIX

Procedia PDF Downloads 280
25176 Knowledge Representation Based on Interval Type-2 CFCM Clustering

Authors: Lee Myung-Won, Kwak Keun-Chang

Abstract:

This paper is concerned with knowledge representation and extraction of fuzzy if-then rules using Interval Type-2 Context-based Fuzzy C-Means clustering (IT2-CFCM) with the aid of fuzzy granulation. This proposed clustering algorithm is based on information granulation in the form of IT2 based Fuzzy C-Means (IT2-FCM) clustering and estimates the cluster centers by preserving the homogeneity between the clustered patterns from the IT2 contexts produced in the output space. Furthermore, we can obtain the automatic knowledge representation in the design of Radial Basis Function Networks (RBFN), Linguistic Model (LM), and Adaptive Neuro-Fuzzy Networks (ANFN) from the numerical input-output data pairs. We shall focus on a design of ANFN in this paper. The experimental results on an estimation problem of energy performance reveal that the proposed method showed a good knowledge representation and performance in comparison with the previous works.

Keywords: IT2-FCM, IT2-CFCM, context-based fuzzy clustering, adaptive neuro-fuzzy network, knowledge representation

Procedia PDF Downloads 322
25175 Heterogeneity of Soil Moisture and Its Impacts on the Mountainous Watershed Hydrology in Northwest China

Authors: Chansheng He, Zhongfu Wang, Xiao Bai, Jie Tian, Xin Jin

Abstract:

Heterogeneity of soil hydraulic properties directly affects hydrological processes at different scales. Understanding heterogeneity of soil hydraulic properties such as soil moisture is therefore essential for modeling watershed ecohydrological processes, particularly in hard to access, topographically complex mountainous watersheds. This study maps spatial variations of soil moisture by in situ observation network that consists of sampling points, zones, and tributaries, and monitors corresponding hydrological variables of air and soil temperatures, evapotranspiration, infiltration, and runoff in the Upper Reach of the Heihe River Watershed, a second largest inland river (terminal lake) with a drainage area of over 128,000 km² in Northwest China. Subsequently, the study uses a hydrological model, SWAT (Soil and Water Assessment Tool) to simulate the effects of heterogeneity of soil moisture on watershed hydrological processes. The spatial clustering method, Full-Order-CLK was employed to derive five soil heterogeneous zones (Configuration 97, 80, 65, 40, and 20) for soil input to SWAT. Results show the simulations by the SWAT model with the spatially clustered soil hydraulic information from the field sampling data had much better representation of the soil heterogeneity and more accurate performance than the model using the average soil property values for each soil type derived from the coarse soil datasets. Thus, incorporating detailed field sampling soil heterogeneity data greatly improves performance in hydrologic modeling.

Keywords: heterogeneity, soil moisture, SWAT, up-scaling

Procedia PDF Downloads 346
25174 Identification and Molecular Profiling of A Family I Cystatin Homologue from Sebastes schlegeli Deciphering Its Putative Role in Host Immunity

Authors: Don Anushka Sandaruwan Elvitigala, P. D. S. U. Wickramasinghe, Jehee Lee

Abstract:

Cystatins are a large superfamily of proteins which act as reversible inhibitors of cysteine proteases. Papain proteases and cysteine cathepsins are predominant substrates of cystatins. Cystatin superfamily can be further clustered into three groups as Stefins, Cystatins, and Kininogens. Among them, stefines are also known as family 1 cystatins which harbors cystatin Bs and cystatin As. In this study, a homologue of family one cystatins more close to cystatin Bs was identified from Korean black rockfish (Sebastes schlegeli) using a prior constructed cDNA (complementary deoxyribonucleic acid) database and designated as RfCyt1. The full-length cDNA of RfCyt1 consisted of 573 bp, with a coding region of 294 bp. It comprised a 5´-untranslated region (UTR) of 55 bp, and 3´-UTR of 263 bp. The coding sequence encodes a polypeptide consisting of 97 amino acids with a predicted molecular weight of 11kDa and theoretical isoelectric point of 6.3. The RfCyt1 shared homology with other teleosts and vertebrate species and consisted conserved features of cystatin family signature including single cystatin-like domain, cysteine protease inhibitory signature of pentapeptide (QXVXG) consensus sequence and N-terminal two conserved neighboring glycine (⁸GG⁹) residues. As expected, phylogenetic reconstruction developed using the neighbor-joining method showed that RfCyt1 is clustered with the cystatin family 1 members, in which more closely with its teleostan orthologues. An SYBR Green qPCR (quantitative polymerase chain reaction) assay was performed to quantify the RfCytB transcripts in different tissues in healthy and immune stimulated fish. RfCyt1 was ubiquitously expressed in all tissue types of healthy animals with gill and spleen being the highest. Temporal expression of RfCyt1 displayed significant up-regulation upon infection with Aeromonas salmonicida. Recombinantly expressed RfCyt1 showed concentration-dependent papain inhibitory activity. Collectively these findings evidence for detectable protease inhibitory and immunity relevant roles of RfCyt1 in Sebastes schlegeli.

Keywords: Sebastes schlegeli, family 1 cystatin, immune stimulation, expressional modulation

Procedia PDF Downloads 136
25173 Data Transformations in Data Envelopment Analysis

Authors: Mansour Mohammadpour

Abstract:

Data transformation refers to the modification of any point in a data set by a mathematical function. When applying transformations, the measurement scale of the data is modified. Data transformations are commonly employed to turn data into the appropriate form, which can serve various functions in the quantitative analysis of the data. This study addresses the investigation of the use of data transformations in Data Envelopment Analysis (DEA). Although data transformations are important options for analysis, they do fundamentally alter the nature of the variable, making the interpretation of the results somewhat more complex.

Keywords: data transformation, data envelopment analysis, undesirable data, negative data

Procedia PDF Downloads 20
25172 Prevalence and Spatial Distribution of Anaemia in Ethiopia using 2011 EDHS

Authors: Bedilu A. Ejigu, Eshetu Wencheko, Kiros Berhane

Abstract:

Anaemia is a condition in which the haemoglobin concentration falls below an established cut-off value due to a decrease in the number and size of red blood cells. The current study aimed to assess the spatial pattern and identify predictors related to anaemia using the third Ethiopian demographic health survey which was conducted in 2010. To achieve this objective, this study took into account the clustered nature of the data. As a result, multilevel modeling has been used in the statistical analysis. For analysis purpose, only complete cases from 15,909 females, and 13,903 males were considered. Among all subjects who agreed for haemoglobin test, 5.49 %males, and 19.86% females were anaemic. In both binary and ordinal outcome modeling approaches, educational level, age, wealth index, BMI and HIV status were identified to be significant predictors for anaemia prevalence. Furthermore, it was noted that pregnant women were more anaemic than non-pregnant women. As revealed by Moran's I test, significant spatial autocorrelation was noted across clusters. The risk of anaemia was found to vary across different regions, and higher prevalence was observed in Somali and Affar region.

Keywords: anaemia, Moran's I test, multilevel models, spatial pattern

Procedia PDF Downloads 424
25171 Spatial Temporal Change of COVID-19 Vaccination Condition in the US: An Exploration Based on Space Time Cube

Authors: Yue Hao

Abstract:

COVID-19 vaccines not only protect individuals but society as a whole. In this case, having an understanding of the change and trend of vaccination conditions may shed some light on revising and making up-to-date policies regarding large-scale public health promotions and calls in order to lead and encourage the adoption of COVID-19 vaccines. However, vaccination status change over time and vary from place to place hidden patterns that were not fully explored in previous research. In our research, we took advantage of the spatial-temporal analytical methods in the domain of geographic information science and captured the spatial-temporal changes regarding COVID-19 vaccination status in the United States during 2020 and 2021. After conducting the emerging hot spots analysis on both the state level data of the US and county level data of California we found that: (1) at the macroscopic level, there is a continuously increasing trend of the vaccination rate in the US, but there is a variance on the spatial clusters at county level; (2) spatial hotspots and clusters with high vaccination amount over time were clustered around the west and east coast in regions like California and New York City where are densely populated with considerable economy conditions; (3) in terms of the growing trend of the daily vaccination among, Los Angeles County alone has very high statistics and dramatic increases over time. We hope that our findings can be valuable guidance for supporting future decision-making regarding vaccination policies as well as directing new research on relevant topics.

Keywords: COVID-19 vaccine, GIS, space time cube, spatial-temporal analysis

Procedia PDF Downloads 79
25170 Examining the Teaching and Learning Needs of Science and Mathematics Educators in South Africa

Authors: M. Shaheed Hartley

Abstract:

There has been increasing pressure on education researchers and practitioners at higher education institutions to focus on the development of South Africa’s rural and peri-urban communities and improving their quality of life. Many tertiary institutions are obliged to review their outreach interventions in schools. To ensure that the support provided to schools is still relevant, a systemic evaluation of science educator needs is central to this process. These prioritised needs will serve as guide not only for the outreach projects of tertiary institutions, but also to service providers in general so that the process of addressing educators needs become coordinated, organised and delivered in a systemic manner. This paper describes one area of a broader needs assessment exercise to collect data regarding the needs of educators in a district of 45 secondary schools in the Western Cape Province of South Africa. This research focuses on the needs and challenges faced by science educators at these schools as articulated by the relevant stakeholders. The objectives of this investigation are two-fold: (1) to create a data base that will capture the needs and challenges identified by science educators of the selected secondary schools; and (2) to develop a needs profile for each of the participating secondary schools that will serve as a strategic asset to be shared with the various service providers as part of a community of practice whose core business is to support science educators and science education at large. The data was collected by a means of a needs assessment questionnaire (NAQ) which was developed in both actual and preferred versions. An open-ended questionnaire was also administered which allowed teachers to express their views. The categories of the questionnaire were predetermined by participating researchers, educators and education department officials. Group interviews were also held with the science teachers at each of the schools. An analysis of the data revealed important trends in terms of science educator needs and identified schools that can be clustered around priority needs, logistic reasoning and educator profiles. The needs database also provides opportunity for the community of practice to strategise and coordinate their interventions.

Keywords: needs assessment, science and mathematics education, evaluation, teaching and learning, South Africa

Procedia PDF Downloads 182
25169 Sarcasm Recognition System Using Hybrid Tone-Word Spotting Audio Mining Technique

Authors: Sandhya Baskaran, Hari Kumar Nagabushanam

Abstract:

Sarcasm sentiment recognition is an area of natural language processing that is being probed into in the recent times. Even with the advancements in NLP, typical translations of words, sentences in its context fail to provide the exact information on a sentiment or emotion of a user. For example, if something bad happens, the statement ‘That's just what I need, great! Terrific!’ is expressed in a sarcastic tone which could be misread as a positive sign by any text-based analyzer. In this paper, we are presenting a unique real time ‘word with its tone’ spotting technique which would provide the sentiment analysis for a tone or pitch of a voice in combination with the words being expressed. This hybrid approach increases the probability for identification of special sentiment like sarcasm much closer to the real world than by mining text or speech individually. The system uses a tone analyzer such as YIN-FFT which extracts pitch segment-wise that would be used in parallel with a speech recognition system. The clustered data is classified for sentiments and sarcasm score for each of it determined. Our Simulations demonstrates the improvement in f-measure of around 12% compared to existing detection techniques with increased precision and recall.

Keywords: sarcasm recognition, tone-word spotting, natural language processing, pitch analyzer

Procedia PDF Downloads 293
25168 Mycobacterium tuberculosis and Molecular Epidemiology: An Overview

Authors: Asho Ali

Abstract:

Tuberculosis is a disease of grave concern which infects one-third of the global population. The high incidence of tuberculosis is further compounded by the increasing emergence of drug resistant strains including multi drug resistant (MDR). Global incidence MDR-TB is ~4%. Molecular epidemiological studies, based on the assumption that patients infected with clustered strains are epidemiologically linked, have helped understand the transmission dynamics of disease. It has also helped to investigate the basis of variation in Mycobacterium tuberculosis (MTB) strains, differences in transmission, and severity of disease or drug resistance mechanisms from across the globe. This has helped in developing strategies for the treatment and prevention of the disease including MDR.

Keywords: Mycobcaterium tuberculosis, molecular epidemiology, drug resistance, disease

Procedia PDF Downloads 403
25167 Impact Location From Instrumented Mouthguard Kinematic Data In Rugby

Authors: Jazim Sohail, Filipe Teixeira-Dias

Abstract:

Mild traumatic brain injury (mTBI) within non-helmeted contact sports is a growing concern due to the serious risk of potential injury. Extensive research is being conducted looking into head kinematics in non-helmeted contact sports utilizing instrumented mouthguards that allow researchers to record accelerations and velocities of the head during and after an impact. This does not, however, allow the location of the impact on the head, and its magnitude and orientation, to be determined. This research proposes and validates two methods to quantify impact locations from instrumented mouthguard kinematic data, one using rigid body dynamics, the other utilizing machine learning. The rigid body dynamics technique focuses on establishing and matching moments from Euler’s and torque equations in order to find the impact location on the head. The methodology is validated with impact data collected from a lab test with the dummy head fitted with an instrumented mouthguard. Additionally, a Hybrid III Dummy head finite element model was utilized to create synthetic kinematic data sets for impacts from varying locations to validate the impact location algorithm. The algorithm calculates accurate impact locations; however, it will require preprocessing of live data, which is currently being done by cross-referencing data timestamps to video footage. The machine learning technique focuses on eliminating the preprocessing aspect by establishing trends within time-series signals from instrumented mouthguards to determine the impact location on the head. An unsupervised learning technique is used to cluster together impacts within similar regions from an entire time-series signal. The kinematic signals established from mouthguards are converted to the frequency domain before using a clustering algorithm to cluster together similar signals within a time series that may span the length of a game. Impacts are clustered within predetermined location bins. The same Hybrid III Dummy finite element model is used to create impacts that closely replicate on-field impacts in order to create synthetic time-series datasets consisting of impacts in varying locations. These time-series data sets are used to validate the machine learning technique. The rigid body dynamics technique provides a good method to establish accurate impact location of impact signals that have already been labeled as true impacts and filtered out of the entire time series. However, the machine learning technique provides a method that can be implemented with long time series signal data but will provide impact location within predetermined regions on the head. Additionally, the machine learning technique can be used to eliminate false impacts captured by sensors saving additional time for data scientists using instrumented mouthguard kinematic data as validating true impacts with video footage would not be required.

Keywords: head impacts, impact location, instrumented mouthguard, machine learning, mTBI

Procedia PDF Downloads 217
25166 Modified Active (MA) Algorithm to Generate Semantic Web Related Clustered Hierarchy for Keyword Search

Authors: G. Leena Giri, Archana Mathur, S. H. Manjula, K. R. Venugopal, L. M. Patnaik

Abstract:

Keyword search in XML documents is based on the notion of lowest common ancestors in the labelled trees model of XML documents and has recently gained a lot of research interest in the database community. In this paper, we propose the Modified Active (MA) algorithm which is an improvement over the active clustering algorithm by taking into consideration the entity aspect of the nodes to find the level of the node pertaining to a particular keyword input by the user. A portion of the bibliography database is used to experimentally evaluate the modified active algorithm and results show that it performs better than the active algorithm. Our modification improves the response time of the system and thereby increases the efficiency of the system.

Keywords: keyword matching patterns, MA algorithm, semantic search, knowledge management

Procedia PDF Downloads 413
25165 A Study of Social Media Users’ Switching Behavior

Authors: Chiao-Chen Chang, Yang-Chieh Chin

Abstract:

Social media has created a change in the way the network community is clustered, especially from the location of the community, from the original virtual space to the intertwined network, and thus the communication between people will change from face to face communication to social media-based communication model. However, social media users who have had a fixed engagement may have an intention to switch to another service provider because of the emergence of new forms of social media. For example, some of Facebook or Twitter users switched to Instagram in 2014 because of social media messages or image overloads, and users may seek simpler and instant social media to become their main social networking tool. This study explores the impact of system features overload, information overload, social monitoring concerns, problematic use and privacy concerns as the antecedents on social media fatigue, dissatisfaction, and alternative attractiveness; further influence social media switching. This study also uses the online questionnaire survey method to recover the sample data, and then confirm the factor analysis, path analysis, model fit analysis and mediating analysis with the structural equation model (SEM). Research findings demonstrated that there were significant effects on multiple paths. Based on the research findings, this study puts forward the implications of theory and practice.

Keywords: social media, switching, social media fatigue, alternative attractiveness

Procedia PDF Downloads 140
25164 Cluster Analysis of Retailers’ Benefits from Their Cooperation with Manufacturers: Business Models Perspective

Authors: M. K. Witek-Hajduk, T. M. Napiórkowski

Abstract:

A number of studies discussed the topic of benefits of retailers-manufacturers cooperation and coopetition. However, there are only few publications focused on the benefits of cooperation and coopetition between retailers and their suppliers of durable consumer goods; especially in the context of business model of cooperating partners. This paper aims to provide a clustering approach to segment retailers selling consumer durables according to the benefits they obtain from their cooperation with key manufacturers and differentiate the said retailers’ in term of the business models of cooperating partners. For the purpose of the study, a survey (with a CATI method) collected data on 603 consumer durables retailers present on the Polish market. Retailers are clustered both, with hierarchical and non-hierarchical methods. Five distinctive groups of consumer durables’ retailers are (based on the studied benefits) identified using the two-stage clustering approach. The clusters are then characterized with a set of exogenous variables, key of which are business models employed by the retailer and its partnering key manufacturer. The paper finds that the a combination of a medium sized retailer classified as an Integrator with a chiefly domestic capital and a manufacturer categorized as a Market Player will yield the highest benefits. On the other side of the spectrum is medium sized Distributor retailer with solely domestic capital – in this case, the business model of the cooperating manufactrer appears to be irreleveant. This paper is the one of the first empirical study using cluster analysis on primary data that defines the types of cooperation between consumer durables’ retailers and manufacturers – their key suppliers. The analysis integrates a perspective of both retailers’ and manufacturers’ business models and matches them with individual and joint benefits.

Keywords: benefits of cooperation, business model, cluster analysis, retailer-manufacturer cooperation

Procedia PDF Downloads 256
25163 Toward an Appropriate Index for Corporate Governance

Authors: Bita Mashayekhi, Farzaneh Jalali, Alemeh Yazdanian

Abstract:

This study contributes to identifying the corporate governance indices in previous researches by using content analysis on relevant papers published in 20 top accounting journals according to Google Scholar ranking, dated from 1990 to 2016. For this purpose, 65 papers are scrutinized deeply, and the concepts of corporate governance are coded and categorized. Then extracted indices are clustered into 10 and 51 categories and subcategories, respectively; and their frequencies are determined. Results show that the board of directors’ characteristics is employed more frequently in reviewed papers, and the board of directors’ independency is the most frequent index within the 97 percent of our sample. Duality, board size, and ownership structure have more frequencies in comparison with other extracted corporate governance indices.

Keywords: corporate governance, content analysis, corporate governance index, top accounting journals

Procedia PDF Downloads 354
25162 Phylogenetic Analyses of Newcastle Disease Virus Isolated from Unvaccinated Chicken Flocks in Kyrgyzstan from 2015 to 2016

Authors: Giang Tran Thi Huong, Hieu Dong Van, Tung Dao Duy, Saadanov Iskender, Isakeev Mairambek, Tsutomu Omatsu, Yukie Katayama, Tetsuya Mizutani, Yuki Ozeki, Yohei Takeda, Haruko Ogawa, Kunitoshi Imai

Abstract:

Newcastle disease virus (NDV) is a contagious viral disease of the poultry industry and other birds throughout the world. At present, very little is known about molecular epidemiological data regarding the causes of ND outbreak in commercial poultry farms in Kyrgyzstan. In the current study, the NDV isolated from the one out of three samples from the unvaccinated flock was confirmed as NDV. Phylogenetic analysis indicated that this NDV strain is clustered in the Class II subgenotype VIId, and closely related to the Chinese NDV isolate. Phylogenetic analyses revealed that the isolated NDV strain has an origin different from the 4 NDV strains previously identified in Kyrgyzstan. According to the mean death time (MDT: 61.1 h) and a multibasic amino acid (aa) sequence at the F0 proteolytic cleavage site (¹¹²R-R-Q-K-R-F¹¹⁷), the NDV isolate was determined as mesogenic strain. Several mutations in the neutralizing epitopes (notably, ³⁴⁷E→K) and the global head were observed in the hemagglutinin-neuraminidase (HN) protein of the current isolate. The present study represents the molecular characterization of the coding gene region of NDV in Kyrgyzstan. Additionally, further study will be investigated on the antigenic characterization using monoclonal antibody.

Keywords: Kyrgyzstan, Newcastle disease, genotype, genome characterization

Procedia PDF Downloads 142
25161 Water Accessibility at Household Levels in Zambia: A Case Study of Fitobaula Settlement

Authors: Emmanuel Sachikumba, Micheal Msoni, Westone Mafuleka

Abstract:

Zambia has a good climate with favourable rainfall pattern; this provides sufficient recharge for the surface and groundwater resources. In spite of the sufficient surface and ground water resources, accessibility to water at household levels is problematic both in quality and quantity. The study examined water accessibility as well as water quality at the household level. The research looked at the sources of water for the households and considered the complications of accessibility to water and the available opportunities therein. The investigation involved fifty households and the data was collected by the use of questionnaires (to assess accessibility) and laboratory tests (for ascertaining water quality). In addition to this, government departments such as the health, agriculture, forestry and education as well as the municipal council were interviewed on the topic under study. The study was descriptive in nature where clustered sampling procedures using simple random methods were utilised to select the households which were to participate in the study. The key findings were that; accessibility to water household levels is still a challenge in the settlement as most of the point sources (shallow wells, the stream and the river) were found to be contaminated. In addition to this, it was found that there was no direct relationship between the economic performance of a household and the accessibility to water. The study also observed that there were opportunities for the people in the settlement as they were increasingly getting into the education system, and adult literacy was being encouraged in the settlement. Furthermore, the settlement has groundwater resources which indicate that there can be sufficient water provision for the settlers.

Keywords: accessibility, household, water, settlement

Procedia PDF Downloads 450
25160 The Nature and the Structure of Scientific and Innovative Collaboration Networks

Authors: Afshin Moazami, Andrea Schiffauerova

Abstract:

The objective of this work is to investigate the development and the role of collaboration networks in the creation of knowledge and innovations in the US and Canada, with a special focus on Quebec. In order to create scientific networks, the data on journal articles were extracted from SCOPUS, and the networks were built based on the co-authorship of the journal papers. For innovation networks, the USPTO database was used, and the networks were built on the patent co-inventorship. Various indicators characterizing the evolution of the network structure and the positions of the researchers and inventors in the networks were calculated. The comparison between the United States, Canada, and Quebec was then carried out. The preliminary results show that the nature of scientific collaboration networks differs from the one seen in innovation networks. Scientists work in bigger teams and are mostly interconnected within one giant network component, whereas the innovation network is much more clustered and fragmented, the inventors work more repetitively with the same partners, often in smaller isolated groups. In both Canada and the US, an increasing tendency towards collaboration was observed, and it was found that networks are getting bigger and more centralized with time. Moreover, a declining share of knowledge transfers per scientist was detected, suggesting an increasing specialization of science. The US collaboration networks tend to be more centralized than the Canadian ones. Quebec shares a lot of features with the Canadian network, but some differences were observed, for example, Quebec inventors rely more on the knowledge transmission through intermediaries.

Keywords: Canada, collaboration, innovation network, scientific network, Quebec, United States

Procedia PDF Downloads 201
25159 Identification of Damage Mechanisms in Interlock Reinforced Composites Using a Pattern Recognition Approach of Acoustic Emission Data

Authors: M. Kharrat, G. Moreau, Z. Aboura

Abstract:

The latest advances in the weaving industry, combined with increasingly sophisticated means of materials processing, have made it possible to produce complex 3D composite structures. Mainly used in aeronautics, composite materials with 3D architecture offer better mechanical properties than 2D reinforced composites. Nevertheless, these materials require a good understanding of their behavior. Because of the complexity of such materials, the damage mechanisms are multiple, and the scenario of their appearance and evolution depends on the nature of the exerted solicitations. The AE technique is a well-established tool for discriminating between the damage mechanisms. Suitable sensors are used during the mechanical test to monitor the structural health of the material. Relevant AE-features are then extracted from the recorded signals, followed by a data analysis using pattern recognition techniques. In order to better understand the damage scenarios of interlock composite materials, a multi-instrumentation was set-up in this work for tracking damage initiation and development, especially in the vicinity of the first significant damage, called macro-damage. The deployed instrumentation includes video-microscopy, Digital Image Correlation, Acoustic Emission (AE) and micro-tomography. In this study, a multi-variable AE data analysis approach was developed for the discrimination between the different signal classes representing the different emission sources during testing. An unsupervised classification technique was adopted to perform AE data clustering without a priori knowledge. The multi-instrumentation and the clustered data served to label the different signal families and to build a learning database. This latter is useful to construct a supervised classifier that can be used for automatic recognition of the AE signals. Several materials with different ingredients were tested under various solicitations in order to feed and enrich the learning database. The methodology presented in this work was useful to refine the damage threshold for the new generation materials. The damage mechanisms around this threshold were highlighted. The obtained signal classes were assigned to the different mechanisms. The isolation of a 'noise' class makes it possible to discriminate between the signals emitted by damages without resorting to spatial filtering or increasing the AE detection threshold. The approach was validated on different material configurations. For the same material and the same type of solicitation, the identified classes are reproducible and little disturbed. The supervised classifier constructed based on the learning database was able to predict the labels of the classified signals.

Keywords: acoustic emission, classifier, damage mechanisms, first damage threshold, interlock composite materials, pattern recognition

Procedia PDF Downloads 155
25158 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 341