Search results for: clustering analysis
28148 Development of a Robust Protein Classifier to Predict EMT Status of Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) Tumors
Authors: ZhenlinJu, Christopher P. Vellano, RehanAkbani, Yiling Lu, Gordon B. Mills
Abstract:
The epithelial–mesenchymal transition (EMT) is a process by which epithelial cells acquire mesenchymal characteristics, such as profound disruption of cell-cell junctions, loss of apical-basolateral polarity, and extensive reorganization of the actin cytoskeleton to induce cell motility and invasion. A hallmark of EMT is its capacity to promote metastasis, which is due in part to activation of several transcription factors and subsequent downregulation of E-cadherin. Unfortunately, current approaches have yet to uncover robust protein marker sets that can classify tumors as possessing strong EMT signatures. In this study, we utilize reverse phase protein array (RPPA) data and consensus clustering methods to successfully classify a subset of cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) tumors into an EMT protein signaling group (EMT group). The overall survival (OS) of patients in the EMT group is significantly worse than those in the other Hormone and PI3K/AKT signaling groups. In addition to a shrinkage and selection method for linear regression (LASSO), we applied training/test set and Monte Carlo resampling approaches to identify a set of protein markers that predicts the EMT status of CESC tumors. We fit a logistic model to these protein markers and developed a classifier, which was fixed in the training set and validated in the testing set. The classifier robustly predicted the EMT status of the testing set with an area under the curve (AUC) of 0.975 by Receiver Operating Characteristic (ROC) analysis. This method not only identifies a core set of proteins underlying an EMT signature in cervical cancer patients, but also provides a tool to examine protein predictors that drive molecular subtypes in other diseases.Keywords: consensus clustering, TCGA CESC, Silhouette, Monte Carlo LASSO
Procedia PDF Downloads 47128147 Microsatellite-Based Genetic Variations and Relationships among Some Farmed Nile Tilapia Populations in Ghana: Implications for Nile Tilapia Culture
Authors: Acheampong Addo, Emmanuel Odartei Armah, Seth Koranteng Agyakwah, Ruby Asmah, Emmanuel Tetteh-Doku Mensah, Rhoda Lims Diyie, Sena Amewu, Catherine Ragasa, Edward Kofi Abban, Mike Yaw Osei-Atweneboana
Abstract:
The study investigated genetic variation and relationships among populations of Nile tilapia cultured in small-scale fish farms in selected regions of Ghana. A total of 700 samples were collected. All samples were screened with five microsatellite markers and results were analyzed using (Genetic Analysis in Excel), (Molecular and Evolutionary Genetic Analysis software, and Genpop on the web for Heterozygosity and Shannon diversity, (Analysis of Molecular Variance), and (Principal Coordinate Analysis). Fish from the 16 populations (made up of 14 farms and 2 selectively bred populations) clustered into three groups: 7 populations clustered with the GIFT-derived strain, 4 populations clustered with the Akosombo strain, and three populations were in a separate cluster. The clustering pattern indicated groups of different strains of Nile tilapia cultured. Mantel correlation test also showed low genetic variations among the 16 populations hence the need to boost seed quality in order to accelerate aquaculture production in Ghana.Keywords: microsatellites, small- scale, Nile tilapia, akosombo strain, GIFT strain
Procedia PDF Downloads 17828146 Performance Evaluation of Various Segmentation Techniques on MRI of Brain Tissue
Authors: U.V. Suryawanshi, S.S. Chowhan, U.V Kulkarni
Abstract:
Accuracy of segmentation methods is of great importance in brain image analysis. Tissue classification in Magnetic Resonance brain images (MRI) is an important issue in the analysis of several brain dementias. This paper portraits performance of segmentation techniques that are used on Brain MRI. A large variety of algorithms for segmentation of Brain MRI has been developed. The objective of this paper is to perform a segmentation process on MR images of the human brain, using Fuzzy c-means (FCM), Kernel based Fuzzy c-means clustering (KFCM), Spatial Fuzzy c-means (SFCM) and Improved Fuzzy c-means (IFCM). The review covers imaging modalities, MRI and methods for noise reduction and segmentation approaches. All methods are applied on MRI brain images which are degraded by salt-pepper noise demonstrate that the IFCM algorithm performs more robust to noise than the standard FCM algorithm. We conclude with a discussion on the trend of future research in brain segmentation and changing norms in IFCM for better results.Keywords: image segmentation, preprocessing, MRI, FCM, KFCM, SFCM, IFCM
Procedia PDF Downloads 33628145 The Data Quality Model for the IoT based Real-time Water Quality Monitoring Sensors
Authors: Rabbia Idrees, Ananda Maiti, Saurabh Garg, Muhammad Bilal Amin
Abstract:
IoT devices are the basic building blocks of IoT network that generate enormous volume of real-time and high-speed data to help organizations and companies to take intelligent decisions. To integrate this enormous data from multisource and transfer it to the appropriate client is the fundamental of IoT development. The handling of this huge quantity of devices along with the huge volume of data is very challenging. The IoT devices are battery-powered and resource-constrained and to provide energy efficient communication, these IoT devices go sleep or online/wakeup periodically and a-periodically depending on the traffic loads to reduce energy consumption. Sometime these devices get disconnected due to device battery depletion. If the node is not available in the network, then the IoT network provides incomplete, missing, and inaccurate data. Moreover, many IoT applications, like vehicle tracking and patient tracking require the IoT devices to be mobile. Due to this mobility, If the distance of the device from the sink node become greater than required, the connection is lost. Due to this disconnection other devices join the network for replacing the broken-down and left devices. This make IoT devices dynamic in nature which brings uncertainty and unreliability in the IoT network and hence produce bad quality of data. Due to this dynamic nature of IoT devices we do not know the actual reason of abnormal data. If data are of poor-quality decisions are likely to be unsound. It is highly important to process data and estimate data quality before bringing it to use in IoT applications. In the past many researchers tried to estimate data quality and provided several Machine Learning (ML), stochastic and statistical methods to perform analysis on stored data in the data processing layer, without focusing the challenges and issues arises from the dynamic nature of IoT devices and how it is impacting data quality. A comprehensive review on determining the impact of dynamic nature of IoT devices on data quality is done in this research and presented a data quality model that can deal with this challenge and produce good quality of data. This research presents the data quality model for the sensors monitoring water quality. DBSCAN clustering and weather sensors are used in this research to make data quality model for the sensors monitoring water quality. An extensive study has been done in this research on finding the relationship between the data of weather sensors and sensors monitoring water quality of the lakes and beaches. The detailed theoretical analysis has been presented in this research mentioning correlation between independent data streams of the two sets of sensors. With the help of the analysis and DBSCAN, a data quality model is prepared. This model encompasses five dimensions of data quality: outliers’ detection and removal, completeness, patterns of missing values and checks the accuracy of the data with the help of cluster’s position. At the end, the statistical analysis has been done on the clusters formed as the result of DBSCAN, and consistency is evaluated through Coefficient of Variation (CoV).Keywords: clustering, data quality, DBSCAN, and Internet of things (IoT)
Procedia PDF Downloads 14428144 Remote Assessment and Change Detection of GreenLAI of Cotton Crop Using Different Vegetation Indices
Authors: Ganesh B. Shinde, Vijaya B. Musande
Abstract:
Cotton crop identification based on the timely information has significant advantage to the different implications of food, economic and environment. Due to the significant advantages, the accurate detection of cotton crop regions using supervised learning procedure is challenging problem in remote sensing. Here, classifiers on the direct image are played a major role but the results are not much satisfactorily. In order to further improve the effectiveness, variety of vegetation indices are proposed in the literature. But, recently, the major challenge is to find the better vegetation indices for the cotton crop identification through the proposed methodology. Accordingly, fuzzy c-means clustering is combined with neural network algorithm, trained by Levenberg-Marquardt for cotton crop classification. To experiment the proposed method, five LISS-III satellite images was taken and the experimentation was done with six vegetation indices such as Simple Ratio, Normalized Difference Vegetation Index, Enhanced Vegetation Index, Green Atmospherically Resistant Vegetation Index, Wide-Dynamic Range Vegetation Index, Green Chlorophyll Index. Along with these indices, Green Leaf Area Index is also considered for investigation. From the research outcome, Green Atmospherically Resistant Vegetation Index outperformed with all other indices by reaching the average accuracy value of 95.21%.Keywords: Fuzzy C-Means clustering (FCM), neural network, Levenberg-Marquardt (LM) algorithm, vegetation indices
Procedia PDF Downloads 32428143 Occurrence of Porcine circovirus Type 2 in Pigs of Eastern Cape Province South Africa
Authors: Kayode O. Afolabi, Benson C. Iweriebor, Anthony I. Okoh, Larry C. Obi
Abstract:
Porcine circovirus type 2 (PCV2) is the major etiological viral agent of porcine multisystemic wasting syndrome (PWMS) and other porcine circovirus-associated diseases (PCVAD) of great economic importance in pig industry globally. In an effort to determine the status of swine herds in the Province as regarding the ‘small but powerful’ viral pathogen; a total of 375 blood, faecal and nasal swab samples were obtained from seven pig farms (commercial and communal) in Amathole, O.R. Tambo and Chris-Hani District Municipalities of Eastern Cape Province between the year 2015 and 2016. Three hundred and thirty nine (339) samples out of the total sample were subjected to molecular screening using PCV2 specific primers by conventional polymerase chain reaction (PCR). Selected sequences were further analyzed and confirmed through genome sequencing and phylogenetic analyses. The data obtained revealed that 15.93% of the screened samples (54/339) from the swine herds of the studied areas were positive for PCV2; while the severity of occurrence of the viral pathogen as observed at farm level ranges from approximately 5.6% to 60% in the studied farms. The Majority, precisely 15 out of 17 (88%) analyzed sequences were found clustering with other PCV2b reference strains in the phylogenetic analysis. More interestingly, two other sequences obtained were also found clustering within PCV2d genogroup, which is presently another fast-spreading genotype with observable higher virulence in global swine herds. This finding confirmed the presence of this all-important viral pathogen in pigs of the region; which could result in a serious outbreak of PCVAD and huge economic loss at the instances of triggering factors if no appropriate measures are taken to curb its spread effectively.Keywords: pigs, polymerase chain reaction, porcine circovirus type 2, South Africa
Procedia PDF Downloads 21328142 Genetic Structuring of Four Tectona grandis L. F. Seed Production Areas in Southern India
Authors: P. M. Sreekanth
Abstract:
Teak (Tectona grandis L. f.) is a tree species indigenous to India and other Southeastern countries. It produces high-value timber and is easily established in plantations. Reforestation requires a constant supply of high quality seeds. Seed Production Areas (SPA) of teak are improved stands used for collection of open-pollinated quality seeds in large quantities. Information on the genetic diversity of major teak SPAs in India is scanty. The genetic structure of four important seed production areas of Kerala State in Southern India was analyzed employing amplified fragment length polymorphism markers using ten selective primer combinations on 80 samples (4 populations X 20 trees). The study revealed that the gene diversity of the SPAs varied from 0.169 (Konni SPA) to 0.203 (Wayanad SPA). The percentage of polymorphic loci ranged from 74.42 (Parambikulam SPA) to 84.06 (Konni SPA). The mean total gene diversity index (HT) of all the four SPAs was 0.2296 ±0.02. A high proportion of genetic diversity was observed within the populations (83%) while diversity between populations was lower (17%) (GST = 0.17). Principal coordinate analysis and STRUCTURE analysis of the genotypes indicated that the pattern of clustering was in accordance with the origin and geographic location of SPAs, indicating specific identity of each population. A UPGMA dendrogram was prepared and showed that all the twenty samples from each of Konni and Parambikulam SPAs clustered into two separate groups, respectively. However, five Nilambur genotypes and one Wayanad genotype intruded into the Konni cluster. The higher gene flow estimated (Nm = 2.4) reflected the inclusion of Konni origin planting stock in the Nilambur and Wayanad plantations. Evidence for population structure investigated using 3D Principal Coordinate Analysis of FAMD software 1.30 indicated that the pattern of clustering was in accordance with the origin of SPAs. The present study showed that assessment of genetic diversity in seed production plantations can be achieved using AFLP markers. The AFLP fingerprinting was also capable of identifying the geographical origin of planting stock and there by revealing the occurrence of the errors in genotype labeling. Molecular marker-based selective culling of genetically similar trees from a stand so as to increase the genetic base of seed production areas could be a new proposition to improve quality of seeds required for raising commercial plantations of teak. The technique can also be used to assess the genetic diversity status of plus trees within provenances during their selection for raising clonal seed orchards for assuring the quality of seeds available for raising future plantations.Keywords: AFLP, genetic structure, spa, teak
Procedia PDF Downloads 31028141 Changes in Geospatial Structure of Households in the Czech Republic: Findings from Population and Housing Census
Authors: Jaroslav Kraus
Abstract:
Spatial information about demographic processes are a standard part of outputs in the Czech Republic. That was also the case of Population and Housing Census which was held on 2011. This is a starting point for a follow up study devoted to two basic types of households: single person households and households of one completed family. Single person households and one family households create more than 80 percent of all households, but the share and spatial structure is in long-term changing. The increase of single households is results of long-term fertility decrease and divorce increase, but also possibility of separate living. There are regions in the Czech Republic with traditional demographic behavior, and regions like capital Prague and some others with changing pattern. Population census is based - according to international standards - on the concept of currently living population. Three types of geospatial approaches will be used for analysis: (i) firstly measures of geographic distribution, (ii) secondly mapping clusters to identify the locations of statistically significant hot spots, cold spots, spatial outliers, and similar features and (iii) finally analyzing pattern approach as a starting point for more in-depth analyses (geospatial regression) in the future will be also applied. For analysis of this type of data, number of households by types should be distinct objects. All events in a meaningful delimited study region (e.g. municipalities) will be included in an analysis. Commonly produced measures of central tendency and spread will include: identification of the location of the center of the point set (by NUTS3 level); identification of the median center and standard distance, weighted standard distance and standard deviational ellipses will be also used. Identifying that clustering exists in census households datasets does not provide a detailed picture of the nature and pattern of clustering but will be helpful to apply simple hot-spot (and cold spot) identification techniques to such datasets. Once the spatial structure of households will be determined, any particular measure of autocorrelation can be constructed by defining a way of measuring the difference between location attribute values. The most widely used measure is Moran’s I that will be applied to municipal units where numerical ratio is calculated. Local statistics arise naturally out of any of the methods for measuring spatial autocorrelation and will be applied to development of localized variants of almost any standard summary statistic. Local Moran’s I will give an indication of household data homogeneity and diversity on a municipal level.Keywords: census, geo-demography, households, the Czech Republic
Procedia PDF Downloads 10128140 Genetic Diversity in Capsicum Germplasm Based on Inter Simple Sequence Repeat Markers
Authors: Siwapech Silapaprayoon, Januluk Khanobdee, Sompid Samipak
Abstract:
Chili peppers are the fruits of Capsicum pepper plants well known for their fiery burning sensation on the tongue after consumption. They are members of the Solanaceae or common nightshade family along with potato, tomato and eggplant. Thai cuisine has gained popularity for its distinct flavors due to usages of various spices and its heat from the addition of chili pepper. Though being used in little quantity for each dish, chili pepper holds a special place in Thai cuisine. There are many varieties of chili peppers in Thailand, and thirty accessions were collected at Rajamangala University of Technology Lanna, Lampang, Thailand. To effectively manage any germplasm it is essential to know the diversity and relationships among members. Thirty-six Inter Simple Sequence Repeat (ISSRs) DNA markers were used to analyze the germplasm. Total of 335 polymorphic bands was obtained giving the average of 9.3 alleles per marker. Unweighted pair-group mean arithmetic method (UPGMA) clustering of data using NTSYS-pc software indicated that the accessions showed varied levels of genetic similarity ranging from 0.57-1.00 similarity coefficient index indicating significant levels of variation. At SM coefficient of 0.81, the germplasm was separated into four groups. Phenotypic variation was discussed in context of phylogenetic tree clustering.Keywords: diversity, germplasm, Chili pepper, ISSR
Procedia PDF Downloads 15328139 Study of Mobile Game Addiction Using Electroencephalography Data Analysis
Authors: Arsalan Ansari, Muhammad Dawood Idrees, Maria Hafeez
Abstract:
Use of mobile phones has been increasing considerably over the past decade. Currently, it is one of the main sources of communication and information. Initially, mobile phones were limited to calls and messages, but with the advent of new technology smart phones were being used for many other purposes including video games. Despite of positive outcomes, addiction to video games on mobile phone has become a leading cause of psychological and physiological problems among many people. Several researchers examined the different aspects of behavior addiction with the use of different scales. Objective of this study is to examine any distinction between mobile game addicted and non-addicted players with the use of electroencephalography (EEG), based upon psycho-physiological indicators. The mobile players were asked to play a mobile game and EEG signals were recorded by BIOPAC equipment with AcqKnowledge as data acquisition software. Electrodes were places, following the 10-20 system. EEG was recorded at sampling rate of 200 samples/sec (12,000samples/min). EEG recordings were obtained from the frontal (Fp1, Fp2), parietal (P3, P4), and occipital (O1, O2) lobes of the brain. The frontal lobe is associated with behavioral control, personality, and emotions. The parietal lobe is involved in perception, understanding logic, and arithmetic. The occipital lobe plays a role in visual tasks. For this study, a 60 second time window was chosen for analysis. Preliminary analysis of the signals was carried out with Acqknowledge software of BIOPAC Systems. From the survey based on CGS manual study 2010, it was concluded that five participants out of fifteen were in addictive category. This was used as prior information to group the addicted and non-addicted by physiological analysis. Statistical analysis showed that by applying clustering analysis technique authors were able to categorize the addicted and non-addicted players specifically on theta frequency range of occipital area.Keywords: mobile game, addiction, psycho-physiology, EEG analysis
Procedia PDF Downloads 16928138 Semantic Based Analysis in Complaint Management System with Analytics
Authors: Francis Alterado, Jennifer Enriquez
Abstract:
Semantic Based Analysis in Complaint Management System with Analytics is an enhanced tool of providing complaints by the clients as well as a mechanism for Palawan Polytechnic College to gather, process, and monitor status of these complaints. The study has a mobile application that serves as a remote facility of communication between the students and the school management on the issues encountered by the student and the solution of every complaint received. In processing the complaints, text mining and clustering algorithms were utilized. Every module of the systems was tested and based on the results; these are 100% free from error before integration was done. A system testing was also done by checking the expected functionality of the system which was 100% functional. The system was tested by 10 students by forwarding complaints to 10 departments. Based on results, the students were able to submit complaints, the system was able to process accordingly by identifying to which department the complaints are intended, and the concerned department was able to give feedback on the complaint received to the student. With this, the system gained 4.7 rating which means Excellent.Keywords: technology adoption, emerging technology, issues challenges, algorithm, text mining, mobile technology
Procedia PDF Downloads 20328137 Institutional Segmantation and Country Clustering: Implications for Multinational Enterprises Over Standardized Management
Authors: Jung-Hoon Han, Jooyoung Kwak
Abstract:
Distances between cultures, institutions are gaining academic attention once again since the classical debate on the validity of globalization. Despite the incessant efforts to define international segments with various concepts, no significant attempts have been made considering the institutional dimensions. Resource-based theory and institutional theory provides useful insights in assessing market environment and understanding when and how MNEs loose or gain advantages. This study consists of two parts: identifying institutional clusters and predicting the effect of MNEs’ origin on the applicability of competitive advantages. MNEs in one country cluster are expected to use similar management systems.Keywords: institutional theory, resource-based theory, institutional environment, cultural dimensions, cluster analysis, standardized management
Procedia PDF Downloads 49128136 High-Risk Gene Variant Profiling Models Ethnic Disparities in Diabetes Vulnerability
Authors: Jianhua Zhang, Weiping Chen, Guanjie Chen, Jason Flannick, Emma Fikse, Glenda Smerin, Yanqin Yang, Yulong Li, John A. Hanover, William F. Simonds
Abstract:
Ethnic disparities in many diseases are well recognized and reflect the consequences of genetic, behavior, and environmental factors. However, direct scientific evidence connecting the ethnic genetic variations and the disease disparities has been elusive, which may have led to the ethnic inequalities in large scale genetic studies. Through the genome-wide analysis of data representing 185,934 subjects, including 14,955 from our own studies of the African America Diabetes Mellitus, we discovered sets of genetic variants either unique to or conserved in all ethnicities. We further developed a quantitative gene function-based high-risk variant index (hrVI) of 20,428 genes to establish profiles that strongly correlate with the subjects' self-identified ethnicities. With respect to the ability to detect human essential and pathogenic genes, the hrVI analysis method is both comparable with and complementary to the well-known genetic analysis methods, pLI and VIRlof. Application of the ethnicity-specific hrVI analysis to the type 2 diabetes mellitus (T2DM) national repository, containing 20,791 cases and 24,440 controls, identified 114 candidate T2DM-associated genes, 8.8-fold greater than that of ethnicity-blind analysis. All the genes identified are defined as either pathogenic or likely-pathogenic in ClinVar database, with 33.3% diabetes-associated and 54.4% obesity-associated genes. These results demonstrate the utility of hrVI analysis and provide the first genetic evidence by clustering patterns of how genetic variations among ethnicities may impede the discovery of diabetes and foreseeably other disease-associated genes.Keywords: diabetes-associated genes, ethnic health disparities, high-risk variant index, hrVI, T2DM
Procedia PDF Downloads 14028135 On Musical Information Geometry with Applications to Sonified Image Analysis
Authors: Shannon Steinmetz, Ellen Gethner
Abstract:
In this paper, a theoretical foundation is developed for patterned segmentation of audio using the geometry of music and statistical manifold. We demonstrate image content clustering using conic space sonification. The algorithm takes a geodesic curve as a model estimator of the three-parameter Gamma distribution. The random variable is parameterized by musical centricity and centric velocity. Model parameters predict audio segmentation in the form of duration and frame count based on the likelihood of musical geometry transition. We provide an example using a database of randomly selected images, resulting in statistically significant clusters of similar image content.Keywords: sonification, musical information geometry, image, content extraction, automated quantification, audio segmentation, pattern recognition
Procedia PDF Downloads 24728134 Scientific Linux Cluster for BIG-DATA Analysis (SLBD): A Case of Fayoum University
Authors: Hassan S. Hussein, Rania A. Abul Seoud, Amr M. Refaat
Abstract:
Scientific researchers face in the analysis of very large data sets that is increasing noticeable rate in today’s and tomorrow’s technologies. Hadoop and Spark are types of software that developed frameworks. Hadoop framework is suitable for many Different hardware platforms. In this research, a scientific Linux cluster for Big Data analysis (SLBD) is presented. SLBD runs open source software with large computational capacity and high performance cluster infrastructure. SLBD composed of one cluster contains identical, commodity-grade computers interconnected via a small LAN. SLBD consists of a fast switch and Gigabit-Ethernet card which connect four (nodes). Cloudera Manager is used to configure and manage an Apache Hadoop stack. Hadoop is a framework allows storing and processing big data across the cluster by using MapReduce algorithm. MapReduce algorithm divides the task into smaller tasks which to be assigned to the network nodes. Algorithm then collects the results and form the final result dataset. SLBD clustering system allows fast and efficient processing of large amount of data resulting from different applications. SLBD also provides high performance, high throughput, high availability, expandability and cluster scalability.Keywords: big data platforms, cloudera manager, Hadoop, MapReduce
Procedia PDF Downloads 36328133 Designing Floor Planning in 2D and 3D with an Efficient Topological Structure
Authors: V. Nagammai
Abstract:
Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining thousands of transistors into a single chip. Development of technology increases the complexity in IC manufacturing which may vary the power consumption, increase the size and latency period. Topology defines a number of connections between network. In this project, NoC topology is generated using atlas tool which will increase performance in turn determination of constraints are effective. The routing is performed by XY routing algorithm and wormhole flow control. In NoC topology generation, the value of power, area and latency are predetermined. In previous work, placement, routing and shortest path evaluation is performed using an algorithm called floor planning with cluster reconstruction and path allocation algorithm (FCRPA) with the account of 4 3x3 switch, 6 4x4 switch, and 2 5x5 switches. The usage of the 4x4 and 5x5 switch will increase the power consumption and area of the block. In order to avoid the problem, this paper has used one 8x8 switch and 4 3x3 switches. This paper uses IPRCA which of 3 steps they are placement, clustering, and shortest path evaluation. The placement is performed using min – cut placement and clustering are performed using an algorithm called cluster generation. The shortest path is evaluated using an algorithm called Dijkstra's algorithm. The power consumption of each block is determined. The experimental result shows that the area, power, and wire length improved simultaneously.Keywords: application specific noc, b* tree representation, floor planning, t tree representation
Procedia PDF Downloads 39628132 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data
Authors: Haifa Ben Saber, Mourad Elloumi
Abstract:
In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.
Procedia PDF Downloads 37428131 Short Association Bundle Atlas for Lateralization Studies from dMRI Data
Authors: C. Román, M. Guevara, P. Salas, D. Duclap, J. Houenou, C. Poupon, J. F. Mangin, P. Guevara
Abstract:
Diffusion Magnetic Resonance Imaging (dMRI) allows the non-invasive study of human brain white matter. From diffusion data, it is possible to reconstruct fiber trajectories using tractography algorithms. Our previous work consists in an automatic method for the identification of short association bundles of the superficial white matter (SWM), based on a whole brain inter-subject hierarchical clustering applied to a HARDI database. The method finds representative clusters of similar fibers, belonging to a group of subjects, according to a distance measure between fibers, using a non-linear registration (DTI-TK). The algorithm performs an automatic labeling based on the anatomy, defined by a cortex mesh parcelated with FreeSurfer software. The clustering was applied to two independent groups of 37 subjects. The clusters resulting from both groups were compared using a restrictive threshold of mean distance between each pair of bundles from different groups, in order to keep reproducible connections. In the left hemisphere, 48 reproducible bundles were found, while 43 bundles where found in the right hemisphere. An inter-hemispheric bundle correspondence was then applied. The symmetric horizontal reflection of the right bundles was calculated, in order to obtain the position of them in the left hemisphere. Next, the intersection between similar bundles was calculated. The pairs of bundles with a fiber intersection percentage higher than 50% were considered similar. The similar bundles between both hemispheres were fused and symmetrized. We obtained 30 common bundles between hemispheres. An atlas was created with the resulting bundles and used to segment 78 new subjects from another HARDI database, using a distance threshold between 6-8 mm according to the bundle length. Finally, a laterality index was calculated based on the bundle volume. Seven bundles of the atlas presented right laterality (IP_SP_1i, LO_LO_1i, Op_Tr_0i, PoC_PoC_0i, PoC_PreC_2i, PreC_SM_0i, y RoMF_RoMF_0i) and one presented left laterality (IP_SP_2i), there is no tendency of lateralization according to the brain region. Many factors can affect the results, like tractography artifacts, subject registration, and bundle segmentation. Further studies are necessary in order to establish the influence of these factors and evaluate SWM laterality.Keywords: dMRI, hierarchical clustering, lateralization index, tractography
Procedia PDF Downloads 33328130 Investigation and Analysis of Residential Building Energy End-Use Profile in Hot and Humid Area with Reference to Zhuhai City in China
Authors: Qingqing Feng, S. Thomas Ng, Frank Xu
Abstract:
Energy consumption in domestic sector has been increasing rapidly in China all along these years. Confronted with environmental challenges, the international society has made a concerted effort by setting the Paris Agreement, the Sustainable Development Goals, and the New Urban Agenda. Thus it’s very important for China to put forward reasonable countermeasures to boost building energy conservation which necessitates looking into the actuality of residential energy end-use profile and its influence factors. In this study, questionnaire surveys have been conducted in Zhuhai city in China, a typical city in hot summer warm winter climate zone. The data solicited mainly include the occupancy schedule, building’s information, residents’ information, household energy uses, the type, quantity and use patterns of appliances and occupants’ satisfaction. Over 200 valid samples have been collected through face-to-face interviews. Descriptive analysis, clustering analysis, correlation analysis and sensitivity analysis were then conducted on the dataset to understand the energy end-use profile. The findings identify: 1) several typical clusters of occupancy patterns and appliances utilization patterns; 2) the top three sensitive factors influencing energy consumption; 3) the correlations between satisfaction and energy consumption. For China with many different climates zones, it’s difficult to find a silver bullet on energy conservation. The aim of this paper is to provide a theoretical basis for multi-stakeholders including policy makers, residents, and academic communities to formulate reasonable energy saving blueprints for hot and humid urban residential buildings in China.Keywords: residential building, energy end-use profile, questionnaire survey, sustainability
Procedia PDF Downloads 13728129 Analysis Of Non-uniform Characteristics Of Small Underwater Targets Based On Clustering
Authors: Tianyang Xu
Abstract:
Small underwater targets generally have a non-centrosymmetric geometry, and the acoustic scattering field of the target has spatial inhomogeneity under active sonar detection conditions. In view of the above problems, this paper takes the hemispherical cylindrical shell as the research object, and considers the angle continuity implied in the echo characteristics, and proposes a cluster-driven research method for the non-uniform characteristics of target echo angle. First, the target echo features are extracted, and feature vectors are constructed. Secondly, the t-SNE algorithm is used to improve the internal connection of the feature vector in the low-dimensional feature space and to construct the visual feature space. Finally, the implicit angular relationship between echo features is extracted under unsupervised condition by cluster analysis. The reconstruction results of the local geometric structure of the target corresponding to different categories show that the method can effectively divide the angle interval of the local structure of the target according to the natural acoustic scattering characteristics of the target.Keywords: underwater target;, non-uniform characteristics;, cluster-driven method;, acoustic scattering characteristics
Procedia PDF Downloads 14028128 A Robust Spatial Feature Extraction Method for Facial Expression Recognition
Authors: H. G. C. P. Dinesh, G. Tharshini, M. P. B. Ekanayake, G. M. R. I. Godaliyadda
Abstract:
This paper presents a new spatial feature extraction method based on principle component analysis (PCA) and Fisher Discernment Analysis (FDA) for facial expression recognition. It not only extracts reliable features for classification, but also reduces the feature space dimensions of pattern samples. In this method, first each gray scale image is considered in its entirety as the measurement matrix. Then, principle components (PCs) of row vectors of this matrix and variance of these row vectors along PCs are estimated. Therefore, this method would ensure the preservation of spatial information of the facial image. Afterwards, by incorporating the spectral information of the eigen-filters derived from the PCs, a feature vector was constructed, for a given image. Finally, FDA was used to define a set of basis in a reduced dimension subspace such that the optimal clustering is achieved. The method of FDA defines an inter-class scatter matrix and intra-class scatter matrix to enhance the compactness of each cluster while maximizing the distance between cluster marginal points. In order to matching the test image with the training set, a cosine similarity based Bayesian classification was used. The proposed method was tested on the Cohn-Kanade database and JAFFE database. It was observed that the proposed method which incorporates spatial information to construct an optimal feature space outperforms the standard PCA and FDA based methods.Keywords: facial expression recognition, principle component analysis (PCA), fisher discernment analysis (FDA), eigen-filter, cosine similarity, bayesian classifier, f-measure
Procedia PDF Downloads 43228127 The Survey Research and Evaluation of Green Residential Building Based on the Improved Group Analytical Hierarchy Process Method in Yinchuan
Abstract:
Due to the economic downturn and the deterioration of the living environment, the development of residential buildings as high energy consuming building is gradually changing from “extensive” to green building in China. So, the evaluation system of green building is continuously improved, but the current evaluation work has the following problems: (1) There are differences in the cost of the actual investment and the purchasing power of residents, also construction target of green residential building is single and lacks multi-objective performance development. (2) Green building evaluation lacks regional characteristics and cannot reflect the different regional residents demand. (3) In the process of determining the criteria weight, the experts’ judgment matrix is difficult to meet the requirement of consistency. Therefore, to solve those problems, questionnaires which are about the green residential building for Ningxia area are distributed, and the results of questionnaires can feedback the purchasing power of residents and the acceptance of the green building cost. Secondly, combined with the geographical features of Ningxia minority areas, the evaluation criteria system of green residential building is constructed. Finally, using the improved group AHP method and the grey clustering method, the criteria weight is determined, and a real case is evaluated, which is located in Xing Qing district, Ningxia. A conclusion can be obtained that the professional evaluation for this project and good social recognition is basically the same.Keywords: evaluation, green residential building, grey clustering method, group AHP
Procedia PDF Downloads 39928126 Longevity of Soybean Seeds Submitted to Different Mechanized Harvesting Conditions
Authors: Rute Faria, Digo Moraes, Amanda Santos, Dione Morais, Maria Sartori
Abstract:
Seed vigor is a fundamental component for the good performance of the entire soybean production process. Seeds with mechanical damage at harvest time will be more susceptible to fungal and insect attack during storage, which will invariably reduce their vigor to the field, compromising uniformity and final stand performance. Harvesters, even the most modern ones, when not properly regulated or operated, can cause irreversible damages to the seeds, compromising even their commercialization. Therefore, the control of an efficient harvest is necessary in order to guarantee a good quality final product. In this work, the damage caused by two different harvesters (one rented, and another one) was evaluated, traveling in two speeds (4 and 8 km / h). The design was completely randomized in 2 x 2 factorial, with four replications. To evaluate the physiological quality seed germination and vigor tests were carried out over a period of six months. A multivariate analysis of Principal Components (PCA) and clustering allowed us to verify that the leased machine had better performance in the incidence of immediate damages in the seeds, but after a storage period of 6 months the vigor of these seeds reduced more than own machine evidencing that such a machine would bring more damages to the seeds.Keywords: Glycine max (L.), cluster analysis, PCA, vigor
Procedia PDF Downloads 26328125 Global Low Carbon Transitions in the Power Sector: A Machine Learning Archetypical Clustering Approach
Authors: Abdullah Alotaiq, David Wallom, Malcolm McCulloch
Abstract:
This study presents an archetype-based approach to designing effective strategies for low-carbon transitions in the power sector. To achieve global energy transition goals, a renewable energy transition is critical, and understanding diverse energy landscapes across different countries is essential to design effective renewable energy policies and strategies. Using a clustering approach, this study identifies 12 energy archetypes based on the electricity mix, socio-economic indicators, and renewable energy contribution potential of 187 UN countries. Each archetype is characterized by distinct challenges and opportunities, ranging from high dependence on fossil fuels to low electricity access, low economic growth, and insufficient contribution potential of renewables. Archetype A, for instance, consists of countries with low electricity access, high poverty rates, and limited power infrastructure, while Archetype J comprises developed countries with high electricity demand and installed renewables. The study findings have significant implications for renewable energy policymaking and investment decisions, with policymakers and investors able to use the archetype approach to identify suitable renewable energy policies and measures and assess renewable energy potential and risks. Overall, the archetype approach provides a comprehensive framework for understanding diverse energy landscapes and accelerating decarbonisation of the power sector.Keywords: fossil fuels, power plants, energy transition, renewable energy, archetypes
Procedia PDF Downloads 5828124 Logistic Model Tree and Expectation-Maximization for Pollen Recognition and Grouping
Authors: Endrick Barnacin, Jean-Luc Henry, Jack Molinié, Jimmy Nagau, Hélène Delatte, Gérard Lebreton
Abstract:
Palynology is a field of interest for many disciplines. It has multiple applications such as chronological dating, climatology, allergy treatment, and even honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time-consuming task that requires the intervention of experts in the field, which is becoming increasingly rare due to economic and social conditions. So, the automation of this task is a necessity. Pollen slides analysis is mainly a visual process as it is carried out with the naked eye. That is the reason why a primary method to automate palynology is the use of digital image processing. This method presents the lowest cost and has relatively good accuracy in pollen retrieval. In this work, we propose a system combining recognition and grouping of pollen. It consists of using a Logistic Model Tree to classify pollen already known by the proposed system while detecting any unknown species. Then, the unknown pollen species are divided using a cluster-based approach. Success rates for the recognition of known species have been achieved, and automated clustering seems to be a promising approach.Keywords: pollen recognition, logistic model tree, expectation-maximization, local binary pattern
Procedia PDF Downloads 18528123 Recognition and Counting Algorithm for Sub-Regional Objects in a Handwritten Image through Image Sets
Authors: Kothuri Sriraman, Mattupalli Komal Teja
Abstract:
In this paper, a novel algorithm is proposed for the recognition of hulls in a hand written images that might be irregular or digit or character shape. Identification of objects and internal objects is quite difficult to extract, when the structure of the image is having bulk of clusters. The estimation results are easily obtained while going through identifying the sub-regional objects by using the SASK algorithm. Focusing mainly to recognize the number of internal objects exist in a given image, so as it is shadow-free and error-free. The hard clustering and density clustering process of obtained image rough set is used to recognize the differentiated internal objects, if any. In order to find out the internal hull regions it involves three steps pre-processing, Boundary Extraction and finally, apply the Hull Detection system. By detecting the sub-regional hulls it can increase the machine learning capability in detection of characters and it can also be extend in order to get the hull recognition even in irregular shape objects like wise black holes in the space exploration with their intensities. Layered hulls are those having the structured layers inside while it is useful in the Military Services and Traffic to identify the number of vehicles or persons. This proposed SASK algorithm is helpful in making of that kind of identifying the regions and can useful in undergo for the decision process (to clear the traffic, to identify the number of persons in the opponent’s in the war).Keywords: chain code, Hull regions, Hough transform, Hull recognition, Layered Outline Extraction, SASK algorithm
Procedia PDF Downloads 35328122 Human Behavior Modeling in Video Surveillance of Conference Halls
Authors: Nour Charara, Hussein Charara, Omar Abou Khaled, Hani Abdallah, Elena Mugellini
Abstract:
In this paper, we present a human behavior modeling approach in videos scenes. This approach is used to model the normal behaviors in the conference halls. We exploited the Probabilistic Latent Semantic Analysis technique (PLSA), using the 'Bag-of-Terms' paradigm, as a tool for exploring video data to learn the model by grouping similar activities. Our term vocabulary consists of 3D spatio-temporal patch groups assigned by the direction of motion. Our video representation ensures the spatial information, the object trajectory, and the motion. The main importance of this approach is that it can be adapted to detect abnormal behaviors in order to ensure and enhance human security.Keywords: activity modeling, clustering, PLSA, video representation
Procedia PDF Downloads 39728121 Using Two-Mode Network to Access the Connections of Film Festivals
Authors: Qiankun Zhong
Abstract:
In a global cultural context, film festival awards become authorities to define the aesthetic value of films. To study which genres and producing countries are valued by different film festivals and how those evaluations interact with each other, this research explored the interactions between the film festivals through their selection of movies and the factors that lead to the tendency of film festivals to nominate the same movies. To do this, the author employed a two-mode network on the movies that won the highest awards at five international film festivals with the highest attendance in the past ten years (the Venice Film Festival, the Cannes Film Festival, the Toronto International Film Festival, Sundance Film Festival, and the Berlin International Film Festival) and the film festivals that nominated those movies. The title, genre, producing country and language of 50 movies, and the range (regional, national or international) and organizing country or area of 129 film festivals were collected. These created networks connected by nominating the same films and awarding the same movies. The author then assessed the density and centrality of these networks to answer the question: What are the film festivals that tend to have more shared values with other festivals? Based on the Eigenvector centrality of the two-mode network, Palm Springs, Robert Festival, Toronto, Chicago, and San Sebastian are the festivals that tend to nominate commonly appreciated movies. In contrast, Black Movie Film Festival has the unique value of generally not sharing nominations with other film festivals. A homophily test was applied to access the clustering effects of film and film festivals. The result showed that movie genres (E-I index=0.55) and geographic location (E-I index=0.35) are possible indicators of film festival clustering. A blockmodel was also created to examine the structural roles of the film festivals and their meaning in real-world context. By analyzing the same blocks with film festival attributes, it was identified that film festivals either organized in the same area, with the same history, or with the same attitude on independent films would occupy the same structural roles in the network. Through the interpretation of the blocks, language was identified as an indicator that contributes to the role position of a film festival. Comparing the result of blockmodeling in the different periods, it is seen that international film festivals contrast with the Hollywood industry’s dominant value. The structural role dynamics provide evidence for a multi-value film festival network.Keywords: film festivals, film studies, media industry studies, network analysis
Procedia PDF Downloads 32128120 Data-Driven Market Segmentation in Hospitality Using Unsupervised Machine Learning
Authors: Rik van Leeuwen, Ger Koole
Abstract:
Within hospitality, marketing departments use segmentation to create tailored strategies to ensure personalized marketing. This study provides a data-driven approach by segmenting guest profiles via hierarchical clustering based on an extensive set of features. The industry requires understandable outcomes that contribute to adaptability for marketing departments to make data-driven decisions and ultimately driving profit. A marketing department specified a business question that guides the unsupervised machine learning algorithm. Features of guests change over time; therefore, there is a probability that guests transition from one segment to another. The purpose of the study is to provide steps in the process from raw data to actionable insights, which serve as a guideline for how hospitality companies can adopt an algorithmic approach.Keywords: hierarchical cluster analysis, hospitality, market segmentation
Procedia PDF Downloads 11128119 A Review of Blog Assisted Language Learning Research: Based on Bibliometric Analysis
Authors: Bo Ning Lyu
Abstract:
Blog assisted language learning (BALL) has been trialed by educators in language teaching with the development of Web 2.0 technology. Understanding the development trend of related research helps grasp the whole picture of the use of blog in language education. This paper reviews current research related to blogs enhanced language learning based on bibliometric analysis, aiming at (1) identifying the most frequently used keywords and their co-occurrence, (2) clustering research topics based on co-citation analysis, (3) finding the most frequently cited studies and authors and (4) constructing the co-authorship network. 330 articles were searched out in Web of Science, 225 peer-viewed journal papers were finally collected according to selection criteria. Bibexcel and VOSviewer were used to visualize the results. Studies reviewed were published between 2005 to 2016, most in the year of 2014 and 2015 (35 papers respectively). The top 10 most frequently appeared keywords are learning, language, blog, teaching, writing, social, web 2.0, technology, English, communication. 8 research themes could be clustered by co-citation analysis: blogging for collaborative learning, blogging for writing skills, blogging in higher education, feedback via blogs, blogging for self-regulated learning, implementation of using blogs in classroom, comparative studies and audio/video blogs. Early studies focused on the introduction of the classroom implementation while recent studies moved to the audio/video blogs from their traditional usage. By reviewing the research related to BALL quantitatively and objectively, this paper reveals the evolution and development trends as well as identifies influential research, helping researchers and educators quickly grasp this field overall and conducting further studies.Keywords: blog, bibliometric analysis, language learning, literature review
Procedia PDF Downloads 212