Search results for: k-means clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 591

Search results for: k-means clustering

81 Lexical Semantic Analysis to Support Ontology Modeling of Maintenance Activities– Case Study of Offshore Riser Integrity

Authors: Vahid Ebrahimipour

Abstract:

Word representation and context meaning of text-based documents play an essential role in knowledge modeling. Business procedures written in natural language are meant to store technical and engineering information, management decision and operation experience during the production system life cycle. Context meaning representation is highly dependent upon word sense, lexical relativity, and sematic features of the argument. This paper proposes a method for lexical semantic analysis and context meaning representation of maintenance activity in a mass production system. Our approach constructs a straightforward lexical semantic approach to analyze facilitates semantic and syntactic features of context structure of maintenance report to facilitate translation, interpretation, and conversion of human-readable interpretation into computer-readable representation and understandable with less heterogeneity and ambiguity. The methodology will enable users to obtain a representation format that maximizes shareability and accessibility for multi-purpose usage. It provides a contextualized structure to obtain a generic context model that can be utilized during the system life cycle. At first, it employs a co-occurrence-based clustering framework to recognize a group of highly frequent contextual features that correspond to a maintenance report text. Then the keywords are identified for syntactic and semantic extraction analysis. The analysis exercises causality-driven logic of keywords’ senses to divulge the structural and meaning dependency relationships between the words in a context. The output is a word contextualized representation of maintenance activity accommodating computer-based representation and inference using OWL/RDF.

Keywords: lexical semantic analysis, metadata modeling, contextual meaning extraction, ontology modeling, knowledge representation

Procedia PDF Downloads 77
80 Identification of Blood Biomarkers Unveiling Early Alzheimer's Disease Diagnosis Through Single-Cell RNA Sequencing Data and Autoencoders

Authors: Hediyeh Talebi, Shokoofeh Ghiam, Changiz Eslahchi

Abstract:

Traditionally, Alzheimer’s disease research has focused on genes with significant fold changes, potentially neglecting subtle but biologically important alterations. Our study introduces an integrative approach that highlights genes crucial to underlying biological processes, regardless of their fold change magnitude. Alzheimer's Single-cell RNA-seq data related to the peripheral blood mononuclear cells (PBMC) was extracted from the Gene Expression Omnibus (GEO). After quality control, normalization, scaling, batch effect correction, and clustering, differentially expressed genes (DEGs) were identified with adjusted p-values less than 0.05. These DEGs were categorized based on cell-type, resulting in four datasets, each corresponding to a distinct cell type. To distinguish between cells from healthy individuals and those with Alzheimer's, an adversarial autoencoder with a classifier was employed. This allowed for the separation of healthy and diseased samples. To identify the most influential genes in this classification, the weight matrices in the network, which includes the encoder and classifier components, were multiplied, and focused on the top 20 genes. The analysis revealed that while some of these genes exhibit a high fold change, others do not. These genes, which may be overlooked by previous methods due to their low fold change, were shown to be significant in our study. The findings highlight the critical role of genes with subtle alterations in diagnosing Alzheimer's disease, a facet frequently overlooked by conventional methods. These genes demonstrate remarkable discriminatory power, underscoring the need to integrate biological relevance with statistical measures in gene prioritization. This integrative approach enhances our understanding of the molecular mechanisms in Alzheimer’s disease and provides a promising direction for identifying potential therapeutic targets.

Keywords: alzheimer's disease, single-cell RNA-seq, neural networks, blood biomarkers

Procedia PDF Downloads 29
79 Improving Efficiencies of Planting Configurations on Draft Environment of Town Square: The Case Study of Taichung City Hall in Taichung, Taiwan

Authors: Yu-Wen Huang, Yi-Cheng Chiang

Abstract:

With urban development, lots of buildings are built around the city. The buildings always affect the urban wind environment. The accelerative situation of wind caused of buildings often makes pedestrians uncomfortable, even causes the accidents and dangers. Factors influencing pedestrian level wind including atmospheric boundary layer, wind direction, wind velocity, planting, building volume, geometric shape of the buildings and adjacent interference effects, etc. Planting has many functions including scraping and slowing urban heat island effect, creating a good visual landscape, increasing urban green area and improve pedestrian level wind. On the other hand, urban square is an important space element supporting the entrance to buildings, city landmarks, and activity collections, etc. The appropriateness of urban square environment usually dominates its success. This research focuses on the effect of tree-planting on the wind environment of urban square. This research studied the square belt of Taichung City Hall. Taichung City Hall is a cuboid building with a large mass opening. The square belt connects the front square, the central opening and the back square. There is often wind draft on the square belt. This phenomenon decreases the activities on the squares. This research applies tree-planting to improve the wind environment and evaluate the effects of two types of planting configuration. The Computational Fluid Dynamics (CFD) simulation analysis and extensive field measurements are applied to explore the improve efficiency of planting configuration on wind environment. This research compares efficiencies of different kinds of planting configuration, including the clustering array configuration and the dispersion, and evaluates the efficiencies by the SET*.

Keywords: micro-climate, wind environment, planting configuration, comfortableness, computational fluid dynamics (CFD)

Procedia PDF Downloads 271
78 Analysis of the Role of Population Ageing on Crosstown Roads' Traffic Accidents Using Latent Class Clustering

Authors: N. Casado-Sanz, B. Guirao

Abstract:

The population aged 65 and over is projected to double in the coming decades. Due to this increase, driver population is expected to grow and in the near future, all countries will be faced with population aging of varying intensity and in unique time frames. This is the greatest challenge facing industrialized nations and due to this fact, the study of the relationships of dependency between population aging and road safety is becoming increasingly relevant. Although the deterioration of driving skills in the elderly has been analyzed in depth, to our knowledge few research studies have focused on the road infrastructure and the mobility of this particular group of users. In Spain, crosstown roads have one of the highest fatality rates. These rural routes have a higher percentage of elderly people who are more dependent on driving due to the absence or limitations of urban public transportation. Analysing road safety in these routes is very complex because of the variety of the features, the dispersion of the data and the complete lack of related literature. The objective of this paper is to identify key factors that cause traffic accidents. The individuals under study were the accidents with killed or seriously injured in Spanish crosstown roads during the period 2006-2015. Latent cluster analysis was applied as a preliminary tool for segmentation of accidents, considering population aging as the main input among other socioeconomic indicators. Subsequently, a linear regression analysis was carried out to estimate the degree of dependence between the accident rate and the variables that define each group. The results show that segmenting the data is very interesting and provides further information. Additionally, the results revealed the clear influence of the aging variable in the clusters obtained. Other variables related to infrastructure and mobility levels, such as the crosstown roads layout and the traffic intensity aimed to be one of the key factors in the causality of road accidents.

Keywords: cluster analysis, population ageing, rural roads, road safety

Procedia PDF Downloads 81
77 Developing Indicators in System Mapping Process Through Science-Based Visual Tools

Authors: Cristian Matti, Valerie Fowles, Eva Enyedi, Piotr Pogorzelski

Abstract:

The system mapping process can be defined as a knowledge service where a team of facilitators, experts and practitioners facilitate a guided conversation, enable the exchange of information and support an iterative curation process. System mapping processes rely on science-based tools to introduce and simplify a variety of components and concepts of socio-technical systems through metaphors while facilitating an interactive dialogue process to enable the design of co-created maps. System maps work then as “artifacts” to provide information and focus the conversation into specific areas around the defined challenge and related decision-making process. Knowledge management facilitates the curation of that data gathered during the system mapping sessions through practices of documentation and subsequent knowledge co-production for which common practices from data science are applied to identify new patterns, hidden insights, recurrent loops and unexpected elements. This study presents empirical evidence on the application of these techniques to explore mechanisms by which visual tools provide guiding principles to portray system components, key variables and types of data through the lens of climate change. In addition, data science facilitates the structuring of elements that allow the analysis of layers of information through affinity and clustering analysis and, therefore, develop simple indicators for supporting the decision-making process. This paper addresses methodological and empirical elements on the horizontal learning process that integrate system mapping through visual tools, interpretation, cognitive transformation and analysis. The process is designed to introduce practitioners to simple iterative and inclusive processes that create actionable knowledge and enable a shared understanding of the system in which they are embedded.

Keywords: indicators, knowledge management, system mapping, visual tools

Procedia PDF Downloads 160
76 Analysis of the Impact of Suez Canal on the Robustness of Global Shipping Networks

Authors: Zimu Li, Zheng Wan

Abstract:

The Suez Canal plays an important role in global shipping networks and is one of the most frequently used waterways in the world. The 2021 canal obstruction by ship Ever Given in March 2021, however, completed blocked the Suez Canal for a week and caused significant disruption to world trade. Therefore, it is very important to quantitatively analyze the impact of the accident on the robustness of the global shipping network. However, the current research on maritime transportation networks is usually limited to local or small-scale networks in a certain region. Based on the complex network theory, this study establishes a global shipping complex network covering 2713 nodes and 137830 edges by using the real trajectory data of the global marine transport ship automatic identification system in 2018. At the same time, two attack modes, deliberate (Suez Canal Blocking) and random, are defined to calculate the changes in network node degree, eccentricity, clustering coefficient, network density, network isolated nodes, betweenness centrality, and closeness centrality under the two attack modes, and quantitatively analyze the actual impact of Suez Canal Blocking on the robustness of global shipping network. The results of the network robustness analysis show that Suez Canal blocking was more destructive to the shipping network than random attacks of the same scale. The network connectivity and accessibility decreased significantly, and the decline decreased with the distance between the port and the canal, showing the phenomenon of distance attenuation. This study further analyzes the impact of the blocking of the Suez Canal on Chinese ports and finds that the blocking of the Suez Canal significantly interferes withChina's shipping network and seriously affects China's normal trade activities. Finally, the impact of the global supply chain is analyzed, and it is found that blocking the canal will seriously damage the normal operation of the global supply chain.

Keywords: global shipping networks, ship AIS trajectory data, main channel, complex network, eigenvalue change

Procedia PDF Downloads 138
75 Economic Cost of Malaria: A Threat to Household Income in Nigeria

Authors: Nsikan Affiah, Kayode Osungbade, Williams Uzoma

Abstract:

Malaria remains one of the major killers of humans worldwide, threatening the lives of more than one-third of the world’s population. Some people refers it to; a disease of poverty because it contributes towards national poverty through its impact on foreign direct investment, tourism, labour productivity, and trade. At the micro level, it may cause poverty through spending on health care, income losses, and premature deaths. Unfortunately, malaria is a disease that affects both low-income household and its high-income counterpart, but low-income households are still at greater risk because significant part of the available monthly income is dedicated to various preventive and treatment measures. The objective of this study is to estimate direct and indirect cost of malaria treatment in households in a section of South-South Region (Akwa Ibom State) of Nigeria. A cross-sectional study of Six Hundred and Forty (640) heads of households or any adult representative of households in three local government areas of Akwa Ibom State, Nigeria from May 1-31, 2015 were ascertained through interviewer-administered questionnaire adapted from Nigerian Malaria Indicator Survey Report. The clustering technique was used to select 640 households with the help of Primary Health Care (PHC) house numbering system. Using exchange rate of 197 Naira/USD, result shows that direct cost of malaria treatment was 8,894.44 USD while the indirect cost of malaria treatment was 11,012.81 USD. Total cost of treatment made up of 44.7% direct cost and 55.3% indirect cost, with average direct cost of malaria treatment per household estimated at 20.6 USD and the average indirect cost of treatment per household estimated at 25.1 USD. Average total cost for each episode (888) of malaria was estimated at 22.4 USD. While at household level, the average total cost was estimated at 45.5 USD. From the average total cost, low-income households would spend 36% of monthly household income on treating malaria and the impact could be said to be catastrophic, compared to high-income households where only 1.2% of monthly household income is spent on malaria treatment. It could be concluded that the cost of malaria treatment is well beyond the means of households and given the reality of repeated bouts of malaria and its contribution to the impoverishment of households, there is a need for urgent action.

Keywords: direct cost, indirect cost, low income households, malaria

Procedia PDF Downloads 224
74 Bean in Turkey: Characterization, Inter Gene Pool Hybridization Events, Breeding, Utilizations

Authors: Faheem Shahzad Baloch, Muhammad Azhar Nadeem, Muhammad Amjad Nawaz, Ephrem Habyarimana, Gonul Comertpay, Tolga Karakoy, Rustu Hatipoglu, Mehmet Zahit Yeken, Vahdettin Ciftci

Abstract:

Turkey is considered a bridge between Europe, Asia, and Africa and possibly played an important role in the distribution of many crops including common bean. Hundreds of common bean landraces can be found in Turkey, particularly in farmers’ fields, and they consistently contribute to the overall production. To investigate the existing genetic diversity and hybridization events between the Andean and Mesoamerican gene pools in the Turkish common bean, 188 common bean accessions (182 landraces and 6 modern cultivars as controls) were collected from 19 different Turkish geographic regions. These accessions were characterized using phenotypic data (growth habit and seed weight), geographic provenance, 12557 high-quality whole-genome DArTseq markers, and 3767 novel DArTseq loci were also identified. The clustering algorithms resolved the Turkish common bean landrace germplasm into the two recognized gene pools, the Mesoamerican and Andean gene pools. Hybridization events were observed in both gene pools (14.36% of the accessions) but mostly in the Mesoamerican (7.97% of the accessions), and was low relative to previous European studies. The lower level of hybridization witnessed the existence of Turkish common bean germplasm in its original form as compared to Europe. Mesoamerican gene pool reflected a higher level of diversity, while the Andean gene pool was predominant (56.91% of the accessions), but genetically less diverse and phenotypically more pure, reflecting farmers greater preference for the Andean gene pool. We also found some genetically distinct landraces and overall, a meaningful level of genetic variability which can be used by the scientific community in breeding efforts to develop superior common bean strains.

Keywords: bean germplasm, DArTseq markers, genotyping by sequencing, Turkey, whole genome diversity

Procedia PDF Downloads 208
73 Associations between Sharing Bike Usage and Characteristics of Urban Street Built Environment in Wuhan, China

Authors: Miao Li, Mengyuan Xu

Abstract:

As a low-carbon travel mode, bicycling has drawn increasing political interest in the contemporary Chinese urban context, and the public sharing bikes have become the most popular ways of bike usage in China now. This research aims to explore the spatial-temporal relationship between sharing bike usage and different characteristics of the urban street built environment. In the research, street segments were used as the analytic unit of the street built environment defined by street intersections. The sharing bike usage data in the research include a total of 2.64 million samples that are the entire sharing bike distribution data recorded in two days in 2018 within a neighborhood of 185.4 hectares in the city of Wuhan, China. And these data are assigned to the 97 urban street segments in this area based on their geographic location. The built environment variables used in this research are categorized into three sections: 1) street design characteristics, such as street width, street greenery, types of bicycle lanes; 2) condition of other public transportation, such as the availability of metro station; 3) Street function characteristics that are described by the categories and density of the point of interest (POI) along the segments. Spatial Lag Models (SLM) were used in order to reveal the relationships of specific urban streets built environment characteristics and the likelihood of sharing bicycling usage in whole and different periods a day. The results show: 1) there is spatial autocorrelation among sharing bicycling usage of urban streets in case area in general, non-working day, working day and each period of a day, which presents a clustering pattern in the street space; 2) a statistically strong association between bike sharing usage and several different built environment characteristics such as POI density, types of bicycle lanes and street width; 3) the pattern that bike sharing usage is influenced by built environment characteristics depends on the period within a day. These findings could be useful for policymakers and urban designers to better understand the factors affecting bike sharing system and thus propose guidance and strategy for urban street planning and design in order to promote the use of sharing bikes.

Keywords: big data, sharing bike usage, spatial statistics, urban street built environment

Procedia PDF Downloads 111
72 Visualization of PM₂.₅ Time Series and Correlation Analysis of Cities in Bangladesh

Authors: Asif Zaman, Moinul Islam Zaber, Amin Ahsan Ali

Abstract:

In recent years of industrialization, the South Asian countries are being affected by air pollution due to a severe increase in fine particulate matter 2.5 (PM₂.₅). Among them, Bangladesh is one of the most polluting countries. In this paper, statistical analyses were conducted on the time series of PM₂.₅ from various districts in Bangladesh, mostly around Dhaka city. Research has been conducted on the dynamic interactions and relationships between PM₂.₅ concentrations in different zones. The study is conducted toward understanding the characteristics of PM₂.₅, such as spatial-temporal characterization, correlation of other contributors behind air pollution such as human activities, driving factors and environmental casualties. Clustering on the data gave an insight on the districts groups based on their AQI frequency as representative districts. Seasonality analysis on hourly and monthly frequency found higher concentration of fine particles in nighttime and winter season, respectively. Cross correlation analysis discovered a phenomenon of correlations among cities based on time-lagged series of air particle readings and visualization framework is developed for observing interaction in PM₂.₅ concentrations between cities. Significant time-lagged correlations were discovered between the PM₂.₅ time series in different city groups throughout the country by cross correlation analysis. Additionally, seasonal heatmaps depict that the pooled series correlations are less significant in warmer months, and among cities of greater geographic distance as well as time lag magnitude and direction of the best shifted correlated particulate matter time series among districts change seasonally. The geographic map visualization demonstrates spatial behaviour of air pollution among districts around Dhaka city and the significant effect of wind direction as the vital actor on correlated shifted time series. The visualization framework has multipurpose usage from gathering insight of general and seasonal air quality of Bangladesh to determining the pathway of regional transportation of air pollution.

Keywords: air quality, particles, cross correlation, seasonality

Procedia PDF Downloads 84
71 Genetic Diversity Analysis of Pearl Millet (Pennisetum glaucum [L. R. Rr.]) Accessions from Northwestern Nigeria

Authors: Sa’adu Mafara Abubakar, Muhammad Nuraddeen Danjuma, Adewole Tomiwa Adetunji, Richard Mundembe, Salisu Mohammed, Francis Bayo Lewu, Joseph I. Kiok

Abstract:

Pearl millet is the most drought tolerant of all domesticated cereals, is cultivated extensively to feed millions of people who mainly live in hash agroclimatic zones. It serves as a major source of food for more than 40 million smallholder farmers living in the marginal agricultural lands of Northern Nigeria. Pearl millet grain is more nutritious than other cereals like maize, is also a principal source of energy, protein, vitamins, and minerals for millions of poorest people in the regions where it is cultivated. Pearl millet has recorded relatively little research attention compared with other crops and no sufficient work has analyzed its genetic diversity in north-western Nigeria. Therefore, this study was undertaken with the objectives to analyze the genetic diversity of pearl millet accessions using SSR marker and to analyze the extent of evolutionary relationship among pearl millet accessions at the molecular level. The result of the present study confirmed diversity among accessions of pearl millet in the study area. Simple Sequence Repeats (SSR) markers were used for genetic analysis and evolutionary relationship of the accessions of pearl millet. To analyze the level of genetic diversity, 8 polymorphic SSR markers were used to screen 69 accessions collected based on three maturity periods. SSR markers result reveal relationships among the accessions in terms of genetic similarities, evolutionary and ancestral origin, it also reveals a total of 53 alleles recorded with 8 microsatellites and an average of 6.875 per microsatellite, the range was from 3 to 9 alleles in PSMP2248 and PSMP2080 respectively. Moreover, both the factorial analysis and the dendrogram of phylogeny tree grouping patterns and cluster analysis were almost in agreement with each other that diversity is not clustering according to geographical patterns but, according to similarity, the result showed maximum similarity among clusters with few numbers of accessions. It has been recommended that other molecular markers should be tested in the same study area.

Keywords: pearl millet, genetic diversity, simple sequence repeat (SSR)

Procedia PDF Downloads 219
70 Exploring the Unintended Consequences of Loyalty programs in the Gambling Sector

Authors: Violet Justine Mtonga, Cecilia Diaz

Abstract:

this paper explores the prevalence of loyalty programs in the UK gambling industry and their association with unintended consequences and harm amongst program members. The use of loyalty programs within the UK gambling industry has risen significantly with over 40 million cards in circulation. Some research suggests that as of 2013-2014, nearly 95% of UK consumers have at least one loyalty card with 78% being members of two or more programs, and the average household possesses ‘22 loyalty programs’, nearly half of which tend to be used actively. The core design of loyalty programs is to create a relational ‘win-win’ approach where value is jointly created between the parties involved through repetitive engagement. However, main concern about the diffusion of gambling organisations’ loyalty programs amongst consumers, might be the use by the organisations within the gambling industry to over influence customer engagement and potentially cause unintended harm. To help understand the complex phenomena of the diffusions and adaptation of the use of loyalty programs in the gambling industry, and the potential unintended outcomes, this study is theoretically underpinned by the social exchange theory of relationships entrenched in the processes of social exchanges of resources, rewards, and costs for long-term interactions and mutual benefits. Qualitative data were collected via in-depth interviews from 14 customers and 12 employees within the UK land-based gambling firms. Data were analysed using a combination of thematic and clustering analysis to help reveal and discover the emerging themes regarding the use of loyalty cards for gambling companies and exploration of subgroups within the sample. The study’s results indicate that there are different unintended consequences and harm of loyalty program engagement and usage such as maladaptive gambling behaviours, risk of compulsiveness, and loyalty programs promoting gambling from home. Furthermore, there is a strong indication of a rite of passage among loyalty program members. There is also strong evidence to support other unfavorable behaviors such as amplified gambling habits and risk-taking practices. Additionally, in pursuit of rewards, loyalty program incentives effectuate overconsumption and heighten expenditure. Overall, the primary findings of this study show that loyalty programs in the gambling industry should be designed with an ethical perspective and practice.

Keywords: gambling, loyalty programs, social exchange theory, unintended harm

Procedia PDF Downloads 62
69 Leveraging Natural Language Processing for Legal Artificial Intelligence: A Longformer Approach for Taiwanese Legal Cases

Authors: Hsin Lee, Hsuan Lee

Abstract:

Legal artificial intelligence (LegalAI) has been increasing applications within legal systems, propelled by advancements in natural language processing (NLP). Compared with general documents, legal case documents are typically long text sequences with intrinsic logical structures. Most existing language models have difficulty understanding the long-distance dependencies between different structures. Another unique challenge is that while the Judiciary of Taiwan has released legal judgments from various levels of courts over the years, there remains a significant obstacle in the lack of labeled datasets. This deficiency makes it difficult to train models with strong generalization capabilities, as well as accurately evaluate model performance. To date, models in Taiwan have yet to be specifically trained on judgment data. Given these challenges, this research proposes a Longformer-based pre-trained language model explicitly devised for retrieving similar judgments in Taiwanese legal documents. This model is trained on a self-constructed dataset, which this research has independently labeled to measure judgment similarities, thereby addressing a void left by the lack of an existing labeled dataset for Taiwanese judgments. This research adopts strategies such as early stopping and gradient clipping to prevent overfitting and manage gradient explosion, respectively, thereby enhancing the model's performance. The model in this research is evaluated using both the dataset and the Average Entropy of Offense-charged Clustering (AEOC) metric, which utilizes the notion of similar case scenarios within the same type of legal cases. Our experimental results illustrate our model's significant advancements in handling similarity comparisons within extensive legal judgments. By enabling more efficient retrieval and analysis of legal case documents, our model holds the potential to facilitate legal research, aid legal decision-making, and contribute to the further development of LegalAI in Taiwan.

Keywords: legal artificial intelligence, computation and language, language model, Taiwanese legal cases

Procedia PDF Downloads 45
68 Career Guidance System Using Machine Learning

Authors: Mane Darbinyan, Lusine Hayrapetyan, Elen Matevosyan

Abstract:

Artificial Intelligence in Education (AIED) has been created to help students get ready for the workforce, and over the past 25 years, it has grown significantly, offering a variety of technologies to support academic, institutional, and administrative services. However, this is still challenging, especially considering the labor market's rapid change. While choosing a career, people face various obstacles because they do not take into consideration their own preferences, which might lead to many other problems like shifting jobs, work stress, occupational infirmity, reduced productivity, and manual error. Besides preferences, people should properly evaluate their technical and non-technical skills, as well as their personalities. Professional counseling has become a difficult undertaking for counselors due to the wide range of career choices brought on by changing technological trends. It is necessary to close this gap by utilizing technology that makes sophisticated predictions about a person's career goals based on their personality. Hence, there is a need to create an automated model that would help in decision-making based on user inputs. Improving career guidance can be achieved by embedding machine learning into the career consulting ecosystem. There are various systems of career guidance that work based on the same logic, such as the classification of applicants, matching applications with appropriate departments or jobs, making predictions, and providing suitable recommendations. Methodologies like KNN, Neural Networks, K-means clustering, D-Tree, and many other advanced algorithms are applied in the fields of data and compute some data, which is helpful to predict the right careers. Besides helping users with their career choice, these systems provide numerous opportunities which are very useful while making this hard decision. They help the candidate to recognize where he/she specifically lacks sufficient skills so that the candidate can improve those skills. They are also capable to offer an e-learning platform, taking into account the user's lack of knowledge. Furthermore, users can be provided with details on a particular job, such as the abilities required to excel in that industry.

Keywords: career guidance system, machine learning, career prediction, predictive decision, data mining, technical and non-technical skills

Procedia PDF Downloads 52
67 Career Guidance System Using Machine Learning

Authors: Mane Darbinyan, Lusine Hayrapetyan, Elen Matevosyan

Abstract:

Artificial Intelligence in Education (AIED) has been created to help students get ready for the workforce, and over the past 25 years, it has grown significantly, offering a variety of technologies to support academic, institutional, and administrative services. However, this is still challenging, especially considering the labor market's rapid change. While choosing a career, people face various obstacles because they do not take into consideration their own preferences, which might lead to many other problems like shifting jobs, work stress, occupational infirmity, reduced productivity, and manual error. Besides preferences, people should evaluate properly their technical and non-technical skills, as well as their personalities. Professional counseling has become a difficult undertaking for counselors due to the wide range of career choices brought on by changing technological trends. It is necessary to close this gap by utilizing technology that makes sophisticated predictions about a person's career goals based on their personality. Hence, there is a need to create an automated model that would help in decision-making based on user inputs. Improving career guidance can be achieved by embedding machine learning into the career consulting ecosystem. There are various systems of career guidance that work based on the same logic, such as the classification of applicants, matching applications with appropriate departments or jobs, making predictions, and providing suitable recommendations. Methodologies like KNN, neural networks, K-means clustering, D-Tree, and many other advanced algorithms are applied in the fields of data and compute some data, which is helpful to predict the right careers. Besides helping users with their career choice, these systems provide numerous opportunities which are very useful while making this hard decision. They help the candidate to recognize where he/she specifically lacks sufficient skills so that the candidate can improve those skills. They are also capable of offering an e-learning platform, taking into account the user's lack of knowledge. Furthermore, users can be provided with details on a particular job, such as the abilities required to excel in that industry.

Keywords: career guidance system, machine learning, career prediction, predictive decision, data mining, technical and non-technical skills

Procedia PDF Downloads 43
66 A Risk Assessment Tool for the Contamination of Aflatoxins on Dried Figs Based on Machine Learning Algorithms

Authors: Kottaridi Klimentia, Demopoulos Vasilis, Sidiropoulos Anastasios, Ihara Diego, Nikolaidis Vasileios, Antonopoulos Dimitrios

Abstract:

Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity, and aflatoxinogenic capacity of the strains, topography, soil, and climate parameters of the fig orchards, are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive, and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), the concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P, and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques, i.e., dimensionality reduction on the original dataset (principal component analysis), metric learning (Mahalanobis metric for clustering), and k-nearest neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson correlation coefficient (PCC) between observed and predicted values.

Keywords: aflatoxins, Aspergillus spp., dried figs, k-nearest neighbors, machine learning, prediction

Procedia PDF Downloads 147
65 Examining Social Connectivity through Email Network Analysis: Study of Librarians' Emailing Groups in Pakistan

Authors: Muhammad Arif Khan, Haroon Idrees, Imran Aziz, Sidra Mushtaq

Abstract:

Social platforms like online discussion and mailing groups are well aligned with academic as well as professional learning spaces. Professional communities are increasingly moving to online forums for sharing and capturing the intellectual abilities. This study investigated dynamics of social connectivity of yahoo mailing groups of Pakistani Library and Information Science (LIS) professionals using Graph Theory technique. Design/Methodology: Social Network Analysis is the increasingly concerned domain for scientists in identifying whether people grow together through online social interaction or, whether they just reflect connectivity. We have conducted a longitudinal study using Network Graph Theory technique to analyze the large data-set of email communication. The data was collected from three yahoo mailing groups using network analysis software over a period of six months i.e. January to June 2016. Findings of the network analysis were reviewed through focus group discussion with LIS experts and selected respondents of the study. Data were analyzed in Microsoft Excel and network diagrams were visualized using NodeXL and ORA-Net Scene package. Findings: Findings demonstrate that professionals and students exhibit intellectual growth the more they get tied within a network by interacting and participating in communication through online forums. The study reports on dynamics of the large network by visualizing the email correspondence among group members in a network consisting vertices (members) and edges (randomized correspondence). The model pair wise relationship between group members was illustrated to show characteristics, reasons, and strength of ties. Connectivity of nodes illustrated the frequency of communication among group members through examining node coupling, diffusion of networks, and node clustering has been demonstrated in-depth. Network analysis was found to be a useful technique in investigating the dynamics of the large network.

Keywords: emailing networks, network graph theory, online social platforms, yahoo mailing groups

Procedia PDF Downloads 208
64 Ischemic Stroke Detection in Computed Tomography Examinations

Authors: Allan F. F. Alves, Fernando A. Bacchim Neto, Guilherme Giacomini, Marcela de Oliveira, Ana L. M. Pavan, Maria E. D. Rosa, Diana R. Pina

Abstract:

Stroke is a worldwide concern, only in Brazil it accounts for 10% of all registered deaths. There are 2 stroke types, ischemic (87%) and hemorrhagic (13%). Early diagnosis is essential to avoid irreversible cerebral damage. Non-enhanced computed tomography (NECT) is one of the main diagnostic techniques used due to its wide availability and rapid diagnosis. Detection depends on the size and severity of lesions and the time spent between the first symptoms and examination. The Alberta Stroke Program Early CT Score (ASPECTS) is a subjective method that increases the detection rate. The aim of this work was to implement an image segmentation system to enhance ischemic stroke and to quantify the area of ischemic and hemorrhagic stroke lesions in CT scans. We evaluated 10 patients with NECT examinations diagnosed with ischemic stroke. Analyzes were performed in two axial slices, one at the level of the thalamus and basal ganglion and one adjacent to the top edge of the ganglionic structures with window width between 80 and 100 Hounsfield Units. We used different image processing techniques such as morphological filters, discrete wavelet transform and Fuzzy C-means clustering. Subjective analyzes were performed by a neuroradiologist according to the ASPECTS scale to quantify ischemic areas in the middle cerebral artery region. These subjective analysis results were compared with objective analyzes performed by the computational algorithm. Preliminary results indicate that the morphological filters actually improve the ischemic areas for subjective evaluations. The comparison in area of the ischemic region contoured by the neuroradiologist and the defined area by computational algorithm showed no deviations greater than 12% in any of the 10 examination tests. Although there is a tendency that the areas contoured by the neuroradiologist are smaller than those obtained by the algorithm. These results show the importance of a computer aided diagnosis software to assist neuroradiology decisions, especially in critical situations as the choice of treatment for ischemic stroke.

Keywords: ischemic stroke, image processing, CT scans, Fuzzy C-means

Procedia PDF Downloads 339
63 Recommendations for Data Quality Filtering of Opportunistic Species Occurrence Data

Authors: Camille Van Eupen, Dirk Maes, Marc Herremans, Kristijn R. R. Swinnen, Ben Somers, Stijn Luca

Abstract:

In ecology, species distribution models are commonly implemented to study species-environment relationships. These models increasingly rely on opportunistic citizen science data when high-quality species records collected through standardized recording protocols are unavailable. While these opportunistic data are abundant, uncertainty is usually high, e.g., due to observer effects or a lack of metadata. Data quality filtering is often used to reduce these types of uncertainty in an attempt to increase the value of studies relying on opportunistic data. However, filtering should not be performed blindly. In this study, recommendations are built for data quality filtering of opportunistic species occurrence data that are used as input for species distribution models. Using an extensive database of 5.7 million citizen science records from 255 species in Flanders, the impact on model performance was quantified by applying three data quality filters, and these results were linked to species traits. More specifically, presence records were filtered based on record attributes that provide information on the observation process or post-entry data validation, and changes in the area under the receiver operating characteristic (AUC), sensitivity, and specificity were analyzed using the Maxent algorithm with and without filtering. Controlling for sample size enabled us to study the combined impact of data quality filtering, i.e., the simultaneous impact of an increase in data quality and a decrease in sample size. Further, the variation among species in their response to data quality filtering was explored by clustering species based on four traits often related to data quality: commonness, popularity, difficulty, and body size. Findings show that model performance is affected by i) the quality of the filtered data, ii) the proportional reduction in sample size caused by filtering and the remaining absolute sample size, and iii) a species ‘quality profile’, resulting from a species classification based on the four traits related to data quality. The findings resulted in recommendations on when and how to filter volunteer generated and opportunistically collected data. This study confirms that correctly processed citizen science data can make a valuable contribution to ecological research and species conservation.

Keywords: citizen science, data quality filtering, species distribution models, trait profiles

Procedia PDF Downloads 167
62 An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Authors: Ben Soltane Cheima, Ittansa Yonas Kelbesa

Abstract:

Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

Keywords: feature extraction, speaker modeling, feature matching, Mel frequency cepstrum coefficient (MFCC), Gaussian mixture model (GMM), vector quantization (VQ), Linde-Buzo-Gray (LBG), expectation maximization (EM), pre-processing, voice activity detection (VAD), short time energy (STE), background noise statistical modeling, closed-set tex-independent speaker identification system (CISI)

Procedia PDF Downloads 280
61 Investigating Homicide Offender Typologies Based on Their Clinical Histories and Crime Scene Behaviour Patterns

Authors: Valeria Abreu Minero, Edward Barker, Hannah Dickson, Francois Husson, Sandra Flynn, Jennifer Shaw

Abstract:

Purpose – The purpose of this paper is to identify offender typologies based on aspects of the offenders’ psychopathology and their associations with crime scene behaviours using data derived from the National Confidential Enquiry into Suicide and Safety in Mental Health concerning homicides in England and Wales committed by offenders in contact with mental health services in the year preceding the offence (n=759). Design/methodology/approach – The authors used multiple correspondence analysis to investigate the interrelationships between the variables and hierarchical agglomerative clustering to identify offender typologies. Variables describing: the offender’s mental health history; the offenders’ mental state at the time of offence; characteristics useful for police investigations; and patterns of crime scene behaviours were included. Findings – Results showed differences in the offender’s histories in relation to their crime scene behaviours. Further, analyses revealed three homicide typologies: externalising, psychosis and depression. Analyses revealed three homicide typologies: externalising, psychotic and depressive. Practical implications – These typologies may assist the police during homicide investigations by: furthering their understanding of the crime or likely suspect; offering insights into crime patterns; provide advice as to what an offender’s offence behaviour might signify about his/her mental health background; findings suggest information concerning offender psychopathology may be useful for offender profiling purposes in cases of homicide offenders with schizophrenia, depression and comorbid diagnosis of personality disorder and alcohol/drug dependence. Originality/value – Empirical studies with an emphasis on offender profiling have almost exclusively focussed on the inference of offender demographic characteristics. This study provides a first step in the exploration of offender psychopathology and its integration to the multivariate analysis of offence information for the purposes of investigative profiling of homicide by identifying the dominant patterns of mental illness within homicidal behaviour.

Keywords: offender profiling, mental illness, psychopathology, multivariate analysis, homicide, crime scene analysis, crime scene behviours, investigative advice

Procedia PDF Downloads 95
60 Statistical Pattern Recognition for Biotechnological Process Characterization Based on High Resolution Mass Spectrometry

Authors: S. Fröhlich, M. Herold, M. Allmer

Abstract:

Early stage quantitative analysis of host cell protein (HCP) variations is challenging yet necessary for comprehensive bioprocess development. High resolution mass spectrometry (HRMS) provides a high-end technology for accurate identification alongside with quantitative information. Hereby we describe a flexible HRMS assay platform to quantify HCPs relevant in microbial expression systems such as E. Coli in both up and downstream development by means of MVDA tools. Cell pellets were lysed and proteins extracted, purified samples not further treated before applying the SMART tryptic digest kit. Peptides separation was optimized using an RP-UHPLC separation platform. HRMS-MSMS analysis was conducted on an Orbitrap Velos Elite applying CID. Quantification was performed label-free taking into account ionization properties and physicochemical peptide similarities. Results were analyzed using SIEVE 2.0 (Thermo Fisher Scientific) and SIMCA (Umetrics AG). The developed HRMS platform was applied to an E. Coli expression set with varying productivity and the corresponding downstream process. Selected HCPs were successfully quantified within the fmol range. Analysing HCP networks based on pattern analysis facilitated low level quantification and enhanced validity. This approach is of high relevance for high-throughput screening experiments during upstream development, e.g. for titer determination, dynamic HCP network analysis or product characterization. Considering the downstream purification process, physicochemical clustering of identified HCPs is of relevance to adjust buffer conditions accordingly. However, the technology provides an innovative approach for label-free MS based quantification relying on statistical pattern analysis and comparison. Absolute quantification based on physicochemical properties and peptide similarity score provides a technological approach without the need of sophisticated sample preparation strategies and is therefore proven to be straightforward, sensitive and highly reproducible in terms of product characterization.

Keywords: process analytical technology, mass spectrometry, process characterization, MVDA, pattern recognition

Procedia PDF Downloads 225
59 Towards Real-Time Classification of Finger Movement Direction Using Encephalography Independent Components

Authors: Mohamed Mounir Tellache, Hiroyuki Kambara, Yasuharu Koike, Makoto Miyakoshi, Natsue Yoshimura

Abstract:

This study explores the practicality of using electroencephalographic (EEG) independent components to predict eight-direction finger movements in pseudo-real-time. Six healthy participants with individual-head MRI images performed finger movements in eight directions with two different arm configurations. The analysis was performed in two stages. The first stage consisted of using independent component analysis (ICA) to separate the signals representing brain activity from non-brain activity signals and to obtain the unmixing matrix. The resulting independent components (ICs) were checked, and those reflecting brain-activity were selected. Finally, the time series of the selected ICs were used to predict eight finger-movement directions using Sparse Logistic Regression (SLR). The second stage consisted of using the previously obtained unmixing matrix, the selected ICs, and the model obtained by applying SLR to classify a different EEG dataset. This method was applied to two different settings, namely the single-participant level and the group-level. For the single-participant level, the EEG dataset used in the first stage and the EEG dataset used in the second stage originated from the same participant. For the group-level, the EEG datasets used in the first stage were constructed by temporally concatenating each combination without repetition of the EEG datasets of five participants out of six, whereas the EEG dataset used in the second stage originated from the remaining participants. The average test classification results across datasets (mean ± S.D.) were 38.62 ± 8.36% for the single-participant, which was significantly higher than the chance level (12.50 ± 0.01%), and 27.26 ± 4.39% for the group-level which was also significantly higher than the chance level (12.49% ± 0.01%). The classification accuracy within [–45°, 45°] of the true direction is 70.03 ± 8.14% for single-participant and 62.63 ± 6.07% for group-level which may be promising for some real-life applications. Clustering and contribution analyses further revealed the brain regions involved in finger movement and the temporal aspect of their contribution to the classification. These results showed the possibility of using the ICA-based method in combination with other methods to build a real-time system to control prostheses.

Keywords: brain-computer interface, electroencephalography, finger motion decoding, independent component analysis, pseudo real-time motion decoding

Procedia PDF Downloads 114
58 RNA-Seq Analysis of the Wild Barley (H. spontaneum) Leaf Transcriptome under Salt Stress

Authors: Ahmed Bahieldin, Ahmed Atef, Jamal S. M. Sabir, Nour O. Gadalla, Sherif Edris, Ahmed M. Alzohairy, Nezar A. Radhwan, Mohammed N. Baeshen, Ahmed M. Ramadan, Hala F. Eissa, Sabah M. Hassan, Nabih A. Baeshen, Osama Abuzinadah, Magdy A. Al-Kordy, Fotouh M. El-Domyati, Robert K. Jansen

Abstract:

Wild salt-tolerant barley (Hordeum spontaneum) is the ancestor of cultivated barley (Hordeum vulgare or H. vulgare). Although the cultivated barley genome is well studied, little is known about genome structure and function of its wild ancestor. In the present study, RNA-Seq analysis was performed on young leaves of wild barley treated with salt (500 mM NaCl) at four different time intervals. Transcriptome sequencing yielded 103 to 115 million reads for all replicates of each treatment, corresponding to over 10 billion nucleotides per sample. Of the total reads, between 74.8 and 80.3% could be mapped and 77.4 to 81.7% of the transcripts were found in the H. vulgare unigene database (unigene-mapped). The unmapped wild barley reads for all treatments and replicates were assembled de novo and the resulting contigs were used as a new reference genome. This resultedin94.3 to 95.3%oftheunmapped reads mapping to the new reference. The number of differentially expressed transcripts was 9277, 3861 of which were uni gene-mapped. The annotated unigene- and de novo-mapped transcripts (5100) were utilized to generate expression clusters across time of salt stress treatment. Two-dimensional hierarchical clustering classified differential expression profiles into nine expression clusters, four of which were selected for further analysis. Differentially expressed transcripts were assigned to the main functional categories. The most important groups were ‘response to external stimulus’ and ‘electron-carrier activity’. Highly expressed transcripts are involved in several biological processes, including electron transport and exchanger mechanisms, flavonoid biosynthesis, reactive oxygen species (ROS) scavenging, ethylene production, signaling network and protein refolding. The comparisons demonstrated that mRNA-Seq is an efficient method for the analysis of differentially expressed genes and biological processes under salt stress.

Keywords: electron transport, flavonoid biosynthesis, reactive oxygen species, rnaseq

Procedia PDF Downloads 359
57 Optimal Pricing Based on Real Estate Demand Data

Authors: Vanessa Kummer, Maik Meusel

Abstract:

Real estate demand estimates are typically derived from transaction data. However, in regions with excess demand, transactions are driven by supply and therefore do not indicate what people are actually looking for. To estimate the demand for housing in Switzerland, search subscriptions from all important Swiss real estate platforms are used. These data do, however, suffer from missing information—for example, many users do not specify how many rooms they would like or what price they would be willing to pay. In economic analyses, it is often the case that only complete data is used. Usually, however, the proportion of complete data is rather small which leads to most information being neglected. Also, the data might have a strong distortion if it is complete. In addition, the reason that data is missing might itself also contain information, which is however ignored with that approach. An interesting issue is, therefore, if for economic analyses such as the one at hand, there is an added value by using the whole data set with the imputed missing values compared to using the usually small percentage of complete data (baseline). Also, it is interesting to see how different algorithms affect that result. The imputation of the missing data is done using unsupervised learning. Out of the numerous unsupervised learning approaches, the most common ones, such as clustering, principal component analysis, or neural networks techniques are applied. By training the model iteratively on the imputed data and, thereby, including the information of all data into the model, the distortion of the first training set—the complete data—vanishes. In a next step, the performances of the algorithms are measured. This is done by randomly creating missing values in subsets of the data, estimating those values with the relevant algorithms and several parameter combinations, and comparing the estimates to the actual data. After having found the optimal parameter set for each algorithm, the missing values are being imputed. Using the resulting data sets, the next step is to estimate the willingness to pay for real estate. This is done by fitting price distributions for real estate properties with certain characteristics, such as the region or the number of rooms. Based on these distributions, survival functions are computed to obtain the functional relationship between characteristics and selling probabilities. Comparing the survival functions shows that estimates which are based on imputed data sets do not differ significantly from each other; however, the demand estimate that is derived from the baseline data does. This indicates that the baseline data set does not include all available information and is therefore not representative for the entire sample. Also, demand estimates derived from the whole data set are much more accurate than the baseline estimation. Thus, in order to obtain optimal results, it is important to make use of all available data, even though it involves additional procedures such as data imputation.

Keywords: demand estimate, missing-data imputation, real estate, unsupervised learning

Procedia PDF Downloads 258
56 Phenotypic Diversity of the Tomato Germplasm from the Lazio Region in Central Italy, with a Case Study on Molecular Distinctiveness

Authors: Barbara Farinon, Maurizio E. Picarella, Lorenzo Mancini, Andrea Mazzucato

Abstract:

Italy is notoriously a secondary center of diversification for cultivated tomatoes (Solanum lycopersicum L.). The study of phenotypic and genetic diversity in landrace collections is important for germplasm conservation and biodiversity protection. Here, we set up to study the germplasm collected in the region of Lazio in Central Italy with a focus on the distinctiveness among landraces and the attribution of membership to unnamed accessions. Our regional collection included 30 accessions belonging to six different locally recognized landraces and 21 unnamed accessions. All accessions were gathered in Lazio and belonged to the collection held at the Regional Agency for the Development and Innovation of Agriculture in Lazio (ARSIAL, in the application of the Regional Act n. 15/2000, funded by Lazio Rural Development Plan 2014 – 2020 Agro-environmental Measure, Action 10.2.1) and at the University of Tuscia. We included 13 control genotypes as references. The collection showed wide phenotypic variability for several traits, such as fruit weight (range 14-277 g), locule number (2-12), shape index (0.54-2.65), yield (0.24-3.08 kg/plant), and soluble solids (3.4-7.5 °B). A few landraces showed uncommon phenotypes, such as potato leaf, colorless fruit epidermis, or delayed ripening. Multivariate analysis of 25 cardinal phenotypic variables grouped the named varieties and allowed to assign of some of the unnamed to recognized groups. A case study for distinctiveness is presented for the flattened-ribbed types that presented overlapping distribution according to the phenotypic data. Molecular markers retrieved by previous studies revealed differences compared to the phenotyping clustering, indicating that the named varieties “Scatolone di Bolsena” and “Pantano Romanesco” belong to the Marmande group, together with the reference landrace from Tuscany “Costoluto Fiorentino”. Differently, the landrace “Spagnoletta di Formia e Gaeta” was clearly distinct from the former at the molecular level. Therefore, a genotypic analysis of the analyzed collection appears needed to better define the molecular distinctiveness among the flattened-ribbed accessions, as well as to properly attribute the membership group of the unnamed accessions.

Keywords: distinctiveness, flattened-ribbed fruits, regional landraces, tomato

Procedia PDF Downloads 96
55 Comprehensive Longitudinal Multi-omic Profiling in Weight Gain and Insulin Resistance

Authors: Christine Y. Yeh, Brian D. Piening, Sarah M. Totten, Kimberly Kukurba, Wenyu Zhou, Kevin P. F. Contrepois, Gucci J. Gu, Sharon Pitteri, Michael Snyder

Abstract:

Three million deaths worldwide are attributed to obesity. However, the biomolecular mechanisms that describe the link between adiposity and subsequent disease states are poorly understood. Insulin resistance characterizes approximately half of obese individuals and is a major cause of obesity-mediated diseases such as Type II diabetes, hypertension and other cardiovascular diseases. This study makes use of longitudinal quantitative and high-throughput multi-omics (genomics, epigenomics, transcriptomics, glycoproteomics etc.) methodologies on blood samples to develop multigenic and multi-analyte signatures associated with weight gain and insulin resistance. Participants of this study underwent a 30-day period of weight gain via excessive caloric intake followed by a 60-day period of restricted dieting and return to baseline weight. Blood samples were taken at three different time points per patient: baseline, peak-weight and post weight loss. Patients were characterized as either insulin resistant (IR) or insulin sensitive (IS) before having their samples processed via longitudinal multi-omic technologies. This comparative study revealed a wealth of biomolecular changes associated with weight gain after using methods in machine learning, clustering, network analysis etc. Pathways of interest included those involved in lipid remodeling, acute inflammatory response and glucose metabolism. Some of these biomolecules returned to baseline levels as the patient returned to normal weight whilst some remained elevated. IR patients exhibited key differences in inflammatory response regulation in comparison to IS patients at all time points. These signatures suggest differential metabolism and inflammatory pathways between IR and IS patients. Biomolecular differences associated with weight gain and insulin resistance were identified on various levels: in gene expression, epigenetic change, transcriptional regulation and glycosylation. This study was not only able to contribute to new biology that could be of use in preventing or predicting obesity-mediated diseases, but also matured novel biomedical informatics technologies to produce and process data on many comprehensive omics levels.

Keywords: insulin resistance, multi-omics, next generation sequencing, proteogenomics, type ii diabetes

Procedia PDF Downloads 393
54 A Construction Management Tool: Determining a Project Schedule Typical Behaviors Using Cluster Analysis

Authors: Natalia Rudeli, Elisabeth Viles, Adrian Santilli

Abstract:

Delays in the construction industry are a global phenomenon. Many construction projects experience extensive delays exceeding the initially estimated completion time. The main purpose of this study is to identify construction projects typical behaviors in order to develop a prognosis and management tool. Being able to know a construction projects schedule tendency will enable evidence-based decision-making to allow resolutions to be made before delays occur. This study presents an innovative approach that uses Cluster Analysis Method to support predictions during Earned Value Analyses. A clustering analysis was used to predict future scheduling, Earned Value Management (EVM), and Earned Schedule (ES) principal Indexes behaviors in construction projects. The analysis was made using a database with 90 different construction projects. It was validated with additional data extracted from literature and with another 15 contrasting projects. For all projects, planned and executed schedules were collected and the EVM and ES principal indexes were calculated. A complete linkage classification method was used. In this way, the cluster analysis made considers that the distance (or similarity) between two clusters must be measured by its most disparate elements, i.e. that the distance is given by the maximum span among its components. Finally, through the use of EVM and ES Indexes and Tukey and Fisher Pairwise Comparisons, the statistical dissimilarity was verified and four clusters were obtained. It can be said that construction projects show an average delay of 35% of its planned completion time. Furthermore, four typical behaviors were found and for each of the obtained clusters, the interim milestones and the necessary rhythms of construction were identified. In general, detected typical behaviors are: (1) Projects that perform a 5% of work advance in the first two tenths and maintain a constant rhythm until completion (greater than 10% for each remaining tenth), being able to finish on the initially estimated time. (2) Projects that start with an adequate construction rate but suffer minor delays culminating with a total delay of almost 27% of the planned time. (3) Projects which start with a performance below the planned rate and end up with an average delay of 64%, and (4) projects that begin with a poor performance, suffer great delays and end up with an average delay of a 120% of the planned completion time. The obtained clusters compose a tool to identify the behavior of new construction projects by comparing their current work performance to the validated database, thus allowing the correction of initial estimations towards more accurate completion schedules.

Keywords: cluster analysis, construction management, earned value, schedule

Procedia PDF Downloads 232
53 The Extent of Virgin Olive-Oil Prices' Distribution Revealing the Behavior of Market Speculators

Authors: Fathi Abid, Bilel Kaffel

Abstract:

The olive tree, the olive harvest during winter season and the production of olive oil better known by professionals under the name of the crushing operation have interested institutional traders such as olive-oil offices and private companies such as food industry refining and extracting pomace olive oil as well as export-import public and private companies specializing in olive oil. The major problem facing producers of olive oil each winter campaign, contrary to what is expected, it is not whether the harvest will be good or not but whether the sale price will allow them to cover production costs and achieve a reasonable margin of profit or not. These questions are entirely legitimate if we judge by the importance of the issue and the heavy complexity of the uncertainty and competition made tougher by a high level of indebtedness and the experience and expertise of speculators and producers whose objectives are sometimes conflicting. The aim of this paper is to study the formation mechanism of olive oil prices in order to learn about speculators’ behavior and expectations in the market, how they contribute by their industry knowledge and their financial alliances and the size the financial challenge that may be involved for them to build private information hoses globally to take advantage. The methodology used in this paper is based on two stages, in the first stage we study econometrically the formation mechanisms of olive oil price in order to understand the market participant behavior by implementing ARMA, SARMA, GARCH and stochastic diffusion processes models, the second stage is devoted to prediction purposes, we use a combined wavelet- ANN approach. Our main findings indicate that olive oil market participants interact with each other in a way that they promote stylized facts formation. The unstable participant’s behaviors create the volatility clustering, non-linearity dependent and cyclicity phenomena. By imitating each other in some periods of the campaign, different participants contribute to the fat tails observed in the olive oil price distribution. The best prediction model for the olive oil price is based on a back propagation artificial neural network approach with input information based on wavelet decomposition and recent past history.

Keywords: olive oil price, stylized facts, ARMA model, SARMA model, GARCH model, combined wavelet-artificial neural network, continuous-time stochastic volatility mode

Procedia PDF Downloads 312
52 The Relationship between Violence against Women in the Family and Common Mental Disorders in Urban Informal Settlements of Mumbai, India: A Cross-Sectional Study

Authors: Abigail Bentley, Audrey Prost, Nayreen Daruwalla, Apoorwa Gupta, David Osrin

Abstract:

BACKGROUND: Intimate partner violence (IPV) can impact a woman’s physical, reproductive and mental health, including common mental disorders such as anxiety and depression. However, people other than an intimate partner may also perpetrate violence against women in the family, particularly in India. This study aims to investigate the relationship between experiences of violence perpetrated by the husband and other members of the wider household and symptoms of common mental disorders in women residing in informal settlement (slum) areas of Mumbai. METHODS: Experiences of violence were assessed through a detailed cross-sectional survey of 598 women, including questions about specific acts of emotional, economic, physical and sexual violence across different time points in the woman’s life and the main perpetrator of each act. Symptoms of common mental disorders were assessed using the 12-item General Health Questionnaire (GHQ-12). The GHQ-12 scores were divided into four groups and the relationship between experiences of each type of violence in the last 12 months and GHQ-12 score group was analyzed using ordinal logistic regression, adjusted for the woman’s age and clustering. RESULTS: 482 (81%) women consented to interview. On average, they were 28.5 years old, had completed 7 years of education and had been married 9 years. 88% were Muslim and 47% lived in joint and 53% in nuclear families. 44% of women had experienced at least one act of violence in their lifetime (33% emotional, 22% economic, 23% physical, 12% sexual). 7% had a high GHQ-12 score (6 or above). For violence experiences in the last 12 months, the odds of being in the highest GHQ-12 score group versus the lower groups combined were 13.1 for emotional violence, 6.5 for economic, 5.7 for physical and 6.3 for sexual (p<0.001 for all outcomes). DISCUSSION: The high level of violence reported across the lifetime could be due to the detailed assessment of violent acts at multiple time points and the inclusion of perpetrators within the family other than the husband. Each type of violence was associated with greater odds of a higher GHQ-12 score and therefore more symptoms of common mental disorders. Emotional violence was far more strongly associated with symptoms of common mental disorders than physical or sexual violence. However, it is not possible to attribute causal directionality to the association. Further work to investigate the relationship between differing severity of violence experiences and women’s mental health and the components of emotional violence that make it so strongly associated with symptoms of common mental disorders would be beneficial.

Keywords: common mental disorders, family violence, India, informal settlements, mental health, violence against women

Procedia PDF Downloads 337