Search results for: unsupervised
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 142

Search results for: unsupervised

22 Preliminary Study of Gold Nanostars/Enhanced Filter for Keratitis Microorganism Raman Fingerprint Analysis

Authors: Chi-Chang Lin, Jian-Rong Wu, Jiun-Yan Chiu

Abstract:

Myopia, ubiquitous symptom that is necessary to correct the eyesight by optical lens struggles many people for their daily life. Recent years, younger people raise interesting on using contact lens because of its convenience and aesthetics. In clinical, the risk of eye infections increases owing to the behavior of incorrectly using contact lens unsupervised cleaning which raising the infection risk of cornea, named ocular keratitis. In order to overcome the identification needs, new detection or analysis method with rapid and more accurate identification for clinical microorganism is importantly needed. In our study, we take advantage of Raman spectroscopy having unique fingerprint for different functional groups as the distinct and fast examination tool on microorganism. As we know, Raman scatting signals are normally too weak for the detection, especially in biological field. Here, we applied special SERS enhancement substrates to generate higher Raman signals. SERS filter we designed in this article that prepared by deposition of silver nanoparticles directly onto cellulose filter surface and suspension nanoparticles - gold nanostars (AuNSs) also be introduced together to achieve better enhancement for lower concentration analyte (i.e., various bacteria). Research targets also focusing on studying the shape effect of synthetic AuNSs, needle-like surface morphology may possible creates more hot-spot for getting higher SERS enhance ability. We utilized new designed SERS technology to distinguish the bacteria from ocular keratitis under strain level, and specific Raman and SERS fingerprint were grouped under pattern recognition process. We reported a new method combined different SERS substrates can be applied for clinical microorganism detection under strain level with simple, rapid preparation and low cost. Our presenting SERS technology not only shows the great potential for clinical bacteria detection but also can be used for environmental pollution and food safety analysis.

Keywords: bacteria, gold nanostars, Raman spectroscopy surface-enhanced Raman scattering filter

Procedia PDF Downloads 137
21 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

Keywords: time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder

Procedia PDF Downloads 260
20 Machine Learning in Gravity Models: An Application to International Recycling Trade Flow

Authors: Shan Zhang, Peter Suechting

Abstract:

Predicting trade patterns is critical to decision-making in public and private domains, especially in the current context of trade disputes among major economies. In the past, U.S. recycling has relied heavily on strong demand for recyclable materials overseas. However, starting in 2017, a series of new recycling policies (bans and higher inspection standards) was enacted by multiple countries that were the primary importers of recyclables from the U.S. prior to that point. As the global trade flow of recycling shifts, some new importers, mostly developing countries in South and Southeast Asia, have been overwhelmed by the sheer quantities of scrap materials they have received. As the leading exporter of recyclable materials, the U.S. now has a pressing need to build its recycling industry domestically. With respect to the global trade in scrap materials used for recycling, the interest in this paper is (1) predicting how the export of recyclable materials from the U.S. might vary over time, and (2) predicting how international trade flows for recyclables might change in the future. Focusing on three major recyclable materials with a history of trade, this study uses data-driven and machine learning (ML) algorithms---supervised (shrinkage and tree methods) and unsupervised (neural network method)---to decipher the international trade pattern of recycling. Forecasting the potential trade values of recyclables in the future could help importing countries, to which those materials will shift next, to prepare related trade policies. Such policies can assist policymakers in minimizing negative environmental externalities and in finding the optimal amount of recyclables needed by each country. Such forecasts can also help exporting countries, like the U.S understand the importance of healthy domestic recycling industry. The preliminary result suggests that gravity models---in addition to particular selection macroeconomic predictor variables--are appropriate predictors of the total export value of recyclables. With the inclusion of variables measuring aspects of the political conditions (trade tariffs and bans), predictions show that recyclable materials are shifting from more policy-restricted countries to less policy-restricted countries in international recycling trade. Those countries also tend to have high manufacturing activities as a percentage of their GDP.

Keywords: environmental economics, machine learning, recycling, international trade

Procedia PDF Downloads 138
19 A Virtual Set-Up to Evaluate Augmented Reality Effect on Simulated Driving

Authors: Alicia Yanadira Nava Fuentes, Ilse Cervantes Camacho, Amadeo José Argüelles Cruz, Ana María Balboa Verduzco

Abstract:

Augmented reality promises being present in future driving, with its immersive technology let to show directions and maps to identify important places indicating with graphic elements when the car driver requires the information. On the other side, driving is considered a multitasking activity and, for some people, a complex activity where different situations commonly occur that require the immediate attention of the car driver to make decisions that contribute to avoid accidents; therefore, the main aim of the project is the instrumentation of a platform with biometric sensors that allows evaluating the performance in driving vehicles with the influence of augmented reality devices to detect the level of attention in drivers, since it is important to know the effect that it produces. In this study, the physiological sensors EPOC X (EEG), ECG06 PRO and EMG Myoware are joined in the driving test platform with a Logitech G29 steering wheel and the simulation software City Car Driving in which the level of traffic can be controlled, as well as the number of pedestrians that exist within the simulation obtaining a driver interaction in real mode and through a MSP430 microcontroller achieves the acquisition of data for storage. The sensors bring a continuous analog signal in time that needs signal conditioning, at this point, a signal amplifier is incorporated due to the acquired signals having a sensitive range of 1.25 mm/mV, also filtering that consists in eliminating the frequency bands of the signal in order to be interpretative and without noise to convert it from an analog signal into a digital signal to analyze the physiological signals of the drivers, these values are stored in a database. Based on this compilation, we work on the extraction of signal features and implement K-NN (k-nearest neighbor) classification methods and decision trees (unsupervised learning) that enable the study of data for the identification of patterns and determine by classification methods different effects of augmented reality on drivers. The expected results of this project include are a test platform instrumented with biometric sensors for data acquisition during driving and a database with the required variables to determine the effect caused by augmented reality on people in simulated driving.

Keywords: augmented reality, driving, physiological signals, test platform

Procedia PDF Downloads 106
18 A Method to Evaluate and Compare Web Information Extractors

Authors: Patricia Jiménez, Rafael Corchuelo, Hassan A. Sleiman

Abstract:

Web mining is gaining importance at an increasing pace. Currently, there are many complementary research topics under this umbrella. Their common theme is that they all focus on applying knowledge discovery techniques to data that is gathered from the Web. Sometimes, these data are relatively easy to gather, chiefly when it comes from server logs. Unfortunately, there are cases in which the data to be mined is the data that is displayed on a web document. In such cases, it is necessary to apply a pre-processing step to first extract the information of interest from the web documents. Such pre-processing steps are performed using so-called information extractors, which are software components that are typically configured by means of rules that are tailored to extracting the information of interest from a web page and structuring it according to a pre-defined schema. Paramount to getting good mining results is that the technique used to extract the source information is exact, which requires to evaluate and compare the different proposals in the literature from an empirical point of view. According to Google Scholar, about 4 200 papers on information extraction have been published during the last decade. Unfortunately, they were not evaluated within a homogeneous framework, which leads to difficulties to compare them empirically. In this paper, we report on an original information extraction evaluation method. Our contribution is three-fold: a) this is the first attempt to provide an evaluation method for proposals that work on semi-structured documents; the little existing work on this topic focuses on proposals that work on free text, which has little to do with extracting information from semi-structured documents. b) It provides a method that relies on statistically sound tests to support the conclusions drawn; the previous work does not provide clear guidelines or recommend statistically sound tests, but rather a survey that collects many features to take into account as well as related work; c) We provide a novel method to compute the performance measures regarding unsupervised proposals; otherwise they would require the intervention of a user to compute them by using the annotations on the evaluation sets and the information extracted. Our contributions will definitely help researchers in this area make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it will also help practitioners make informed decisions on which proposal is the most adequate for a particular problem. This conference is a good forum to discuss on our ideas so that we can spread them to help improve the evaluation of information extraction proposals and gather valuable feedback from other researchers.

Keywords: web information extractors, information extraction evaluation method, Google scholar, web

Procedia PDF Downloads 226
17 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: Gaelle Candel, David Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embeddings. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n²) to O(n²=k), and the memory requirement from n² to 2(n=k)², which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution, and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning

Procedia PDF Downloads 114
16 Identification of Damage Mechanisms in Interlock Reinforced Composites Using a Pattern Recognition Approach of Acoustic Emission Data

Authors: M. Kharrat, G. Moreau, Z. Aboura

Abstract:

The latest advances in the weaving industry, combined with increasingly sophisticated means of materials processing, have made it possible to produce complex 3D composite structures. Mainly used in aeronautics, composite materials with 3D architecture offer better mechanical properties than 2D reinforced composites. Nevertheless, these materials require a good understanding of their behavior. Because of the complexity of such materials, the damage mechanisms are multiple, and the scenario of their appearance and evolution depends on the nature of the exerted solicitations. The AE technique is a well-established tool for discriminating between the damage mechanisms. Suitable sensors are used during the mechanical test to monitor the structural health of the material. Relevant AE-features are then extracted from the recorded signals, followed by a data analysis using pattern recognition techniques. In order to better understand the damage scenarios of interlock composite materials, a multi-instrumentation was set-up in this work for tracking damage initiation and development, especially in the vicinity of the first significant damage, called macro-damage. The deployed instrumentation includes video-microscopy, Digital Image Correlation, Acoustic Emission (AE) and micro-tomography. In this study, a multi-variable AE data analysis approach was developed for the discrimination between the different signal classes representing the different emission sources during testing. An unsupervised classification technique was adopted to perform AE data clustering without a priori knowledge. The multi-instrumentation and the clustered data served to label the different signal families and to build a learning database. This latter is useful to construct a supervised classifier that can be used for automatic recognition of the AE signals. Several materials with different ingredients were tested under various solicitations in order to feed and enrich the learning database. The methodology presented in this work was useful to refine the damage threshold for the new generation materials. The damage mechanisms around this threshold were highlighted. The obtained signal classes were assigned to the different mechanisms. The isolation of a 'noise' class makes it possible to discriminate between the signals emitted by damages without resorting to spatial filtering or increasing the AE detection threshold. The approach was validated on different material configurations. For the same material and the same type of solicitation, the identified classes are reproducible and little disturbed. The supervised classifier constructed based on the learning database was able to predict the labels of the classified signals.

Keywords: acoustic emission, classifier, damage mechanisms, first damage threshold, interlock composite materials, pattern recognition

Procedia PDF Downloads 131
15 Recent Policy Changes in Israeli Early Childhood Frameworks: Hope for the Future

Authors: Yaara Shilo

Abstract:

Early childhood education and care (ECEC)in Israel has undergone extensive reform and now requires daycare centers to meet internationally recognized professional standards. Since 1948, one of the aims of childcare facilities was to enable women’s participation in the workforce.A 1965 law grouped daycare centers for young children with facilities for the elderly and for disabled persons under the same authority. In the 1970’s, ECEC leaders sought to change childcare from proprietary to educational facilities. From 1976 deliberations in the Knesset regarding appropriate attribution of ECEC frameworks resulted in their being moved to various authorities that supported women’s employment: Ministries of Finance, Industry, and Commerce, as well as the Welfare Department. Prior to 2018, 75% of infants and toddlers in institutional care were in unlicensed and unsupervised settings. Legislative processes accompanied the conceptual change to an eventual appropriate attribution of ECEC frameworks. Position papers over the past two decades resulted in recommendations for standards conforming to OECD regulations. Simultaneous incidents of child abuse, some resulting in death, riveted public attention to the need for adequate government supervision, accelerating the legislative process. Appropriate care for very young children must center on quality interactions with caregivers, thus requiring adequate staff training. Finally, in 2018 a law was passed stipulating standards for staff training, proper facilities, child-adult ratios, and safety measures. The Ariav commission expanded training to caregivers for ages 0-3. Transfer of the ECEC to the Ministry of Education ensured establishment of basic training. Groundwork created by new legislation initiated professional development of EC educators for ages 0-3. This process should raise salaries and bolster the system’s ability to attract quality employees. In 2022 responsibility for ECEC ages 0-3 was transferred from the Ministry of Finance to the Ministry of Education, shifting emphasis from proprietary care to professional considerations focusing on wellbeing and early childhood education. The recent revolutionary changes in ECEC point to a new age in the care and education of Israel’s youngest citizens. Implementation of international standards, adequate training, and professionalization of the workforce focus on the child’s needs.

Keywords: policy, early childhood, care and education, daycare, development

Procedia PDF Downloads 68
14 Building User Behavioral Models by Processing Web Logs and Clustering Mechanisms

Authors: Madhuka G. P. D. Udantha, Gihan V. Dias, Surangika Ranathunga

Abstract:

Today Websites contain very interesting applications. But there are only few methodologies to analyze User navigations through the Websites and formulating if the Website is put to correct use. The web logs are only used if some major attack or malfunctioning occurs. Web Logs contain lot interesting dealings on users in the system. Analyzing web logs has become a challenge due to the huge log volume. Finding interesting patterns is not as easy as it is due to size, distribution and importance of minor details of each log. Web logs contain very important data of user and site which are not been put to good use. Retrieving interesting information from logs gives an idea of what the users need, group users according to their various needs and improve site to build an effective and efficient site. The model we built is able to detect attacks or malfunctioning of the system and anomaly detection. Logs will be more complex as volume of traffic and the size and complexity of web site grows. Unsupervised techniques are used in this solution which is fully automated. Expert knowledge is only used in validation. In our approach first clean and purify the logs to bring them to a common platform with a standard format and structure. After cleaning module web session builder is executed. It outputs two files, Web Sessions file and Indexed URLs file. The Indexed URLs file contains the list of URLs accessed and their indices. Web Sessions file lists down the indices of each web session. Then DBSCAN and EM Algorithms are used iteratively and recursively to get the best clustering results of the web sessions. Using homogeneity, completeness, V-measure, intra and inter cluster distance and silhouette coefficient as parameters these algorithms self-evaluate themselves to input better parametric values to run the algorithms. If a cluster is found to be too large then micro-clustering is used. Using Cluster Signature Module the clusters are annotated with a unique signature called finger-print. In this module each cluster is fed to Associative Rule Learning Module. If it outputs confidence and support as value 1 for an access sequence it would be a potential signature for the cluster. Then the access sequence occurrences are checked in other clusters. If it is found to be unique for the cluster considered then the cluster is annotated with the signature. These signatures are used in anomaly detection, prevent cyber attacks, real-time dashboards that visualize users, accessing web pages, predict actions of users and various other applications in Finance, University Websites, News and Media Websites etc.

Keywords: anomaly detection, clustering, pattern recognition, web sessions

Procedia PDF Downloads 257
13 Analysis and Design Modeling for Next Generation Network Intrusion Detection and Prevention System

Authors: Nareshkumar Harale, B. B. Meshram

Abstract:

The continued exponential growth of successful cyber intrusions against today’s businesses has made it abundantly clear that traditional perimeter security measures are no longer adequate and effective. We evolved the network trust architecture from trust-untrust to Zero-Trust, With Zero Trust, essential security capabilities are deployed in a way that provides policy enforcement and protection for all users, devices, applications, data resources, and the communications traffic between them, regardless of their location. Information exchange over the Internet, in spite of inclusion of advanced security controls, is always under innovative, inventive and prone to cyberattacks. TCP/IP protocol stack, the adapted standard for communication over network, suffers from inherent design vulnerabilities such as communication and session management protocols, routing protocols and security protocols are the major cause of major attacks. With the explosion of cyber security threats, such as viruses, worms, rootkits, malwares, Denial of Service attacks, accomplishing efficient and effective intrusion detection and prevention is become crucial and challenging too. In this paper, we propose a design and analysis model for next generation network intrusion detection and protection system as part of layered security strategy. The proposed system design provides intrusion detection for wide range of attacks with layered architecture and framework. The proposed network intrusion classification framework deals with cyberattacks on standard TCP/IP protocol, routing protocols and security protocols. It thereby forms the basis for detection of attack classes and applies signature based matching for known cyberattacks and data mining based machine learning approaches for unknown cyberattacks. Our proposed implemented software can effectively detect attacks even when malicious connections are hidden within normal events. The unsupervised learning algorithm applied to network audit data trails results in unknown intrusion detection. Association rule mining algorithms generate new rules from collected audit trail data resulting in increased intrusion prevention though integrated firewall systems. Intrusion response mechanisms can be initiated in real-time thereby minimizing the impact of network intrusions. Finally, we have shown that our approach can be validated and how the analysis results can be used for detecting and protection from the new network anomalies.

Keywords: network intrusion detection, network intrusion prevention, association rule mining, system analysis and design

Procedia PDF Downloads 200
12 Winter Wheat Yield Forecasting Using Sentinel-2 Imagery at the Early Stages

Authors: Chunhua Liao, Jinfei Wang, Bo Shan, Yang Song, Yongjun He, Taifeng Dong

Abstract:

Winter wheat is one of the main crops in Canada. Forecasting of within-field variability of yield in winter wheat at the early stages is essential for precision farming. However, the crop yield modelling based on high spatial resolution satellite data is generally affected by the lack of continuous satellite observations, resulting in reducing the generalization ability of the models and increasing the difficulty of crop yield forecasting at the early stages. In this study, the correlations between Sentinel-2 data (vegetation indices and reflectance) and yield data collected by combine harvester were investigated and a generalized multivariate linear regression (MLR) model was built and tested with data acquired in different years. It was found that the four-band reflectance (blue, green, red, near-infrared) performed better than their vegetation indices (NDVI, EVI, WDRVI and OSAVI) in wheat yield prediction. The optimum phenological stage for wheat yield prediction with highest accuracy was at the growing stages from the end of the flowering to the beginning of the filling stage. The best MLR model was therefore built to predict wheat yield before harvest using Sentinel-2 data acquired at the end of the flowering stage. Further, to improve the ability of the yield prediction at the early stages, three simple unsupervised domain adaptation (DA) methods were adopted to transform the reflectance data at the early stages to the optimum phenological stage. The winter wheat yield prediction using multiple vegetation indices showed higher accuracy than using single vegetation index. The optimum stage for winter wheat yield forecasting varied with different fields when using vegetation indices, while it was consistent when using multispectral reflectance and the optimum stage for winter wheat yield prediction was at the end of flowering stage. The average testing RMSE of the MLR model at the end of the flowering stage was 604.48 kg/ha. Near the booting stage, the average testing RMSE of yield prediction using the best MLR was reduced to 799.18 kg/ha when applying the mean matching domain adaptation approach to transform the data to the target domain (at the end of the flowering) compared to that using the original data based on the models developed at the booting stage directly (“MLR at the early stage”) (RMSE =1140.64 kg/ha). This study demonstrated that the simple mean matching (MM) performed better than other DA methods and it was found that “DA then MLR at the optimum stage” performed better than “MLR directly at the early stages” for winter wheat yield forecasting at the early stages. The results indicated that the DA had a great potential in near real-time crop yield forecasting at the early stages. This study indicated that the simple domain adaptation methods had a great potential in crop yield prediction at the early stages using remote sensing data.

Keywords: wheat yield prediction, domain adaptation, Sentinel-2, within-field scale

Procedia PDF Downloads 37
11 Unsupervised Detection of Burned Area from Remote Sensing Images Using Spatial Correlation and Fuzzy Clustering

Authors: Tauqir A. Moughal, Fusheng Yu, Abeer Mazher

Abstract:

Land-cover and land-use change information are important because of their practical uses in various applications, including deforestation, damage assessment, disasters monitoring, urban expansion, planning, and land management. Therefore, developing change detection methods for remote sensing images is an important ongoing research agenda. However, detection of change through optical remote sensing images is not a trivial task due to many factors including the vagueness between the boundaries of changed and unchanged regions and spatial dependence of the pixels to its neighborhood. In this paper, we propose a binary change detection technique for bi-temporal optical remote sensing images. As in most of the optical remote sensing images, the transition between the two clusters (change and no change) is overlapping and the existing methods are incapable of providing the accurate cluster boundaries. In this regard, a methodology has been proposed which uses the fuzzy c-means clustering to tackle the problem of vagueness in the changed and unchanged class by formulating the soft boundaries between them. Furthermore, in order to exploit the neighborhood information of the pixels, the input patterns are generated corresponding to each pixel from bi-temporal images using 3×3, 5×5 and 7×7 window. The between images and within image spatial dependence of the pixels to its neighborhood is quantified by using Pearson product moment correlation and Moran’s I statistics, respectively. The proposed technique consists of two phases. At first, between images and within image spatial correlation is calculated to utilize the information that the pixels at different locations may not be independent. Second, fuzzy c-means technique is used to produce two clusters from input feature by not only taking care of vagueness between the changed and unchanged class but also by exploiting the spatial correlation of the pixels. To show the effectiveness of the proposed technique, experiments are conducted on multispectral and bi-temporal remote sensing images. A subset (2100×1212 pixels) of a pan-sharpened, bi-temporal Landsat 5 thematic mapper optical image of Los Angeles, California, is used in this study which shows a long period of the forest fire continued from July until October 2009. Early forest fire and later forest fire optical remote sensing images were acquired on July 5, 2009 and October 25, 2009, respectively. The proposed technique is used to detect the fire (which causes change on earth’s surface) and compared with the existing K-means clustering technique. Experimental results showed that proposed technique performs better than the already existing technique. The proposed technique can be easily extendable for optical hyperspectral images and is suitable for many practical applications.

Keywords: burned area, change detection, correlation, fuzzy clustering, optical remote sensing

Procedia PDF Downloads 139
10 Post-Soviet LULC Analysis of Tbilisi, Batumi and Kutaisi Using of Remote Sensing and Geo Information System

Authors: Lela Gadrani, Mariam Tsitsagi

Abstract:

Human is a part of the urban landscape and responsible for it. Urbanization of cities includes the longest phase; thus none of the environment ever undergoes such anthropogenic impact as the area of large cities. The post-Soviet period is very interesting in terms of scientific research. The changes that have occurred in the cities since the collapse of the Soviet Union have not yet been analyzed best to our knowledge. In this context, the aim of this paper is to analyze the changes in the land use of the three large cities of Georgia (Tbilisi, Kutaisi, Batumi). Tbilisi as a capital city, Batumi as a port city, and Kutaisi as a former industrial center. Data used during the research process are conventionally divided into satellite and supporting materials. For this purpose, the largest topographic maps (1:10 000) of all three cities were analyzed, Tbilisi General Plans (1896, 1924), Tbilisi and Kutaisi historical maps. The main emphasis was placed on the classification of Landsat images. In this case, we have classified the images LULC (LandUse / LandCover) of all three cities taken in 1987 and 2016 using the supervised and unsupervised methods. All the procedures were performed in the programs: Arc GIS 10.3.1 and ENVI 5.0. In each classification we have singled out the following classes: built-up area, water bodies, agricultural lands, green cover and bare soil, and calculated the areas occupied by them. In order to check the validity of the obtained results, additionally we used the higher resolution images of CORONA and Sentinel. Ultimately we identified the changes that took place in the land use in the post-Soviet period in the above cities. According to the results, a large wave of changes touched Tbilisi and Batumi, though in different periods. It turned out that in the case of Tbilisi, the area of developed territory has increased by 13.9% compared to the 1987 data, which is certainly happening at the expense of agricultural land and green cover, in particular, the area of agricultural lands has decreased by 4.97%; and the green cover by 5.67%. It should be noted that Batumi has obviously overtaken the country's capital in terms of development. With the unaided eye it is clear that in comparison with other regions of Georgia, everything is different in Batumi. In fact, Batumi is an unofficial summer capital of Georgia. Undoubtedly, Batumi’s development is very important both in economic and social terms. However, there is a danger that in the uneven conditions of urban development, we will eventually get a developed center - Batumi, and multiple underdeveloped peripheries around it. Analysis of the changes in the land use is of utmost importance not only for quantitative evaluation of the changes already implemented, but for future modeling and prognosis of urban development. Raster data containing the classes of land use is an integral part of the city's prognostic models.

Keywords: analysis, geo information system, remote sensing, LULC

Procedia PDF Downloads 428
9 Applying Miniaturized near Infrared Technology for Commingled and Microplastic Waste Analysis

Authors: Monika Rani, Claudio Marchesi, Stefania Federici, Laura E. Depero

Abstract:

Degradation of the aquatic environment by plastic litter, especially microplastics (MPs), i.e., any water-insoluble solid plastic particle with the longest dimension in the range 1µm and 1000 µm (=1 mm) size, is an unfortunate indication of the advancement of the Anthropocene age on Earth. Microplastics formed due to natural weathering processes are termed as secondary microplastics, while when these are synthesized in industries, they are called primary microplastics. Their presence from the highest peaks to the deepest points in oceans explored and their resistance to biological and chemical decay has adversely affected the environment, especially marine life. Even though the presence of MPs in the marine environment is well-reported, a legitimate and authentic analytical technique to sample, analyze, and quantify the MPs is still under progress and testing stages. Among the characterization techniques, vibrational spectroscopic techniques are largely adopted in the field of polymers. And the ongoing miniaturization of these methods is on the way to revolutionize the plastic recycling industry. In this scenario, the capability and the feasibility of a miniaturized near-infrared (MicroNIR) spectroscopy combined with chemometrics tools for qualitative and quantitative analysis of urban plastic waste collected from a recycling plant and microplastic mixture fragmented in the lab were investigated. Based on the Resin Identification Code, 250 plastic samples were used for macroplastic analysis and to set up a library of polymers. Subsequently, MicroNIR spectra were analysed through the application of multivariate modelling. Principal Components Analysis (PCA) was used as an unsupervised tool to find trends within the data. After the exploratory PCA analysis, a supervised classification tool was applied in order to distinguish the different plastic classes, and a database containing the NIR spectra of polymers was made. For the microplastic analysis, the three most abundant polymers in the plastic litter, PE, PP, PS, were mechanically fragmented in the laboratory to micron size. The distinctive arrangement of blends of these three microplastics was prepared in line with a designed ternary composition plot. After the PCA exploratory analysis, a quantitative model Partial Least Squares Regression (PLSR) allowed to predict the percentage of microplastics in the mixtures. With a complete dataset of 63 compositions, PLS was calibrated with 42 data-points. The model was used to predict the composition of 21 unknown mixtures of the test set. The advantage of the consolidated NIR Chemometric approach lies in the quick evaluation of whether the sample is macro or micro, contaminated, coloured or not, and with no sample pre-treatment. The technique can be utilized with bigger example volumes and even considers an on-site evaluation and in this manner satisfies the need for a high-throughput strategy.

Keywords: chemometrics, microNIR, microplastics, urban plastic waste

Procedia PDF Downloads 131
8 A Machine Learning Approach for Assessment of Tremor: A Neurological Movement Disorder

Authors: Rajesh Ranjan, Marimuthu Palaniswami, A. A. Hashmi

Abstract:

With the changing lifestyle and environment around us, the prevalence of the critical and incurable disease has proliferated. One such condition is the neurological disorder which is rampant among the old age population and is increasing at an unstoppable rate. Most of the neurological disorder patients suffer from some movement disorder affecting the movement of their body parts. Tremor is the most common movement disorder which is prevalent in such patients that infect the upper or lower limbs or both extremities. The tremor symptoms are commonly visible in Parkinson’s disease patient, and it can also be a pure tremor (essential tremor). The patients suffering from tremor face enormous trouble in performing the daily activity, and they always need a caretaker for assistance. In the clinics, the assessment of tremor is done through a manual clinical rating task such as Unified Parkinson’s disease rating scale which is time taking and cumbersome. Neurologists have also affirmed a challenge in differentiating a Parkinsonian tremor with the pure tremor which is essential in providing an accurate diagnosis. Therefore, there is a need to develop a monitoring and assistive tool for the tremor patient that keep on checking their health condition by coordinating them with the clinicians and caretakers for early diagnosis and assistance in performing the daily activity. In our research, we focus on developing a system for automatic classification of tremor which can accurately differentiate the pure tremor from the Parkinsonian tremor using a wearable accelerometer-based device, so that adequate diagnosis can be provided to the correct patient. In this research, a study was conducted in the neuro-clinic to assess the upper wrist movement of the patient suffering from Pure (Essential) tremor and Parkinsonian tremor using a wearable accelerometer-based device. Four tasks were designed in accordance with Unified Parkinson’s disease motor rating scale which is used to assess the rest, postural, intentional and action tremor in such patient. Various features such as time-frequency domain, wavelet-based and fast-Fourier transform based cross-correlation were extracted from the tri-axial signal which was used as input feature vector space for the different supervised and unsupervised learning tools for quantification of severity of tremor. A minimum covariance maximum correlation energy comparison index was also developed which was used as the input feature for various classification tools for distinguishing the PT and ET tremor types. An automatic system for efficient classification of tremor was developed using feature extraction methods, and superior performance was achieved using K-nearest neighbors and Support Vector Machine classifiers respectively.

Keywords: machine learning approach for neurological disorder assessment, automatic classification of tremor types, feature extraction method for tremor classification, neurological movement disorder, parkinsonian tremor, essential tremor

Procedia PDF Downloads 131
7 Impact Location From Instrumented Mouthguard Kinematic Data In Rugby

Authors: Jazim Sohail, Filipe Teixeira-Dias

Abstract:

Mild traumatic brain injury (mTBI) within non-helmeted contact sports is a growing concern due to the serious risk of potential injury. Extensive research is being conducted looking into head kinematics in non-helmeted contact sports utilizing instrumented mouthguards that allow researchers to record accelerations and velocities of the head during and after an impact. This does not, however, allow the location of the impact on the head, and its magnitude and orientation, to be determined. This research proposes and validates two methods to quantify impact locations from instrumented mouthguard kinematic data, one using rigid body dynamics, the other utilizing machine learning. The rigid body dynamics technique focuses on establishing and matching moments from Euler’s and torque equations in order to find the impact location on the head. The methodology is validated with impact data collected from a lab test with the dummy head fitted with an instrumented mouthguard. Additionally, a Hybrid III Dummy head finite element model was utilized to create synthetic kinematic data sets for impacts from varying locations to validate the impact location algorithm. The algorithm calculates accurate impact locations; however, it will require preprocessing of live data, which is currently being done by cross-referencing data timestamps to video footage. The machine learning technique focuses on eliminating the preprocessing aspect by establishing trends within time-series signals from instrumented mouthguards to determine the impact location on the head. An unsupervised learning technique is used to cluster together impacts within similar regions from an entire time-series signal. The kinematic signals established from mouthguards are converted to the frequency domain before using a clustering algorithm to cluster together similar signals within a time series that may span the length of a game. Impacts are clustered within predetermined location bins. The same Hybrid III Dummy finite element model is used to create impacts that closely replicate on-field impacts in order to create synthetic time-series datasets consisting of impacts in varying locations. These time-series data sets are used to validate the machine learning technique. The rigid body dynamics technique provides a good method to establish accurate impact location of impact signals that have already been labeled as true impacts and filtered out of the entire time series. However, the machine learning technique provides a method that can be implemented with long time series signal data but will provide impact location within predetermined regions on the head. Additionally, the machine learning technique can be used to eliminate false impacts captured by sensors saving additional time for data scientists using instrumented mouthguard kinematic data as validating true impacts with video footage would not be required.

Keywords: head impacts, impact location, instrumented mouthguard, machine learning, mTBI

Procedia PDF Downloads 173
6 Self-Organizing Maps for Exploration of Partially Observed Data and Imputation of Missing Values in the Context of the Manufacture of Aircraft Engines

Authors: Sara Rejeb, Catherine Duveau, Tabea Rebafka

Abstract:

To monitor the production process of turbofan aircraft engines, multiple measurements of various geometrical parameters are systematically recorded on manufactured parts. Engine parts are subject to extremely high standards as they can impact the performance of the engine. Therefore, it is essential to analyze these databases to better understand the influence of the different parameters on the engine's performance. Self-organizing maps are unsupervised neural networks which achieve two tasks simultaneously: they visualize high-dimensional data by projection onto a 2-dimensional map and provide clustering of the data. This technique has become very popular for data exploration since it provides easily interpretable results and a meaningful global view of the data. As such, self-organizing maps are usually applied to aircraft engine condition monitoring. As databases in this field are huge and complex, they naturally contain multiple missing entries for various reasons. The classical Kohonen algorithm to compute self-organizing maps is conceived for complete data only. A naive approach to deal with partially observed data consists in deleting items or variables with missing entries. However, this requires a sufficient number of complete individuals to be fairly representative of the population; otherwise, deletion leads to a considerable loss of information. Moreover, deletion can also induce bias in the analysis results. Alternatively, one can first apply a common imputation method to create a complete dataset and then apply the Kohonen algorithm. However, the choice of the imputation method may have a strong impact on the resulting self-organizing map. Our approach is to address simultaneously the two problems of computing a self-organizing map and imputing missing values, as these tasks are not independent. In this work, we propose an extension of self-organizing maps for partially observed data, referred to as missSOM. First, we introduce a criterion to be optimized, that aims at defining simultaneously the best self-organizing map and the best imputations for the missing entries. As such, missSOM is also an imputation method for missing values. To minimize the criterion, we propose an iterative algorithm that alternates the learning of a self-organizing map and the imputation of missing values. Moreover, we develop an accelerated version of the algorithm by entwining the iterations of the Kohonen algorithm with the updates of the imputed values. This method is efficiently implemented in R and will soon be released on CRAN. Compared to the standard Kohonen algorithm, it does not come with any additional cost in terms of computing time. Numerical experiments illustrate that missSOM performs well in terms of both clustering and imputation compared to the state of the art. In particular, it turns out that missSOM is robust to the missingness mechanism, which is in contrast to many imputation methods that are appropriate for only a single mechanism. This is an important property of missSOM as, in practice, the missingness mechanism is often unknown. An application to measurements on one type of part is also provided and shows the practical interest of missSOM.

Keywords: imputation method of missing data, partially observed data, robustness to missingness mechanism, self-organizing maps

Procedia PDF Downloads 125
5 Characterization of Agroforestry Systems in Burkina Faso Using an Earth Observation Data Cube

Authors: Dan Kanmegne

Abstract:

Africa will become the most populated continent by the end of the century, with around 4 billion inhabitants. Food security and climate changes will become continental issues since agricultural practices depend on climate but also contribute to global emissions and land degradation. Agroforestry has been identified as a cost-efficient and reliable strategy to address these two issues. It is defined as the integrated management of trees and crops/animals in the same land unit. Agroforestry provides benefits in terms of goods (fruits, medicine, wood, etc.) and services (windbreaks, fertility, etc.), and is acknowledged to have a great potential for carbon sequestration; therefore it can be integrated into reduction mechanisms of carbon emissions. Particularly in sub-Saharan Africa, the constraint stands in the lack of information about both areas under agroforestry and the characterization (composition, structure, and management) of each agroforestry system at the country level. This study describes and quantifies “what is where?”, earliest to the quantification of carbon stock in different systems. Remote sensing (RS) is the most efficient approach to map such a dynamic technology as agroforestry since it gives relatively adequate and consistent information over a large area at nearly no cost. RS data fulfill the good practice guidelines of the Intergovernmental Panel On Climate Change (IPCC) that is to be used in carbon estimation. Satellite data are getting more and more accessible, and the archives are growing exponentially. To retrieve useful information to support decision-making out of this large amount of data, satellite data needs to be organized so to ensure fast processing, quick accessibility, and ease of use. A new solution is a data cube, which can be understood as a multi-dimensional stack (space, time, data type) of spatially aligned pixels and used for efficient access and analysis. A data cube for Burkina Faso has been set up from the cooperation project between the international service provider WASCAL and Germany, which provides an accessible exploitation architecture of multi-temporal satellite data. The aim of this study is to map and characterize agroforestry systems using the Burkina Faso earth observation data cube. The approach in its initial stage is based on an unsupervised image classification of a normalized difference vegetation index (NDVI) time series from 2010 to 2018, to stratify the country based on the vegetation. Fifteen strata were identified, and four samples per location were randomly assigned to define the sampling units. For safety reasons, the northern part will not be part of the fieldwork. A total of 52 locations will be visited by the end of the dry season in February-March 2020. The field campaigns will consist of identifying and describing different agroforestry systems and qualitative interviews. A multi-temporal supervised image classification will be done with a random forest algorithm, and the field data will be used for both training the algorithm and accuracy assessment. The expected outputs are (i) map(s) of agroforestry dynamics, (ii) characteristics of different systems (main species, management, area, etc.); (iii) assessment report of Burkina Faso data cube.

Keywords: agroforestry systems, Burkina Faso, earth observation data cube, multi-temporal image classification

Procedia PDF Downloads 112
4 Recognizing Human Actions by Multi-Layer Growing Grid Architecture

Authors: Z. Gharaee

Abstract:

Recognizing actions performed by others is important in our daily lives since it is necessary for communicating with others in a proper way. We perceive an action by observing the kinematics of motions involved in the performance. We use our experience and concepts to make a correct recognition of the actions. Although building the action concepts is a life-long process, which is repeated throughout life, we are very efficient in applying our learned concepts in analyzing motions and recognizing actions. Experiments on the subjects observing the actions performed by an actor show that an action is recognized after only about two hundred milliseconds of observation. In this study, hierarchical action recognition architecture is proposed by using growing grid layers. The first-layer growing grid receives the pre-processed data of consecutive 3D postures of joint positions and applies some heuristics during the growth phase to allocate areas of the map by inserting new neurons. As a result of training the first-layer growing grid, action pattern vectors are generated by connecting the elicited activations of the learned map. The ordered vector representation layer receives action pattern vectors to create time-invariant vectors of key elicited activations. Time-invariant vectors are sent to second-layer growing grid for categorization. This grid creates the clusters representing the actions. Finally, one-layer neural network developed by a delta rule labels the action categories in the last layer. System performance has been evaluated in an experiment with the publicly available MSR-Action3D dataset. There are actions performed by using different parts of human body: Hand Clap, Two Hands Wave, Side Boxing, Bend, Forward Kick, Side Kick, Jogging, Tennis Serve, Golf Swing, Pick Up and Throw. The growing grid architecture was trained by applying several random selections of generalization test data fed to the system during on average 100 epochs for each training of the first-layer growing grid and around 75 epochs for each training of the second-layer growing grid. The average generalization test accuracy is 92.6%. A comparison analysis between the performance of growing grid architecture and self-organizing map (SOM) architecture in terms of accuracy and learning speed show that the growing grid architecture is superior to the SOM architecture in action recognition task. The SOM architecture completes learning the same dataset of actions in around 150 epochs for each training of the first-layer SOM while it takes 1200 epochs for each training of the second-layer SOM and it achieves the average recognition accuracy of 90% for generalization test data. In summary, using the growing grid network preserves the fundamental features of SOMs, such as topographic organization of neurons, lateral interactions, the abilities of unsupervised learning and representing high dimensional input space in the lower dimensional maps. The architecture also benefits from an automatic size setting mechanism resulting in higher flexibility and robustness. Moreover, by utilizing growing grids the system automatically obtains a prior knowledge of input space during the growth phase and applies this information to expand the map by inserting new neurons wherever there is high representational demand.

Keywords: action recognition, growing grid, hierarchical architecture, neural networks, system performance

Procedia PDF Downloads 133
3 Quantified Metabolomics for the Determination of Phenotypes and Biomarkers across Species in Health and Disease

Authors: Miroslava Cuperlovic-Culf, Lipu Wang, Ketty Boyle, Nadine Makley, Ian Burton, Anissa Belkaid, Mohamed Touaibia, Marc E. Surrette

Abstract:

Metabolic changes are one of the major factors in the development of a variety of diseases in various species. Metabolism of agricultural plants is altered the following infection with pathogens sometimes contributing to resistance. At the same time, pathogens use metabolites for infection and progression. In humans, metabolism is a hallmark of cancer development for example. Quantified metabolomics data combined with other omics or clinical data and analyzed using various unsupervised and supervised methods can lead to better diagnosis and prognosis. It can also provide information about resistance as well as contribute knowledge of compounds significant for disease progression or prevention. In this work, different methods for metabolomics quantification and analysis from Nuclear Magnetic Resonance (NMR) measurements that are used for investigation of disease development in wheat and human cells will be presented. One-dimensional 1H NMR spectra are used extensively for metabolic profiling due to their high reliability, wide range of applicability, speed, trivial sample preparation and low cost. This presentation will describe a new method for metabolite quantification from NMR data that combines alignment of spectra of standards to sample spectra followed by multivariate linear regression optimization of spectra of assigned metabolites to samples’ spectra. Several different alignment methods were tested and multivariate linear regression result has been compared with other quantification methods. Quantified metabolomics data can be analyzed in the variety of ways and we will present different clustering methods used for phenotype determination, network analysis providing knowledge about the relationships between metabolites through metabolic network as well as biomarker selection providing novel markers. These analysis methods have been utilized for the investigation of fusarium head blight resistance in wheat cultivars as well as analysis of the effect of estrogen receptor and carbonic anhydrase activation and inhibition on breast cancer cell metabolism. Metabolic changes in spikelet’s of wheat cultivars FL62R1, Stettler, MuchMore and Sumai3 following fusarium graminearum infection were explored. Extensive 1D 1H and 2D NMR measurements provided information for detailed metabolite assignment and quantification leading to possible metabolic markers discriminating resistance level in wheat subtypes. Quantification data is compared to results obtained using other published methods. Fusarium infection induced metabolic changes in different wheat varieties are discussed in the context of metabolic network and resistance. Quantitative metabolomics has been used for the investigation of the effect of targeted enzyme inhibition in cancer. In this work, the effect of 17 β -estradiol and ferulic acid on metabolism of ER+ breast cancer cells has been compared to their effect on ER- control cells. The effect of the inhibitors of carbonic anhydrase on the observed metabolic changes resulting from ER activation has also been determined. Metabolic profiles were studied using 1D and 2D metabolomic NMR experiments, combined with the identification and quantification of metabolites, and the annotation of the results is provided in the context of biochemical pathways.

Keywords: metabolic biomarkers, metabolic network, metabolomics, multivariate linear regression, NMR quantification, quantified metabolomics, spectral alignment

Procedia PDF Downloads 316
2 Phenotype and Psychometric Characterization of Phelan-Mcdermid Syndrome Patients

Authors: C. Bel, J. Nevado, F. Ciceri, M. Ropacki, T. Hoffmann, P. Lapunzina, C. Buesa

Abstract:

Background: The Phelan-McDermid syndrome (PMS) is a genetic disorder caused by the deletion of the terminal region of chromosome 22 or mutation of the SHANK3 gene. Shank3 disruption in mice leads to dysfunction of synaptic transmission, which can be restored by epigenetic regulation with both Lysine Specific Demethylase 1 (LSD1) inhibitors. PMS subjects result in a variable degree of intellectual disability, delay or absence of speech, autistic spectrum disorders symptoms, low muscle tone, motor delays and epilepsy. Vafidemstat is an LSD1 inhibitor in Phase II clinical development with a well-established and favorable safety profile, and data supporting the restoration of memory and cognition defects as well as reduction of agitation and aggression in several animal models and clinical studies. Therefore, vafidemstat has the potential to become a first-in-class precision medicine approach to treat PMS patients. Aims: The goal of this research is to perform an observational trial to psychometrically characterize individuals carrying deletions in SHANK3 and build a foundation for subsequent precision psychiatry clinical trials with vafidemstat. Methodology: This study is characterizing the clinical profile of 20 to 40 subjects, > 16-year-old, with genotypically confirmed PMS diagnosis. Subjects will complete a battery of neuropsychological scales, including the Repetitive Behavior Questionnaire (RBQ), Vineland Adaptive Behavior Scales, Escala de Observación para el Diagnostico del Autismo (Autism Diagnostic Observational Scale) (ADOS)-2, the Battelle Developmental Inventory and the Behavior Problems Inventory (BPI). Results: By March 2021, 19 patients have been enrolled. Unsupervised hierarchical clustering of the results obtained so far identifies 3 groups of patients, characterized by different profiles of cognitive and behavioral scores. The first cluster is characterized by low Battelle age, high ADOS and low Vineland, RBQ and BPI scores. Low Vineland, RBQ and BPI scores are also detected in the second cluster, which in contrast has high Battelle age and low ADOS scores. The third cluster is somewhat in the middle for the Battelle, Vineland and ADOS scores while displaying the highest levels of aggression (high BPI) and repeated behaviors (high RBQ). In line with the observation that female patients are generally affected by milder forms of autistic symptoms, no male patients are present in the second cluster. Dividing the results by gender highlights that male patients in the third cluster are characterized by a higher frequency of aggression, whereas female patients from the same cluster display a tendency toward higher repetitive behavior. Finally, statistically significant differences in deletion sizes are detected comparing the three clusters (also after correcting for gender), and deletion size appears to be positively correlated with ADOS and negatively correlated with Vineland A and C scores. No correlation is detected between deletion size and the BPI and RBQ scores. Conclusions: Precision medicine may open a new way to understand and treat Central Nervous System disorders. Epigenetic dysregulation has been proposed to be an important mechanism in the pathogenesis of schizophrenia and autism. Vafidemstat holds exciting therapeutic potential in PMS, and this study will provide data regarding the optimal endpoints for a future clinical study to explore vafidemstat ability to treat shank3-associated psychiatric disorders.

Keywords: autism, epigenetics, LSD1, personalized medicine

Procedia PDF Downloads 136
1 Predicting Open Chromatin Regions in Cell-Free DNA Whole Genome Sequencing Data by Correlation Clustering  

Authors: Fahimeh Palizban, Farshad Noravesh, Amir Hossein Saeidian, Mahya Mehrmohamadi

Abstract:

In the recent decade, the emergence of liquid biopsy has significantly improved cancer monitoring and detection. Dying cells, including those originating from tumors, shed their DNA into the blood and contribute to a pool of circulating fragments called cell-free DNA. Accordingly, identifying the tissue origin of these DNA fragments from the plasma can result in more accurate and fast disease diagnosis and precise treatment protocols. Open chromatin regions are important epigenetic features of DNA that reflect cell types of origin. Profiling these features by DNase-seq, ATAC-seq, and histone ChIP-seq provides insights into tissue-specific and disease-specific regulatory mechanisms. There have been several studies in the area of cancer liquid biopsy that integrate distinct genomic and epigenomic features for early cancer detection along with tissue of origin detection. However, multimodal analysis requires several types of experiments to cover the genomic and epigenomic aspects of a single sample, which will lead to a huge amount of cost and time. To overcome these limitations, the idea of predicting OCRs from WGS is of particular importance. In this regard, we proposed a computational approach to target the prediction of open chromatin regions as an important epigenetic feature from cell-free DNA whole genome sequence data. To fulfill this objective, local sequencing depth will be fed to our proposed algorithm and the prediction of the most probable open chromatin regions from whole genome sequencing data can be carried out. Our method integrates the signal processing method with sequencing depth data and includes count normalization, Discrete Fourie Transform conversion, graph construction, graph cut optimization by linear programming, and clustering. To validate the proposed method, we compared the output of the clustering (open chromatin region+, open chromatin region-) with previously validated open chromatin regions related to human blood samples of the ATAC-DB database. The percentage of overlap between predicted open chromatin regions and the experimentally validated regions obtained by ATAC-seq in ATAC-DB is greater than 67%, which indicates meaningful prediction. As it is evident, OCRs are mostly located in the transcription start sites (TSS) of the genes. In this regard, we compared the concordance between the predicted OCRs and the human genes TSS regions obtained from refTSS and it showed proper accordance around 52.04% and ~78% with all and the housekeeping genes, respectively. Accurately detecting open chromatin regions from plasma cell-free DNA-seq data is a very challenging computational problem due to the existence of several confounding factors, such as technical and biological variations. Although this approach is in its infancy, there has already been an attempt to apply it, which leads to a tool named OCRDetector with some restrictions like the need for highly depth cfDNA WGS data, prior information about OCRs distribution, and considering multiple features. However, we implemented a graph signal clustering based on a single depth feature in an unsupervised learning manner that resulted in faster performance and decent accuracy. Overall, we tried to investigate the epigenomic pattern of a cell-free DNA sample from a new computational perspective that can be used along with other tools to investigate genetic and epigenetic aspects of a single whole genome sequencing data for efficient liquid biopsy-related analysis.

Keywords: open chromatin regions, cancer, cell-free DNA, epigenomics, graph signal processing, correlation clustering

Procedia PDF Downloads 110