Search results for: sequence data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25119

Search results for: sequence data

24789 The Impact of the Variation of Sky View Factor on Landscape Degree of Enclosure of Urban Blue and Green Belt

Authors: Yi-Chun Huang, Kuan-Yun Chen, Chuang-Hung Lin

Abstract:

Urban Green Belt and Blue is a part of the city landscape, it is an important constituent element of the urban environment and appearance. The Hsinchu East Gate Moat is situated in the center of the city, which not only has a wealth of historical and cultural resources, but also combines the Green Belt and the Blue Belt qualities at the same time. The Moat runs more than a thousand meters through the vital Green Belt and the Blue Belt in downtown, and each section is presented in different qualities of moat from south to north. The water area and the green belt of surroundings are presented linear and banded spread. The water body and the rich diverse river banks form an urban green belt of rich layers. The watercourse with green belt design lets users have connections with blue belts in different ways; therefore, the integration of Hsinchu East Gate and moat have become one of the unique urban landscapes in Taiwan. The study is based on the fact-finding case of Hsinchu East Gate Moat where situated in northern Taiwan, to research the impact between the SVF variation of the city and spatial sequence of Urban Green Belt and Blue landscape and visual analysis by constituent cross-section, and then comparing the influence of different leaf area index – the variable ecological factors to the degree of enclosure. We proceed to survey the landscape design of open space, to measure existing structural features of the plant canopy which contain the height of plants and branches, the crown diameter, breast-height diameter through access to diagram of Geographic Information Systems (GIS) and on-the-spot actual measurement. The north and south districts of blue green belt areas are divided 20 meters into a unit from East Gate Roundabout as the epicenter, and to set up a survey points to measure the SVF above the survey points; then we proceed to quantitative analysis from the data to calculate open landscape degree of enclosure. The results can be reference for the composition of future river landscape and the practical operation for dynamic space planning of blue and green belt landscape.

Keywords: sky view factor, degree of enclosure, spatial sequence, leaf area indices

Procedia PDF Downloads 535
24788 An Unbiased Profiling of Immune Repertoire via Sequencing and Analyzing T-Cell Receptor Genes

Authors: Yi-Lin Chen, Sheng-Jou Hung, Tsunglin Liu

Abstract:

Adaptive immune system recognizes a wide range of antigens via expressing a large number of structurally distinct T cell and B cell receptor genes. The distinct receptor genes arise from complex rearrangements called V(D)J recombination, and constitute the immune repertoire. A common method of profiling immune repertoire is via amplifying recombined receptor genes using multiple primers and high-throughput sequencing. This multiplex-PCR approach is efficient; however, the resulting repertoire can be distorted because of primer bias. To eliminate primer bias, 5’ RACE is an alternative amplification approach. However, the application of RACE approach is limited by its low efficiency (i.e., the majority of data are non-regular receptor sequences, e.g., containing intronic segments) and lack of the convenient tool for analysis. We propose a computational tool that can correctly identify non-regular receptor sequences in RACE data via aligning receptor sequences against the whole gene instead of only the exon regions as done in all other tools. Using our tool, the remaining regular data allow for an accurate profiling of immune repertoire. In addition, a RACE approach is improved to yield a higher fraction of regular T-cell receptor sequences. Finally, we quantify the degree of primer bias of a multiplex-PCR approach via comparing it to the RACE approach. The results reveal significant differences in frequency of VJ combination by the two approaches. Together, we provide a new experimental and computation pipeline for an unbiased profiling of immune repertoire. As immune repertoire profiling has many applications, e.g., tracing bacterial and viral infection, detection of T cell lymphoma and minimal residual disease, monitoring cancer immunotherapy, etc., our work should benefit scientists who are interested in the applications.

Keywords: immune repertoire, T-cell receptor, 5' RACE, high-throughput sequencing, sequence alignment

Procedia PDF Downloads 172
24787 Microbubbles Enhanced Synthetic Phorbol Ester Degradation by Ozonolysis

Authors: D. Kuvshinov, A. Siswanto, W. Zimmerman

Abstract:

A phorbol-12-myristate-13-acetate (TPA) is a synthetic analogue of phorbol ester (PE), a natural toxic compound of Euphorbiaceae plant. The oil extracted from plants of this family is useful source for primarily biofuel. However this oil can also be used as a food stock due to its significant nutrition content. The limitations for utilizing the oil as a food stock are mainly due to a toxicity of PE. Nowadays a majority of PE detoxification processes are expensive as include multi steps alcohol extraction sequence. Ozone is considered as a strong oxidative agent. It reaction with PE it attacks the carbon double bond of PE. This modification of PE molecular structure results into nontoxic ester with high lipid content. This report presents data on development of simple and cheap PE detoxification process with water application as a buffer and ozone as reactive component. The core of this new technique is a simultaneous application of new microscale plasma unit for ozone production and patented gas oscillation technology. In combination with a reactor design the technology permits ozone injection to the water-TPA mixture in form of microbubbles. The efficacy of a heterogeneous process depends on diffusion coefficient which can be controlled by contact time and interface area. The low velocity of rising microbubbles and high surface to volume ratio allow fast mass transfer to be achieved during the process. Direct injection of ozone is the most efficient process for a highly reactive and short lived chemical. Data on the plasma unit behavior are presented and influence of the gas oscillation technology to the microbubbles production mechanism has been discussed. Data on overall process efficacy for TPA degradation is shown.

Keywords: microbubble, ozonolysis, synthetic phorbol ester, chemical engineering

Procedia PDF Downloads 199
24786 A Quinary Coding and Matrix Structure Based Channel Hopping Algorithm for Blind Rendezvous in Cognitive Radio Networks

Authors: Qinglin Liu, Zhiyong Lin, Zongheng Wei, Jianfeng Wen, Congming Yi, Hai Liu

Abstract:

The multi-channel blind rendezvous problem in distributed cognitive radio networks (DCRNs) refers to how users in the network can hop to the same channel at the same time slot without any prior knowledge (i.e., each user is unaware of other users' information). The channel hopping (CH) technique is a typical solution to this blind rendezvous problem. In this paper, we propose a quinary coding and matrix structure-based CH algorithm called QCMS-CH. The QCMS-CH algorithm can guarantee the rendezvous of users using only one cognitive radio in the scenario of the asynchronous clock (i.e., arbitrary time drift between the users), heterogeneous channels (i.e., the available channel sets of users are distinct), and symmetric role (i.e., all users play a same role). The QCMS-CH algorithm first represents a randomly selected channel (denoted by R) as a fixed-length quaternary number. Then it encodes the quaternary number into a quinary bootstrapping sequence according to a carefully designed quaternary-quinary coding table with the prefix "R00". Finally, it builds a CH matrix column by column according to the bootstrapping sequence and six different types of elaborately generated subsequences. The user can access the CH matrix row by row and accordingly perform its channel, hoping to attempt rendezvous with other users. We prove the correctness of QCMS-CH and derive an upper bound on its Maximum Time-to-Rendezvous (MTTR). Simulation results show that the QCMS-CH algorithm outperforms the state-of-the-art in terms of the MTTR and the Expected Time-to-Rendezvous (ETTR).

Keywords: channel hopping, blind rendezvous, cognitive radio networks, quaternary-quinary coding

Procedia PDF Downloads 67
24785 Full Characterization of Heterogeneous Antibody Samples under Denaturing and Native Conditions on a Hybrid Quadrupole-Orbitrap Mass Spectrometer

Authors: Rowan Moore, Kai Scheffler, Eugen Damoc, Jennifer Sutton, Aaron Bailey, Stephane Houel, Simon Cubbon, Jonathan Josephs

Abstract:

Purpose: MS analysis of monoclonal antibodies (mAbs) at the protein and peptide levels is critical during development and production of biopharmaceuticals. The compositions of current generation therapeutic proteins are often complex due to various modifications which may affect efficacy. Intact proteins analyzed by MS are detected in higher charge states that also provide more complexity in mass spectra. Protein analysis in native or native-like conditions with zero or minimal organic solvent and neutral or weakly acidic pH decreases charge state value resulting in mAb detection at higher m/z ranges with more spatial resolution. Methods: Three commercially available mAbs were used for all experiments. Intact proteins were desalted online using size exclusion chromatography (SEC) or reversed phase chromatography coupled on-line with a mass spectrometer. For streamlined use of the LC- MS platform we used a single SEC column and alternately selected specific mobile phases to perform separations in either denaturing or native-like conditions: buffer A (20 % ACN, 0.1 % FA) with Buffer B (100 mM ammonium acetate). For peptide analysis mAbs were proteolytically digested with and without prior reduction and alkylation. The mass spectrometer used for all experiments was a commercially available Thermo Scientific™ hybrid Quadrupole-Orbitrap™ mass spectrometer, equipped with the new BioPharma option which includes a new High Mass Range (HMR) mode that allows for improved high mass transmission and mass detection up to 8000 m/z. Results: We have analyzed the profiles of three mAbs under reducing and native conditions by direct infusion with offline desalting and with on-line desalting via size exclusion and reversed phase type columns. The presence of high salt under denaturing conditions was found to influence the observed charge state envelope and impact mass accuracy after spectral deconvolution. The significantly lower charge states observed under native conditions improves the spatial resolution of protein signals and has significant benefits for the analysis of antibody mixtures, e.g. lysine variants, degradants or sequence variants. This type of analysis requires the detection of masses beyond the standard mass range ranging up to 6000 m/z requiring the extended capabilities available in the new HMR mode. We have compared each antibody sample that was analyzed individually with mixtures in various relative concentrations. For this type of analysis, we observed that apparent native structures persist and ESI is benefited by the addition of low amounts of acetonitrile and formic acid in combination with the ammonium acetate-buffered mobile phase. For analyses on the peptide level we analyzed reduced/alkylated, and non-reduced proteolytic digests of the individual antibodies separated via reversed phase chromatography aiming to retrieve as much information as possible regarding sequence coverage, disulfide bridges, post-translational modifications such as various glycans, sequence variants, and their relative quantification. All data acquired were submitted to a single software package for analysis aiming to obtain a complete picture of the molecules analyzed. Here we demonstrate the capabilities of the mass spectrometer to fully characterize homogeneous and heterogeneous therapeutic proteins on one single platform. Conclusion: Full characterization of heterogeneous intact protein mixtures by improved mass separation on a quadrupole-Orbitrap™ mass spectrometer with extended capabilities has been demonstrated.

Keywords: disulfide bond analysis, intact analysis, native analysis, mass spectrometry, monoclonal antibodies, peptide mapping, post-translational modifications, sequence variants, size exclusion chromatography, therapeutic protein analysis, UHPLC

Procedia PDF Downloads 344
24784 Taxonomic Classification for Living Organisms Using Convolutional Neural Networks

Authors: Saed Khawaldeh, Mohamed Elsharnouby, Alaa Eddin Alchalabi, Usama Pervaiz, Tajwar Aleef, Vu Hoang Minh

Abstract:

Taxonomic classification has a wide-range of applications such as finding out more about the evolutionary history of organisms that can be done by making a comparison between species living now and species that lived in the past. This comparison can be made using different kinds of extracted species’ data which include DNA sequences. Compared to the estimated number of the organisms that nature harbours, humanity does not have a thorough comprehension of which specific species they all belong to, in spite of the significant development of science and scientific knowledge over many years. One of the methods that can be applied to extract information out of the study of organisms in this regard is to use the DNA sequence of a living organism as a marker, thus making it available to classify it into a taxonomy. The classification of living organisms can be done in many machine learning techniques including Neural Networks (NNs). In this study, DNA sequences classification is performed using Convolutional Neural Networks (CNNs) which is a special type of NNs.

Keywords: deep networks, convolutional neural networks, taxonomic classification, DNA sequences classification

Procedia PDF Downloads 414
24783 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 298
24782 Towards Creative Movie Title Generation Using Deep Neural Models

Authors: Simon Espigolé, Igor Shalyminov, Helen Hastie

Abstract:

Deep machine learning techniques including deep neural networks (DNN) have been used to model language and dialogue for conversational agents to perform tasks, such as giving technical support and also for general chit-chat. They have been shown to be capable of generating long, diverse and coherent sentences in end-to-end dialogue systems and natural language generation. However, these systems tend to imitate the training data and will only generate the concepts and language within the scope of what they have been trained on. This work explores how deep neural networks can be used in a task that would normally require human creativity, whereby the human would read the movie description and/or watch the movie and come up with a compelling, interesting movie title. This task differs from simple summarization in that the movie title may not necessarily be derivable from the content or semantics of the movie description. Here, we train a type of DNN called a sequence-to-sequence model (seq2seq) that takes as input a short textual movie description and some information on e.g. genre of the movie. It then learns to output a movie title. The idea is that the DNN will learn certain techniques and approaches that the human movie titler may deploy that may not be immediately obvious to the human-eye. To give an example of a generated movie title, for the movie synopsis: ‘A hitman concludes his legacy with one more job, only to discover he may be the one getting hit.’; the original, true title is ‘The Driver’ and the one generated by the model is ‘The Masquerade’. A human evaluation was conducted where the DNN output was compared to the true human-generated title, as well as a number of baselines, on three 5-point Likert scales: ‘creativity’, ‘naturalness’ and ‘suitability’. Subjects were also asked which of the two systems they preferred. The scores of the DNN model were comparable to the scores of the human-generated movie title, with means m=3.11, m=3.12, respectively. There is room for improvement in these models as they were rated significantly less ‘natural’ and ‘suitable’ when compared to the human title. In addition, the human-generated title was preferred overall 58% of the time when pitted against the DNN model. These results, however, are encouraging given the comparison with a highly-considered, well-crafted human-generated movie title. Movie titles go through a rigorous process of assessment by experts and focus groups, who have watched the movie. This process is in place due to the large amount of money at stake and the importance of creating an effective title that captures the audiences’ attention. Our work shows progress towards automating this process, which in turn may lead to a better understanding of creativity itself.

Keywords: creativity, deep machine learning, natural language generation, movies

Procedia PDF Downloads 306
24781 Data Refinement Enhances The Accuracy of Short-Term Traffic Latency Prediction

Authors: Man Fung Ho, Lap So, Jiaqi Zhang, Yuheng Zhao, Huiyang Lu, Tat Shing Choi, K. Y. Michael Wong

Abstract:

Nowadays, a tremendous amount of data is available in the transportation system, enabling the development of various machine learning approaches to make short-term latency predictions. A natural question is then the choice of relevant information to enable accurate predictions. Using traffic data collected from the Taiwan Freeway System, we consider the prediction of short-term latency of a freeway segment with a length of 17 km covering 5 measurement points, each collecting vehicle-by-vehicle data through the electronic toll collection system. The processed data include the past latencies of the freeway segment with different time lags, the traffic conditions of the individual segments (the accumulations, the traffic fluxes, the entrance and exit rates), the total accumulations, and the weekday latency profiles obtained by Gaussian process regression of past data. We arrive at several important conclusions about how data should be refined to obtain accurate predictions, which have implications for future system-wide latency predictions. (1) We find that the prediction of median latency is much more accurate and meaningful than the prediction of average latency, as the latter is plagued by outliers. This is verified by machine-learning prediction using XGBoost that yields a 35% improvement in the mean square error of the 5-minute averaged latencies. (2) We find that the median latency of the segment 15 minutes ago is a very good baseline for performance comparison, and we have evidence that further improvement is achieved by machine learning approaches such as XGBoost and Long Short-Term Memory (LSTM). (3) By analyzing the feature importance score in XGBoost and calculating the mutual information between the inputs and the latencies to be predicted, we identify a sequence of inputs ranked in importance. It confirms that the past latencies are most informative of the predicted latencies, followed by the total accumulation, whereas inputs such as the entrance and exit rates are uninformative. It also confirms that the inputs are much less informative of the average latencies than the median latencies. (4) For predicting the latencies of segments composed of two or three sub-segments, summing up the predicted latencies of each sub-segment is more accurate than the one-step prediction of the whole segment, especially with the latency prediction of the downstream sub-segments trained to anticipate latencies several minutes ahead. The duration of the anticipation time is an increasing function of the traveling time of the upstream segment. The above findings have important implications to predicting the full set of latencies among the various locations in the freeway system.

Keywords: data refinement, machine learning, mutual information, short-term latency prediction

Procedia PDF Downloads 151
24780 TMIF: Transformer-Based Multi-Modal Interactive Fusion for Rumor Detection

Authors: Jiandong Lv, Xingang Wang, Cuiling Shao

Abstract:

The rapid development of social media platforms has made it one of the important news sources. While it provides people with convenient real-time communication channels, fake news and rumors are also spread rapidly through social media platforms, misleading the public and even causing bad social impact in view of the slow speed and poor consistency of artificial rumor detection. We propose an end-to-end rumor detection model-TIMF, which captures the dependencies between multimodal data based on the interactive attention mechanism, uses a transformer for cross-modal feature sequence mapping and combines hybrid fusion strategies to obtain decision results. This paper verifies two multi-modal rumor detection datasets and proves the superior performance and early detection performance of the proposed model.

Keywords: hybrid fusion, multimodal fusion, rumor detection, social media, transformer

Procedia PDF Downloads 204
24779 Phylogenetic Relationships of the Malaysian Primates Cercopithecine Based on COI Gene Sequences

Authors: B. M. Md-Zain, N. A. Rahman, M. A. B. Abdul-Latiff, W. M. R. Idris

Abstract:

We conducted molecular research to portray phylogenetic relationships of Malaysian primates particularly in the genus of Macaca. We have sequenced cytochrome C oxidase subunit I (COI) of mitochondrial DNA of several individuals from M. fascicularis and M. arctoides. PCR amplifications were performed and COI DNA sequences were aligned using ClustalW. Phylogenetic trees were constructed using distance analyses by employing neighbor-joining algorithm (NJ). We managed to sequence 700 bp of COI DNA sequences. The tree topology showed that M. fascicularis did not clump based on phyleogeography division in Peninsular Malaysia. Individuals from Negeri Sembilan merged together with samples from Perak and Penang into one clade. In addition, phylogenetic analyses indicated that M. arctoides was classified into sinica group instead of fascicularis group supported by genetic distance data. COI gene is an effective locus to clarify phylogenetic position of M. arctoides but not in discriminating M. fascicularis population in Peninsular Malaysia.

Keywords: cercopithecine, long-tailed macaque, Macaca fascicularis, Macaca arctoides

Procedia PDF Downloads 331
24778 Modelling Biological Treatment of Dye Wastewater in SBR Systems Inoculated with Bacteria by Artificial Neural Network

Authors: Yasaman Sanayei, Alireza Bahiraie

Abstract:

This paper presents a systematic methodology based on the application of artificial neural networks for sequencing batch reactor (SBR). The SBR is a fill-and-draw biological wastewater technology, which is specially suited for nutrient removal. Employing reactive dye by Sphingomonas paucimobilis bacteria at sequence batch reactor is a novel approach of dye removal. The influent COD, MLVSS, and reaction time were selected as the process inputs and the effluent COD and BOD as the process outputs. The best possible result for the discrete pole parameter was a= 0.44. In orderto adjust the parameters of ANN, the Levenberg-Marquardt (LM) algorithm was employed. The results predicted by the model were compared to the experimental data and showed a high correlation with R2> 0.99 and a low mean absolute error (MAE). The results from this study reveal that the developed model is accurate and efficacious in predicting COD and BOD parameters of the dye-containing wastewater treated by SBR. The proposed modeling approach can be applied to other industrial wastewater treatment systems to predict effluent characteristics. Note that SBR are normally operated with constant predefined duration of the stages, thus, resulting in low efficient operation. Data obtained from the on-line electronic sensors installed in the SBR and from the control quality laboratory analysis have been used to develop the optimal architecture of two different ANN. The results have shown that the developed models can be used as efficient and cost-effective predictive tools for the system analysed.

Keywords: artificial neural network, COD removal, SBR, Sphingomonas paucimobilis

Procedia PDF Downloads 392
24777 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 171
24776 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 490
24775 A Sequence of Traumatic Pain: Feminist Issues within Laila Al-Othman’s Ṣamt al-Farāshāt (Silence of the Butterflies)

Authors: Khaled Igbaria

Abstract:

Laila Al-Othman is a well-known feminist writer in Kuwait and the entire Arab world. She was born in 1943 in Kuwait to a large and wealthy family. The author has written several short stories, as well as novels, such as The Woman and the Cat (1985) and Wasumayya Comes out of the Sea (1986), which was chosen as one of the best 100 Arab novels of the 21st century. Another prominent novel of hers is Ṣamt al-Farāshāt [Silence of the Butterflies] (2007), which was highly controversial in her native Kuwait upon publication. For this study, her engagement in feminism was achieved by exploring the different ways in which her novel, Ṣamt al-Farāshāt [Silence of the Butterflies], addresses several feminist issues, mainly forced marriage, rape and sexual abuse, gender-based physical, sexual violence, and enforced silence. This paper focuses on demonstrating social obstacles and continuous trauma caused by a sequence of pain experienced by Arab females in their patriarchal society. This study argues that the novel reveals a sustained effort to raise the banner of feminism and a strong desire to liberate Arab women from patriarchal domination. Al-Othman successfully and uniquely represents women as gender-based traumatic victims of sexual and physical violence, forced silence, and general oppression in the patriarchal Arab society, as those needing help, support, protection, and liberation. They are not represented as independent or free. Methodologically, the study employs a qualitative literary analysis method in addition to trauma theory psychoanalysis, concentrating on feminist issues highlighted in the novel.

Keywords: Al-Othman, Arab women pain, trauma within narration., Silence of the Butterflies

Procedia PDF Downloads 34
24774 Building User Behavioral Models by Processing Web Logs and Clustering Mechanisms

Authors: Madhuka G. P. D. Udantha, Gihan V. Dias, Surangika Ranathunga

Abstract:

Today Websites contain very interesting applications. But there are only few methodologies to analyze User navigations through the Websites and formulating if the Website is put to correct use. The web logs are only used if some major attack or malfunctioning occurs. Web Logs contain lot interesting dealings on users in the system. Analyzing web logs has become a challenge due to the huge log volume. Finding interesting patterns is not as easy as it is due to size, distribution and importance of minor details of each log. Web logs contain very important data of user and site which are not been put to good use. Retrieving interesting information from logs gives an idea of what the users need, group users according to their various needs and improve site to build an effective and efficient site. The model we built is able to detect attacks or malfunctioning of the system and anomaly detection. Logs will be more complex as volume of traffic and the size and complexity of web site grows. Unsupervised techniques are used in this solution which is fully automated. Expert knowledge is only used in validation. In our approach first clean and purify the logs to bring them to a common platform with a standard format and structure. After cleaning module web session builder is executed. It outputs two files, Web Sessions file and Indexed URLs file. The Indexed URLs file contains the list of URLs accessed and their indices. Web Sessions file lists down the indices of each web session. Then DBSCAN and EM Algorithms are used iteratively and recursively to get the best clustering results of the web sessions. Using homogeneity, completeness, V-measure, intra and inter cluster distance and silhouette coefficient as parameters these algorithms self-evaluate themselves to input better parametric values to run the algorithms. If a cluster is found to be too large then micro-clustering is used. Using Cluster Signature Module the clusters are annotated with a unique signature called finger-print. In this module each cluster is fed to Associative Rule Learning Module. If it outputs confidence and support as value 1 for an access sequence it would be a potential signature for the cluster. Then the access sequence occurrences are checked in other clusters. If it is found to be unique for the cluster considered then the cluster is annotated with the signature. These signatures are used in anomaly detection, prevent cyber attacks, real-time dashboards that visualize users, accessing web pages, predict actions of users and various other applications in Finance, University Websites, News and Media Websites etc.

Keywords: anomaly detection, clustering, pattern recognition, web sessions

Procedia PDF Downloads 267
24773 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 302
24772 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 432
24771 Automatic Reporting System for Transcriptome Indel Identification and Annotation Based on Snapshot of Next-Generation Sequencing Reads Alignment

Authors: Shuo Mu, Guangzhi Jiang, Jinsa Chen

Abstract:

The analysis of Indel for RNA sequencing of clinical samples is easily affected by sequencing experiment errors and software selection. In order to improve the efficiency and accuracy of analysis, we developed an automatic reporting system for Indel recognition and annotation based on image snapshot of transcriptome reads alignment. This system includes sequence local-assembly and realignment, target point snapshot, and image-based recognition processes. We integrated high-confidence Indel dataset from several known databases as a training set to improve the accuracy of image processing and added a bioinformatical processing module to annotate and filter Indel artifacts. Subsequently, the system will automatically generate data, including data quality levels and images results report. Sanger sequencing verification of the reference Indel mutation of cell line NA12878 showed that the process can achieve 83% sensitivity and 96% specificity. Analysis of the collected clinical samples showed that the interpretation accuracy of the process was equivalent to that of manual inspection, and the processing efficiency showed a significant improvement. This work shows the feasibility of accurate Indel analysis of clinical next-generation sequencing (NGS) transcriptome. This result may be useful for RNA study for clinical samples with microsatellite instability in immunotherapy in the future.

Keywords: automatic reporting, indel, next-generation sequencing, NGS, transcriptome

Procedia PDF Downloads 157
24770 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 225
24769 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 250
24768 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 338
24767 Approach to Honey Volatiles' Profiling by Gas Chromatography and Mass Spectrometry

Authors: Igor Jerkovic

Abstract:

Biodiversity of flora provides many different nectar sources for the bees. Unifloral honeys possess distinctive flavours, mainly derived from their nectar sources (characteristic volatile organic components (VOCs)). Specific or nonspecific VOCs (chemical markers) could be used for unifloral honey characterisation as addition to the melissopalynologycal analysis. The main honey volatiles belong, in general, to three principal categories: terpenes, norisoprenoids, and benzene derivatives. Some of these substances have been described as characteristics of the floral source, and other compounds, like several alcohols, branched aldehydes, and furan derivatives, may be related to the microbial purity of honey processing and storage conditions. Selection of the extraction method for the honey volatiles profiling should consider that heating of the honey produce different artefacts and therefore conventional methods of VOCs isolation (such as hydrodistillation) cannot be applied for the honey. Two-way approach for the isolation of the honey VOCs was applied using headspace solid-phase microextraction (HS-SPME) and ultrasonic solvent extraction (USE). The extracts were analysed by gas chromatography and mass spectrometry (GC-MS). HS-SPME (with the fibers of different polarity such as polydimethylsiloxane/ divinylbenzene (PDMS/DVB) or divinylbenzene/carboxene/ polydimethylsiloxane (DVB/CAR/PDMS)) enabled isolation of high volatile headspace VOCs of the honey samples. Among them, some characteristic or specific compounds can be found such as 3,4-dihydro-3-oxoedulan (in Centaurea cyanus L. honey) or 1H-indole, methyl anthranilate, and cis-jasmone (in Citrus unshiu Marc. honey). USE with different solvents (mainly dichloromethane or the mixture pentane : diethyl ether 1 : 2 v/v) enabled isolation of less volatile and semi-volatile VOCs of the honey samples. Characteristic compounds from C. unshiu honey extracts were caffeine, 1H-indole, 1,3-dihydro-2H-indol-2-one, methyl anthranilate, and phenylacetonitrile. Sometimes, the selection of solvent sequence was useful for more complete profiling such as sequence I: pentane → diethyl ether or sequence II: pentane → pentane/diethyl ether (1:2, v/v) → dichloromethane). The extracts with diethyl ether contained hydroquinone and 4-hydroxybenzoic acid as the major compounds, while (E)-4-(r-1’,t-2’,c-4’-trihydroxy-2’,6’,6’-trimethylcyclo-hexyl)but-3-en-2-one predominated in dichloromethane extracts of Allium ursinum L. honey. With this two-way approach, it was possible to obtain a more detailed insight into the honey volatile and semi-volatile compounds and to minimize the risks of compound discrimination due to their partial extraction that is of significant importance for the complete honey profiling and identification of the chemical biomarkers that can complement the pollen analysis.

Keywords: honey chemical biomarkers, honey volatile compounds profiling, headspace solid-phase microextraction (HS-SPME), ultrasonic solvent extraction (USE)

Procedia PDF Downloads 181
24766 Combination of Geological, Geophysical and Reservoir Engineering Analyses in Field Development: A Case Study

Authors: Atif Zafar, Fan Haijun

Abstract:

A sequence of different Reservoir Engineering methods and tools in reservoir characterization and field development are presented in this paper. The real data of Jin Gas Field of L-Basin of Pakistan is used. The basic concept behind this work is to enlighten the importance of well test analysis in a broader way (i.e. reservoir characterization and field development) unlike to just determine the permeability and skin parameters. Normally in the case of reservoir characterization we rely on well test analysis to some extent but for field development plan, the well test analysis has become a forgotten tool specifically for locations of new development wells. This paper describes the successful implementation of well test analysis in Jin Gas Field where the main uncertainties are identified during initial stage of field development when location of new development well was marked only on the basis of G&G (Geologic and Geophysical) data. The seismic interpretation could not encounter one of the boundary (fault, sub-seismic fault, heterogeneity) near the main and only producing well of Jin Gas Field whereas the results of the model from the well test analysis played a very crucial rule in order to propose the location of second well of the newly discovered field. The results from different methods of well test analysis of Jin Gas Field are also integrated with and supported by other tools of Reservoir Engineering i.e. Material Balance Method and Volumetric Method. In this way, a comprehensive way out and algorithm is obtained in order to integrate the well test analyses with Geological and Geophysical analyses for reservoir characterization and field development. On the strong basis of this working and algorithm, it was successfully evaluated that the proposed location of new development well was not justified and it must be somewhere else except South direction.

Keywords: field development plan, reservoir characterization, reservoir engineering, well test analysis

Procedia PDF Downloads 344
24765 A Basic Modeling Approach for the 3D Protein Structure of Insulin

Authors: Daniel Zarzo Montes, Manuel Zarzo Castelló

Abstract:

Proteins play a fundamental role in biology, but their structure is complex, and it is a challenge for teachers to conceptually explain the differences between their primary, secondary, tertiary, and quaternary structures. On the other hand, there are currently many computer programs to visualize the 3D structure of proteins, but they require advanced training and knowledge. Moreover, it becomes difficult to visualize the sequence of amino acids in these models, and how the protein conformation is reached. Given this drawback, a simple and instructive procedure is proposed in order to teach the protein structure to undergraduate and graduate students. For this purpose, insulin has been chosen because it is a protein that consists of 51 amino acids, a relatively small number. The methodology has consisted of the use of plastic atom models, which are frequently used in organic chemistry and biochemistry to explain the chirality of biomolecules. For didactic purposes, when the aim is to teach the biochemical foundations of proteins, a manipulative system seems convenient, starting from the chemical structure of amino acids. It has the advantage that the bonds between amino acids can be conveniently rotated, following the pattern marked by the 3D models. First, the 51 amino acids were modeled, and then they were linked according to the sequence of this protein. Next, the three disulfide bonds that characterize the stability of insulin have been established, and then the alpha-helix structure has been formed. In order to reach the tertiary 3D conformation of this protein, different interactive models available on the Internet have been visualized. In conclusion, the proposed methodology seems very suitable for biology and biochemistry students because they can learn the fundamentals of protein modeling by means of a manipulative procedure as a basis for understanding the functionality of proteins. This methodology would be conveniently useful for a biology or biochemistry laboratory practice, either at the pre-graduate or university level.

Keywords: protein structure, 3D model, insulin, biomolecule

Procedia PDF Downloads 32
24764 Role of HLA Typing in Celiac Disease

Authors: Meriche Hacene

Abstract:

Introduction: Celiac disease (CD) is a chronic immune-mediated enteropathy triggered by gluten found in wheat or oats or rye. Celiac disease is associated with the HLA-DQ2 and HLA-DQ8 susceptibility alleles. This association with the HLA DQ2/DQ8 molecules confirmed the responsibility of genetic factors that intervene in the triggering of the autoimmune process of this condition. Objective: To evaluate the results of HLA DQ2 and HLA DQ8 typing of 40 patients suspected of having CD by PCR-SSP (Polymerase Chain Reaction Sequence Specific Primers). Material and method : 40 patients suspected of celiac disease with IgA transglutaminase serology (-) and duodenal biopsy (+). HLADR/DQ PCR-SSP (fluogen-innotrain) typing was carried out. Results : The average age of adults was 40 years, children: 4 years, the sex ratio was 1M/3F. In our patients the HLA DQ2 allele is found with a frequency of 75%, the DQ8 with a frequency of 25%, 17.5% were HLA-DQ2 homozygous and 15% were HLADQ2/HLADQ8. In our series, HLADQ2, DQ8 are found in almost all patients with a frequency of 95%. 30% of patients in our study had associated positivity of HLA-DRB3, DRB4 or DRB5 alleles. Conclusion : A high prevalence of positivity of HLADQ2 alleles at the expense of HLA DQ8 was found, which is consistent with literature data. These molecules constitute an additional marker for screening and diagnosis of CD.

Keywords: HLA typing, coeliac disease, HLA DQ 2, HLA DQ8

Procedia PDF Downloads 37
24763 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 60
24762 Development of One-Pot Sequential Cyclizations and Photocatalyzed Decarboxylative Radical Cyclization: Application Towards Aspidospermatan Alkaloids

Authors: Guillaume Bélanger, Jean-Philippe Fontaine, Clémence Hauduc

Abstract:

There is an undeniable thirst from organic chemists and from the pharmaceutical industry to access complex alkaloids with short syntheses. While medicinal chemists are interested in the fascinating wide range of biological properties of alkaloids, synthetic chemists are rather interested in finding new routes to access these challenging natural products of often low availability from nature. To synthesize complex polycyclic cores of natural products, reaction cascades or sequences performed one-pot offer a neat advantage over classical methods for their rapid increase in molecular complexity in a single operation. In counterpart, reaction cascades need to be run on substrates bearing all the required functional groups necessary for the key cyclizations. Chemoselectivity is thus a major issue associated with such a strategy, in addition to diastereocontrol and regiocontrol for the overall transformation. In the pursuit of synthetic efficiency, our research group developed an innovative one-pot transformation of linear substrates into bi- and tricyclic adducts applied to the construction of Aspidospermatan-type alkaloids. The latter is a rich class of indole alkaloids bearing a unique bridged azatricyclic core. Despite many efforts toward the synthesis of members of this family, efficient and versatile synthetic routes are still coveted. Indeed, very short, non-racemic approaches are rather scarce: for example, in the cases of aspidospermidine and aspidospermine, syntheses are all fifteen steps and over. We envisaged a unified approach to access several members of the Aspidospermatan alkaloids family. The key sequence features a highly chemoselective formamide activation that triggers a Vilsmeier-Haack cyclization, followed by an azomethine ylide generation and intramolecular cycloaddition. Despite the high density and variety of functional groups on the substrates (electron-rich and electron-poor alkenes, nitrile, amide, ester, enol ether), the sequence generated three new carbon-carbon bonds and three rings in a single operation with good yield and high chemoselectivity. A detailed study of amide, nucleophile, and dipolarophile variations to finally get to the successful combination required for the key transformation will be presented. To complete the indoline fragment of the natural products, we developed an original approach. Indeed, all reported routes to Aspidospermatan alkaloids introduce the indoline or indole early in the synthesis. In our work, the indoline needs to be installed on the azatricyclic core after the key cyclization sequence. As a result, typical Fischer indolization is not suited since this reaction is known to fail on such substrates. We thus envisaged a unique photocatalyzed decarboxylative radical cyclization. The development of this reaction as well as the scope and limitations of the methodology, will also be presented. The original Vilsmeier-Haack and azomethine ylide cyclization sequence as well as the new photocatalyzed decarboxylative radical cyclization will undoubtedly open access to new routes toward polycyclic indole alkaloids and derivatives of pharmaceutical interest in general.

Keywords: Aspidospermatan alkaloids, azomethine ylide cycloaddition, decarboxylative radical cyclization, indole and indoline synthesis, one-pot sequential cyclizations, photocatalysis, Vilsmeier-Haack Cyclization

Procedia PDF Downloads 58
24761 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 308
24760 Biodegradation of Malathion by Acinetobacter baumannii Strain AFA Isolated from Domestic Sewage in Egypt

Authors: Ahmed F. Azmy, Amal E. Saafan, Tamer M. Essam, Magdy A. Amin, Shaban H. Ahmed

Abstract:

Bacterial strains capable of degradation of malathion from the domestic sewage were isolated by an enrichment culture technique. Three bacterial strains were screened and identified as Acinetobacter baumannii (AFA), Pseudomonas aeruginosae (PS1),andPseudomonas mendocina (PS2) based on morphological, biochemical identification and 16S rRNA sequence analysis. Acinetobacter baumannii AFA was the most efficient malathion degrading bacterium, so used for further biodegradation study. AFA was able to grow in mineral salt medium (MSM) supplemented with malathion (100 mg/l) as a sole carbon source, and within 14 days, 84% of the initial dose was degraded by the isolate measured by high performance liquid chromatography. Strain AFA could also degrade other organophosphorus compounds including diazenon, chlorpyrifos and fenitrothion. The effect of different culture conditions on the degradation of malathion like inoculum density, other carbon or nitrogen sources, temperature and shaking were examined. Degradation of malathion and bacterial cell growth were accelerated when culture media were supplemented with yeast extract, glucose and citrate. The optimum conditions for malathion degradation by strain AFA were; an inoculum density of 1.5x 1012CFU/ml at 30°C with shaking. A specific polymerase chain reaction primers were designed manually using multiple sequence alignment of the corresponding carboxylesterase enzymes of Acinetobacter species. Sequencing result of amplified PCR product and phylogenetic analysis showed low degree of homology with the other carboxylesterase enzymes of Acinetobacter strains, so we suggested that this enzyme is a novel esterase enzyme. Isolated bacterial strains may have potential role for use in bioremediation of malathion contaminated.

Keywords: Acinetobacter baumannii, biodegradation, malathion, organophosphate pesticides

Procedia PDF Downloads 464