Search results for: omics data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 42066

Search results for: omics data analysis

41736 Mobile Learning: Toward Better Understanding of Compression Techniques

Authors: Farouk Lawan Gambo

Abstract:

Data compression shrinks files into fewer bits then their original presentation. It has more advantage on internet because the smaller a file, the faster it can be transferred but learning most of the concepts in data compression are abstract in nature therefore making them difficult to digest by some students (Engineers in particular). To determine the best approach toward learning data compression technique, this paper first study the learning preference of engineering students who tend to have strong active, sensing, visual and sequential learning preferences, the paper also study the advantage that mobility of learning have experienced; Learning at the point of interest, efficiency, connection, and many more. A survey is carried out with some reasonable number of students, through random sampling to see whether considering the learning preference and advantages in mobility of learning will give a promising improvement over the traditional way of learning. Evidence from data analysis using Ms-Excel as a point of concern for error-free findings shows that there is significance different in the students after using learning content provided on smart phone, also the result of the findings presented in, bar charts and pie charts interpret that mobile learning has to be promising feature of learning.

Keywords: data analysis, compression techniques, learning content, traditional learning approach

Procedia PDF Downloads 347
41735 Autonomic Threat Avoidance and Self-Healing in Database Management System

Authors: Wajahat Munir, Muhammad Haseeb, Adeel Anjum, Basit Raza, Ahmad Kamran Malik

Abstract:

Databases are the key components of the software systems. Due to the exponential growth of data, it is the concern that the data should be accurate and available. The data in databases is vulnerable to internal and external threats, especially when it contains sensitive data like medical or military applications. Whenever the data is changed by malicious intent, data analysis result may lead to disastrous decisions. Autonomic self-healing is molded toward computer system after inspiring from the autonomic system of human body. In order to guarantee the accuracy and availability of data, we propose a technique which on a priority basis, tries to avoid any malicious transaction from execution and in case a malicious transaction affects the system, it heals the system in an isolated mode in such a way that the availability of system would not be compromised. Using this autonomic system, the management cost and time of DBAs can be minimized. In the end, we test our model and present the findings.

Keywords: autonomic computing, self-healing, threat avoidance, security

Procedia PDF Downloads 504
41734 Implementation and Performance Analysis of Data Encryption Standard and RSA Algorithm with Image Steganography and Audio Steganography

Authors: S. C. Sharma, Ankit Gambhir, Rajeev Arya

Abstract:

In today’s era data security is an important concern and most demanding issues because it is essential for people using online banking, e-shopping, reservations etc. The two major techniques that are used for secure communication are Cryptography and Steganography. Cryptographic algorithms scramble the data so that intruder will not able to retrieve it; however steganography covers that data in some cover file so that presence of communication is hidden. This paper presents the implementation of Ron Rivest, Adi Shamir, and Leonard Adleman (RSA) Algorithm with Image and Audio Steganography and Data Encryption Standard (DES) Algorithm with Image and Audio Steganography. The coding for both the algorithms have been done using MATLAB and its observed that these techniques performed better than individual techniques. The risk of unauthorized access is alleviated up to a certain extent by using these techniques. These techniques could be used in Banks, RAW agencies etc, where highly confidential data is transferred. Finally, the comparisons of such two techniques are also given in tabular forms.

Keywords: audio steganography, data security, DES, image steganography, intruder, RSA, steganography

Procedia PDF Downloads 290
41733 Anomaly Detection in Financial Markets Using Tucker Decomposition

Authors: Salma Krafessi

Abstract:

The financial markets have a multifaceted, intricate environment, and enormous volumes of data are produced every day. To find investment possibilities, possible fraudulent activity, and market oddities, accurate anomaly identification in this data is essential. Conventional methods for detecting anomalies frequently fail to capture the complex organization of financial data. In order to improve the identification of abnormalities in financial time series data, this study presents Tucker Decomposition as a reliable multi-way analysis approach. We start by gathering closing prices for the S&P 500 index across a number of decades. The information is converted to a three-dimensional tensor format, which contains internal characteristics and temporal sequences in a sliding window structure. The tensor is then broken down using Tucker Decomposition into a core tensor and matching factor matrices, allowing latent patterns and relationships in the data to be captured. A possible sign of abnormalities is the reconstruction error from Tucker's Decomposition. We are able to identify large deviations that indicate unusual behavior by setting a statistical threshold. A thorough examination that contrasts the Tucker-based method with traditional anomaly detection approaches validates our methodology. The outcomes demonstrate the superiority of Tucker's Decomposition in identifying intricate and subtle abnormalities that are otherwise missed. This work opens the door for more research into multi-way data analysis approaches across a range of disciplines and emphasizes the value of tensor-based methods in financial analysis.

Keywords: tucker decomposition, financial markets, financial engineering, artificial intelligence, decomposition models

Procedia PDF Downloads 69
41732 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 65
41731 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 316
41730 Analyzing Keyword Networks for the Identification of Correlated Research Topics

Authors: Thiago M. R. Dias, Patrícia M. Dias, Gray F. Moita

Abstract:

The production and publication of scientific works have increased significantly in the last years, being the Internet the main factor of access and distribution of these works. Faced with this, there is a growing interest in understanding how scientific research has evolved, in order to explore this knowledge to encourage research groups to become more productive. Therefore, the objective of this work is to explore repositories containing data from scientific publications and to characterize keyword networks of these publications, in order to identify the most relevant keywords, and to highlight those that have the greatest impact on the network. To do this, each article in the study repository has its keywords extracted and in this way the network is  characterized, after which several metrics for social network analysis are applied for the identification of the highlighted keywords.

Keywords: bibliometrics, data analysis, extraction and data integration, scientometrics

Procedia PDF Downloads 257
41729 Using RASCAL Code to Analyze the Postulated UF6 Fire Accident

Authors: J. R. Wang, Y. Chiang, W. S. Hsu, S. H. Chen, J. H. Yang, S. W. Chen, C. Shih, Y. F. Chang, Y. H. Huang, B. R. Shen

Abstract:

In this research, the RASCAL code was used to simulate and analyze the postulated UF6 fire accident which may occur in the Institute of Nuclear Energy Research (INER). There are four main steps in this research. In the first step, the UF6 data of INER were collected. In the second step, the RASCAL analysis methodology and model was established by using these data. Third, this RASCAL model was used to perform the simulation and analysis of the postulated UF6 fire accident. Three cases were simulated and analyzed in this step. Finally, the analysis results of RASCAL were compared with the hazardous levels of the chemicals. According to the compared results of three cases, Case 3 has the maximum danger in human health.

Keywords: RASCAL, UF₆, safety, hydrogen fluoride

Procedia PDF Downloads 222
41728 A Safety Analysis Method for Multi-Agent Systems

Authors: Ching Louis Liu, Edmund Kazmierczak, Tim Miller

Abstract:

Safety analysis for multi-agent systems is complicated by the, potentially nonlinear, interactions between agents. This paper proposes a method for analyzing the safety of multi-agent systems by explicitly focusing on interactions and the accident data of systems that are similar in structure and function to the system being analyzed. The method creates a Bayesian network using the accident data from similar systems. A feature of our method is that the events in accident data are labeled with HAZOP guide words. Our method uses an Ontology to abstract away from the details of a multi-agent implementation. Using the ontology, our methods then constructs an “Interaction Map,” a graphical representation of the patterns of interactions between agents and other artifacts. Interaction maps combined with statistical data from accidents and the HAZOP classifications of events can be converted into a Bayesian Network. Bayesian networks allow designers to explore “what it” scenarios and make design trade-offs that maintain safety. We show how to use the Bayesian networks, and the interaction maps to improve multi-agent system designs.

Keywords: multi-agent system, safety analysis, safety model, integration map

Procedia PDF Downloads 417
41727 Hybrid Approach for Country’s Performance Evaluation

Authors: C. Slim

Abstract:

This paper presents an integrated model, which hybridized data envelopment analysis (DEA) and support vector machine (SVM) together, to class countries according to their efficiency and performance. This model takes into account aspects of multi-dimensional indicators, decision-making hierarchy and relativity of measurement. Starting from a set of indicators of performance as exhaustive as possible, a process of successive aggregations has been developed to attain an overall evaluation of a country’s competitiveness.

Keywords: Artificial Neural Networks (ANN), Support vector machine (SVM), Data Envelopment Analysis (DEA), Aggregations, indicators of performance

Procedia PDF Downloads 338
41726 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 493
41725 The Development of the Website Learning the Local Wisdom in Phra Nakhon Si Ayutthaya Province

Authors: Bunthida Chunngam, Thanyanan Worasesthaphong

Abstract:

This research had objective to develop of the website learning the local wisdom in Phra Nakhon Si Ayutthaya province and studied satisfaction of system user. This research sample was multistage sample for 100 questionnaires, analyzed data to calculated reliability value with Cronbach’s alpha coefficient method α=0.82. This system had 3 functions which were system using, system feather evaluation and system accuracy evaluation which the statistics used for data analysis was descriptive statistics to explain sample feature so these statistics were frequency, percentage, mean and standard deviation. This data analysis result found that the system using performance quality had good level satisfaction (4.44 mean), system feather function analysis had good level satisfaction (4.11 mean) and system accuracy had good level satisfaction (3.74 mean).

Keywords: website, learning, local wisdom, Phra Nakhon Si Ayutthaya province

Procedia PDF Downloads 120
41724 Positive Affect, Negative Affect, Organizational and Motivational Factor on the Acceptance of Big Data Technologies

Authors: Sook Ching Yee, Angela Siew Hoong Lee

Abstract:

Big data technologies have become a trend to exploit business opportunities and provide valuable business insights through the analysis of big data. However, there are still many organizations that have yet to adopt big data technologies especially small and medium organizations (SME). This study uses the technology acceptance model (TAM) to look into several constructs in the TAM and other additional constructs which are positive affect, negative affect, organizational factor and motivational factor. The conceptual model proposed in the study will be tested on the relationship and influence of positive affect, negative affect, organizational factor and motivational factor towards the intention to use big data technologies to produce an outcome. Empirical research is used in this study by conducting a survey to collect data.

Keywords: big data technologies, motivational factor, negative affect, organizational factor, positive affect, technology acceptance model (TAM)

Procedia PDF Downloads 362
41723 The Data Quality Model for the IoT based Real-time Water Quality Monitoring Sensors

Authors: Rabbia Idrees, Ananda Maiti, Saurabh Garg, Muhammad Bilal Amin

Abstract:

IoT devices are the basic building blocks of IoT network that generate enormous volume of real-time and high-speed data to help organizations and companies to take intelligent decisions. To integrate this enormous data from multisource and transfer it to the appropriate client is the fundamental of IoT development. The handling of this huge quantity of devices along with the huge volume of data is very challenging. The IoT devices are battery-powered and resource-constrained and to provide energy efficient communication, these IoT devices go sleep or online/wakeup periodically and a-periodically depending on the traffic loads to reduce energy consumption. Sometime these devices get disconnected due to device battery depletion. If the node is not available in the network, then the IoT network provides incomplete, missing, and inaccurate data. Moreover, many IoT applications, like vehicle tracking and patient tracking require the IoT devices to be mobile. Due to this mobility, If the distance of the device from the sink node become greater than required, the connection is lost. Due to this disconnection other devices join the network for replacing the broken-down and left devices. This make IoT devices dynamic in nature which brings uncertainty and unreliability in the IoT network and hence produce bad quality of data. Due to this dynamic nature of IoT devices we do not know the actual reason of abnormal data. If data are of poor-quality decisions are likely to be unsound. It is highly important to process data and estimate data quality before bringing it to use in IoT applications. In the past many researchers tried to estimate data quality and provided several Machine Learning (ML), stochastic and statistical methods to perform analysis on stored data in the data processing layer, without focusing the challenges and issues arises from the dynamic nature of IoT devices and how it is impacting data quality. A comprehensive review on determining the impact of dynamic nature of IoT devices on data quality is done in this research and presented a data quality model that can deal with this challenge and produce good quality of data. This research presents the data quality model for the sensors monitoring water quality. DBSCAN clustering and weather sensors are used in this research to make data quality model for the sensors monitoring water quality. An extensive study has been done in this research on finding the relationship between the data of weather sensors and sensors monitoring water quality of the lakes and beaches. The detailed theoretical analysis has been presented in this research mentioning correlation between independent data streams of the two sets of sensors. With the help of the analysis and DBSCAN, a data quality model is prepared. This model encompasses five dimensions of data quality: outliers’ detection and removal, completeness, patterns of missing values and checks the accuracy of the data with the help of cluster’s position. At the end, the statistical analysis has been done on the clusters formed as the result of DBSCAN, and consistency is evaluated through Coefficient of Variation (CoV).

Keywords: clustering, data quality, DBSCAN, and Internet of things (IoT)

Procedia PDF Downloads 139
41722 Customers’ Acceptability of Islamic Banking: Employees’ Perspective in Peshawar

Authors: Tahira Imtiaz, Karim Ullah

Abstract:

This paper aims to incorporate the banks employees’ perspective on acceptability of Islamic banking by the customers of Peshawar. A qualitative approach is adopted for which six in-depth interviews with employees of Islamic banks are conducted. The employees were asked to share their experience regarding customers’ acceptance attitude towards acceptability of Islamic banking. Collected data was analyzed through thematic analysis technique and its synthesis with the current literature. Through data analysis a theoretical framework is developed, which highlights the factors which drive customers towards Islamic banking, as witnessed by the employees. The practical implication of analyzed data evident that a new model could be developed on the basis of four determinants of human preference namely: inner satisfaction, time, faith and market forces.

Keywords: customers’ attraction, employees’ perspective, Islamic banking, Riba

Procedia PDF Downloads 333
41721 Confirmatory Factor Analysis of Smartphone Addiction Inventory (SPAI) in the Yemeni Environment

Authors: Mohammed Al-Khadher

Abstract:

Currently, we are witnessing rapid advancements in the field of information and communications technology, forcing us, as psychologists, to combat the psychological and social effects of such developments. It also drives us to continually look for the development and preparation of measurement tools compatible with the changes brought about by the digital revolution. In this context, the current study aimed to identify the factor analysis of the Smartphone Addiction Inventory (SPAI) in the Republic of Yemen. The sample consisted of (1920) university students (1136 males and 784 females) who answered the inventory, and the data was analyzed using the statistical software (AMOS V25). The factor analysis results showed a goodness-of-fit of the data five-factor model with excellent indicators, as RMSEA-(.052), CFI-(.910), GFI-(.931), AGFI-(.915), TLI-(.897), NFI-(.895), RFI-(.880), and RMR-(.032). All within the ideal range to prove the model's fit of the scale’s factor analysis. The confirmatory factor analysis results showed factor loading in (4) items on (Time Spent), (4) items on (Compulsivity), (8) items on (Daily Life Interference), (5) items on (Craving), and (3) items on (Sleep interference); and all standard values of factor loading were statistically significant at the significance level (>.001).

Keywords: smartphone addiction inventory (SPAI), confirmatory factor analysis (CFA), yemeni students, people at risk of smartphone addiction

Procedia PDF Downloads 94
41720 Geographic Information Systems and Remotely Sensed Data for the Hydrological Modelling of Mazowe Dam

Authors: Ellen Nhedzi Gozo

Abstract:

Unavailability of adequate hydro-meteorological data has always limited the analysis and understanding of hydrological behaviour of several dam catchments including Mazowe Dam in Zimbabwe. The problem of insufficient data for Mazowe Dam catchment analysis was solved by extracting catchment characteristics and aerial hydro-meteorological data from ASTER, LANDSAT, Shuttle Radar Topographic Mission SRTM remote sensing (RS) images using ILWIS, ArcGIS and ERDAS Imagine geographic information systems (GIS) software. Available observed hydrological as well as meteorological data complemented the use of the remotely sensed information. Ground truth land cover was mapped using a Garmin Etrex global positioning system (GPS) system. This information was then used to validate land cover classification detail that was obtained from remote sensing images. A bathymetry survey was conducted using a SONAR system connected to GPS. Hydrological modelling using the HBV model was then performed to simulate the hydrological process of the catchment in an effort to verify the reliability of the derived parameters. The model output shows a high Nash-Sutcliffe Coefficient that is close to 1 indicating that the parameters derived from remote sensing and GIS can be applied with confidence in the analysis of Mazowe Dam catchment.

Keywords: geographic information systems, hydrological modelling, remote sensing, water resources management

Procedia PDF Downloads 336
41719 Optimizing Quantum Machine Learning with Amplitude and Phase Encoding Techniques

Authors: Om Viroje

Abstract:

Quantum machine learning represents a frontier in computational technology, promising significant advancements in data processing capabilities. This study explores the significance of data encoding techniques, specifically amplitude and phase encoding, in this emerging field. By employing a comparative analysis methodology, the research evaluates how these encoding techniques affect the accuracy, efficiency, and noise resilience of quantum algorithms. Our findings reveal that amplitude encoding enhances algorithmic accuracy and noise tolerance, whereas phase encoding significantly boosts computational efficiency. These insights are crucial for developing robust quantum frameworks that can be effectively applied in real-world scenarios. In conclusion, optimizing encoding strategies is essential for advancing quantum machine learning, potentially transforming various industries through improved data processing and analysis.

Keywords: quantum machine learning, data encoding, amplitude encoding, phase encoding, noise resilience

Procedia PDF Downloads 13
41718 Seismic Performance Evaluation of Existing Building Using Structural Information Modeling

Authors: Byungmin Cho, Dongchul Lee, Taejin Kim, Minhee Lee

Abstract:

The procedure for the seismic retrofit of existing buildings includes the seismic evaluation. In the evaluation step, it is assessed whether the buildings have satisfactory performance against seismic load. Based on the results of that, the buildings are upgraded. To evaluate seismic performance of the buildings, it usually goes through the model transformation from elastic analysis to inelastic analysis. However, when the data is not delivered through the interwork, engineers should manually input the data. In this process, since it leads to inaccuracy and loss of information, the results of the analysis become less accurate. Therefore, in this study, the process for the seismic evaluation of existing buildings using structural information modeling is suggested. This structural information modeling makes the work economic and accurate. To this end, it is determined which part of the process could be computerized through the investigation of the process for the seismic evaluation based on ASCE 41. The structural information modeling process is developed to apply to the seismic evaluation using Perform 3D program usually used for the nonlinear response history analysis. To validate this process, the seismic performance of an existing building is investigated.

Keywords: existing building, nonlinear analysis, seismic performance, structural information modeling

Procedia PDF Downloads 384
41717 Comparative Sustainability Performance Analysis of Australian Companies Using Composite Measures

Authors: Ramona Zharfpeykan, Paul Rouse

Abstract:

Organizational sustainability is important to both organizations themselves and their stakeholders. Despite its increasing popularity and increasing numbers of organizations reporting sustainability, research on evaluating and comparing the sustainability performance of companies is limited. The aim of this study was to develop models to measure sustainability performance for both cross-sectional and longitudinal comparisons across companies in the same or different industries. A secondary aim was to see if sustainability reports can be used to evaluate sustainability performance. The study used both a content analysis of Australian sustainability reports in mining and metals and financial services for 2011-2014 and a survey of Australian and New Zealand organizations. Two methods ranging from a composite index using uniform weights to data envelopment analysis (DEA) were employed to analyze the data and develop the models. The results show strong statistically significant relationships between the developed models, which suggests that each model provides a consistent, systematic and reasonably robust analysis. The results of the models show that for both industries, companies that had sustainability scores above or below the industry average stayed almost the same during the study period. These indices and models can be used by companies to evaluate their sustainability performance and compare it with previous years, or with other companies in the same or different industries. These methods can also be used by various stakeholders and sustainability ranking companies such as the Global Reporting Initiative (GRI).

Keywords: data envelopment analysis, sustainability, sustainability performance measurement system, sustainability performance index, global reporting initiative

Procedia PDF Downloads 181
41716 An Architectural Model for APT Detection

Authors: Nam-Uk Kim, Sung-Hwan Kim, Tai-Myoung Chung

Abstract:

Typical security management systems are not suitable for detecting APT attack, because they cannot draw the big picture from trivial events of security solutions. Although SIEM solutions have security analysis engine for that, their security analysis mechanisms need to be verified in academic field. Although this paper proposes merely an architectural model for APT detection, we will keep studying on correlation analysis mechanism in the future.

Keywords: advanced persistent threat, anomaly detection, data mining

Procedia PDF Downloads 528
41715 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 402
41714 Rodriguez Diego, Del Valle Martin, Hargreaves Matias, Riveros Jose Luis

Authors: Nathainail Bashir, Neil Anderson

Abstract:

The objective of this study site was to investigate the current state of the practice with regards to karst detection methods and recommend the best method and pattern of arrays to acquire the desire results. Proper site investigation in karst prone regions is extremely valuable in determining the location of possible voids. Two geophysical techniques were employed: multichannel analysis of surface waves (MASW) and electric resistivity tomography (ERT).The MASW data was acquired at each test location using different array lengths and different array orientations (to increase the probability of getting interpretable data in karst terrain). The ERT data were acquired using a dipole-dipole array consisting of 168 electrodes. The MASW data was interpreted (re: estimated depth to physical top of rock) and used to constrain and verify the interpretation of the ERT data. The ERT data indicates poorer quality MASW data were acquired in areas where there was significant local variation in the depth to top of rock.

Keywords: dipole-dipole, ERT, Karst terrains, MASW

Procedia PDF Downloads 315
41713 An Analysis System for Integrating High-Throughput Transcript Abundance Data with Metabolic Pathways in Green Algae

Authors: Han-Qin Zheng, Yi-Fan Chiang-Hsieh, Chia-Hung Chien, Wen-Chi Chang

Abstract:

As the most important non-vascular plants, algae have many research applications, including high species diversity, biofuel sources, adsorption of heavy metals and, following processing, health supplements. With the increasing availability of next-generation sequencing (NGS) data for algae genomes and transcriptomes, an integrated resource for retrieving gene expression data and metabolic pathway is essential for functional analysis and systems biology in algae. However, gene expression profiles and biological pathways are displayed separately in current resources, and making it impossible to search current databases directly to identify the cellular response mechanisms. Therefore, this work develops a novel AlgaePath database to retrieve gene expression profiles efficiently under various conditions in numerous metabolic pathways. AlgaePath, a web-based database, integrates gene information, biological pathways, and next-generation sequencing (NGS) datasets in Chlamydomonasreinhardtii and Neodesmus sp. UTEX 2219-4. Users can identify gene expression profiles and pathway information by using five query pages (i.e. Gene Search, Pathway Search, Differentially Expressed Genes (DEGs) Search, Gene Group Analysis, and Co-Expression Analysis). The gene expression data of 45 and 4 samples can be obtained directly on pathway maps in C. reinhardtii and Neodesmus sp. UTEX 2219-4, respectively. Genes that are differentially expressed between two conditions can be identified in Folds Search. Furthermore, the Gene Group Analysis of AlgaePath includes pathway enrichment analysis, and can easily compare the gene expression profiles of functionally related genes in a map. Finally, Co-Expression Analysis provides co-expressed transcripts of a target gene. The analysis results provide a valuable reference for designing further experiments and elucidating critical mechanisms from high-throughput data. More than an effective interface to clarify the transcript response mechanisms in different metabolic pathways under various conditions, AlgaePath is also a data mining system to identify critical mechanisms based on high-throughput sequencing.

Keywords: next-generation sequencing (NGS), algae, transcriptome, metabolic pathway, co-expression

Procedia PDF Downloads 407
41712 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 404
41711 Chemometric-Based Voltammetric Method for Analysis of Vitamins and Heavy Metals in Honey Samples

Authors: Marwa A. A. Ragab, Amira F. El-Yazbi, Amr El-Hawiet

Abstract:

The analysis of heavy metals in honey samples is crucial. When found in honey, they denote environmental pollution. Some of these heavy metals as lead either present at low or high concentrations are considered to be toxic. Other heavy metals, for example, copper and zinc, if present at low concentrations, they considered safe even vital minerals. On the contrary, if they present at high concentrations, they are toxic. Their voltammetric determination in honey represents a challenge due to the presence of other electro-active components as vitamins, which may overlap with the peaks of the metal, hindering their accurate and precise determination. The simultaneous analysis of some vitamins: nicotinic acid (B3) and riboflavin (B2), and heavy metals: lead, cadmium, and zinc, in honey samples, was addressed. The analysis was done in 0.1 M Potassium Chloride (KCl) using a hanging mercury drop electrode (HMDE), followed by chemometric manipulation of the voltammetric data using the derivative method. Then the derivative data were convoluted using discrete Fourier functions. The proposed method allowed the simultaneous analysis of vitamins and metals though their varied responses and sensitivities. Although their peaks were overlapped, the proposed chemometric method allowed their accurate and precise analysis. After the chemometric treatment of the data, metals were successfully quantified at low levels in the presence of vitamins (1: 2000). The heavy metals limit of detection (LOD) values after the chemometric treatment of data decreased by more than 60% than those obtained from the direct voltammetric method. The method applicability was tested by analyzing the selected metals and vitamins in real honey samples obtained from different botanical origins.

Keywords: chemometrics, overlapped voltammetric peaks, derivative and convoluted derivative methods, metals and vitamins

Procedia PDF Downloads 150
41710 A Statistical Approach to Classification of Agricultural Regions

Authors: Hasan Vural

Abstract:

Turkey is a favorable country to produce a great variety of agricultural products because of her different geographic and climatic conditions which have been used to divide the country into four main and seven sub regions. This classification into seven regions traditionally has been used in order to data collection and publication especially related with agricultural production. Afterwards, nine agricultural regions were considered. Recently, the governmental body which is responsible of data collection and dissemination (Turkish Institute of Statistics-TIS) has used 12 classes which include 11 sub regions and Istanbul province. This study aims to evaluate these classification efforts based on the acreage of ten main crops in a ten years time period (1996-2005). The panel data grouped in 11 subregions has been evaluated by cluster and multivariate statistical methods. It was concluded that from the agricultural production point of view, it will be rather meaningful to consider three main and eight sub-agricultural regions throughout the country.

Keywords: agricultural region, factorial analysis, cluster analysis,

Procedia PDF Downloads 416
41709 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 641
41708 Charting Sentiments with Naive Bayes and Logistic Regression

Authors: Jummalla Aashrith, N. L. Shiva Sai, K. Bhavya Sri

Abstract:

The swift progress of web technology has not only amassed a vast reservoir of internet data but also triggered a substantial surge in data generation. The internet has metamorphosed into one of the dynamic hubs for online education, idea dissemination, as well as opinion-sharing. Notably, the widely utilized social networking platform Twitter is experiencing considerable expansion, providing users with the ability to share viewpoints, participate in discussions spanning diverse communities, and broadcast messages on a global scale. The upswing in online engagement has sparked a significant curiosity in subjective analysis, particularly when it comes to Twitter data. This research is committed to delving into sentiment analysis, focusing specifically on the realm of Twitter. It aims to offer valuable insights into deciphering information within tweets, where opinions manifest in a highly unstructured and diverse manner, spanning a spectrum from positivity to negativity, occasionally punctuated by neutrality expressions. Within this document, we offer a comprehensive exploration and comparative assessment of modern approaches to opinion mining. Employing a range of machine learning algorithms such as Naive Bayes and Logistic Regression, our investigation plunges into the domain of Twitter data streams. We delve into overarching challenges and applications inherent in the realm of subjectivity analysis over Twitter.

Keywords: machine learning, sentiment analysis, visualisation, python

Procedia PDF Downloads 56
41707 Response Analysis of a Steel Reinforced Concrete High-Rise Building during the 2011 Tohoku Earthquake

Authors: Naohiro Nakamura, Takuya Kinoshita, Hiroshi Fukuyama

Abstract:

The 2011 off The Pacific Coast of Tohoku Earthquake caused considerable damage to wide areas of eastern Japan. A large number of earthquake observation records were obtained at various places. To design more earthquake-resistant buildings and improve earthquake disaster prevention, it is necessary to utilize these data to analyze and evaluate the behavior of a building during an earthquake. This paper presents an earthquake response simulation analysis (hereafter a seismic response analysis) that was conducted using data recorded during the main earthquake (hereafter the main shock) as well as the earthquakes before and after it. The data were obtained at a high-rise steel-reinforced concrete (SRC) building in the bay area of Tokyo. We first give an overview of the building, along with the characteristics of the earthquake motion and the building during the main shock. The data indicate that there was a change in the natural period before and after the earthquake. Next, we present the results of our seismic response analysis. First, the analysis model and conditions are shown, and then, the analysis result is compared with the observational records. Using the analysis result, we then study the effect of soil-structure interaction on the response of the building. By identifying the characteristics of the building during the earthquake (i.e., the 1st natural period and the 1st damping ratio) by the Auto-Regressive eXogenous (ARX) model, we compare the analysis result with the observational records so as to evaluate the accuracy of the response analysis. In this study, a lumped-mass system SR model was used to conduct a seismic response analysis using observational data as input waves. The main results of this study are as follows: 1) The observational records of the 3/11 main shock put it between a level 1 and level 2 earthquake. The result of the ground response analysis showed that the maximum shear strain in the ground was about 0.1% and that the possibility of liquefaction occurring was low. 2) During the 3/11 main shock, the observed wave showed that the eigenperiod of the building became longer; this behavior could be generally reproduced in the response analysis. This prolonged eigenperiod was due to the nonlinearity of the superstructure, and the effect of the nonlinearity of the ground seems to have been small. 3) As for the 4/11 aftershock, a continuous analysis in which the subject seismic wave was input after the 3/11 main shock was input was conducted. The analyzed values generally corresponded well with the observed values. This means that the effect of the nonlinearity of the main shock was retained by the building. It is important to consider this when conducting the response evaluation. 4) The first period and the damping ratio during a vibration were evaluated by an ARX model. Our results show that the response analysis model in this study is generally good at estimating a change in the response of the building during a vibration.

Keywords: ARX model, response analysis, SRC building, the 2011 off the Pacific Coast of Tohoku Earthquake

Procedia PDF Downloads 164