Search results for: Data Analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 42089

Search results for: Data Analysis

41759 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 404
41758 An Analysis System for Integrating High-Throughput Transcript Abundance Data with Metabolic Pathways in Green Algae

Authors: Han-Qin Zheng, Yi-Fan Chiang-Hsieh, Chia-Hung Chien, Wen-Chi Chang

Abstract:

As the most important non-vascular plants, algae have many research applications, including high species diversity, biofuel sources, adsorption of heavy metals and, following processing, health supplements. With the increasing availability of next-generation sequencing (NGS) data for algae genomes and transcriptomes, an integrated resource for retrieving gene expression data and metabolic pathway is essential for functional analysis and systems biology in algae. However, gene expression profiles and biological pathways are displayed separately in current resources, and making it impossible to search current databases directly to identify the cellular response mechanisms. Therefore, this work develops a novel AlgaePath database to retrieve gene expression profiles efficiently under various conditions in numerous metabolic pathways. AlgaePath, a web-based database, integrates gene information, biological pathways, and next-generation sequencing (NGS) datasets in Chlamydomonasreinhardtii and Neodesmus sp. UTEX 2219-4. Users can identify gene expression profiles and pathway information by using five query pages (i.e. Gene Search, Pathway Search, Differentially Expressed Genes (DEGs) Search, Gene Group Analysis, and Co-Expression Analysis). The gene expression data of 45 and 4 samples can be obtained directly on pathway maps in C. reinhardtii and Neodesmus sp. UTEX 2219-4, respectively. Genes that are differentially expressed between two conditions can be identified in Folds Search. Furthermore, the Gene Group Analysis of AlgaePath includes pathway enrichment analysis, and can easily compare the gene expression profiles of functionally related genes in a map. Finally, Co-Expression Analysis provides co-expressed transcripts of a target gene. The analysis results provide a valuable reference for designing further experiments and elucidating critical mechanisms from high-throughput data. More than an effective interface to clarify the transcript response mechanisms in different metabolic pathways under various conditions, AlgaePath is also a data mining system to identify critical mechanisms based on high-throughput sequencing.

Keywords: next-generation sequencing (NGS), algae, transcriptome, metabolic pathway, co-expression

Procedia PDF Downloads 407
41757 Chemometric-Based Voltammetric Method for Analysis of Vitamins and Heavy Metals in Honey Samples

Authors: Marwa A. A. Ragab, Amira F. El-Yazbi, Amr El-Hawiet

Abstract:

The analysis of heavy metals in honey samples is crucial. When found in honey, they denote environmental pollution. Some of these heavy metals as lead either present at low or high concentrations are considered to be toxic. Other heavy metals, for example, copper and zinc, if present at low concentrations, they considered safe even vital minerals. On the contrary, if they present at high concentrations, they are toxic. Their voltammetric determination in honey represents a challenge due to the presence of other electro-active components as vitamins, which may overlap with the peaks of the metal, hindering their accurate and precise determination. The simultaneous analysis of some vitamins: nicotinic acid (B3) and riboflavin (B2), and heavy metals: lead, cadmium, and zinc, in honey samples, was addressed. The analysis was done in 0.1 M Potassium Chloride (KCl) using a hanging mercury drop electrode (HMDE), followed by chemometric manipulation of the voltammetric data using the derivative method. Then the derivative data were convoluted using discrete Fourier functions. The proposed method allowed the simultaneous analysis of vitamins and metals though their varied responses and sensitivities. Although their peaks were overlapped, the proposed chemometric method allowed their accurate and precise analysis. After the chemometric treatment of the data, metals were successfully quantified at low levels in the presence of vitamins (1: 2000). The heavy metals limit of detection (LOD) values after the chemometric treatment of data decreased by more than 60% than those obtained from the direct voltammetric method. The method applicability was tested by analyzing the selected metals and vitamins in real honey samples obtained from different botanical origins.

Keywords: chemometrics, overlapped voltammetric peaks, derivative and convoluted derivative methods, metals and vitamins

Procedia PDF Downloads 150
41756 A Statistical Approach to Classification of Agricultural Regions

Authors: Hasan Vural

Abstract:

Turkey is a favorable country to produce a great variety of agricultural products because of her different geographic and climatic conditions which have been used to divide the country into four main and seven sub regions. This classification into seven regions traditionally has been used in order to data collection and publication especially related with agricultural production. Afterwards, nine agricultural regions were considered. Recently, the governmental body which is responsible of data collection and dissemination (Turkish Institute of Statistics-TIS) has used 12 classes which include 11 sub regions and Istanbul province. This study aims to evaluate these classification efforts based on the acreage of ten main crops in a ten years time period (1996-2005). The panel data grouped in 11 subregions has been evaluated by cluster and multivariate statistical methods. It was concluded that from the agricultural production point of view, it will be rather meaningful to consider three main and eight sub-agricultural regions throughout the country.

Keywords: agricultural region, factorial analysis, cluster analysis,

Procedia PDF Downloads 416
41755 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 641
41754 Charting Sentiments with Naive Bayes and Logistic Regression

Authors: Jummalla Aashrith, N. L. Shiva Sai, K. Bhavya Sri

Abstract:

The swift progress of web technology has not only amassed a vast reservoir of internet data but also triggered a substantial surge in data generation. The internet has metamorphosed into one of the dynamic hubs for online education, idea dissemination, as well as opinion-sharing. Notably, the widely utilized social networking platform Twitter is experiencing considerable expansion, providing users with the ability to share viewpoints, participate in discussions spanning diverse communities, and broadcast messages on a global scale. The upswing in online engagement has sparked a significant curiosity in subjective analysis, particularly when it comes to Twitter data. This research is committed to delving into sentiment analysis, focusing specifically on the realm of Twitter. It aims to offer valuable insights into deciphering information within tweets, where opinions manifest in a highly unstructured and diverse manner, spanning a spectrum from positivity to negativity, occasionally punctuated by neutrality expressions. Within this document, we offer a comprehensive exploration and comparative assessment of modern approaches to opinion mining. Employing a range of machine learning algorithms such as Naive Bayes and Logistic Regression, our investigation plunges into the domain of Twitter data streams. We delve into overarching challenges and applications inherent in the realm of subjectivity analysis over Twitter.

Keywords: machine learning, sentiment analysis, visualisation, python

Procedia PDF Downloads 56
41753 Response Analysis of a Steel Reinforced Concrete High-Rise Building during the 2011 Tohoku Earthquake

Authors: Naohiro Nakamura, Takuya Kinoshita, Hiroshi Fukuyama

Abstract:

The 2011 off The Pacific Coast of Tohoku Earthquake caused considerable damage to wide areas of eastern Japan. A large number of earthquake observation records were obtained at various places. To design more earthquake-resistant buildings and improve earthquake disaster prevention, it is necessary to utilize these data to analyze and evaluate the behavior of a building during an earthquake. This paper presents an earthquake response simulation analysis (hereafter a seismic response analysis) that was conducted using data recorded during the main earthquake (hereafter the main shock) as well as the earthquakes before and after it. The data were obtained at a high-rise steel-reinforced concrete (SRC) building in the bay area of Tokyo. We first give an overview of the building, along with the characteristics of the earthquake motion and the building during the main shock. The data indicate that there was a change in the natural period before and after the earthquake. Next, we present the results of our seismic response analysis. First, the analysis model and conditions are shown, and then, the analysis result is compared with the observational records. Using the analysis result, we then study the effect of soil-structure interaction on the response of the building. By identifying the characteristics of the building during the earthquake (i.e., the 1st natural period and the 1st damping ratio) by the Auto-Regressive eXogenous (ARX) model, we compare the analysis result with the observational records so as to evaluate the accuracy of the response analysis. In this study, a lumped-mass system SR model was used to conduct a seismic response analysis using observational data as input waves. The main results of this study are as follows: 1) The observational records of the 3/11 main shock put it between a level 1 and level 2 earthquake. The result of the ground response analysis showed that the maximum shear strain in the ground was about 0.1% and that the possibility of liquefaction occurring was low. 2) During the 3/11 main shock, the observed wave showed that the eigenperiod of the building became longer; this behavior could be generally reproduced in the response analysis. This prolonged eigenperiod was due to the nonlinearity of the superstructure, and the effect of the nonlinearity of the ground seems to have been small. 3) As for the 4/11 aftershock, a continuous analysis in which the subject seismic wave was input after the 3/11 main shock was input was conducted. The analyzed values generally corresponded well with the observed values. This means that the effect of the nonlinearity of the main shock was retained by the building. It is important to consider this when conducting the response evaluation. 4) The first period and the damping ratio during a vibration were evaluated by an ARX model. Our results show that the response analysis model in this study is generally good at estimating a change in the response of the building during a vibration.

Keywords: ARX model, response analysis, SRC building, the 2011 off the Pacific Coast of Tohoku Earthquake

Procedia PDF Downloads 164
41752 Performance Evaluation and Comparison between the Empirical Mode Decomposition, Wavelet Analysis, and Singular Spectrum Analysis Applied to the Time Series Analysis in Atmospheric Science

Authors: Olivier Delage, Hassan Bencherif, Alain Bourdier

Abstract:

Signal decomposition approaches represent an important step in time series analysis, providing useful knowledge and insight into the data and underlying dynamics characteristics while also facilitating tasks such as noise removal and feature extraction. As most of observational time series are nonlinear and nonstationary, resulting of several physical processes interaction at different time scales, experimental time series have fluctuations at all time scales and requires the development of specific signal decomposition techniques. Most commonly used techniques are data driven, enabling to obtain well-behaved signal components without making any prior-assumptions on input data. Among the most popular time series decomposition techniques, most cited in the literature, are the empirical mode decomposition and its variants, the empirical wavelet transform and singular spectrum analysis. With increasing popularity and utility of these methods in wide ranging applications, it is imperative to gain a good understanding and insight into the operation of these algorithms. In this work, we describe all of the techniques mentioned above as well as their ability to denoise signals, to capture trends, to identify components corresponding to the physical processes involved in the evolution of the observed system and deduce the dimensionality of the underlying dynamics. Results obtained with all of these methods on experimental total ozone columns and rainfall time series will be discussed and compared

Keywords: denoising, empirical mode decomposition, singular spectrum analysis, time series, underlying dynamics, wavelet analysis

Procedia PDF Downloads 118
41751 Exploring the Correlation between Population Distribution and Urban Heat Island under Urban Data: Taking Shenzhen Urban Heat Island as an Example

Authors: Wang Yang

Abstract:

Shenzhen is a modern city of China's reform and opening-up policy, the development of urban morphology has been established on the administration of the Chinese government. This city`s planning paradigm is primarily affected by the spatial structure and human behavior. The subjective urban agglomeration center is divided into several groups and centers. In comparisons of this effect, the city development law has better to be neglected. With the continuous development of the internet, extensive data technology has been introduced in China. Data mining and data analysis has become important tools in municipal research. Data mining has been utilized to improve data cleaning such as receiving business data, traffic data and population data. Prior to data mining, government data were collected by traditional means, then were analyzed using city-relationship research, delaying the timeliness of urban development, especially for the contemporary city. Data update speed is very fast and based on the Internet. The city's point of interest (POI) in the excavation serves as data source affecting the city design, while satellite remote sensing is used as a reference object, city analysis is conducted in both directions, the administrative paradigm of government is broken and urban research is restored. Therefore, the use of data mining in urban analysis is very important. The satellite remote sensing data of the Shenzhen city in July 2018 were measured by the satellite Modis sensor and can be utilized to perform land surface temperature inversion, and analyze city heat island distribution of Shenzhen. This article acquired and classified the data from Shenzhen by using Data crawler technology. Data of Shenzhen heat island and interest points were simulated and analyzed in the GIS platform to discover the main features of functional equivalent distribution influence. Shenzhen is located in the east-west area of China. The city’s main streets are also determined according to the direction of city development. Therefore, it is determined that the functional area of the city is also distributed in the east-west direction. The urban heat island can express the heat map according to the functional urban area. Regional POI has correspondence. The research result clearly explains that the distribution of the urban heat island and the distribution of urban POIs are one-to-one correspondence. Urban heat island is primarily influenced by the properties of the underlying surface, avoiding the impact of urban climate. Using urban POIs as analysis object, the distribution of municipal POIs and population aggregation are closely connected, so that the distribution of the population corresponded with the distribution of the urban heat island.

Keywords: POI, satellite remote sensing, the population distribution, urban heat island thermal map

Procedia PDF Downloads 104
41750 Nonlinear Analysis in Investigating the Complexity of Neurophysiological Data during Reflex Behavior

Authors: Juliana A. Knocikova

Abstract:

Methods of nonlinear signal analysis are based on finding that random behavior can arise in deterministic nonlinear systems with a few degrees of freedom. Considering the dynamical systems, entropy is usually understood as a rate of information production. Changes in temporal dynamics of physiological data are indicating evolving of system in time, thus a level of new signal pattern generation. During last decades, many algorithms were introduced to assess some patterns of physiological responses to external stimulus. However, the reflex responses are usually characterized by short periods of time. This characteristic represents a great limitation for usual methods of nonlinear analysis. To solve the problems of short recordings, parameter of approximate entropy has been introduced as a measure of system complexity. Low value of this parameter is reflecting regularity and predictability in analyzed time series. On the other side, increasing of this parameter means unpredictability and a random behavior, hence a higher system complexity. Reduced neurophysiological data complexity has been observed repeatedly when analyzing electroneurogram and electromyogram activities during defence reflex responses. Quantitative phrenic neurogram changes are also obvious during severe hypoxia, as well as during airway reflex episodes. Concluding, the approximate entropy parameter serves as a convenient tool for analysis of reflex behavior characterized by short lasting time series.

Keywords: approximate entropy, neurophysiological data, nonlinear dynamics, reflex

Procedia PDF Downloads 300
41749 On-line Control of the Natural and Anthropogenic Safety in Krasnoyarsk Region

Authors: T. Penkova, A. Korobko, V. Nicheporchuk, L. Nozhenkova, A. Metus

Abstract:

This paper presents an approach of on-line control of the state of technosphere and environment objects based on the integration of Data Warehouse, OLAP and Expert systems technologies. It looks at the structure and content of data warehouse that provides consolidation and storage of monitoring data. There is a description of OLAP-models that provide a multidimensional analysis of monitoring data and dynamic analysis of principal parameters of controlled objects. The authors suggest some criteria of emergency risk assessment using expert knowledge about danger levels. It is demonstrated now some of the proposed solutions could be adopted in territorial decision making support systems. Operational control allows authorities to detect threat, prevent natural and anthropogenic emergencies and ensure a comprehensive safety of territory.

Keywords: decision making support systems, emergency risk assessment, natural and anthropogenic safety, on-line control, territory

Procedia PDF Downloads 406
41748 Combining Shallow and Deep Unsupervised Machine Learning Techniques to Detect Bad Actors in Complex Datasets

Authors: Jun Ming Moey, Zhiyaun Chen, David Nicholson

Abstract:

Bad actors are often hard to detect in data that imprints their behaviour patterns because they are comparatively rare events embedded in non-bad actor data. An unsupervised machine learning framework is applied here to detect bad actors in financial crime datasets that record millions of transactions undertaken by hundreds of actors (<0.01% bad). Specifically, the framework combines ‘shallow’ (PCA, Isolation Forest) and ‘deep’ (Autoencoder) methods to detect outlier patterns. Detection performance analysis for both the individual methods and their combination is reported.

Keywords: detection, machine learning, deep learning, unsupervised, outlier analysis, data science, fraud, financial crime

Procedia PDF Downloads 95
41747 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Models

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the direct and indirect effects of variables in path models. One or more structural regression equations are used to estimate a series of parameters in path models to find the better fit of data. However, sometimes the assumptions of classical regression models, such as ordinary least squares (OLS), are violated by the nature of the data, resulting in insignificant direct and indirect effects of exogenous variables. This article aims to explore the effectiveness of a copula-based regression approach as an alternative to classical regression, specifically when variables are linked through an elliptical copula.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 42
41746 Landslide Susceptibility Analysis in the St. Lawrence Lowlands Using High Resolution Data and Failure Plane Analysis

Authors: Kevin Potoczny, Katsuichiro Goda

Abstract:

The St. Lawrence lowlands extend from Ottawa to Quebec City and are known for large deposits of sensitive Leda clay. Leda clay deposits are responsible for many large landslides, such as the 1993 Lemieux and 2010 St. Jude (4 fatalities) landslides. Due to the large extent and sensitivity of Leda clay, regional hazard analysis for landslides is an important tool in risk management. A 2018 regional study by Farzam et al. on the susceptibility of Leda clay slopes to landslide hazard uses 1 arc second topographical data. A qualitative method known as Hazus is used to estimate susceptibility by checking for various criteria in a location and determine a susceptibility rating on a scale of 0 (no susceptibility) to 10 (very high susceptibility). These criteria are slope angle, geological group, soil wetness, and distance from waterbodies. Given the flat nature of St. Lawrence lowlands, the current assessment fails to capture local slopes, such as the St. Jude site. Additionally, the data did not allow one to analyze failure planes accurately. This study majorly improves the analysis performed by Farzam et al. in two aspects. First, regional assessment with high resolution data allows for identification of local locations that may have been previously identified as low susceptibility. This then provides the opportunity to conduct a more refined analysis on the failure plane of the slope. Slopes derived from 1 arc second data are relatively gentle (0-10 degrees) across the region; however, the 1- and 2-meter resolution 2022 HRDEM provided by NRCAN shows that short, steep slopes are present. At a regional level, 1 arc second data can underestimate the susceptibility of short, steep slopes, which can be dangerous as Leda clay landslides behave retrogressively and travel upwards into flatter terrain. At the location of the St. Jude landslide, slope differences are significant. 1 arc second data shows a maximum slope of 12.80 degrees and a mean slope of 4.72 degrees, while the HRDEM data shows a maximum slope of 56.67 degrees and a mean slope of 10.72 degrees. This equates to a difference of three susceptibility levels when the soil is dry and one susceptibility level when wet. The use of GIS software is used to create a regional susceptibility map across the St. Lawrence lowlands at 1- and 2-meter resolutions. Failure planes are necessary to differentiate between small and large landslides, which have so far been ignored in regional analysis. Leda clay failures can only retrogress as far as their failure planes, so the regional analysis must be able to transition smoothly into a more robust local analysis. It is expected that slopes within the region, once previously assessed at low susceptibility scores, contain local areas of high susceptibility. The goal is to create opportunities for local failure plane analysis to be undertaken, which has not been possible before. Due to the low resolution of previous regional analyses, any slope near a waterbody could be considered hazardous. However, high-resolution regional analysis would allow for more precise determination of hazard sites.

Keywords: hazus, high-resolution DEM, leda clay, regional analysis, susceptibility

Procedia PDF Downloads 77
41745 The Comparative Analysis of International Financial Reporting Standart Adoption through Earnings Response Coefficient and Conservatism Principle: Case Study in Jakarta Islamic Index 2010 – 2014

Authors: Dwi Wijiastutik, Tarjo, Yuni Rimawati

Abstract:

The purpose of this empirical study is to analyse how to the market reaction and the conservative degree changes on the adoption of International Financial Reporting Standart (IFRS) through Jakarta Islamic Index. The study also has given others additional analysis on the profitability, capital structure and size company toward IFRS adoption. The data collection methods used in this study reveals as secondary data and deep analysis to the company’s annual report and daily price stock at yahoo finance. We analyse 40 companies listed on Jakarta Islamic Index from 2010 to 2014. The result of the study concluded that IFRS has given a different on the depth analysis to the two of variance analysis: Moderated Regression Analysis and Wilcoxon Signed Rank to test developed hypotheses. Our result on the regression analysis shows that market response and conservatism principle is not significantly after IFRS Adoption in Jakarta Islamic Index. Furthermore, in addition, analysis on profitability, capital structure, and company size show that significantly after IFRS adoption. The findings of our study help investor by showing the impact of IFRS for making decided investment.

Keywords: IFRS, earnings response coefficient, conservatism principle

Procedia PDF Downloads 273
41744 The Establishment of Probabilistic Risk Assessment Analysis Methodology for Dry Storage Concrete Casks Using SAPHIRE 8

Authors: J. R. Wang, W. Y. Cheng, J. S. Yeh, S. W. Chen, Y. M. Ferng, J. H. Yang, W. S. Hsu, C. Shih

Abstract:

To understand the risk for dry storage concrete casks in the cask loading, transfer, and storage phase, the purpose of this research is to establish the probabilistic risk assessment (PRA) analysis methodology for dry storage concrete casks by using SAPHIRE 8 code. This analysis methodology is used to perform the study of Taiwan nuclear power plants (NPPs) dry storage system. The process of research has three steps. First, the data of the concrete casks and Taiwan NPPs are collected. Second, the PRA analysis methodology is developed by using SAPHIRE 8. Third, the PRA analysis is performed by using this methodology. According to the analysis results, the maximum risk is the multipurpose canister (MPC) drop case.

Keywords: PRA, dry storage, concrete cask, SAPHIRE

Procedia PDF Downloads 212
41743 Aesthetic Analysis and Socio-Cultural Significance of Eku Idowo and Anipo Masquerades of the Anetuno (Ebira Chao)

Authors: Lamidi Lawal Aduozava

Abstract:

Masquerade tradition is an indigenous culture of the Anetuno an extraction of the Ebira referred to as Ebira chao. This paper seeks to make aesthetic analysis of the masquerades in terms of their costumes and socio-cultural significance. To this end, the study examined and documented the functions and roles of Anipo and Idowo masquerades in terms of therapeutic, economic, prophetic and divination, entertainment, and funeral functions to the owner community(Eziobe group of families) in Igarra, Edo State of Nigeria, West Africa. For the purpose of data collection, focus group discussion, participatory, visual and observatory methods of data collection were used. All the data collected were aesthetically, descriptively and historically analyzed.

Keywords: Aesthetics, , Costume, , Masquerades, , Significance.

Procedia PDF Downloads 163
41742 Sociocultural Foundations of Psychological Well-Being among Ethiopian Adults

Authors: Kassahun Tilahun

Abstract:

Most of the studies available on adult psychological well-being have been centered on Western countries. However, psychological well-being does not have the same meaning across the world. The Euro-American and African conceptions and experiences of psychological well-being differ systematically. As a result, questions like, how do people living in developing African countries, like Ethiopia, report their psychological well-being; what would the context-specific prominent determinants of their psychological well-being be, needs a definitive answer. This study was, therefore, aimed at developing a new theory that would address these socio-cultural issues of psychological well-being. Consequently, data were obtained through interview and open ended questionnaire. A total of 438 adults, working in governmental and non-governmental organizations situated in Addis Ababa, participated in the study. Appropriate qualitative method of data analysis, i.e. thematic content analysis, was employed for analyzing the data. The thematic analysis involves a type of abductive analysis, driven both by theoretical interest and the nature of the data. Reliability and credibility issues were addressed appropriately. The finding identified five major categories of themes, which are viewed as essential in determining the conceptions and experiences of psychological well-being of Ethiopian adults. These were; socio-cultural harmony, social cohesion, security, competence and accomplishment, and the self. Detailed discussion on the rational for including these themes was made and appropriate positive psychology interventions were proposed. Researchers are also encouraged to expand this qualitative research and in turn develop a suitable instrument taping the psychological well-being of adults with different sociocultural orientations.

Keywords: sociocultural, psychological, well-being Ethiopia, adults

Procedia PDF Downloads 546
41741 A Review of Methods for Handling Missing Data in the Formof Dropouts in Longitudinal Clinical Trials

Authors: A. Satty, H. Mwambi

Abstract:

Much clinical trials data-based research are characterized by the unavoidable problem of dropout as a result of missing or erroneous values. This paper aims to review some of the various techniques to address the dropout problems in longitudinal clinical trials. The fundamental concepts of the patterns and mechanisms of dropout are discussed. This study presents five general techniques for handling dropout: (1) Deletion methods; (2) Imputation-based methods; (3) Data augmentation methods; (4) Likelihood-based methods; and (5) MNAR-based methods. Under each technique, several methods that are commonly used to deal with dropout are presented, including a review of the existing literature in which we examine the effectiveness of these methods in the analysis of incomplete data. Two application examples are presented to study the potential strengths or weaknesses of some of the methods under certain dropout mechanisms as well as to assess the sensitivity of the modelling assumptions.

Keywords: incomplete longitudinal clinical trials, missing at random (MAR), imputation, weighting methods, sensitivity analysis

Procedia PDF Downloads 415
41740 Anomaly Detection Based Fuzzy K-Mode Clustering for Categorical Data

Authors: Murat Yazici

Abstract:

Anomalies are irregularities found in data that do not adhere to a well-defined standard of normal behavior. The identification of outliers or anomalies in data has been a subject of study within the statistics field since the 1800s. Over time, a variety of anomaly detection techniques have been developed in several research communities. The cluster analysis can be used to detect anomalies. It is the process of associating data with clusters that are as similar as possible while dissimilar clusters are associated with each other. Many of the traditional cluster algorithms have limitations in dealing with data sets containing categorical properties. To detect anomalies in categorical data, fuzzy clustering approach can be used with its advantages. The fuzzy k-Mode (FKM) clustering algorithm, which is one of the fuzzy clustering approaches, by extension to the k-means algorithm, is reported for clustering datasets with categorical values. It is a form of clustering: each point can be associated with more than one cluster. In this paper, anomaly detection is performed on two simulated data by using the FKM cluster algorithm. As a significance of the study, the FKM cluster algorithm allows to determine anomalies with their abnormality degree in contrast to numerous anomaly detection algorithms. According to the results, the FKM cluster algorithm illustrated good performance in the anomaly detection of data, including both one anomaly and more than one anomaly.

Keywords: fuzzy k-mode clustering, anomaly detection, noise, categorical data

Procedia PDF Downloads 54
41739 Hydrology and Hydraulics Analysis of Beko Abo Dam and Appurtenant Structre Design, Ethiopia

Authors: Azazhu Wassie

Abstract:

This study tried to evaluate the maximum design flood for appurtenance structure design using the given climatological and hydrological data analysis on the referenced study area. The maximum design flood is determined by using flood frequency analysis. Using this method, the peak discharge is 32,583.67 m3/s, but the data is transferred because the dam site is not on the gauged station. Then the peak discharge becomes 38,115 m3/s. The study was conducted in June 2023. This dam is built across a river to create a reservoir on its upstream side for impounding water. The water stored in the reservoir is used for various purposes, such as irrigation, hydropower, navigation, fishing, etc. The total average volume of annual runoff is estimated to be 115.1 billion m3. The total potential of the land for irrigation development can go beyond 3 million ha.

Keywords: dam design, flow duration curve, peak flood, rainfall, reservoir capacity, risk and reliability

Procedia PDF Downloads 28
41738 The Effect of Data Integration to the Smart City

Authors: Richard Byrne, Emma Mulliner

Abstract:

Smart cities are a vision for the future that is increasingly becoming a reality. While a key concept of the smart city is the ability to capture, communicate, and process data that has long been produced through day-to-day activities of the city, much of the assessment models in place neglect this fact to focus on ‘smartness’ concepts. Although it is true technology often provides the opportunity to capture and communicate data in more effective ways, there are also human processes involved that are just as important. The growing importance with regards to the use and ownership of data in society can be seen by all with companies such as Facebook and Google increasingly coming under the microscope, however, why is the same scrutiny not applied to cities? The research area is therefore of great importance to the future of our cities here and now, while the findings will be of just as great importance to our children in the future. This research aims to understand the influence data is having on organisations operating throughout the smart cities sector and employs a mixed-method research approach in order to best answer the following question: Would a data-based evaluation model for smart cities be more appropriate than a smart-based model in assessing the development of the smart city? A fully comprehensive literature review concluded that there was a requirement for a data-driven assessment model for smart cities. This was followed by a documentary analysis to understand the root source of data integration to the smart city. A content analysis of city data platforms enquired as to the alternative approaches employed by cities throughout the UK and draws on best practice from New York to compare and contrast. Grounded in theory, the research findings to this point formulated a qualitative analysis framework comprised of: the changing environment influenced by data, the value of data in the smart city, the data ecosystem of the smart city and organisational response to the data orientated environment. The framework was applied to analyse primary data collected through the form of interviews with both public and private organisations operating throughout the smart cities sector. The work to date represents the first stage of data collection that will be built upon by a quantitative research investigation into the feasibility of data network effects in the smart city. An analysis into the benefits of data interoperability supporting services to the smart city in the areas of health and transport will conclude the research to achieve the aim of inductively forming a framework that can be applied to future smart city policy. To conclude, the research recognises the influence of technological perspectives in the development of smart cities to date and highlights this as a challenge to introduce theory applied with a planning dimension. The primary researcher has utilised their experience working in the public sector throughout the investigation to reflect upon what is perceived as a gap in practice of where we are today, to where we need to be tomorrow.

Keywords: data, planning, policy development, smart cities

Procedia PDF Downloads 310
41737 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: data mining, textile production, decision trees, classification

Procedia PDF Downloads 350
41736 Ranking All of the Efficient DMUs in DEA

Authors: Elahe Sarfi, Esmat Noroozi, Farhad Hosseinzadeh Lotfi

Abstract:

One of the important issues in Data Envelopment Analysis is the ranking of Decision Making Units. In this paper, a method for ranking DMUs is presented through which the weights related to efficient units should be chosen in a way that the other units preserve a certain percentage of their efficiency with the mentioned weights. To this end, a model is presented for ranking DMUs on the base of their superefficiency by considering the mentioned restrictions related to weights. This percentage can be determined by decision Maker. If the specific percentage is unsuitable, we can find a suitable and feasible one for ranking DMUs accordingly. Furthermore, the presented model is capable of ranking all of the efficient units including nonextreme efficient ones. Finally, the presented models are utilized for two sets of data and related results are reported.

Keywords: data envelopment analysis, efficiency, ranking, weight

Procedia PDF Downloads 457
41735 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 630
41734 A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing

Authors: Youngji Yoo, Seung Hwan Park, Daewoong An, Sung-Shick Kim, Jun-Geol Baek

Abstract:

The yield management system is very important to produce high-quality semiconductor chips in the semiconductor manufacturing process. In order to improve quality of semiconductors, various tests are conducted in the post fabrication (FAB) process. During the test process, large amount of data are collected and the data includes a lot of information about defect. In general, the defect on the wafer is the main causes of yield loss. Therefore, analyzing the defect data is necessary to improve performance of yield prediction. The wafer bin map (WBM) is one of the data collected in the test process and includes defect information such as the fail bit patterns. The fail bit has characteristics of spatial point patterns. Therefore, this paper proposes the feature extraction method using the spatial point pattern analysis. Actual data obtained from the semiconductor process is used for experiments and the experimental result shows that the proposed method is more accurately recognize the fail bit patterns.

Keywords: semiconductor, wafer bin map, feature extraction, spatial point patterns, contour map

Procedia PDF Downloads 384
41733 A Study of Variables Affecting on a Quality Assessment of Mathematics Subject in Thailand by Using Value Added Analysis on TIMSS 2011

Authors: Ruangdech Sirikit

Abstract:

The purposes of this research were to study the variables affecting the quality assessment of mathematics subject in Thailand by using value-added analysis on TIMSS 2011. The data used in this research is the secondary data from the 2011 Trends in International Mathematics and Science Study (TIMSS), collected from 6,124 students in 172 schools from Thailand, studying only mathematics subjects. The data were based on 14 assessment tests of knowledge in mathematics. There were 3 steps of data analysis: 1) To analyze descriptive statistics 2) To estimate competency of students from the assessment of their mathematics proficiency by using MULTILOG program; 3) analyze value added in the model of quality assessment using Value-Added Model with Hierarchical Linear Modeling (HLM) and 2 levels of analysis. The research results were as follows: 1. Student level variables that had significant effects on the competency of students at .01 levels were Parental care, Resources at home, Enjoyment of learning mathematics and Extrinsic motivation in learning mathematics. Variable that had significant effects on the competency of students at .05 levels were Education of parents and self-confident in learning mathematics. 2. School level variable that had significant effects on competency of students at .01 levels was Extra large school. Variable that had significant effects on competency of students at .05 levels was medium school.

Keywords: quality assessment, value-added model, TIMSS, mathematics, Thailand

Procedia PDF Downloads 283
41732 An Analysis of Public Environmental Investment on the Sustainable Development in China

Authors: K. Y. Chen, Y. N. Jia, H. Chua, C. W. Kan

Abstract:

As the largest developing country in the world, China is now facing the problem arising from the environment. Thus, China government increases the environmental investment yearly. In this study, we will analyse the effect of the public environmental investment on the sustainable development in China. Firstly, we will review the current situation of China's environmental issue. Secondly, we will collect the yearly environmental data as well as the information of public environmental investment. Finally, we will use the collected data to analyse and project the SWOT of public environmental investment in China. Therefore, the aim of this paper is to provide the relationship between public environmental investment and sustainable development in China. Based on the data collected, it was revealed that the public environmental investment had a positive impact on the sustainable development in China as well as the GDP growth. Acknowledgment: Authors would like to thank the financial support from the Hong Kong Polytechnic University for this work.

Keywords: China, public environmental investment, sustainable development, analysis

Procedia PDF Downloads 370
41731 Change Point Analysis in Average Ozone Layer Temperature Using Exponential Lomax Distribution

Authors: Amjad Abdullah, Amjad Yahya, Bushra Aljohani, Amani Alghamdi

Abstract:

Change point detection is an important part of data analysis. The presence of a change point refers to a significant change in the behavior of a time series. In this article, we examine the detection of multiple change points of parameters of the exponential Lomax distribution, which is broad and flexible compared with other distributions while fitting data. We used the Schwarz information criterion and binary segmentation to detect multiple change points in publicly available data on the average temperature in the ozone layer. The change points were successfully located.

Keywords: binary segmentation, change point, exponentialLomax distribution, information criterion

Procedia PDF Downloads 175
41730 Image-Based (RBG) Technique for Estimating Phosphorus Levels of Different Crops

Authors: M. M. Ali, Ahmed Al- Ani, Derek Eamus, Daniel K. Y. Tan

Abstract:

In this glasshouse study, we developed the new image-based non-destructive technique for detecting leaf P status of different crops such as cotton, tomato and lettuce. Plants were allowed to grow on nutrient media containing different P concentrations, i.e. 0%, 50% and 100% of recommended P concentration (P0 = no P, L; P1 = 2.5 mL 10 L-1 of P and P2 = 5 mL 10 L-1 of P as NaH2PO4). After 10 weeks of growth, plants were harvested and data on leaf P contents were collected using the standard destructive laboratory method and at the same time leaf images were collected by a handheld crop image sensor. We calculated leaf area, leaf perimeter and RGB (red, green and blue) values of these images. This data was further used in the linear discriminant analysis (LDA) to estimate leaf P contents, which successfully classified these plants on the basis of leaf P contents. The data indicated that P deficiency in crop plants can be predicted using the image and morphological data. Our proposed non-destructive imaging method is precise in estimating P requirements of different crop species.

Keywords: image-based techniques, leaf area, leaf P contents, linear discriminant analysis

Procedia PDF Downloads 382