Search results for: data clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24800

Search results for: data clustering

24290 Fuzzy Rules Based Improved BEENISH Protocol for Wireless Sensor Networks

Authors: Rishabh Sharma

Abstract:

The main design parameter of WSN (wireless sensor network) is the energy consumption. To compensate this parameter, hierarchical clustering is a technique that assists in extending duration of the networks life by efficiently consuming the energy. This paper focuses on dealing with the WSNs and the FIS (fuzzy interface system) which are deployed to enhance the BEENISH protocol. The node energy, mobility, pause time and density are considered for the selection of CH (cluster head). The simulation outcomes exhibited that the projected system outperforms the traditional system with regard to the energy utilization and number of packets transmitted to sink.

Keywords: wireless sensor network, sink, sensor node, routing protocol, fuzzy rule, fuzzy inference system

Procedia PDF Downloads 90
24289 Data-driven Decision-Making in Digital Entrepreneurship

Authors: Abeba Nigussie Turi, Xiangming Samuel Li

Abstract:

Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.

Keywords: startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship

Procedia PDF Downloads 306
24288 Investigating Spatial Disparities in Health Status and Access to Health-Related Interventions among Tribals in Jharkhand

Authors: Parul Suraia, Harshit Sosan Lakra

Abstract:

Indigenous communities represent some of the most marginalized populations globally, with India labeled as tribals, experiencing particularly pronounced marginalization and a concerning decline in their numbers. These communities often inhabit geographically challenging regions characterized by low population densities, posing significant challenges to providing essential infrastructure services. Jharkhand, a Schedule 5 state, is infamous for its low-level health status due to disparities in access to health care. The primary objective of this study is to investigate the spatial inequalities in healthcare accessibility among tribal populations within the state and pinpoint critical areas requiring immediate attention. Health indicators were selected based on the tribal perspective and association of Sustainable Goal 3 (Good Health and Wellbeing) with other SDGs. Focused group discussions in which tribal people and tribal experts were done in order to finalize the indicators. Employing Principal Component Analysis, two essential indices were constructed: the Tribal Health Index (THI) and the Tribal Health Intervention Index (THII). Index values were calculated based on the district-wise secondary data for Jharkhand. The bivariate spatial association technique, Moran’s I was used to assess the spatial pattern of the variables to determine if there is any clustering (positive spatial autocorrelation) or dispersion (negative spatial autocorrelation) of values across Jharkhand. The results helped in facilitating targeting policy interventions in deprived areas of Jharkhand.

Keywords: tribal health, health spatial disparities, health status, Jharkhand

Procedia PDF Downloads 74
24287 RNA-Seq Based Transcriptomic Analysis of Wheat Cultivars for Unveiling of Genomic Variations and Isolation of Drought Tolerant Genes for Genome Editing

Authors: Ghulam Muhammad Ali

Abstract:

Unveiling of genes involved in drought and root architecture using transcriptomic analyses remained fragmented for further improvement of wheat through genome editing. The purpose of this research endeavor was to unveil the variations in different genes implicated in drought tolerance and root architecture in wheat through RNA-seq data analysis. In this study seedlings of 8 days old, 6 cultivars of wheat namely, Batis, Blue Silver, Local White, UZ888, Chakwal 50 and Synthetic wheat S22 were subjected to transcriptomic analysis for root and shoot genes. Total of 12 RNA samples was sequenced by Illumina. Using updated wheat transcripts from Ensembl and IWGC references with 54,175 gene models, we found that 49,621 out of 54,175 (91.5%) genes are expressed at an RPKM of 0.1 or more (in at least 1 sample). The number of genes expressed was higher in Local White than Batis. Differentially expressed genes (DEG) were higher in Chakwal 50. Expression-based clustering indicated conserved function of DRO1and RPK1 between Arabidopsis and wheat. Dendrogram showed that Local White is sister to Chakwal 50 while Batis is closely related to Blue Silver. This study flaunts transcriptomic sequence variations in different cultivars that showed mutations in genes associated with drought that may directly contribute to drought tolerance. DRO1 and RPK1 genes were fetched/isolated for genome editing. These genes are being edited in wheat through CRISPR-Cas9 for yield enhancement.

Keywords: transcriptomic, wheat, genome editing, drought, CRISPR-Cas9, yield enhancement

Procedia PDF Downloads 134
24286 Molecular Comparison of HEV Isolates from Sewage & Humans at Western India

Authors: Nidhi S. Chandra, Veena Agrawal, Debprasad Chattopadhyay

Abstract:

Background: Hepatitis E virus (HEV) is a major cause of acute viral hepatitis in developing countries. It spreads feco orally mainly due to contamination of drinking water by sewage. There is limited data on the genotypic comparison of HEV isolates from sewage water and humans. The aim of this study was to identify genotype and conduct phylogenetic analysis of HEV isolates from sewage water and humans. Materials and Methods: 14 sewage water and 60 serum samples from acute sporadic hepatitis E cases (negative for hepatitis A, B, C) were tested for HEV-RNA by nested polymerase chain reaction (RTnPCR) using primers designed with in RdRp (RNA dependent RNA polymerase) region of open reading frame-1 (ORF-1). Sequencing was done by ABI prism 310. The sequences (343 nucleotides) were compared with each other and were aligned with previously reported HEV sequences obtained from GeneBank, using Clustal W software. A Phylogenetic tree was constructed by using PHYLIP version 3.67 software. Results: HEV-RNA was detected in 49/ 60 (81.67%) serum and 5/14 (35.71%) sewage samples. The sequences obtained from 17 serums and 2 sewage specimens belonged to genotype I with 85% similarity and clustering with previously reported human HEV sequences from India. HEV isolates from human and sewage in North West India are genetically closely related to each other. Conclusion: These finding suggest that sewage acts as reservoir of HEV. Therefore it is important that measures are taken for proper waste disposal and treatment of drinking water to prevent outbreaks and epidemics due to HEV.

Keywords: hepatitis E virus, nested polymerase chain reaction, open reading frame-1, nucleotidies

Procedia PDF Downloads 366
24285 A Geographical Information System Supported Method for Determining Urban Transformation Areas in the Scope of Disaster Risks in Kocaeli

Authors: Tayfun Salihoğlu

Abstract:

Following the Law No: 6306 on Transformation of Disaster Risk Areas, urban transformation in Turkey found its legal basis. In the best practices all over the World, the urban transformation was shaped as part of comprehensive social programs through the discourses of renewing the economic, social and physical degraded parts of the city, producing spaces resistant to earthquakes and other possible disasters and creating a livable environment. In Turkish practice, a contradictory process is observed. In this study, it is aimed to develop a method for better understanding of the urban space in terms of disaster risks in order to constitute a basis for decisions in Kocaeli Urban Transformation Master Plan, which is being prepared by Kocaeli Metropolitan Municipality. The spatial unit used in the study is the 50x50 meter grids. In order to reflect the multidimensionality of urban transformation, three basic components that have spatial data in Kocaeli were identified. These components were named as 'Problems in Built-up Areas', 'Disaster Risks arising from Geological Conditions of the Ground and Problems of Buildings', and 'Inadequacy of Urban Services'. Each component was weighted and scored for each grid. In order to delimitate urban transformation zones Optimized Outlier Analysis (Local Moran I) in the ArcGIS 10.6.1 was conducted to test the type of distribution (clustered or scattered) and its significance on the grids by assuming the weighted total score of the grid as Input Features. As a result of this analysis, it was found that the weighted total scores were not significantly clustering at all grids in urban space. The grids which the input feature is clustered significantly were exported as the new database to use in further mappings. Total Score Map reflects the significant clusters in terms of weighted total scores of 'Problems in Built-up Areas', 'Disaster Risks arising from Geological Conditions of the Ground and Problems of Buildings' and 'Inadequacy of Urban Services'. Resulting grids with the highest scores are the most likely candidates for urban transformation in this citywide study. To categorize urban space in terms of urban transformation, Grouping Analysis in ArcGIS 10.6.1 was conducted to data that includes each component scores in significantly clustered grids. Due to Pseudo Statistics and Box Plots, 6 groups with the highest F stats were extracted. As a result of the mapping of the groups, it can be said that 6 groups can be interpreted in a more meaningful manner in relation to the urban space. The method presented in this study can be magnified due to the availability of more spatial data. By integrating with other data to be obtained during the planning process, this method can contribute to the continuation of research and decision-making processes of urban transformation master plans on a more consistent basis.

Keywords: urban transformation, GIS, disaster risk assessment, Kocaeli

Procedia PDF Downloads 110
24284 Cryptographic Protocol for Secure Cloud Storage

Authors: Luvisa Kusuma, Panji Yudha Prakasa

Abstract:

Cloud storage, as a subservice of infrastructure as a service (IaaS) in Cloud Computing, is the model of nerworked storage where data can be stored in server. In this paper, we propose a secure cloud storage system consisting of two main components; client as a user who uses the cloud storage service and server who provides the cloud storage service. In this system, we propose the protocol schemes to guarantee against security attacks in the data transmission. The protocols are login protocol, upload data protocol, download protocol, and push data protocol, which implement hybrid cryptographic mechanism based on data encryption before it is sent to the cloud, so cloud storage provider does not know the user's data and cannot analysis user’s data, because there is no correspondence between data and user.

Keywords: cloud storage, security, cryptographic protocol, artificial intelligence

Procedia PDF Downloads 345
24283 Decentralized Data Marketplace Framework Using Blockchain-Based Smart Contract

Authors: Meshari Aljohani, Stephan Olariu, Ravi Mukkamala

Abstract:

Data is essential for enhancing the quality of life. Its value creates chances for users to profit from data sales and purchases. Users in data marketplaces, however, must share and trade data in a secure and trusted environment while maintaining their privacy. The first main contribution of this paper is to identify enabling technologies and challenges facing the development of decentralized data marketplaces. The second main contribution is to propose a decentralized data marketplace framework based on blockchain technology. The proposed framework enables sellers and buyers to transact with more confidence. Using a security deposit, the system implements a unique approach for enforcing honesty in data exchange among anonymous individuals. Before the transaction is considered complete, the system has a time frame. As a result, users can submit disputes to the arbitrators which will review them and respond with their decision. Use cases are presented to demonstrate how these technologies help data marketplaces handle issues and challenges.

Keywords: blockchain, data, data marketplace, smart contract, reputation system

Procedia PDF Downloads 146
24282 Real-Time Classification of Marbles with Decision-Tree Method

Authors: K. S. Parlak, E. Turan

Abstract:

The separation of marbles according to the pattern quality is a process made according to expert decision. The classification phase is the most critical part in terms of economic value. In this study, a self-learning system is proposed which performs the classification of marbles quickly and with high success. This system performs ten feature extraction by taking ten marble images from the camera. The marbles are classified by decision tree method using the obtained properties. The user forms the training set by training the system at the marble classification stage. The system evolves itself in every marble image that is classified. The aim of the proposed system is to minimize the error caused by the person performing the classification and achieve it quickly.

Keywords: decision tree, feature extraction, k-means clustering, marble classification

Procedia PDF Downloads 371
24281 Localization of Geospatial Events and Hoax Prediction in the UFO Database

Authors: Harish Krishnamurthy, Anna Lafontant, Ren Yi

Abstract:

Unidentified Flying Objects (UFOs) have been an interesting topic for most enthusiasts and hence people all over the United States report such findings online at the National UFO Report Center (NUFORC). Some of these reports are a hoax and among those that seem legitimate, our task is not to establish that these events confirm that they indeed are events related to flying objects from aliens in outer space. Rather, we intend to identify if the report was a hoax as was identified by the UFO database team with their existing curation criterion. However, the database provides a wealth of information that can be exploited to provide various analyses and insights such as social reporting, identifying real-time spatial events and much more. We perform analysis to localize these time-series geospatial events and correlate with known real-time events. This paper does not confirm any legitimacy of alien activity, but rather attempts to gather information from likely legitimate reports of UFOs by studying the online reports. These events happen in geospatial clusters and also are time-based. We look at cluster density and data visualization to search the space of various cluster realizations to decide best probable clusters that provide us information about the proximity of such activity. A random forest classifier is also presented that is used to identify true events and hoax events, using the best possible features available such as region, week, time-period and duration. Lastly, we show the performance of the scheme on various days and correlate with real-time events where one of the UFO reports strongly correlates to a missile test conducted in the United States.

Keywords: time-series clustering, feature extraction, hoax prediction, geospatial events

Procedia PDF Downloads 362
24280 Genetic Diversity and Variation of Nigerian Pigeon (Columba livia domestica) Populations Based on the Mitochondrial Coi Gene

Authors: Foluke E. Sola-Ojo, Ibraheem A. Abubakar, Semiu F. Bello, Isiaka H. Fatima, Sule Bisola, Adesina M. Olusegun, Adeniyi C. Adeola

Abstract:

The domesticated pigeon, Columba livia domestica, has many valuable characteristics, including high nutritional value and fast growth rate. There is a lack of information on its genetic diversity in Nigeria; thus, the genetic variability in mitochondrial cytochrome oxidase subunit I (COI) sequences of 150 domestic pigeons from four different locations was examined. Three haplotypes (HT) were identified in Nigerian populations; the most common haplotype, HT1, was shared with wild and domestic pigeons from Europe, America, and Asia, while HT2 and HT3 were unique to Nigeria. The overall haplotype diversity was 0.052± 0.025, and nucleotide diversity was 0.026± 0.068 across the four investigated populations. The phylogenetic tree showed significant clustering and genetic relationship of Nigerian domestic pigeons with other global pigeons. The median-joining network showed a star-like pattern suggesting population expansion. AMOVA results indicated that genetic variations in Nigerian pigeons mainly occurred within populations (99.93%), while the Neutrality tests results suggested that the Nigerian domestic pigeons’ population experienced recent expansion. This study showed a low genetic diversity and population differentiation among Nigerian domestic pigeons consistent with a relatively conservative COI sequence with few polymorphic sites. Furthermore, the COI gene could serve as a candidate molecular marker to investigate the genetic diversity and origin of pigeon species. The current data is insufficient for further conclusions; therefore, more research evidence from multiple molecular markers is required.

Keywords: Nigeria pigeon, COI, genetic diversity, genetic variation, conservation

Procedia PDF Downloads 171
24279 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan

Abstract:

Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: hybrid storage system, data mining, recurrent neural network, support vector machine

Procedia PDF Downloads 297
24278 Discussion on Big Data and One of Its Early Training Application

Authors: Fulya Gokalp Yavuz, Mark Daniel Ward

Abstract:

This study focuses on a contemporary and inevitable topic of Data Science and its exemplary application for early career building: Big Data and Leaving Learning Community (LLC). ‘Academia’ and ‘Industry’ have a common sense on the importance of Big Data. However, both of them are in a threat of missing the training on this interdisciplinary area. Some traditional teaching doctrines are far away being effective on Data Science. Practitioners needs some intuition and real-life examples how to apply new methods to data in size of terabytes. We simply explain the scope of Data Science training and exemplified its early stage application with LLC, which is a National Science Foundation (NSF) founded project under the supervision of Prof. Ward since 2014. Essentially, we aim to give some intuition for professors, researchers and practitioners to combine data science tools for comprehensive real-life examples with the guides of mentees’ feedback. As a result of discussing mentoring methods and computational challenges of Big Data, we intend to underline its potential with some more realization.

Keywords: Big Data, computation, mentoring, training

Procedia PDF Downloads 347
24277 Towards a Secure Storage in Cloud Computing

Authors: Mohamed Elkholy, Ahmed Elfatatry

Abstract:

Cloud computing has emerged as a flexible computing paradigm that reshaped the Information Technology map. However, cloud computing brought about a number of security challenges as a result of the physical distribution of computational resources and the limited control that users have over the physical storage. This situation raises many security challenges for data integrity and confidentiality as well as authentication and access control. This work proposes a security mechanism for data integrity that allows a data owner to be aware of any modification that takes place to his data. The data integrity mechanism is integrated with an extended Kerberos authentication that ensures authorized access control. The proposed mechanism protects data confidentiality even if data are stored on an untrusted storage. The proposed mechanism has been evaluated against different types of attacks and proved its efficiency to protect cloud data storage from different malicious attacks.

Keywords: access control, data integrity, data confidentiality, Kerberos authentication, cloud security

Procedia PDF Downloads 322
24276 Ontological Modeling Approach for Statistical Databases Publication in Linked Open Data

Authors: Bourama Mane, Ibrahima Fall, Mamadou Samba Camara, Alassane Bah

Abstract:

At the level of the National Statistical Institutes, there is a large volume of data which is generally in a format which conditions the method of publication of the information they contain. Each household or business data collection project includes a dissemination platform for its implementation. Thus, these dissemination methods previously used, do not promote rapid access to information and especially does not offer the option of being able to link data for in-depth processing. In this paper, we present an approach to modeling these data to publish them in a format intended for the Semantic Web. Our objective is to be able to publish all this data in a single platform and offer the option to link with other external data sources. An application of the approach will be made on data from major national surveys such as the one on employment, poverty, child labor and the general census of the population of Senegal.

Keywords: Semantic Web, linked open data, database, statistic

Procedia PDF Downloads 168
24275 The Role of Data Protection Officer in Managing Individual Data: Issues and Challenges

Authors: Nazura Abdul Manap, Siti Nur Farah Atiqah Salleh

Abstract:

For decades, the misuse of personal data has been a critical issue. Malaysia has accepted responsibility by implementing the Malaysian Personal Data Protection Act 2010 to secure personal data (PDPA 2010). After more than a decade, this legislation is set to be revised by the current PDPA 2023 Amendment Bill to align with the world's key personal data protection regulations, such as the European Union General Data Protection Regulations (GDPR). Among the other suggested adjustments is the Data User's appointment of a Data Protection Officer (DPO) to ensure the commercial entity's compliance with the PDPA 2010 criteria. The change is expected to be enacted in parliament fairly soon; nevertheless, based on the experience of the Personal Data Protection Department (PDPD) in implementing the Act, it is projected that there will be a slew of additional concerns associated with the DPO mandate. Consequently, the goal of this article is to highlight the issues that the DPO will encounter and how the Personal Data Protection Department should respond to this subject. The study result was produced using a qualitative technique based on an examination of the current literature. This research reveals that there are probable obstacles experienced by the DPO, and thus, there should be a definite, clear guideline in place to aid DPO in executing their tasks. It is argued that appointing a DPO is a wise measure in ensuring that the legal data security requirements are met.

Keywords: guideline, law, data protection officer, personal data

Procedia PDF Downloads 66
24274 Data Collection Based on the Questionnaire Survey In-Hospital Emergencies

Authors: Nouha Mhimdi, Wahiba Ben Abdessalem Karaa, Henda Ben Ghezala

Abstract:

The methods identified in data collection are diverse: electronic media, focus group interviews and short-answer questionnaires [1]. The collection of poor-quality data resulting, for example, from poorly designed questionnaires, the absence of good translators or interpreters, and the incorrect recording of data allow conclusions to be drawn that are not supported by the data or to focus only on the average effect of the program or policy. There are several solutions to avoid or minimize the most frequent errors, including obtaining expert advice on the design or adaptation of data collection instruments; or use technologies allowing better "anonymity" in the responses [2]. In this context, we opted to collect good quality data by doing a sizeable questionnaire-based survey on hospital emergencies to improve emergency services and alleviate the problems encountered. At the level of this paper, we will present our study, and we will detail the steps followed to achieve the collection of relevant, consistent and practical data.

Keywords: data collection, survey, questionnaire, database, data analysis, hospital emergencies

Procedia PDF Downloads 97
24273 Evaluating Traffic Congestion Using the Bayesian Dirichlet Process Mixture of Generalized Linear Models

Authors: Ren Moses, Emmanuel Kidando, Eren Ozguven, Yassir Abdelrazig

Abstract:

This study applied traffic speed and occupancy to develop clustering models that identify different traffic conditions. Particularly, these models are based on the Dirichlet Process Mixture of Generalized Linear regression (DML) and change-point regression (CR). The model frameworks were implemented using 2015 historical traffic data aggregated at a 15-minute interval from an Interstate 295 freeway in Jacksonville, Florida. Using the deviance information criterion (DIC) to identify the appropriate number of mixture components, three traffic states were identified as free-flow, transitional, and congested condition. Results of the DML revealed that traffic occupancy is statistically significant in influencing the reduction of traffic speed in each of the identified states. Influence on the free-flow and the congested state was estimated to be higher than the transitional flow condition in both evening and morning peak periods. Estimation of the critical speed threshold using CR revealed that 47 mph and 48 mph are speed thresholds for congested and transitional traffic condition during the morning peak hours and evening peak hours, respectively. Free-flow speed thresholds for morning and evening peak hours were estimated at 64 mph and 66 mph, respectively. The proposed approaches will facilitate accurate detection and prediction of traffic congestion for developing effective countermeasures.

Keywords: traffic congestion, multistate speed distribution, traffic occupancy, Dirichlet process mixtures of generalized linear model, Bayesian change-point detection

Procedia PDF Downloads 281
24272 Federated Learning in Healthcare

Authors: Ananya Gangavarapu

Abstract:

Convolutional Neural Networks (CNN) based models are providing diagnostic capabilities on par with the medical specialists in many specialty areas. However, collecting the medical data for training purposes is very challenging because of the increased regulations around data collections and privacy concerns around personal health data. The gathering of the data becomes even more difficult if the capture devices are edge-based mobile devices (like smartphones) with feeble wireless connectivity in rural/remote areas. In this paper, I would like to highlight Federated Learning approach to mitigate data privacy and security issues.

Keywords: deep learning in healthcare, data privacy, federated learning, training in distributed environment

Procedia PDF Downloads 126
24271 Disclosure on Adherence of the King Code's Audit Committee Guidance: Cluster Analyses to Determine Strengths and Weaknesses

Authors: Philna Coetzee, Clara Msiza

Abstract:

In modern society, audit committees are seen as the custodians of accountability and the conscience of management and the board. But who holds the audit committee accountable for their actions or non-actions and how do we know what they are supposed to be doing and what they are doing? The purpose of this article is to provide greater insight into the latter part of this problem, namely, determine what best practises for audit committees and the disclosure of what is the realities are. In countries where governance is well established, the roles and responsibilities of the audit committee are mostly clearly guided by legislation and/or guidance documents, with countries increasingly providing guidance on this topic. With high cost involved to adhere to governance guidelines, the public (for public organisations) and shareholders (for private organisations) expect to see the value of their ‘investment’. For audit committees, the dividends on the investment should reflect in less fraudulent activities, less corruption, higher efficiency and effectiveness, improved social and environmental impact, and increased profits, to name a few. If this is not the case (which is reflected in the number of fraudulent activities in both the private and the public sector), stakeholders have the right to ask: where was the audit committee? Therefore, the objective of this article is to contribute to the body of knowledge by comparing the adherence of audit committee to best practices guidelines as stipulated in the King Report across public listed companies, national and provincial government departments, state-owned enterprises and local municipalities. After constructs were formed, based on the literature, factor analyses were conducted to reduce the number of variables in each construct. Thereafter, cluster analyses, which is an explorative analysis technique that classifies a set of objects in such a way that objects that are more similar are grouped into the same group, were conducted. The SPSS TwoStep Clustering Component was used, being capable of handling both continuous and categorical variables. In the first step, a pre-clustering procedure clusters the objects into small sub-clusters, after which it clusters these sub-clusters into the desired number of clusters. The cluster analyses were conducted for each construct and the measure, namely the audit opinion as listed in the external audit report, were included. Analysing 228 organisations' information, the results indicate that there is a clear distinction between the four spheres of business that has been included in the analyses, indicating certain strengths and certain weaknesses within each sphere. The results may provide the overseers of audit committees’ insight into where a specific sector’s strengths and weaknesses lie. Audit committee chairs will be able to improve the areas where their audit committee is lacking behind. The strengthening of audit committees should result in an improvement of the accountability of boards, leading to less fraud and corruption.

Keywords: audit committee disclosure, cluster analyses, governance best practices, strengths and weaknesses

Procedia PDF Downloads 150
24270 The Utilization of Big Data in Knowledge Management Creation

Authors: Daniel Brian Thompson, Subarmaniam Kannan

Abstract:

The huge weightage of knowledge in this world and within the repository of organizations has already reached immense capacity and is constantly increasing as time goes by. To accommodate these constraints, Big Data implementation and algorithms are utilized to obtain new or enhanced knowledge for decision-making. With the transition from data to knowledge provides the transformational changes which will provide tangible benefits to the individual implementing these practices. Today, various organization would derive knowledge from observations and intuitions where this information or data will be translated into best practices for knowledge acquisition, generation and sharing. Through the widespread usage of Big Data, the main intention is to provide information that has been cleaned and analyzed to nurture tangible insights for an organization to apply to their knowledge-creation practices based on facts and figures. The translation of data into knowledge will generate value for an organization to make decisive decisions to proceed with the transition of best practices. Without a strong foundation of knowledge and Big Data, businesses are not able to grow and be enhanced within the competitive environment.

Keywords: big data, knowledge management, data driven, knowledge creation

Procedia PDF Downloads 98
24269 Survey on Data Security Issues Through Cloud Computing Amongst Sme’s in Nairobi County, Kenya

Authors: Masese Chuma Benard, Martin Onsiro Ronald

Abstract:

Businesses have been using cloud computing more frequently recently because they wish to take advantage of its advantages. However, employing cloud computing also introduces new security concerns, particularly with regard to data security, potential risks and weaknesses that could be exploited by attackers, and various tactics and strategies that could be used to lessen these risks. This study examines data security issues on cloud computing amongst sme’s in Nairobi county, Kenya. The study used the sample size of 48, the research approach was mixed methods, The findings show that data owner has no control over the cloud merchant's data management procedures, there is no way to ensure that data is handled legally. This implies that you will lose control over the data stored in the cloud. Data and information stored in the cloud may face a range of availability issues due to internet outages; this can represent a significant risk to data kept in shared clouds. Integrity, availability, and secrecy are all mentioned.

Keywords: data security, cloud computing, information, information security, small and medium-sized firms (SMEs)

Procedia PDF Downloads 71
24268 Cloud Design for Storing Large Amount of Data

Authors: M. Strémy, P. Závacký, P. Cuninka, M. Juhás

Abstract:

Main goal of this paper is to introduce our design of private cloud for storing large amount of data, especially pictures, and to provide good technological backend for data analysis based on parallel processing and business intelligence. We have tested hypervisors, cloud management tools, storage for storing all data and Hadoop to provide data analysis on unstructured data. Providing high availability, virtual network management, logical separation of projects and also rapid deployment of physical servers to our environment was also needed.

Keywords: cloud, glusterfs, hadoop, juju, kvm, maas, openstack, virtualization

Procedia PDF Downloads 343
24267 Estimation of Missing Values in Aggregate Level Spatial Data

Authors: Amitha Puranik, V. S. Binu, Seena Biju

Abstract:

Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.

Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis

Procedia PDF Downloads 364
24266 Molecular Clustering and Velocity Increase in Converging-Diverging Nozzle in Molecular Dynamics Simulation

Authors: Jeoungsu Na, Jaehawn Lee, Changil Hong, Suhee Kim

Abstract:

A molecular dynamics simulation in a converging-diverging nozzle was performed to study molecular collisions and their influence to average flow velocity according to a variety of vacuum levels. The static pressures and the dynamic pressure exerted by the molecule collision on the selected walls were compared to figure out the intensity variances of the directional flows. With pressure differences constant between the entrance and the exit of the nozzle, the numerical experiment was performed for molecular velocities and directional flows. The result shows that the velocities increased at the nozzle exit as the vacuum level gets higher in that area because less molecular collisions.

Keywords: cavitation, molecular collision, nozzle, vacuum, velocity increase

Procedia PDF Downloads 420
24265 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring

Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti

Abstract:

Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by density-based time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., mean value, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one class classifier (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, a new anomaly detector strategy is proposed, namely one class classifier neural network two (OCCNN2), which exploit the classification capability of standard classifiers in an anomaly detection problem, finding the standard class (the boundary of the features space in normal operating conditions) through a two-step approach: coarse and fine boundary estimation. The coarse estimation uses classics OCC techniques, while the fine estimation is performed through a feedforward neural network (NN) trained that exploits the boundaries estimated in the coarse step. The detection algorithms vare then compared with known methods based on principal component analysis (PCA), kernel principal component analysis (KPCA), and auto-associative neural network (ANN). In many cases, the proposed solution increases the performance with respect to the standard OCC algorithms in terms of F1 score and accuracy. In particular, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 96% with the proposed method.

Keywords: anomaly detection, frequencies selection, modal analysis, neural network, sensor network, structural health monitoring, vibration measurement

Procedia PDF Downloads 113
24264 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 149
24263 Immunization-Data-Quality in Public Health Facilities in the Pastoralist Communities: A Comparative Study Evidence from Afar and Somali Regional States, Ethiopia

Authors: Melaku Tsehay

Abstract:

The Consortium of Christian Relief and Development Associations (CCRDA), and the CORE Group Polio Partners (CGPP) Secretariat have been working with Global Alliance for Vac-cines and Immunization (GAVI) to improve the immunization data quality in Afar and Somali Regional States. The main aim of this study was to compare the quality of immunization data before and after the above interventions in health facilities in the pastoralist communities in Ethiopia. To this end, a comparative-cross-sectional study was conducted on 51 health facilities. The baseline data was collected in May 2019, while the end line data in August 2021. The WHO data quality self-assessment tool (DQS) was used to collect data. A significant improvment was seen in the accuracy of the pentavalent vaccine (PT)1 (p = 0.012) data at the health posts (HP), while PT3 (p = 0.010), and Measles (p = 0.020) at the health centers (HC). Besides, a highly sig-nificant improvment was observed in the accuracy of tetanus toxoid (TT)2 data at HP (p < 0.001). The level of over- or under-reporting was found to be < 8%, at the HP, and < 10% at the HC for PT3. The data completeness was also increased from 72.09% to 88.89% at the HC. Nearly 74% of the health facilities timely reported their respective immunization data, which is much better than the baseline (7.1%) (p < 0.001). These findings may provide some hints for the policies and pro-grams targetting on improving immunization data qaulity in the pastoralist communities.

Keywords: data quality, immunization, verification factor, pastoralist region

Procedia PDF Downloads 90
24262 Identifying Critical Success Factors for Data Quality Management through a Delphi Study

Authors: Maria Paula Santos, Ana Lucas

Abstract:

Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.

Keywords: critical success factors, data quality, data quality management, Delphi, Q-Sort

Procedia PDF Downloads 204
24261 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 541