Search results for: data imputations
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24100

Search results for: data imputations

24100 Single Imputation for Audiograms

Authors: Sarah Beaver, Renee Bryce

Abstract:

Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125Hz to 8000Hz. The data contains patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R\textsuperscript{2} values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R\textsuperscript{2} values for the best models for KNN ranges from .89 to .95. The best imputation models received R\textsuperscript{2} between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our best imputation models versus constant imputations by a two percent increase.

Keywords: machine learning, audiograms, data imputations, single imputations

Procedia PDF Downloads 51
24099 Hormone Replacement Therapy (HRT) and Its Impact on the All-Cause Mortality of UK Women: A Matched Cohort Study 1984-2017

Authors: Nurunnahar Akter, Elena Kulinskaya, Nicholas Steel, Ilyas Bakbergenuly

Abstract:

Although Hormone Replacement Therapy (HRT) is an effective treatment in ameliorating menopausal symptoms, it has mixed effects on different health outcomes, increasing, for instance, the risk of breast cancer. Because of this, many symptomatic women are left untreated. Untreated menopausal symptoms may result in other health issues, which eventually put an extra burden and costs to the health care system. All-cause mortality analysis may explain the net benefits and risks of the HRT therapy. However, it received far less attention in HRT studies. This study investigated the impact of HRT on all-cause mortality using electronically recorded primary care data from The Health Improvement Network (THIN) that broadly represents the female population in the United Kingdom (UK). The study entry date for this study was the record of the first HRT prescription from 1984, and patients were followed up until death or transfer to another GP practice or study end date, which was January 2017. 112,354 HRT users (cases) were matched with 245,320 non-users by age at HRT initiation and general practice (GP). The hazards of all-cause mortality associated with HRT were estimated by a parametric Weibull-Cox model adjusting for a wide range of important medical, lifestyle, and socio-demographic factors. The multilevel multiple imputation techniques were used to deal with missing data. This study found that during 32 years of follow-up, combined HRT reduced the hazard ratio (HR) of all-cause mortality by 9% (HR: 0.91; 95% Confidence Interval, 0.88-0.94) in women of age between 46 to 65 at first treatment compared to the non-users of the same age. Age-specific mortality analyses found that combined HRT decreased mortality by 13% (HR: 0.87; 95% CI, 0.82-0.92), 12% (HR: 0.88; 95% CI, 0.82-0.93), and 8% (HR: 0.92; 95% CI, 0.85-0.98), in 51 to 55, 56 to 60, and 61 to 65 age group at first treatment, respectively. There was no association between estrogen-only HRT and women’s all-cause mortality. The findings from this study may help to inform the choices of women at menopause and to further educate the clinicians and resource planners.

Keywords: hormone replacement therapy, multiple imputations, primary care data, the health improvement network (THIN)

Procedia PDF Downloads 142
24098 Efficacy and Safety of Electrical Vestibular Stimulation on Adults with Symptoms of Insomnia: A Double-Blind, Randomized, Sham-Controlled Trial

Authors: Teris Cheung, Joyce Yuen Ting Lam, Kwan Hin Fong, Calvin Pak-Wing Cheng, Julie Sittlington, Yu-Tao Xiang, Tim Man Ho Li

Abstract:

Insomnia is one of the most common health problems in the general population. Insomnia can be acute, intermittent, and become chronic, often due to comorbidity with other physical and mental health conditions. Although there are conventional pharmaceutical and psychotherapeutic treatments to treat symptoms of insomnia, however; there is no robust and novel randomized controlled trial (RCT) using transdermal neurostimulation on individuals with insomnia symptoms. This gives us the impetus to execute the first nationwide RCT. Aim: To evaluate the efficacy of Electrical Vestibular Stimulation (VeNS) on individuals with insomnia in Hong Kong. Design: This study was a two-armed, double blinded, randomized, sham-controlled trial. Sampling: 60 community-dwelling adults aged 18 and 60 years with moderate insomnia symptoms or above (Insomnia Severity Index > 14) were recruited. All subjects were computerized randomized into either the active VeNS group or the sham VeNS group on a 1:1 ratio. Intervention: All participants received a home-use VeNS device and used 30-min VeNS sessions during five consecutive days across a 4-week period (total treatment hours: 10). Baseline measurements and post-VeNS evaluation of the psychological outcomes, including 1) insomnia severity, 2) sleep quality, and 3) quality of life were investigated. The short-and long-term sustainability of the VeNS intervention was assessed immediately after poststim and at a 1-month and 3-month follow-up period. Data analysis: A mixed GEE model was used to analyze the repeated measures data. Missing data were managed by multiple imputations. The level of significance was set to p < 0.05. Significance of the study: This is the first trial to examine the efficacy and safety of VeNS among adults with insomnia symptoms in Hong Kong. Findings that emerged were used to determine whether this VeNS device can be considered a self-help technological device to reduce the severity of insomnia in the community setting and to reduce the global disease burden. Clinical Trial Registration: ClinicalTrials.gov, identifier: NCT04452981.

Keywords: adults, insomnia, neuromodulation, rct, vestibular stimulation

Procedia PDF Downloads 49
24097 Primary Analysis of a Randomized Controlled Trial of Topical Analgesia Post Haemorrhoidectomy

Authors: James Jin, Weisi Xia, Runzhe Gao, Alain Vandal, Darren Svirkis, Andrew Hill

Abstract:

Background: Post-haemorrhoidectomy pain is concerned by patients/clinicians. Minimizing the postoperation pain is highly interested clinically. Combinations of topical cream targeting three hypothesised post-haemorrhoidectomy pain mechanisms were developed and their effectiveness were evaluated. Specifically, a multi-centred double-blinded randomized clinical trial (RCT) was conducted in adults undergoing excisional haemorrhoidectomy. The primary analysis was conveyed on the data collected to evaluate the effectiveness of the combinations of topical cream targeting three hypothesized pain mechanisms after the operations. Methods: 192 patients were randomly allocated to 4 arms (each arm has 48 patients), and each arm was provided with pain cream 10% metronidazole (M), M and 2% diltiazem (MD), M with 4% lidocaine (ML), or MDL, respectively. Patients were instructed to apply topical treatments three times a day for 7 days, and record outcomes for 14 days after the operations. The primary outcome was VAS pain on day 4. Covariates and models were selected in the blind review stage. Multiple imputations were applied for the missingness. LMER, GLMER models together with natural splines were applied. Sandwich estimators and Wald statistics were used. P-values < 0.05 were considered as significant. Conclusions: The addition of topical lidocaine or diltiazem to metronidazole does not add any benefit. ML had significantly better pain and recovery scores than combination MDL. Multimodal topical analgesia with ML after haemorrhoidectomy could be considered for further evaluation. Further trials considering only 3 arms (M, ML, MD) might be worth exploring.

Keywords: RCT, primary analysis, multiple imputation, pain scores, haemorrhoidectomy, analgesia, lmer

Procedia PDF Downloads 73
24096 Self-Organizing Maps for Exploration of Partially Observed Data and Imputation of Missing Values in the Context of the Manufacture of Aircraft Engines

Authors: Sara Rejeb, Catherine Duveau, Tabea Rebafka

Abstract:

To monitor the production process of turbofan aircraft engines, multiple measurements of various geometrical parameters are systematically recorded on manufactured parts. Engine parts are subject to extremely high standards as they can impact the performance of the engine. Therefore, it is essential to analyze these databases to better understand the influence of the different parameters on the engine's performance. Self-organizing maps are unsupervised neural networks which achieve two tasks simultaneously: they visualize high-dimensional data by projection onto a 2-dimensional map and provide clustering of the data. This technique has become very popular for data exploration since it provides easily interpretable results and a meaningful global view of the data. As such, self-organizing maps are usually applied to aircraft engine condition monitoring. As databases in this field are huge and complex, they naturally contain multiple missing entries for various reasons. The classical Kohonen algorithm to compute self-organizing maps is conceived for complete data only. A naive approach to deal with partially observed data consists in deleting items or variables with missing entries. However, this requires a sufficient number of complete individuals to be fairly representative of the population; otherwise, deletion leads to a considerable loss of information. Moreover, deletion can also induce bias in the analysis results. Alternatively, one can first apply a common imputation method to create a complete dataset and then apply the Kohonen algorithm. However, the choice of the imputation method may have a strong impact on the resulting self-organizing map. Our approach is to address simultaneously the two problems of computing a self-organizing map and imputing missing values, as these tasks are not independent. In this work, we propose an extension of self-organizing maps for partially observed data, referred to as missSOM. First, we introduce a criterion to be optimized, that aims at defining simultaneously the best self-organizing map and the best imputations for the missing entries. As such, missSOM is also an imputation method for missing values. To minimize the criterion, we propose an iterative algorithm that alternates the learning of a self-organizing map and the imputation of missing values. Moreover, we develop an accelerated version of the algorithm by entwining the iterations of the Kohonen algorithm with the updates of the imputed values. This method is efficiently implemented in R and will soon be released on CRAN. Compared to the standard Kohonen algorithm, it does not come with any additional cost in terms of computing time. Numerical experiments illustrate that missSOM performs well in terms of both clustering and imputation compared to the state of the art. In particular, it turns out that missSOM is robust to the missingness mechanism, which is in contrast to many imputation methods that are appropriate for only a single mechanism. This is an important property of missSOM as, in practice, the missingness mechanism is often unknown. An application to measurements on one type of part is also provided and shows the practical interest of missSOM.

Keywords: imputation method of missing data, partially observed data, robustness to missingness mechanism, self-organizing maps

Procedia PDF Downloads 124
24095 The First Trial of Transcranial Pulse Stimulation on Young Adolescents With Autism Spectrum Disorder in Hong Kong

Authors: Teris Cheung, Joyce Yuen Ting Lam, Kwan Hin Fong, Yuen Shan Ho, Tim Man Ho Li, Andy Choi-Yeung Tse, Cheng-Ta Li, Calvin Pak-Wing Cheng, Roland Beisteiner

Abstract:

Transcranial pulse stimulation (TPS) is a non-intrusive brain stimulation technology that has been proven effective in older adults with mild neurocognitive disorders and adults with major depressive disorder. Given these robust evidences, TPS might be an adjunct treatment options in neuropsychiatric disorders, for example, autism spectrum disorder (ASD) – which is a common neurodevelopmental disorder in children. This trial aimed to investigate the effects of TPS on right temporoparietal junction, a key node for social cognition for Autism Spectrum Disorder (ASD), and to examine the association between TPS, executive functions and social functions. Design: This trial adopted a two-armed (verum TPS group vs. sham TPS group), double-blinded, randomized, sham-controlled design. Sampling: 32 subjects aged between 12 and 17, diagnosed with ASD were recruited. All subjects were computerized randomized into either verum TPS group or the sham TPS group on a 1:1 ratio. All subjects undertook functional MRI before and after the TPS interventions. Intervention: Six 30-min TPS sessions were administered to subjects in 2 weeks’ time on alternate days assessing neural connectivity changes. Baseline measurements and post-TPS evaluation of the ASD symptoms, executive functions, and social functions were conducted. Participants were followed up at 2-weeks, at 1-month and 3-month, assessing the short-and long-term sustainability of the TPS intervention. Data analysis: Generalized Estimating Equations with repeated measures were used to analyze the group and time difference. Missing data were managed by multiple imputations. The level of significance was set at p < 0.05. To our best knowledge, this is the first study evaluating the efficacy and safety of TPS among adolescents with ASD in Hong Kong and nationwide. Results emerging from this study will develop insight on whether TPS can be used as an adjunct treatment on ASD in neuroscience and clinical psychiatry. Clinical Trial Registration: ClinicalTrials.gov, identifier: NCT05408793.

Keywords: adolescents, autism spectrum disorder, neuromodulation, rct, transcranial pulse stimulation

Procedia PDF Downloads 43
24094 Law Verses Tradition: Beliefs in and Practices of Witchcraft in Contemporary Ghana and the Law

Authors: Baba Iddrisu Musah

Abstract:

Many Ghanaians, including the rich and downtrodden, elite and unlettered, rural and urban dwellers, politicians and civil servants, in one way or the other, believe in and practice witchcraft. The existence of witches’ camp in northern Ghana, the rise of Pentecostal churches, especially in southern Ghana with the penchant to cleanse people of witchcraft, as well as media reports of witchcraft imputations assuming wider dimensions in the country, often classified as a citadel of democracy, good governance and human rights in Africa, buttress the pervasive nature of belief in and the practice of witchcraft in the country. This is in spite of the fact that tremendous efforts, especially by British colonial authorities, were made to regulate witchcraft beliefs and its associated practices. Informed by Western values and philosophy, witchcraft was considered by colonial authorities as illogical and unscientific. This paper, which is largely a review of existing literature, supplemented by archival information from the national archives of Ghana, focuses on the nature of witchcraft regulation in Ghana’s pre-colonial and colonial past, as well as immediately after Ghana obtained her independence in 1957. This article concludes by rhetorically questioning whether or not believing in and the practice of witchcraft in contemporary Ghana in general, and the existence of witches’ camps in the northern region of the country are attributed to the failure of past regulations, as well as the failure of present government policies.

Keywords: colonial, natives, regulation, witchcraft

Procedia PDF Downloads 226
24093 Higher Freshwater Fish and Sea Fish Intake Is Inversely Associated with Liver Cancer in Patients with Hepatitis B

Authors: Maomao Cao

Abstract:

Background and aims While the association between higher consumption of fish and lower liver cancer risk has been confirmed, however, the association between specific fish intake and liver cancer risk remains unknown. We aimed to identify the association between specific fish consumption and the risk of liver cancer. Methods: Based on a community-based seropositive hepatitis B cohort involving 18404 individuals, face to face interview was conducted by a standardized questionnaire to acquire baseline information. Three common fish types in this study were analyzed, including freshwater fish, sea fish, and small fish (shrimp, crab, conch, and shell). All participants received liver cancer screening, and possible cases were identified by CT or MRI. Multivariable logistic models were applied to estimate the odds ratio (OR) and 95% confidence intervals (CI). Multivariate multiple imputations were utilized to impute observations with missing values. Results: 179 liver cancer cases were identified. Consumption of freshwater fish and sea fish at least once a week had a strong inverse association with liver cancer risk compared with the lowest intake level, with an adjusted OR of 0.53 (95% CI, 0.38-0.75) and 0.38 (95% CI, 0.19-0.73), respectively. This inverse association was also observed after the imputation. There was no statistically significant association between intake of small fish and liver cancer risk (OR=0.58, 95%, CI 0.32-1.08). Conclusions: Our findings suggest that consumption of freshwater fish and sea fish at least once a week could reduce liver cancer risk.

Keywords: cross-sectional study, fish intake, liver cancer, risk factor

Procedia PDF Downloads 236
24092 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 304
24091 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 380
24090 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 510
24089 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 530
24088 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 364
24087 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 83
24086 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 347
24085 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 487
24084 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 439
24083 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 364
24082 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 608
24081 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 342
24080 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 127
24079 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 179
24078 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 136
24077 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 172
24076 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 442
24075 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 531
24074 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 300
24073 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 157
24072 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 483
24071 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 293