Search results for: data mining analytics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25609

Search results for: data mining analytics

24379 Prototype of Over Dimension Over Loading (ODOL) Freight Transportation Monitoring System Based on Arduino Mega 'Sabrang': A Case Study in Klaten, Indonesia

Authors: Chairul Fajar, Muhammad Nur Hidayat, Muksalmina

Abstract:

The issue of Over Dimension Over Loading (ODOL) in Indonesia remains a significant challenge, causing traffic accidents, disrupting traffic flow, accelerating road damage, and potentially leading to bridge collapses. Klaten Regency, located on the slopes of Mount Merapi along the Woro River in Kemalang District, has potential Class C excavation materials such as sand and stone. Data from the Klaten Regency Transportation Department indicates that ODOL violations account for 72%, while non-violating vehicles make up only 28%. ODOL involves modifying factory-standard vehicles beyond the limits specified in the Type Test Registration Certificate (SRUT) to save costs and travel time. This study aims to develop a prototype ‘Sabrang’ monitoring system based on Arduino Mega to control and monitor ODOL freight transportation in the mining of Class C excavation materials in Klaten Regency. The prototype is designed to automatically measure the dimensions and weight of objects using a microcontroller. The data analysis techniques used in this study include the Normality Test and Paired T-Test, comparing sensor measurement results on scaled objects. The study results indicate differences in measurement validation under room temperature and ambient temperature conditions. Measurements at room temperature showed that the majority of H0 was accepted, meaning there was no significant difference in measurements when the prototype tool was used. Conversely, measurements at ambient temperature showed that the majority of H0 was rejected, indicating a significant difference in measurements when the prototype tool was used. In conclusion, the ‘Sabrang’ monitoring system prototype is effective for controlling ODOL, although measurement results are influenced by temperature conditions. This study is expected to assist in the monitoring and control of ODOL, thereby enhancing traffic safety and road infrastructure.

Keywords: over dimension over loading, prototype, microcontroller, Arduino, normality test, paired t-test

Procedia PDF Downloads 34
24378 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach

Authors: Sarisa Pinkham, Kanyarat Bussaban

Abstract:

The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.

Keywords: daily rainfall, image processing, approximation, pixel value data

Procedia PDF Downloads 387
24377 A Next-Generation Blockchain-Based Data Platform: Leveraging Decentralized Storage and Layer 2 Scaling for Secure Data Management

Authors: Kenneth Harper

Abstract:

The rapid growth of data-driven decision-making across various industries necessitates advanced solutions to ensure data integrity, scalability, and security. This study introduces a decentralized data platform built on blockchain technology to improve data management processes in high-volume environments such as healthcare and financial services. The platform integrates blockchain networks using Cosmos SDK and Polkadot Substrate alongside decentralized storage solutions like IPFS and Filecoin, and coupled with decentralized computing infrastructure built on top of Avalanche. By leveraging advanced consensus mechanisms, we create a scalable, tamper-proof architecture that supports both structured and unstructured data. Key features include secure data ingestion, cryptographic hashing for robust data lineage, and Zero-Knowledge Proof mechanisms that enhance privacy while ensuring compliance with regulatory standards. Additionally, we implement performance optimizations through Layer 2 scaling solutions, including ZK-Rollups, which provide low-latency data access and trustless data verification across a distributed ledger. The findings from this exercise demonstrate significant improvements in data accessibility, reduced operational costs, and enhanced data integrity when tested in real-world scenarios. This platform reference architecture offers a decentralized alternative to traditional centralized data storage models, providing scalability, security, and operational efficiency.

Keywords: blockchain, cosmos SDK, decentralized data platform, IPFS, ZK-Rollups

Procedia PDF Downloads 27
24376 The Effect of Measurement Distribution on System Identification and Detection of Behavior of Nonlinearities of Data

Authors: Mohammad Javad Mollakazemi, Farhad Asadi, Aref Ghafouri

Abstract:

In this paper, we considered and applied parametric modeling for some experimental data of dynamical system. In this study, we investigated the different distribution of output measurement from some dynamical systems. Also, with variance processing in experimental data we obtained the region of nonlinearity in experimental data and then identification of output section is applied in different situation and data distribution. Finally, the effect of the spanning the measurement such as variance to identification and limitation of this approach is explained.

Keywords: Gaussian process, nonlinearity distribution, particle filter, system identification

Procedia PDF Downloads 516
24375 Building a Scalable Telemetry Based Multiclass Predictive Maintenance Model in R

Authors: Jaya Mathew

Abstract:

Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.

Keywords: predictive maintenance, machine learning, big data, cloud based, on premise solution, R

Procedia PDF Downloads 379
24374 Wireless Transmission of Big Data Using Novel Secure Algorithm

Authors: K. Thiagarajan, K. Saranya, A. Veeraiah, B. Sudha

Abstract:

This paper presents a novel algorithm for secure, reliable and flexible transmission of big data in two hop wireless networks using cooperative jamming scheme. Two hop wireless networks consist of source, relay and destination nodes. Big data has to transmit from source to relay and from relay to destination by deploying security in physical layer. Cooperative jamming scheme determines transmission of big data in more secure manner by protecting it from eavesdroppers and malicious nodes of unknown location. The novel algorithm that ensures secure and energy balance transmission of big data, includes selection of data transmitting region, segmenting the selected region, determining probability ratio for each node (capture node, non-capture and eavesdropper node) in every segment, evaluating the probability using binary based evaluation. If it is secure transmission resume with the two- hop transmission of big data, otherwise prevent the attackers by cooperative jamming scheme and transmit the data in two-hop transmission.

Keywords: big data, two-hop transmission, physical layer wireless security, cooperative jamming, energy balance

Procedia PDF Downloads 490
24373 One Step Further: Pull-Process-Push Data Processing

Authors: Romeo Botes, Imelda Smit

Abstract:

In today’s modern age of technology vast amounts of data needs to be processed in real-time to keep users satisfied. This data comes from various sources and in many formats, including electronic and mobile devices such as GPRS modems and GPS devices. They make use of different protocols including TCP, UDP, and HTTP/s for data communication to web servers and eventually to users. The data obtained from these devices may provide valuable information to users, but are mostly in an unreadable format which needs to be processed to provide information and business intelligence. This data is not always current, it is mostly historical data. The data is not subject to implementation of consistency and redundancy measures as most other data usually is. Most important to the users is that the data are to be pre-processed in a readable format when it is entered into the database. To accomplish this, programmers build processing programs and scripts to decode and process the information stored in databases. Programmers make use of various techniques in such programs to accomplish this, but sometimes neglect the effect some of these techniques may have on database performance. One of the techniques generally used,is to pull data from the database server, process it and push it back to the database server in one single step. Since the processing of the data usually takes some time, it keeps the database busy and locked for the period of time that the processing takes place. Because of this, it decreases the overall performance of the database server and therefore the system’s performance. This paper follows on a paper discussing the performance increase that may be achieved by utilizing array lists along with a pull-process-push data processing technique split in three steps. The purpose of this paper is to expand the number of clients when comparing the two techniques to establish the impact it may have on performance of the CPU storage and processing time.

Keywords: performance measures, algorithm techniques, data processing, push data, process data, array list

Procedia PDF Downloads 244
24372 Extreme Temperature Forecast in Mbonge, Cameroon Through Return Level Analysis of the Generalized Extreme Value (GEV) Distribution

Authors: Nkongho Ayuketang Arreyndip, Ebobenow Joseph

Abstract:

In this paper, temperature extremes are forecast by employing the block maxima method of the generalized extreme value (GEV) distribution to analyse temperature data from the Cameroon Development Corporation (CDC). By considering two sets of data (raw data and simulated data) and two (stationary and non-stationary) models of the GEV distribution, return levels analysis is carried out and it was found that in the stationary model, the return values are constant over time with the raw data, while in the simulated data the return values show an increasing trend with an upper bound. In the non-stationary model, the return levels of both the raw data and simulated data show an increasing trend with an upper bound. This clearly shows that although temperatures in the tropics show a sign of increase in the future, there is a maximum temperature at which there is no exceedance. The results of this paper are very vital in agricultural and environmental research.

Keywords: forecasting, generalized extreme value (GEV), meteorology, return level

Procedia PDF Downloads 478
24371 Impact of Stack Caches: Locality Awareness and Cost Effectiveness

Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang

Abstract:

Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.

Keywords: hit rate, locality of program, stack cache, stack data

Procedia PDF Downloads 303
24370 Autonomic Threat Avoidance and Self-Healing in Database Management System

Authors: Wajahat Munir, Muhammad Haseeb, Adeel Anjum, Basit Raza, Ahmad Kamran Malik

Abstract:

Databases are the key components of the software systems. Due to the exponential growth of data, it is the concern that the data should be accurate and available. The data in databases is vulnerable to internal and external threats, especially when it contains sensitive data like medical or military applications. Whenever the data is changed by malicious intent, data analysis result may lead to disastrous decisions. Autonomic self-healing is molded toward computer system after inspiring from the autonomic system of human body. In order to guarantee the accuracy and availability of data, we propose a technique which on a priority basis, tries to avoid any malicious transaction from execution and in case a malicious transaction affects the system, it heals the system in an isolated mode in such a way that the availability of system would not be compromised. Using this autonomic system, the management cost and time of DBAs can be minimized. In the end, we test our model and present the findings.

Keywords: autonomic computing, self-healing, threat avoidance, security

Procedia PDF Downloads 504
24369 Information Extraction Based on Search Engine Results

Authors: Mohammed R. Elkobaisi, Abdelsalam Maatuk

Abstract:

The search engines are the large scale information retrieval tools from the Web that are currently freely available to all. This paper explains how to convert the raw resulted number of search engines into useful information. This represents a new method for data gathering comparing with traditional methods. When a query is submitted for a multiple numbers of keywords, this take a long time and effort, hence we develop a user interface program to automatic search by taking multi-keywords at the same time and leave this program to collect wanted data automatically. The collected raw data is processed using mathematical and statistical theories to eliminate unwanted data and converting it to usable data.

Keywords: search engines, information extraction, agent system

Procedia PDF Downloads 430
24368 A Bibliometric Analysis of Ukrainian Research Articles on SARS-COV-2 (COVID-19) in Compliance with the Standards of Current Research Information Systems

Authors: Sabina Auhunas

Abstract:

These days in Ukraine, Open Science dramatically develops for the sake of scientists of all branches, providing an opportunity to take a more close look on the studies by foreign scientists, as well as to deliver their own scientific data to national and international journals. However, when it comes to the generalization of data on science activities by Ukrainian scientists, these data are often integrated into E-systems that operate inconsistent and barely related information sources. In order to resolve these issues, developed countries productively use E-systems, designed to store and manage research data, such as Current Research Information Systems that enable combining uncompiled data obtained from different sources. An algorithm for selecting SARS-CoV-2 research articles was designed, by means of which we collected the set of papers published by Ukrainian scientists and uploaded by August 1, 2020. Resulting metadata (document type, open access status, citation count, h-index, most cited documents, international research funding, author counts, the bibliographic relationship of journals) were taken from Scopus and Web of Science databases. The study also considered the info from COVID-19/SARS-CoV-2-related documents published from December 2019 to September 2020, directly from documents published by authors depending on territorial affiliation to Ukraine. These databases are enabled to get the necessary information for bibliometric analysis and necessary details: copyright, which may not be available in other databases (e.g., Science Direct). Search criteria and results for each online database were considered according to the WHO classification of the virus and the disease caused by this virus and represented (Table 1). First, we identified 89 research papers that provided us with the final data set after consolidation and removing duplication; however, only 56 papers were used for the analysis. The total number of documents by results from the WoS database came out at 21641 documents (48 affiliated to Ukraine among them) in the Scopus database came out at 32478 documents (41 affiliated to Ukraine among them). According to the publication activity of Ukrainian scientists, the following areas prevailed: Education, educational research (9 documents, 20.58%); Social Sciences, interdisciplinary (6 documents, 11.76%) and Economics (4 documents, 8.82%). The highest publication activity by institution types was reported in the Ministry of Education and Science of Ukraine (its percent of published scientific papers equals 36% or 7 documents), Danylo Halytsky Lviv National Medical University goes next (5 documents, 15%) and P. L. Shupyk National Medical Academy of Postgraduate Education (4 documents, 12%). Basically, research activities by Ukrainian scientists were funded by 5 entities: Belgian Development Cooperation, the National Institutes of Health (NIH, U.S.), The United States Department of Health & Human Services, grant from the Whitney and Betty MacMillan Center for International and Area Studies at Yale, a grant from the Yale Women Faculty Forum. Based on the results of the analysis, we obtained a set of published articles and preprints to be assessed on the variety of features in upcoming studies, including citation count, most cited documents, a bibliographic relationship of journals, reference linking. Further research on the development of the national scientific E-database continues using brand new analytical methods.

Keywords: content analysis, COVID-19, scientometrics, text mining

Procedia PDF Downloads 115
24367 Implementation and Performance Analysis of Data Encryption Standard and RSA Algorithm with Image Steganography and Audio Steganography

Authors: S. C. Sharma, Ankit Gambhir, Rajeev Arya

Abstract:

In today’s era data security is an important concern and most demanding issues because it is essential for people using online banking, e-shopping, reservations etc. The two major techniques that are used for secure communication are Cryptography and Steganography. Cryptographic algorithms scramble the data so that intruder will not able to retrieve it; however steganography covers that data in some cover file so that presence of communication is hidden. This paper presents the implementation of Ron Rivest, Adi Shamir, and Leonard Adleman (RSA) Algorithm with Image and Audio Steganography and Data Encryption Standard (DES) Algorithm with Image and Audio Steganography. The coding for both the algorithms have been done using MATLAB and its observed that these techniques performed better than individual techniques. The risk of unauthorized access is alleviated up to a certain extent by using these techniques. These techniques could be used in Banks, RAW agencies etc, where highly confidential data is transferred. Finally, the comparisons of such two techniques are also given in tabular forms.

Keywords: audio steganography, data security, DES, image steganography, intruder, RSA, steganography

Procedia PDF Downloads 290
24366 Experiments on Weakly-Supervised Learning on Imperfect Data

Authors: Yan Cheng, Yijun Shao, James Rudolph, Charlene R. Weir, Beth Sahlmann, Qing Zeng-Treitler

Abstract:

Supervised predictive models require labeled data for training purposes. Complete and accurate labeled data, i.e., a ‘gold standard’, is not always available, and imperfectly labeled data may need to serve as an alternative. An important question is if the accuracy of the labeled data creates a performance ceiling for the trained model. In this study, we trained several models to recognize the presence of delirium in clinical documents using data with annotations that are not completely accurate (i.e., weakly-supervised learning). In the external evaluation, the support vector machine model with a linear kernel performed best, achieving an area under the curve of 89.3% and accuracy of 88%, surpassing the 80% accuracy of the training sample. We then generated a set of simulated data and carried out a series of experiments which demonstrated that models trained on imperfect data can (but do not always) outperform the accuracy of the training data, e.g., the area under the curve for some models is higher than 80% when trained on the data with an error rate of 40%. Our experiments also showed that the error resistance of linear modeling is associated with larger sample size, error type, and linearity of the data (all p-values < 0.001). In conclusion, this study sheds light on the usefulness of imperfect data in clinical research via weakly-supervised learning.

Keywords: weakly-supervised learning, support vector machine, prediction, delirium, simulation

Procedia PDF Downloads 199
24365 Transforming Healthcare Data Privacy: Integrating Blockchain with Zero-Knowledge Proofs and Cryptographic Security

Authors: Kenneth Harper

Abstract:

Blockchain technology presents solutions for managing healthcare data, addressing critical challenges in privacy, integrity, and access. This paper explores how privacy-preserving technologies, such as zero-knowledge proofs (ZKPs) and homomorphic encryption (HE), enhance decentralized healthcare platforms by enabling secure computations and patient data protection. An examination of the mathematical foundations of these methods, their practical applications, and how they meet the evolving demands of healthcare data security is unveiled. Using real-world examples, this research highlights industry-leading implementations and offers a roadmap for future applications in secure, decentralized healthcare ecosystems.

Keywords: blockchain, cryptography, data privacy, decentralized data management, differential privacy, healthcare, healthcare data security, homomorphic encryption, privacy-preserving technologies, secure computations, zero-knowledge proofs

Procedia PDF Downloads 18
24364 Operating Speed Models on Tangent Sections of Two-Lane Rural Roads

Authors: Dražen Cvitanić, Biljana Maljković

Abstract:

This paper presents models for predicting operating speeds on tangent sections of two-lane rural roads developed on continuous speed data. The data corresponds to 20 drivers of different ages and driving experiences, driving their own cars along an 18 km long section of a state road. The data were first used for determination of maximum operating speeds on tangents and their comparison with speeds in the middle of tangents i.e. speed data used in most of operating speed studies. Analysis of continuous speed data indicated that the spot speed data are not reliable indicators of relevant speeds. After that, operating speed models for tangent sections were developed. There was no significant difference between models developed using speed data in the middle of tangent sections and models developed using maximum operating speeds on tangent sections. All developed models have higher coefficient of determination then models developed on spot speed data. Thus, it can be concluded that the method of measuring has more significant impact on the quality of operating speed model than the location of measurement.

Keywords: operating speed, continuous speed data, tangent sections, spot speed, consistency

Procedia PDF Downloads 452
24363 Comparative Study Using WEKA for Red Blood Cells Classification

Authors: Jameela Ali, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal, or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithm tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital-alaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.

Keywords: K-nearest neighbors algorithm, radial basis function neural network, red blood cells, support vector machine

Procedia PDF Downloads 410
24362 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 274
24361 The Effect That the Data Assimilation of Qinghai-Tibet Plateau Has on a Precipitation Forecast

Authors: Ruixia Liu

Abstract:

Qinghai-Tibet Plateau has an important influence on the precipitation of its lower reaches. Data from remote sensing has itself advantage and numerical prediction model which assimilates RS data will be better than other. We got the assimilation data of MHS and terrestrial and sounding from GSI, and introduced the result into WRF, then got the result of RH and precipitation forecast. We found that assimilating MHS and terrestrial and sounding made the forecast on precipitation, area and the center of the precipitation more accurate by comparing the result of 1h,6h,12h, and 24h. Analyzing the difference of the initial field, we knew that the data assimilating about Qinghai-Tibet Plateau influence its lower reaches forecast by affecting on initial temperature and RH.

Keywords: Qinghai-Tibet Plateau, precipitation, data assimilation, GSI

Procedia PDF Downloads 234
24360 The Artificial Intelligence Driven Social Work

Authors: Avi Shrivastava

Abstract:

Our world continues to grapple with a lot of social issues. Economic growth and scientific advancements have not completely eradicated poverty, homelessness, discrimination and bias, gender inequality, health issues, mental illness, addiction, and other social issues. So, how do we improve the human condition in a world driven by advanced technology? The answer is simple: we will have to leverage technology to address some of the most important social challenges of the day. AI, or artificial intelligence, has emerged as a critical tool in the battle against issues that deprive marginalized and disadvantaged groups of the right to enjoy benefits that a society offers. Social work professionals can transform their lives by harnessing it. The lack of reliable data is one of the reasons why a lot of social work projects fail. Social work professionals continue to rely on expensive and time-consuming primary data collection methods, such as observation, surveys, questionnaires, and interviews, instead of tapping into AI-based technology to generate useful, real-time data and necessary insights. By leveraging AI’s data-mining ability, we can gain a deeper understanding of how to solve complex social problems and change lives of people. We can do the right work for the right people and at the right time. For example, AI can enable social work professionals to focus their humanitarian efforts on some of the world’s poorest regions, where there is extreme poverty. An interdisciplinary team of Stanford scientists, Marshall Burke, Stefano Ermon, David Lobell, Michael Xie, and Neal Jean, used AI to spot global poverty zones – identifying such zones is a key step in the fight against poverty. The scientists combined daytime and nighttime satellite imagery with machine learning algorithms to predict poverty in Nigeria, Uganda, Tanzania, Rwanda, and Malawi. In an article published by Stanford News, Stanford researchers use dark of night and machine learning, Ermon explained that they provided the machine-learning system, an application of AI, with the high-resolution satellite images and asked it to predict poverty in the African region. “The system essentially learned how to solve the problem by comparing those two sets of images [daytime and nighttime].” This is one example of how AI can be used by social work professionals to reach regions that need their aid the most. It can also help identify sources of inequality and conflict, which could reduce inequalities, according to Nature’s study, titled The role of artificial intelligence in achieving the Sustainable Development Goals, published in 2020. The report also notes that AI can help achieve 79 percent of the United Nation’s (UN) Sustainable Development Goals (SDG). AI is impacting our everyday lives in multiple amazing ways, yet some people do not know much about it. If someone is not familiar with this technology, they may be reluctant to use it to solve social issues. So, before we talk more about the use of AI to accomplish social work objectives, let’s put the spotlight on how AI and social work can complement each other.

Keywords: social work, artificial intelligence, AI based social work, machine learning, technology

Procedia PDF Downloads 102
24359 Positive Affect, Negative Affect, Organizational and Motivational Factor on the Acceptance of Big Data Technologies

Authors: Sook Ching Yee, Angela Siew Hoong Lee

Abstract:

Big data technologies have become a trend to exploit business opportunities and provide valuable business insights through the analysis of big data. However, there are still many organizations that have yet to adopt big data technologies especially small and medium organizations (SME). This study uses the technology acceptance model (TAM) to look into several constructs in the TAM and other additional constructs which are positive affect, negative affect, organizational factor and motivational factor. The conceptual model proposed in the study will be tested on the relationship and influence of positive affect, negative affect, organizational factor and motivational factor towards the intention to use big data technologies to produce an outcome. Empirical research is used in this study by conducting a survey to collect data.

Keywords: big data technologies, motivational factor, negative affect, organizational factor, positive affect, technology acceptance model (TAM)

Procedia PDF Downloads 362
24358 Big Data Analysis with Rhipe

Authors: Byung Ho Jung, Ji Eun Shin, Dong Hoon Lim

Abstract:

Rhipe that integrates R and Hadoop environment made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data. Experimental results for comparing the performance of our Rhipe with stats and biglm packages available on bigmemory, showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases. We also compared the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster. The results showed that fully-distributed mode was faster than pseudo-distributed mode, and computing speeds of fully-distributed mode were faster as the number of data nodes increases.

Keywords: big data, Hadoop, Parallel regression analysis, R, Rhipe

Procedia PDF Downloads 497
24357 Security in Resource Constraints Network Light Weight Encryption for Z-MAC

Authors: Mona Almansoori, Ahmed Mustafa, Ahmad Elshamy

Abstract:

Wireless sensor network was formed by a combination of nodes, systematically it transmitting the data to their base stations, this transmission data can be easily compromised if the limited processing power and the data consistency from these nodes are kept in mind; there is always a discussion to address the secure data transfer or transmission in actual time. This will present a mechanism to securely transmit the data over a chain of sensor nodes without compromising the throughput of the network by utilizing available battery resources available in the sensor node. Our methodology takes many different advantages of Z-MAC protocol for its efficiency, and it provides a unique key by sharing the mechanism using neighbor node MAC address. We present a light weighted data integrity layer which is embedded in the Z-MAC protocol to prove that our protocol performs well than Z-MAC when we introduce the different attack scenarios.

Keywords: hybrid MAC protocol, data integrity, lightweight encryption, neighbor based key sharing, sensor node dataprocessing, Z-MAC

Procedia PDF Downloads 144
24356 Survival Data with Incomplete Missing Categorical Covariates

Authors: Madaki Umar Yusuf, Mohd Rizam B. Abubakar

Abstract:

The survival censored data with incomplete covariate data is a common occurrence in many studies in which the outcome is survival time. With model when the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM by the method of weights. The survival outcome for the class of generalized linear model is applied and this method requires the estimation of the parameters of the distribution of the covariates. In this paper, we propose some clinical trials with ve covariates, four of which have some missing values which clearly show that they were fully censored data.

Keywords: EM algorithm, incomplete categorical covariates, ignorable missing data, missing at random (MAR), Weibull Distribution

Procedia PDF Downloads 405
24355 A Study of Blockchain Oracles

Authors: Abdeljalil Beniiche

Abstract:

The limitation with smart contracts is that they cannot access external data that might be required to control the execution of business logic. Oracles can be used to provide external data to smart contracts. An oracle is an interface that delivers data from external data outside the blockchain to a smart contract to consume. Oracle can deliver different types of data depending on the industry and requirements. In this paper, we study and describe the widely used blockchain oracles. Then, we elaborate on his potential role, technical architecture, and design patterns. Finally, we discuss the human oracle and its key role in solving the truth problem by reaching a consensus about a certain inquiry and tasks.

Keywords: blockchain, oracles, oracles design, human oracles

Procedia PDF Downloads 136
24354 Building a Transformative Continuing Professional Development Experience for Educators through a Principle-Based, Technological-Driven Knowledge Building Approach: A Case Study of a Professional Learning Team in Secondary Education

Authors: Melvin Chan, Chew Lee Teo

Abstract:

There has been a growing emphasis in elevating the teachers’ proficiency and competencies through continuing professional development (CPD) opportunities. In this era of a Volatile, Uncertain, Complex, Ambiguous (VUCA) world, teachers are expected to be collaborative designers, critical thinkers and creative builders. However, many of the CPD structures are still revolving in the model of transmission, which stands in contradiction to the cultivation of future-ready teachers for the innovative world of emerging technologies. This article puts forward the framing of CPD through a Principle-Based, Technological-Driven Knowledge Building Approach grounded in the essence of andragogy and progressive learning theories where growth is best exemplified through an authentic immersion in a social/community experience-based setting. Putting this Knowledge Building Professional Development Model (KBPDM) in operation via a Professional Learning Team (PLT) situated in a Secondary School in Singapore, research findings reveal that the intervention has led to a fundamental change in the learning paradigm of the teachers, henceforth equipping and empowering them successfully in their pedagogical design and practices for a 21st century classroom experience. This article concludes with the possibility in leveraging the Learning Analytics to deepen the CPD experiences for educators.

Keywords: continual professional development, knowledge building, learning paradigm, principle-based

Procedia PDF Downloads 130
24353 A Fast Community Detection Algorithm

Authors: Chung-Yuan Huang, Yu-Hsiang Fu, Chuen-Tsai Sun

Abstract:

Community detection represents an important data-mining tool for analyzing and understanding real-world complex network structures and functions. We believe that at least four criteria determine the appropriateness of a community detection algorithm: (a) it produces useable normalized mutual information (NMI) and modularity results for social networks, (b) it overcomes resolution limitation problems associated with synthetic networks, (c) it produces good NMI results and performance efficiency for Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks, and (d) it produces good modularity and performance efficiency for large-scale real-world complex networks. To our knowledge, no existing community detection algorithm meets all four criteria. In this paper, we describe a simple hierarchical arc-merging (HAM) algorithm that uses network topologies and rule-based arc-merging strategies to identify community structures that satisfy the criteria. We used five well-studied social network datasets and eight sets of LFR benchmark networks to validate the ground-truth community correctness of HAM, eight large-scale real-world complex networks to measure its performance efficiency, and two synthetic networks to determine its susceptibility to resolution limitation problems. Our results indicate that the proposed HAM algorithm is capable of providing satisfactory performance efficiency and that HAM-identified communities were close to ground-truth communities in social and LFR benchmark networks while overcoming resolution limitation problems.

Keywords: complex network, social network, community detection, network hierarchy

Procedia PDF Downloads 227
24352 Multi Data Management Systems in a Cluster Randomized Trial in Poor Resource Setting: The Pneumococcal Vaccine Schedules Trial

Authors: Abdoullah Nyassi, Golam Sarwar, Sarra Baldeh, Mamadou S. K. Jallow, Bai Lamin Dondeh, Isaac Osei, Grant A. Mackenzie

Abstract:

A randomized controlled trial is the "gold standard" for evaluating the efficacy of an intervention. Large-scale, cluster-randomized trials are expensive and difficult to conduct, though. To guarantee the validity and generalizability of findings, high-quality, dependable, and accurate data management systems are necessary. Robust data management systems are crucial for optimizing and validating the quality, accuracy, and dependability of trial data. Regarding the difficulties of data gathering in clinical trials in low-resource areas, there is a scarcity of literature on this subject, which may raise concerns. Effective data management systems and implementation goals should be part of trial procedures. Publicizing the creative clinical data management techniques used in clinical trials should boost public confidence in the study's conclusions and encourage further replication. In the ongoing pneumococcal vaccine schedule study in rural Gambia, this report details the development and deployment of multi-data management systems and methodologies. We implemented six different data management, synchronization, and reporting systems using Microsoft Access, RedCap, SQL, Visual Basic, Ruby, and ASP.NET. Additionally, data synchronization tools were developed to integrate data from these systems into the central server for reporting systems. Clinician, lab, and field data validation systems and methodologies are the main topics of this report. Our process development efforts across all domains were driven by the complexity of research project data collected in real-time data, online reporting, data synchronization, and ways for cleaning and verifying data. Consequently, we effectively used multi-data management systems, demonstrating the value of creative approaches in enhancing the consistency, accuracy, and reporting of trial data in a poor resource setting.

Keywords: data management, data collection, data cleaning, cluster-randomized trial

Procedia PDF Downloads 27
24351 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 278
24350 An Efficient Traceability Mechanism in the Audited Cloud Data Storage

Authors: Ramya P, Lino Abraham Varghese, S. Bose

Abstract:

By cloud storage services, the data can be stored in the cloud, and can be shared across multiple users. Due to the unexpected hardware/software failures and human errors, which make the data stored in the cloud be lost or corrupted easily it affected the integrity of data in cloud. Some mechanisms have been designed to allow both data owners and public verifiers to efficiently audit cloud data integrity without retrieving the entire data from the cloud server. But public auditing on the integrity of shared data with the existing mechanisms will unavoidably reveal confidential information such as identity of the person, to public verifiers. Here a privacy-preserving mechanism is proposed to support public auditing on shared data stored in the cloud. It uses group signatures to compute verification metadata needed to audit the correctness of shared data. The identity of the signer on each block in shared data is kept confidential from public verifiers, who are easily verifying shared data integrity without retrieving the entire file. But on demand, the signer of the each block is reveal to the owner alone. Group private key is generated once by the owner in the static group, where as in the dynamic group, the group private key is change when the users revoke from the group. When the users leave from the group the already signed blocks are resigned by cloud service provider instead of owner is efficiently handled by efficient proxy re-signature scheme.

Keywords: data integrity, dynamic group, group signature, public auditing

Procedia PDF Downloads 392