Search results for: big data types. big data ecosystem
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28449

Search results for: big data types. big data ecosystem

28179 Scientific Linux Cluster for BIG-DATA Analysis (SLBD): A Case of Fayoum University

Authors: Hassan S. Hussein, Rania A. Abul Seoud, Amr M. Refaat

Abstract:

Scientific researchers face in the analysis of very large data sets that is increasing noticeable rate in today’s and tomorrow’s technologies. Hadoop and Spark are types of software that developed frameworks. Hadoop framework is suitable for many Different hardware platforms. In this research, a scientific Linux cluster for Big Data analysis (SLBD) is presented. SLBD runs open source software with large computational capacity and high performance cluster infrastructure. SLBD composed of one cluster contains identical, commodity-grade computers interconnected via a small LAN. SLBD consists of a fast switch and Gigabit-Ethernet card which connect four (nodes). Cloudera Manager is used to configure and manage an Apache Hadoop stack. Hadoop is a framework allows storing and processing big data across the cluster by using MapReduce algorithm. MapReduce algorithm divides the task into smaller tasks which to be assigned to the network nodes. Algorithm then collects the results and form the final result dataset. SLBD clustering system allows fast and efficient processing of large amount of data resulting from different applications. SLBD also provides high performance, high throughput, high availability, expandability and cluster scalability.

Keywords: big data platforms, cloudera manager, Hadoop, MapReduce

Procedia PDF Downloads 332
28178 A Proposal for a Secure and Interoperable Data Framework for Energy Digitalization

Authors: Hebberly Ahatlan

Abstract:

The process of digitizing energy systems involves transforming traditional energy infrastructure into interconnected, data-driven systems that enhance efficiency, sustainability, and responsiveness. As smart grids become increasingly integral to the efficient distribution and management of electricity from both fossil and renewable energy sources, the energy industry faces strategic challenges associated with digitalization and interoperability — particularly in the context of modern energy business models, such as virtual power plants (VPPs). The critical challenge in modern smart grids is to seamlessly integrate diverse technologies and systems, including virtualization, grid computing and service-oriented architecture (SOA), across the entire energy ecosystem. Achieving this requires addressing issues like semantic interoperability, IT/OT convergence, and digital asset scalability, all while ensuring security and risk management. This paper proposes a four-layer digitalization framework to tackle these challenges, encompassing persistent data protection, trusted key management, secure messaging, and authentication of IoT resources. Data assets generated through this framework enable AI systems to derive insights for improving smart grid operations, security, and revenue generation. Furthermore, this paper also proposes a Trusted Energy Interoperability Alliance as a universal guiding standard in the development of this digitalization framework to support more dynamic and interoperable energy markets.

Keywords: digitalization, IT/OT convergence, semantic interoperability, VPP, energy blockchain

Procedia PDF Downloads 134
28177 Investigative Study to Analyze the Impact of Incubator Practices on the Performance of Pakistani Incubation Centers

Authors: Sadaf Zahra Usman

Abstract:

Business Incubation has become a pervasive phenomenon in numerous parts of the world and is seen as a tool for creating a startup ecosystem. The reason for greatest barriers to the advancement of business incubation centers is the need for an entrepreneurial ecosystem and underdeveloped financial assistance and angel investor networks for startup firms. Business incubation helps in creating successful startup ventures by providing administrative support services and assistance in creating their ventures. We identify incubators into three categories: University incubation centers (UICs), Private incubators (PICs), and Government incubator centers (GICs) to measure the influence of different types of business incubation practices and their performance by using a survey questionnaire from incubation managers across Pakistan. The analysis is conducted on eight Business incubators. Results suggest that the quality of incubation centers is extremely important in this regard. The research anticipated helping policymakers, government officials, and incubation management to utilize business incubation more effectively to “hatch” innovation-based entrepreneurial development.

Keywords: entrepreneurship, unemployment, startups, economy, business incubation practice

Procedia PDF Downloads 51
28176 Detection of Change Points in Earthquakes Data: A Bayesian Approach

Authors: F. A. Al-Awadhi, D. Al-Hulail

Abstract:

In this study, we applied the Bayesian hierarchical model to detect single and multiple change points for daily earthquake body wave magnitude. The change point analysis is used in both backward (off-line) and forward (on-line) statistical research. In this study, it is used with the backward approach. Different types of change parameters are considered (mean, variance or both). The posterior model and the conditional distributions for single and multiple change points are derived and implemented using BUGS software. The model is applicable for any set of data. The sensitivity of the model is tested using different prior and likelihood functions. Using Mb data, we concluded that during January 2002 and December 2003, three changes occurred in the mean magnitude of Mb in Kuwait and its vicinity.

Keywords: multiple change points, Markov Chain Monte Carlo, earthquake magnitude, hierarchical Bayesian mode

Procedia PDF Downloads 428
28175 Time-Series Load Data Analysis for User Power Profiling

Authors: Mahdi Daghmhehci Firoozjaei, Minchang Kim, Dima Alhadidi

Abstract:

In this paper, we present a power profiling model for smart grid consumers based on real time load data acquired smart meters. It profiles consumers’ power consumption behaviour using the dynamic time warping (DTW) clustering algorithm. Due to the invariability of signal warping of this algorithm, time-disordered load data can be profiled and consumption features be extracted. Two load types are defined and the related load patterns are extracted for classifying consumption behaviour by DTW. The classification methodology is discussed in detail. To evaluate the performance of the method, we analyze the time-series load data measured by a smart meter in a real case. The results verify the effectiveness of the proposed profiling method with 90.91% true positive rate for load type clustering in the best case.

Keywords: power profiling, user privacy, dynamic time warping, smart grid

Procedia PDF Downloads 107
28174 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 513
28173 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Authors: Usama Ahmed

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 121
28172 The Triple Threat: Microplastic, Nanoplastic, and Macroplastic Pollution and Their Cumulative Impacts on Marine Ecosystem

Authors: Tabugbo B. Ifeyinwa, Josephat O. Ogbuagu, Okeke A. Princewill, Victor C. Eze

Abstract:

The increasing amount of plastic pollution in maritime settings poses a substantial risk to the functioning of ecosystems and the preservation of biodiversity. This comprehensive analysis combines the most recent data on the environmental effects of pollution from macroplastics, microplastics, and nanoplastics within marine ecosystems. Our goal is to provide a comprehensive understanding of the cumulative impacts that plastic waste accumulates on marine life by outlining the origins, processes, and ecological repercussions connected with each size category of plastic debris. Microplastics and nanoplastics have more sneaky effects that are controlled by chemicals. These effects can get through biological barriers and affect the health of cells and the whole body. Compared to macroplastics, which primarily contribute to physical harm through entanglement and ingestion by marine fauna, microplastics, and nanoplastics are associated with non-physical effects. The review underlines a vital need for research that crosses disciplinary boundaries to untangle the intricate interactions that the various sizes of plastic pollution have with marine animals, evaluate the long-term ecological repercussions, and identify effective measures for mitigating the effects of plastic pollution. Additionally, we urge governmental interventions and worldwide cooperation to solve this pervasive environmental concern. Specifically, we identify significant knowledge gaps in the detection and effect assessment of nanoplastics. To protect marine biodiversity and preserve ecosystem services, this review highlights how urgent it is to address the broad spectrum of plastic pollution.

Keywords: macroplastic pollution, marine ecosystem, microplastic pollution, nanoplastic pollution

Procedia PDF Downloads 25
28171 Utilising an Online Data Collection Platform for the Development of a Community Engagement Database: A Case Study on Building Inter-Institutional Partnerships at UWC

Authors: P. Daniels, T. Adonis, P. September-Brown, R. Comalie

Abstract:

The community engagement unit at the University of the Western Cape was tasked with establishing a community engagement database. The database would store information of all community engagement projects related to the university. The wealth of knowledge obtained from the various disciplines would be used to facilitate interdisciplinary collaboration within the university, as well as facilitating community university partnership opportunities. The purpose of this qualitative study was to explore electronic data collection through the development of a database. Two types of electronic data collection platforms were used, namely online questionnaire and email. The semi structured questionnaire was used to collect data related to community engagement projects from different faculties and departments at the university. There are many benefits for using an electronic data collection platform, such as reduction of costs and time, ease in reaching large numbers of potential respondents, and the possibility of providing anonymity to participants. Despite all the advantages of using the electronic platform, there were as many challenges, as depicted in our findings. The findings suggest that certain barriers existed by using an electronic platform for data collection, even though it was in an academic environment, where knowledge and resources were in abundance. One of the challenges experienced in this process was the lack of dissemination of information via email to staff within faculties. The actual online software used for the questionnaire had its own limitations, such as only being able to access the questionnaire from the same electronic device. In a few cases, academics only completed the questionnaire after a telephonic prompt or face to face meeting about "Is higher education in South Africa ready to embrace electronic platform in data collection?"

Keywords: community engagement, database, data collection, electronic platform, electronic tools, knowledge sharing, university

Procedia PDF Downloads 235
28170 Comprehensive Study of Data Science

Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly

Abstract:

Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.

Keywords: data science, machine learning, data analytics, artificial intelligence

Procedia PDF Downloads 45
28169 Utilization of Biodiversity of Peaces Herbals Used as Food and Treat the Path of Economic Phu Sing District in Sisaket Province Thailand

Authors: Nopparet Thammasaranyakun

Abstract:

This research objects are: 1: To study the biodiversity of medicinal plants used for food and medicinal tourism economies along the Phu Sing district Sisaket province. 2: To study the use of medicinal plants used for food and medicinal tourism economies along the Phu Sing district Sisaket province. 3: To provide a database of information on biodiversity for food and medicinal plants and medicinal tourism economies along the Phu Sing district Sisaket province. 4: Learn to create a biodiversity of medicinal plants used as food and treatment by Journeys economic Phu Sing district Sisaket province Boundaries used in this study was the Phu Sing district. Population and Agricultural Development Center, rayong Mun due to the initiative for youth Local, Government Health officials, community leaders, teachers, students, schools, the local people and tourists. Sage wisdom to know the herbs and women's groups, OTOP Phu Sing district in SiisaKet province. By selecting the specific data that way. The process of participatory action research (PAR) is a community-based research. The method of collecting qualitative data. (Qualitative) tool is used from context, Community areas, interview and Taped recordings. Observation and focus group data was statistically analyzed using descriptive statistics (Descriptive Statistics). The results findings: 1- A study of the biodiversity of plants used for food and medicinal tourism economies along the Phu Sing district Sisaket province. Were used in the dry season and the rainy season find the medicinal plants of 251 species 41 types of drugs. 2- The study utilized medicinal plants used as food and the treatment of indigenous Phu Sing Sisaket province. Found 251 species have medicinal properties that are used for food and medicinal purposes 41 types of drugs. 3- Of the database technology of biodiversity for food and medicinal plants used by local treatment Phu Sing district Sisaket province. A data base of 251 medicinal species 41 types of drugs is used for food and medicinal properties Sisaket province. 4- learning the biodiversity of medicinal plants used for food and medicinal tourism economies along the Phu Sing district Sisaket province.

Keywords: utilization of biodiversity, peaces herbals, used as Food, Sing district, sisaket

Procedia PDF Downloads 331
28168 Effects of Nitrogen Addition on Litter Decomposition and Nutrient Release in a Temperate Grassland in Northern China

Authors: Lili Yang, Jirui Gong, Qinpu Luo, Min Liu, Bo Yang, Zihe Zhang

Abstract:

Anthropogenic activities have increased nitrogen (N) inputs to grassland ecosystems. Knowledge of the impact of N addition on litter decomposition is critical to understand ecosystem carbon cycling and their responses to global climate change. The aim of this study was to investigate the effects of N addition and litter types on litter decomposition of a semi-arid temperate grassland during growing and non-growing seasons in Inner Mongolia, northern China, and to identify the relation between litter decomposition and C: N: P stoichiometry in the litter-soil continuum. Six levels of N addition were conducted: CK, N1 (0 g Nm−2 yr−1), N2 (2 g Nm−2 yr−1), N3 (5 g Nm−2 yr−1), N4 (10 g Nm−2 yr−1) and N5 (25 g Nm−2 yr−1). Litter decomposition rates and nutrient release differed greatly among N addition gradients and litter types. N addition promoted litter decomposition of S. grandis, but exhibited no significant influence on L. chinensis litter, indicating that the S. grandis litter decomposition was more sensitive to N addition than L. chinensis. The critical threshold for N addition to promote mixed litter decomposition was 10 -25g Nm−2 yr−1. N addition altered the balance of C: N: P stoichiometry between litter, soil and microbial biomass. During decomposition progress, the L. chinensis litter N: P was higher in N2-N4 plots compared to CK, while the S. grandis litter C: N was lower in N3 and N4 plots, indicating that litter N or P content doesn’t satisfy microbial decomposers with the increasing of N addition. As a result, S. grandis litter exhibited net N immobilization, while L. chinensis litter net P immobilization. Mixed litter C: N: P stoichiometry satisfied the demand of microbial decomposers, showed net mineralization during the decomposition process. With the increasing N deposition in the future, mixed litter would potentially promote C and nutrient cycling in grassland ecosystem by increasing litter decomposition and nutrient release.

Keywords: C: N: P stoichiometry, litter decomposition, nitrogen addition, nutrient release

Procedia PDF Downloads 455
28167 Implication to Environmental Education of Indigenous Knowledge and the Ecosystem of Upland Farmers in Aklan, Philippines

Authors: Emily Arangote

Abstract:

This paper defined the association between the indigenous knowledge, cultural practices and the ecosystem its implication to the environmental education to the farmers. Farmers recognize the need for sustainability of the ecosystem they inhabit. The cultural practices of farmers on use of indigenous pest control, use of insect-repellant plants, soil management practices that suppress diseases and harmful pests and conserve soil moisture are deemed to be ecologically-friendly. Indigenous plant materials that were more drought- and pest-resistant were grown. Crop rotation was implemented with various crop seeds to increase their disease resistance. Multi-cropping, planting of perennial crops, categorization of soil and planting of appropriate crops, planting of appropriate and leguminous crops, alloting land as watershed, and preserving traditional palay seed varieties were found to be beneficial in preserving the environment. The study also found that indigenous knowledge about crops are still relevant and useful to the current generation. This ensured the sustainability of our environment and incumbent on policy makers and educators to support and preserve for generations yet to come.

Keywords: cultural practices, ecosystem, environmental education, indigenous knowledge

Procedia PDF Downloads 295
28166 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 244
28165 An Alternative Credit Scoring System in China’s Consumer Lendingmarket: A System Based on Digital Footprint Data

Authors: Minjuan Sun

Abstract:

Ever since the late 1990s, China has experienced explosive growth in consumer lending, especially in short-term consumer loans, among which, the growth rate of non-bank lending has surpassed bank lending due to the development in financial technology. On the other hand, China does not have a universal credit scoring and registration system that can guide lenders during the processes of credit evaluation and risk control, for example, an individual’s bank credit records are not available for online lenders to see and vice versa. Given this context, the purpose of this paper is three-fold. First, we explore if and how alternative digital footprint data can be utilized to assess borrower’s creditworthiness. Then, we perform a comparative analysis of machine learning methods for the canonical problem of credit default prediction. Finally, we analyze, from an institutional point of view, the necessity of establishing a viable and nationally universal credit registration and scoring system utilizing online digital footprints, so that more people in China can have better access to the consumption loan market. Two different types of digital footprint data are utilized to match with bank’s loan default records. Each separately captures distinct dimensions of a person’s characteristics, such as his shopping patterns and certain aspects of his personality or inferred demographics revealed by social media features like profile image and nickname. We find both datasets can generate either acceptable or excellent prediction results, and different types of data tend to complement each other to get better performances. Typically, the traditional types of data banks normally use like income, occupation, and credit history, update over longer cycles, hence they can’t reflect more immediate changes, like the financial status changes caused by the business crisis; whereas digital footprints can update daily, weekly, or monthly, thus capable of providing a more comprehensive profile of the borrower’s credit capabilities and risks. From the empirical and quantitative examination, we believe digital footprints can become an alternative information source for creditworthiness assessment, because of their near-universal data coverage, and because they can by and large resolve the "thin-file" issue, due to the fact that digital footprints come in much larger volume and higher frequency.

Keywords: credit score, digital footprint, Fintech, machine learning

Procedia PDF Downloads 131
28164 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 166
28163 An Approach for Ensuring Data Flow in Freight Delivery and Management Systems

Authors: Aurelija Burinskienė, Dalė Dzemydienė, Arūnas Miliauskas

Abstract:

This research aims at developing the approach for more effective freight delivery and transportation process management. The road congestions and the identification of causes are important, as well as the context information recognition and management. The measure of many parameters during the transportation period and proper control of driver work became the problem. The number of vehicles per time unit passing at a given time and point for drivers can be evaluated in some situations. The collection of data is mainly used to establish new trips. The flow of the data is more complex in urban areas. Herein, the movement of freight is reported in detail, including the information on street level. When traffic density is extremely high in congestion cases, and the traffic speed is incredibly low, data transmission reaches the peak. Different data sets are generated, which depend on the type of freight delivery network. There are three types of networks: long-distance delivery networks, last-mile delivery networks and mode-based delivery networks; the last one includes different modes, in particular, railways and other networks. When freight delivery is switched from one type of the above-stated network to another, more data could be included for reporting purposes and vice versa. In this case, a significant amount of these data is used for control operations, and the problem requires an integrated methodological approach. The paper presents an approach for providing e-services for drivers by including the assessment of the multi-component infrastructure needed for delivery of freights following the network type. The construction of such a methodology is required to evaluate data flow conditions and overloads, and to minimize the time gaps in data reporting. The results obtained show the possibilities of the proposing methodological approach to support the management and decision-making processes with functionality of incorporating networking specifics, by helping to minimize the overloads in data reporting.

Keywords: transportation networks, freight delivery, data flow, monitoring, e-services

Procedia PDF Downloads 99
28162 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 16
28161 Data Access, AI Intensity, and Scale Advantages

Authors: Chuping Lo

Abstract:

This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.

Keywords: digital intensity, digital divide, international trade, scale of economics

Procedia PDF Downloads 37
28160 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data

Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju

Abstract:

Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.

Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding

Procedia PDF Downloads 375
28159 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 225
28158 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 113
28157 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 94
28156 An Application of Contingent Valuation Method in Valuing Protected Area: A Case Study of Pulau Kukup National Parks

Authors: A. Mukrimah, M. Mohd Parid, H. F. Lim

Abstract:

Wetland ecosystem has valuable resources that contribute to national income generation and public well-being, either directly by resources that have a market value or indirectly by resources that have no market value. Economic approach is used to evaluate the resources to determine the best use of wetland resources and should be emphasized in policy development planning. This approach is to prevent imbalance in the allocation of resources and welfare benefits. A case study was conducted in 2016 to assess the economic value of wetland ecosystem services at Pulau Kukup National Parks (PKNP). This study has applied dichotomous choice survey design Contingent Valuation Method (CVM) to investigate empirically the willingness-to-pay (WTP) by the public. The study interviewed 400 household respondents at Pontian, Johor. Analysis showed 81% of household interviewed were willing to contribute to the Wetland Conservation Trust Fund. The results also indicated that on average a household was willing to pay RM87 annually. By taking into account 21,664 households in Pontian district in 2016, public’s contribution to conserves wetland ecosystem at PKNP was calculated to be RM1, 884,334. From the public’s interest to contribute to the conservation of wetland ecosystem services at PKNP, it indicates that more concerted effort is needed by both the federal and state governments to conserve and rehabilitate the mangrove ecosystem in Malaysia.

Keywords: environmental economy, economic valuation, choice experiment, Pulau Kukup national parks

Procedia PDF Downloads 160
28155 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 408
28154 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 65
28153 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network

Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan

Abstract:

Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.

Keywords: aggregation point, data communication, data aggregation, wireless sensor network

Procedia PDF Downloads 126
28152 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data

Procedia PDF Downloads 556
28151 A Comparative Analysis of Thermal Performance of Building Envelope Types over Time

Authors: Aram Yeretzian, Yaser Abunnasr, Zahraa Makki, Betina Abi Habib

Abstract:

Developments in architectural building typologies that are informed by prevalent construction techniques and socio-cultural practices generate different adaptations in the building envelope. While different building envelope types exhibit different climate responsive passive strategies, the individual and comparative thermal performance analysis resulting from these technologies is yet to be understood. This research aims to develop this analysis by selecting three building envelope types from three distinct building traditions by measuring the heat transmission in the city of Beirut. The three typical residential buildings are selected from the 1920s, 1940s, and 1990s within the same street to ensure similar climatic and urban conditions. Climatic data loggers are installed inside and outside of the three locations to measure indoor and outdoor temperatures, relative humidity, and heat flow. The analysis of the thermal measurements is complemented by site surveys on window opening, lighting, and occupancy in the three selected locations and research on building technology from the three periods. Apart from defining the U-value of the building envelopes, the collected data will help evaluate the indoor environments with respect to the thermal comfort zone. This research, thus, validates and contextualizes the role of building technologies in relation to climate responsive design.

Keywords: architecture, wall construction, envelope performance, thermal comfort

Procedia PDF Downloads 209
28150 A NoSQL Based Approach for Real-Time Managing of Robotics's Data

Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir

Abstract:

This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.

Keywords: NoSQL databases, database management systems, robotics, big data

Procedia PDF Downloads 321