Search results for: large amounts of data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 29262

Search results for: large amounts of data

29232 Analyzing Large Scale Recurrent Event Data with a Divide-And-Conquer Approach

Authors: Jerry Q. Cheng

Abstract:

Currently, in analyzing large-scale recurrent event data, there are many challenges such as memory limitations, unscalable computing time, etc. In this research, a divide-and-conquer method is proposed using parametric frailty models. Specifically, the data is randomly divided into many subsets, and the maximum likelihood estimator from each individual data set is obtained. Then a weighted method is proposed to combine these individual estimators as the final estimator. It is shown that this divide-and-conquer estimator is asymptotically equivalent to the estimator based on the full data. Simulation studies are conducted to demonstrate the performance of this proposed method. This approach is applied to a large real dataset of repeated heart failure hospitalizations.

Keywords: big data analytics, divide-and-conquer, recurrent event data, statistical computing

Procedia PDF Downloads 136
29231 Hyperspectral Data Classification Algorithm Based on the Deep Belief and Self-Organizing Neural Network

Authors: Li Qingjian, Li Ke, He Chun, Huang Yong

Abstract:

In this paper, the method of combining the Pohl Seidman's deep belief network with the self-organizing neural network is proposed to classify the target. This method is mainly aimed at the high nonlinearity of the hyperspectral image, the high sample dimension and the difficulty in designing the classifier. The main feature of original data is extracted by deep belief network. In the process of extracting features, adding known labels samples to fine tune the network, enriching the main characteristics. Then, the extracted feature vectors are classified into the self-organizing neural network. This method can effectively reduce the dimensions of data in the spectrum dimension in the preservation of large amounts of raw data information, to solve the traditional clustering and the long training time when labeled samples less deep learning algorithm for training problems, improve the classification accuracy and robustness. Through the data simulation, the results show that the proposed network structure can get a higher classification precision in the case of a small number of known label samples.

Keywords: DBN, SOM, pattern classification, hyperspectral, data compression

Procedia PDF Downloads 314
29230 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering

Authors: K. Umbleja, M. Ichino

Abstract:

Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.

Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis

Procedia PDF Downloads 133
29229 A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures

Authors: Silvina Caíno-Lores, Jesús Carretero

Abstract:

Large scale computing infrastructures have been widely developed with the core objective of providing a suitable platform for high-performance and high-throughput computing. These systems are designed to support resource-intensive and complex applications, which can be found in many scientific and industrial areas. Currently, large scale data-intensive applications are hindered by the high latencies that result from the access to vastly distributed data. Recent works have suggested that improving data locality is key to move towards exascale infrastructures efficiently, as solutions to this problem aim to reduce the bandwidth consumed in data transfers, and the overheads that arise from them. There are several techniques that attempt to move computations closer to the data. In this survey we analyse the different mechanisms that have been proposed to provide data locality for large scale high-performance and high-throughput systems. This survey intends to assist scientific computing community in understanding the various technical aspects and strategies that have been reported in recent literature regarding data locality. As a result, we present an overview of locality-oriented techniques, which are grouped in four main categories: application development, task scheduling, in-memory computing and storage platforms. Finally, the authors include a discussion on future research lines and synergies among the former techniques.

Keywords: data locality, data-centric computing, large scale infrastructures, cloud computing

Procedia PDF Downloads 228
29228 ROOP: Translating Sequential Code Fragments to Distributed Code Fragments Using Deep Reinforcement Learning

Authors: Arun Sanjel, Greg Speegle

Abstract:

Every second, massive amounts of data are generated, and Data Intensive Scalable Computing (DISC) frameworks have evolved into effective tools for analyzing such massive amounts of data. Since the underlying architecture of these distributed computing platforms is often new to users, building a DISC application can often be time-consuming and prone to errors. The automated conversion of a sequential program to a DISC program will consequently significantly improve productivity. However, synthesizing a user’s intended program from an input specification is complex, with several important applications, such as distributed program synthesizing and code refactoring. Existing works such as Tyro and Casper rely entirely on deductive synthesis techniques or similar program synthesis approaches. Our approach is to develop a data-driven synthesis technique to identify sequential components and translate them to equivalent distributed operations. We emphasize using reinforcement learning and unit testing as feedback mechanisms to achieve our objectives.

Keywords: program synthesis, distributed computing, reinforcement learning, unit testing, DISC

Procedia PDF Downloads 72
29227 A Framework for Event-Based Monitoring of Business Processes in the Supply Chain Management of Industry 4.0

Authors: Johannes Atug, Andreas Radke, Mitchell Tseng, Gunther Reinhart

Abstract:

In modern supply chains, large numbers of SKU (Stock-Keeping-Unit) need to be timely managed, and any delays in noticing disruptions of items often limit the ability to defer the impact on customer order fulfillment. However, in supply chains of IoT-connected enterprises, the ERP (Enterprise-Resource-Planning), the MES (Manufacturing-Execution-System) and the SCADA (Supervisory-Control-and-Data-Acquisition) systems generate large amounts of data, which generally glean much earlier notice of deviations in the business process steps. That is, analyzing these streams of data with process mining techniques allows the monitoring of the supply chain business processes and thus identification of items that deviate from the standard order fulfillment process. In this paper, a framework to enable event-based SCM (Supply-Chain-Management) processes including an overview of core enabling technologies are presented, which is based on the RAMI (Reference-Architecture-Model for Industrie 4.0) architecture. The application of this framework in the industry is presented, and implications for SCM in industry 4.0 and further research are outlined.

Keywords: cyber-physical production systems, event-based monitoring, supply chain management, RAMI (Reference-Architecture-Model for Industrie 4.0)

Procedia PDF Downloads 205
29226 Polyphenols from Winery Wastes as Potential Source of Antioxidants

Authors: Lucia Gharwalova, Irena Kolouchova, Jan Masak

Abstract:

A large amount of waste products is generated throughout the whole winemaking process as well as during work in the vineyard. This waste is as a source of phenolic compounds, such as resveratrol and polydatin, which possess a strong antioxidant capacity. Changes in the amounts of phenols were compared depending on the growing conditions and wine variety. Wastes (grape stems, marc and shoots) from two wineries in the Czech Republic were analyzed. Phenols from these samples were extracted by 40% ethanol. The amount of polyphenols in these extracts was determined by HPLC and their antioxidant capacity by DPPH. We compared changes in the amounts of phenols depending on the type of waste and the wine variety. The most significant source of stilbenoids was waste from pruning (shoots). These results show that winery waste could be further reused thanks to their antioxidant content.

Keywords: antioxidants, polyphenols, resveratrol, winery waste

Procedia PDF Downloads 379
29225 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: big data, social networks, sentiment analysis, twitter

Procedia PDF Downloads 540
29224 Holistic Risk Assessment Based on Continuous Data from the User’s Behavior and Environment

Authors: Cinzia Carrodano, Dimitri Konstantas

Abstract:

Risk is part of our lives. In today’s society risk is connected to our safety and safety has become a major priority in our life. Each person lives his/her life based on the evaluation of the risk he/she is ready to accept and sustain, and the level of safety he/she wishes to reach, based on highly personal criteria. The assessment of risk a person takes in a complex environment and the impact of actions of other people’actions and events on our perception of risk are alements to be considered. The concept of Holistic Risk Assessment (HRA) aims in developing a methodology and a model that will allow us to take into account elements outside the direct influence of the individual, and provide a personalized risk assessment. The concept is based on the fact that in the near future, we will be able to gather and process extremely large amounts of data about an individual and his/her environment in real time. The interaction and correlation of these data is the key element of the holistic risk assessment. In this paper, we present the HRA concept and describe the most important elements and considerations.

Keywords: continuous data, dynamic risk, holistic risk assessment, risk concept

Procedia PDF Downloads 89
29223 Design and Development of an Algorithm to Predict Fluctuations of Currency Rates

Authors: Nuwan Kuruwitaarachchi, M. K. M. Peiris, C. N. Madawala, K. M. A. R. Perera, V. U. N Perera

Abstract:

Dealing with businesses with the foreign market always took a special place in a country’s economy. Political and social factors came into play making currency rate changes fluctuate rapidly. Currency rate prediction has become an important factor for larger international businesses since large amounts of money exchanged between countries. This research focuses on comparing the accuracy of mainly three models; Autoregressive Integrated Moving Average (ARIMA), Artificial Neural Networks(ANN) and Support Vector Machines(SVM). series of data import, export, USD currency exchange rate respect to LKR has been selected for training using above mentioned algorithms. After training the data set and comparing each algorithm, it was able to see that prediction in SVM performed better than other models. It was improved more by combining SVM and SVR models together.

Keywords: ARIMA, ANN, FFNN, RMSE, SVM, SVR

Procedia PDF Downloads 162
29222 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 161
29221 Cloud Design for Storing Large Amount of Data

Authors: M. Strémy, P. Závacký, P. Cuninka, M. Juhás

Abstract:

Main goal of this paper is to introduce our design of private cloud for storing large amount of data, especially pictures, and to provide good technological backend for data analysis based on parallel processing and business intelligence. We have tested hypervisors, cloud management tools, storage for storing all data and Hadoop to provide data analysis on unstructured data. Providing high availability, virtual network management, logical separation of projects and also rapid deployment of physical servers to our environment was also needed.

Keywords: cloud, glusterfs, hadoop, juju, kvm, maas, openstack, virtualization

Procedia PDF Downloads 329
29220 Comparison of Stationary and Two-Axis Tracking System of 50MW Photovoltaic Power Plant in Al-Kufra, Libya: Landscape Impact and Performance

Authors: Yasser Aldali

Abstract:

The scope of this paper is to evaluate and compare the potential of LS-PV (Large Scale Photovoltaic Power Plant) power generation systems in the southern region of Libya at Al-Kufra for both stationary and tracking systems. A Microsoft Excel-VBA program has been developed to compute slope radiation, dew-point, sky temperature, and then cell temperature, maximum power output and module efficiency of the system for stationary system and for tracking system. The results for energy production show that the total energy output is 114GWh/year for stationary system and 148 GWh/year for tracking system. The average module efficiency for the stationary system is 16.6% and 16.2% for the tracking system. The values of electricity generation capacity factor (CF) and solar capacity factor (SCF) for stationary system were found to be 26% and 62.5% respectively and 34% and 82% for tracking system. The GCR (Ground Cover Ratio) for a stationary system is 0.7, which corresponds to a tilt angle of 24°. The GCR for tracking system was found to be 0.12. The estimated ground area needed to build a 50MW PV plant amounts to approx. 0.55 km2 for a stationary PV field constituted by HIT PV arrays and approx. 91 MW/km2. In case of a tracker PV field, the required ground area amounts approx. 2.4k m2 and approx. 20.5 MW/km2.

Keywords: large scale photovoltaic power plant, two-axis tracking system, stationary system, landscape impact

Procedia PDF Downloads 421
29219 Mathematical modeling of the calculation of the absorbed dose in uranium production workers with the genetic effects.

Authors: P. Kazymbet, G. Abildinova, K.Makhambetov, M. Bakhtin, D. Rybalkina, K. Zhumadilov

Abstract:

Conducted cytogenetic research in workers Stepnogorsk Mining-Chemical Combine (Akmola region) with the study of 26341 chromosomal metaphase. Using a regression analysis with program DataFit, version 5.0, dependence between exposure dose and the following cytogenetic exponents has been studied: frequency of aberrant cells, frequency of chromosomal aberrations, frequency of the amounts of dicentric chromosomes, and centric rings. Experimental data on calibration curves "dose-effect" enabled the development of a mathematical model, allowing on data of the frequency of aberrant cells, chromosome aberrations, the amounts of dicentric chromosomes and centric rings calculate the absorbed dose at the time of the study. In the dose range of 0.1 Gy to 5.0 Gy dependence cytogenetic parameters on the dose had the following equation: Y = 0,0067е^0,3307х (R2 = 0,8206) – for frequency of chromosomal aberrations; Y = 0,0057е^0,3161х (R2 = 0,8832) –for frequency of cells with chromosomal aberrations; Y =5 Е-0,5е^0,6383 (R2 = 0,6321) – or frequency of the amounts of dicentric chromosomes and centric rings on cells. On the basis of cytogenetic parameters and regression equations calculated absorbed dose in workers of uranium production at the time of the study did not exceed 0.3 Gy.

Keywords: Stepnogorsk, mathematical modeling, cytogenetic, dicentric chromosomes

Procedia PDF Downloads 447
29218 Large Time Asymptotic Behavior to Solutions of a Forced Burgers Equation

Authors: Satyanarayana Engu, Ahmed Mohd, V. Murugan

Abstract:

We study the large time asymptotics of solutions to the Cauchy problem for a forced Burgers equation (FBE) with the initial data, which is continuous and summable on R. For which, we first derive explicit solutions of FBE assuming a different class of initial data in terms of Hermite polynomials. Later, by violating this assumption we prove the existence of a solution to the considered Cauchy problem. Finally, we give an asymptotic approximate solution and establish that the error will be of order O(t^(-1/2)) with respect to L^p -norm, where 1≤p≤∞, for large time.

Keywords: Burgers equation, Cole-Hopf transformation, Hermite polynomials, large time asymptotics

Procedia PDF Downloads 295
29217 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 311
29216 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines

Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay

Abstract:

One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.

Keywords: big data, data analytics, higher education, republic of the philippines, assessment

Procedia PDF Downloads 311
29215 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0

Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini

Abstract:

Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.

Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling

Procedia PDF Downloads 67
29214 Knowledge Representation and Inconsistency Reasoning of Class Diagram Maintenance in Big Data

Authors: Chi-Lun Liu

Abstract:

Requirements modeling and analysis are important in successful information systems' maintenance. Unified Modeling Language (UML) class diagrams are useful standards for modeling information systems. To our best knowledge, there is a lack of a systems development methodology described by the organism metaphor. The core concept of this metaphor is adaptation. Using the knowledge representation and reasoning approach and ontologies to adopt new requirements are emergent in recent years. This paper proposes an organic methodology which is based on constructivism theory. This methodology is a knowledge representation and reasoning approach to analyze new requirements in the class diagrams maintenance. The process and rules in the proposed methodology automatically analyze inconsistencies in the class diagram. In the big data era, developing an automatic tool based on the proposed methodology to analyze large amounts of class diagram data is an important research topic in the future.

Keywords: knowledge representation, reasoning, ontology, class diagram, software engineering

Procedia PDF Downloads 210
29213 Healthcare Data Mining Innovations

Authors: Eugenia Jilinguirian

Abstract:

In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.

Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database

Procedia PDF Downloads 40
29212 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 402
29211 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule

Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu

Abstract:

Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.

Keywords: instance selection, data reduction, MapReduce, kNN

Procedia PDF Downloads 232
29210 Governance, Risk Management, and Compliance Factors Influencing the Adoption of Cloud Computing in Australia

Authors: Tim Nedyalkov

Abstract:

A business decision to move to the cloud brings fundamental changes in how an organization develops and delivers its Information Technology solutions. The accelerated pace of digital transformation across businesses and government agencies increases the reliance on cloud-based services. They are collecting, managing, and retaining large amounts of data in cloud environments makes information security and data privacy protection essential. It becomes even more important to understand what key factors drive successful cloud adoption following the commencement of the Privacy Amendment Notifiable Data Breaches (NDB) Act 2017 in Australia as the regulatory changes impact many organizations and industries. This quantitative correlational research investigated the governance, risk management, and compliance factors contributing to cloud security success. The factors influence the adoption of cloud computing within an organizational context after the commencement of the NDB scheme. The results and findings demonstrated that corporate information security policies, data storage location, management understanding of data governance responsibilities, and regular compliance assessments are the factors influencing cloud computing adoption. The research has implications for organizations, future researchers, practitioners, policymakers, and cloud computing providers to meet the rapidly changing regulatory and compliance requirements.

Keywords: cloud compliance, cloud security, data governance, privacy protection

Procedia PDF Downloads 93
29209 Development and Power Characterization of an IoT Network for Agricultural Imaging Applications

Authors: Jacob Wahl, Jane Zhang

Abstract:

This paper describes the development and characterization of a prototype IoT network for use with agricultural imaging and monitoring applications. The sensor and gateway nodes are designed using the ESP32 SoC with integrated Bluetooth Low Energy 4.2 and Wi-Fi. A development board, the Arducam IoTai ESP32, is used for prototyping, testing, and power measurements. Google’s Firebase is used as the cloud storage site for image data collected by the sensor. The sensor node captures images using the OV2640 2MP camera module and transmits the image data to the gateway via Bluetooth Low Energy. The gateway then uploads the collected images to Firebase via a known nearby Wi-Fi network connection. This image data can then be processed and analyzed by computer vision and machine learning pipelines to assess crop growth or other needs. The sensor node achieves a wireless transmission data throughput of 220kbps while consuming 150mA of current; the sensor sleeps at 162µA. The sensor node device lifetime is estimated to be 682 days on a 6600mAh LiPo battery while acquiring five images per day based on the development board power measurements. This network can be utilized by any application that requires high data rates, low power consumption, short-range communication, and large amounts of data to be transmitted at low-frequency intervals.

Keywords: Bluetooth low energy, ESP32, firebase cloud, IoT, smart farming

Procedia PDF Downloads 111
29208 An Architecture Based on Capsule Networks for the Identification of Handwritten Signature Forgery

Authors: Luisa Mesquita Oliveira Ribeiro, Alexei Manso Correa Machado

Abstract:

Handwritten signature is a unique form for recognizing an individual, used to discern documents, carry out investigations in the criminal, legal, banking areas and other applications. Signature verification is based on large amounts of biometric data, as they are simple and easy to acquire, among other characteristics. Given this scenario, signature forgery is a worldwide recurring problem and fast and precise techniques are needed to prevent crimes of this nature from occurring. This article carried out a study on the efficiency of the Capsule Network in analyzing and recognizing signatures. The chosen architecture achieved an accuracy of 98.11% and 80.15% for the CEDAR and GPDS databases, respectively.

Keywords: biometrics, deep learning, handwriting, signature forgery

Procedia PDF Downloads 45
29207 Assessing Performance of Data Augmentation Techniques for a Convolutional Network Trained for Recognizing Humans in Drone Images

Authors: Masood Varshosaz, Kamyar Hasanpour

Abstract:

In recent years, we have seen growing interest in recognizing humans in drone images for post-disaster search and rescue operations. Deep learning algorithms have shown great promise in this area, but they often require large amounts of labeled data to train the models. To keep the data acquisition cost low, augmentation techniques can be used to create additional data from existing images. There are many techniques of such that can help generate variations of an original image to improve the performance of deep learning algorithms. While data augmentation is potentially assumed to improve the accuracy and robustness of the models, it is important to ensure that the performance gains are not outweighed by the additional computational cost or complexity of implementing the techniques. To this end, it is important to evaluate the impact of data augmentation on the performance of the deep learning models. In this paper, we evaluated the most currently available 2D data augmentation techniques on a standard convolutional network which was trained for recognizing humans in drone images. The techniques include rotation, scaling, random cropping, flipping, shifting, and their combination. The results showed that the augmented models perform 1-3% better compared to a base network. However, as the augmented images only contain the human parts already visible in the original images, a new data augmentation approach is needed to include the invisible parts of the human body. Thus, we suggest a new method that employs simulated 3D human models to generate new data for training the network.

Keywords: human recognition, deep learning, drones, disaster mitigation

Procedia PDF Downloads 65
29206 An Analysis of Privacy and Security for Internet of Things Applications

Authors: Dhananjay Singh, M. Abdullah-Al-Wadud

Abstract:

The Internet of Things is a concept of a large scale ecosystem of wireless actuators. The actuators are defined as things in the IoT, those which contribute or produces some data to the ecosystem. However, ubiquitous data collection, data security, privacy preserving, large volume data processing, and intelligent analytics are some of the key challenges into the IoT technologies. In order to solve the security requirements, challenges and threats in the IoT, we have discussed a message authentication mechanism for IoT applications. Finally, we have discussed data encryption mechanism for messages authentication before propagating into IoT networks.

Keywords: Internet of Things (IoT), message authentication, privacy, security

Procedia PDF Downloads 346
29205 Distributional and Developmental Analysis of PM2.5 in Beijing, China

Authors: Alexander K. Guo

Abstract:

PM2.5 poses a large threat to people’s health and the environment and is an issue of large concern in Beijing, brought to the attention of the government by the media. In addition, both the United States Embassy in Beijing and the government of China have increased monitoring of PM2.5 in recent years, and have made real-time data available to the public. This report utilizes hourly historical data (2008-2016) from the U.S. Embassy in Beijing for the first time. The first objective was to attempt to fit probability distributions to the data to better predict a number of days exceeding the standard, and the second was to uncover any yearly, seasonal, monthly, daily, and hourly patterns and trends that may arise to better understand of air control policy. In these data, 66,650 hours and 2687 days provided valid data. Lognormal, gamma, and Weibull distributions were fit to the data through an estimation of parameters. The Chi-squared test was employed to compare the actual data with the fitted distributions. The data were used to uncover trends, patterns, and improvements in PM2.5 concentration over the period of time with valid data in addition to specific periods of time that received large amounts of media attention, analyzed to gain a better understanding of causes of air pollution. The data show a clear indication that Beijing’s air quality is unhealthy, with an average of 94.07µg/m3 across all 66,650 hours with valid data. It was found that no distribution fit the entire dataset of all 2687 days well, but each of the three above distribution types was optimal in at least one of the yearly data sets, with the lognormal distribution found to fit recent years better. An improvement in air quality beginning in 2014 was discovered, with the first five months of 2016 reporting an average PM2.5 concentration that is 23.8% lower than the average of the same period in all years, perhaps the result of various new pollution-control policies. It was also found that the winter and fall months contained more days in both good and extremely polluted categories, leading to a higher average but a comparable median in these months. Additionally, the evening hours, especially in the winter, reported much higher PM2.5 concentrations than the afternoon hours, possibly due to the prohibition of trucks in the city in the daytime and the increased use of coal for heating in the colder months when residents are home in the evening. Lastly, through analysis of special intervals that attracted media attention for either unnaturally good or bad air quality, the government’s temporary pollution control measures, such as more intensive road-space rationing and factory closures, are shown to be effective. In summary, air quality in Beijing is improving steadily and do follow standard probability distributions to an extent, but still needs improvement. Analysis will be updated when new data become available.

Keywords: Beijing, distribution, patterns, pm2.5, trends

Procedia PDF Downloads 220
29204 Influence of Sewage Sludge on Agricultural Land Quality and Crop

Authors: Catalina Iticescu, Lucian P. Georgescu, Mihaela Timofti, Gabriel Murariu

Abstract:

Since the accumulation of large quantities of sewage sludge is producing serious environmental problems, numerous environmental specialists are looking for solutions to solve this problem. The sewage sludge obtained by treatment of municipal wastewater may be used as fertiliser on agricultural soils because such sludge contains large amounts of nitrogen, phosphorus and organic matter. In many countries, sewage sludge is used instead of chemical fertilizers in agriculture, this being the most feasible method to reduce the increasingly larger quantities of sludge. The use of sewage sludge on agricultural soils is allowed only with a strict monitoring of their physical and chemical parameters, because heavy metals exist in varying amounts in sewage sludge. Exceeding maximum permitted quantities of harmful substances may lead to pollution of agricultural soil and may cause their removal aside because the plants may take up the heavy metals existing in soil and these metals will most probably be found in humans and animals through food. The sewage sludge analyzed for the present paper was extracted from the Wastewater Treatment Station (WWTP) Galati, Romania. The physico-chemical parameters determined were: pH (upH), total organic carbon (TOC) (mg L⁻¹), N-total (mg L⁻¹), P-total (mg L⁻¹), N-NH₄ (mg L⁻¹), N-NO₂ (mg L⁻¹), N-NO₃ (mg L⁻¹), Fe-total (mg L⁻¹), Cr-total (mg L⁻¹), Cu (mg L⁻¹), Zn (mg L⁻¹), Cd (mg L⁻¹), Pb (mg L⁻¹), Ni (mg L⁻¹). The determination methods were electrometrical (pH, C, TSD) - with a portable HI 9828 HANNA electrodes committed multiparameter and spectrophotometric - with a Spectroquant NOVA 60 - Merck spectrophotometer and with specific Merck parameter kits. The tests made pointed out the fact that the sludge analysed is low heavy metal falling within the legal limits, the quantities of metals measured being much lower than the maximum allowed. The results of the tests made to determine the content of nutrients in the sewage sludge have shown that the existing nutrients may be used to increase the fertility of agricultural soils. Other tests were carried out on lands where sewage sludge was applied in order to establish the maximum quantity of sludge that may be used so as not to constitute a source of pollution. The tests were made on three plots: a first batch with no mud and no chemical fertilizers applied, a second batch on which only sewage sludge was applied, and a third batch on which small amounts of chemical fertilizers were applied in addition to sewage sludge. The results showed that the production increases when the soil is treated with sludge and small amounts of chemical fertilizers. Based on the results of the present research, a fertilization plan has been suggested. This plan should be reconsidered each year based on the crops planned, the yields proposed, the agrochemical indications, the sludge analysis, etc.

Keywords: agricultural use, crops, physico–chemical parameters, sewage sludge

Procedia PDF Downloads 260
29203 A Theoretical Model for Pattern Extraction in Large Datasets

Authors: Muhammad Usman

Abstract:

Pattern extraction has been done in past to extract hidden and interesting patterns from large datasets. Recently, advancements are being made in these techniques by providing the ability of multi-level mining, effective dimension reduction, advanced evaluation and visualization support. This paper focuses on reviewing the current techniques in literature on the basis of these parameters. Literature review suggests that most of the techniques which provide multi-level mining and dimension reduction, do not handle mixed-type data during the process. Patterns are not extracted using advanced algorithms for large datasets. Moreover, the evaluation of patterns is not done using advanced measures which are suited for high-dimensional data. Techniques which provide visualization support are unable to handle a large number of rules in a small space. We present a theoretical model to handle these issues. The implementation of the model is beyond the scope of this paper.

Keywords: association rule mining, data mining, data warehouses, visualization of association rules

Procedia PDF Downloads 198