Search results for: data weighting
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24972

Search results for: data weighting

24852 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 477
24851 A Hybrid Pareto-Based Swarm Optimization Algorithm for the Multi-Objective Flexible Job Shop Scheduling Problems

Authors: Aydin Teymourifar, Gurkan Ozturk

Abstract:

In this paper, a new hybrid particle swarm optimization algorithm is proposed for the multi-objective flexible job shop scheduling problem that is very important and hard combinatorial problem. The Pareto approach is used for solving the multi-objective problem. Several new local search heuristics are integrated into an algorithm based on the critical block concept to enhance the performance of the algorithm. The algorithm is compared with the recently published multi-objective algorithms based on benchmarks selected from the literature. Several metrics are used for quantifying performance and comparison of the achieved solutions. The algorithms are also compared based on the Weighting summation of objectives approach. The proposed algorithm can find the Pareto solutions more efficiently than the compared algorithms in less computational time.

Keywords: swarm-based optimization, local search, Pareto optimality, flexible job shop scheduling, multi-objective optimization

Procedia PDF Downloads 364
24850 Physical Exertion and Fatigue: A Breakthrough in Choking Sphere

Authors: R. Maher, D. Marchant, F. Fazel

Abstract:

Choking in sport has been defined as ‘an acute performance breakdown’, and is generally explained through a range of contributory antecedents, factors, and explanatory theories. The influence of mental antecedents on an athlete’s performance under pressure has been widely examined through numerous studies. Researchers have only recently begun to investigate the influence of physical effort and associated residual fatigue as a potential contributor to choking in sport. Consequently, the initial aim of the present study was to examine the extent to which both physical exertion and pressure affect free-throw shooting performance. It was hypothesized that the free-throw shooting scores would decline under manipulated conditions. Design and Methods: Using a within-subjects design, 50 student-athletes were assigned to four manipulated conditions: (a) higher pressure-running, (b) higher pressure-no running, (c) lower pressure-running, and (d) lower pressure-no running. The physical exertion was manipulated by including a 56 meter shuttle-run in two of the running conditions. The pressure was manipulated with the presence of an audience, video-recording, performance contingent rewards, and weighting successful shots in the higher pressure conditions. A repeated measure analysis of variance was used to analyse the data. Results: The free-throw performance significantly deteriorated under manipulated physical exertion F (1, 49) = 10.13, p = .003, ηp 2 = .17 and pressure conditions F (1, 49) = 5.25, p = .02, ηp 2 = .09. The lowest free-throw scores were observed in the higher pressure-running condition, whereas the highest free-throw scores were reported in the lower pressure-no running condition. Conclusions: Physical exertion and the associated residual fatigue were contributors to choking. The results of the present study herald a new concept in choking research and yield a practical platform for use by athletes, coaches, and sport psychologists to better manage the psychological and physiological aspects of performance under pressure.

Keywords: anxiety, basketball, choking, fatigue, free-throw shooting, physical exertion

Procedia PDF Downloads 283
24849 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 569
24848 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 338
24847 Knowledge of Sexually Transmitted Infections and Socio-Demographic Factors Affecting High Risk Sex among Unmarried Youths in Nigeria

Authors: Obasanjo Afolabi Bolarinwa

Abstract:

This study assesses the levels of knowledge of sexually transmitted infections among unmarried youths in Nigeria; examines the pattern of high risk sex among unmarried youths in Nigeria; investigate the socio-demographic factors (age, place of residence, religion, level of education, wealth index and employment status) affecting the practice of high-risk sexual behaviour and ascertain the relationships between knowledge of sexually transmitted infections and practice of high risk sex. The goal of the study is to identify the factors associated with the practice of high risk sex among youth. These were with a view to identifying critical actions needed to reduce high risk sexual behaviour among youths. The study employed secondary data. The data for the study were extracted from the 2013 Nigeria Demographic and Health Survey (NDHS). The 2013 NDHS collected information from 38,948 Women ages 15-49 years and 17,359 men ages 15-49. A total of 7,744 female and 6,027 male respondents were utilized in the study. In order to adjust for the effect of oversampling of the population, the weighting factor provided by Measure DHS was applied. The data were analysed using frequency distribution and logistic regression. The results show that both male (92.2%) and female (93.6%) have accurate knowledge of sexually transmitted infections. The study also revealed that prevalence of high risk sexual behavior is high among Nigerian youths; this is evident as 77.7% (female) and 78.4% (male) are engaging in high risk sexual behavior. The bivariate analysis shows that age of respondent (χ2=294.2; p < 0.05), religion (χ2=136.64; p < 0.05), wealth index (χ2=17.38; p < 0.05), level of education (χ2=34.73; p < 0.05) and employment status (χ2=94.54; p < 0.05) were individual factors significantly associated with high risk sexual behaviour among male while age of respondent (χ2=327.07; p < 0.05), place of residence (χ2=6.71; p < 0.05), religion (χ2=81.04; p < 0.05), wealth index (χ2=7.41; p < 0.05), level of education (χ2=18.12; p < 0.05) and employment status (χ2=51.02; p < 0.05) were individual factors significantly associated with high risk sexual behaviour among female. Furthermore, the study shows that there is a relationship between knowledge of sexually transmitted infections and high risk sex among male (χ2=38.32; p < 0.05) and female (χ2=18.37; p < 0.05). At multivariate level, the study revealed that individual characteristics such as age, religion, place of residence, wealth index, levels of education and employment status were statistically significantly related with high risk sexual behaviour among male and female (p < 0.05). Lastly, the study shows that knowledge of sexually transmitted infection was significantly related to high risk sexual behaviour among youths (p < 0.05). The study concludes that there is a high level of knowledge of sexually transmitted infections among unmarried youths in Nigeria. The practice of high risk sex is high among unmarried youths but higher among male youths. The prevalence of high risk sexual activity is higher for males when they are at disadvantage and higher for females when they are at advantage. Socio-demographic factors like age of respondents, religion, wealth index, place of residence, employment status and highest level of education are factors influencing high risk sexual behaviour among youths.

Keywords: high risk sex, wealth index, sexual behaviour, knowledge

Procedia PDF Downloads 250
24846 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 188
24845 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 514
24844 A Practical and Efficient Evaluation Function for 3D Model Based Vehicle Matching

Authors: Yuan Zheng

Abstract:

3D model-based vehicle matching provides a new way for vehicle recognition, localization and tracking. Its key is to construct an evaluation function, also called fitness function, to measure the degree of vehicle matching. The existing fitness functions often poorly perform when the clutter and occlusion exist in traffic scenarios. In this paper, we present a practical and efficient fitness function. Unlike the existing evaluation functions, the proposed fitness function is to study the vehicle matching problem from both local and global perspectives, which exploits the pixel gradient information as well as the silhouette information. In view of the discrepancy between 3D vehicle model and real vehicle, a weighting strategy is introduced to differently treat the fitting of the model’s wireframes. Additionally, a normalization operation for the model’s projection is performed to improve the accuracy of the matching. Experimental results on real traffic videos reveal that the proposed fitness function is efficient and robust to the cluttered background and partial occlusion.

Keywords: 3D-2D matching, fitness function, 3D vehicle model, local image gradient, silhouette information

Procedia PDF Downloads 393
24843 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 318
24842 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 458
24841 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 241
24840 Product Features Extraction from Opinions According to Time

Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou

Abstract:

Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.

Keywords: opinion mining, product feature extraction, sentiment analysis, SentiWordNet

Procedia PDF Downloads 403
24839 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 266
24838 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 357
24837 Cluster-Based Exploration of System Readiness Levels: Mathematical Properties of Interfaces

Authors: Justin Fu, Thomas Mazzuchi, Shahram Sarkani

Abstract:

A key factor in technological immaturity in defense weapons acquisition is lack of understanding critical integrations at the subsystem and component level. To address this shortfall, recent research in integration readiness level (IRL) combines with technology readiness level (TRL) to form a system readiness level (SRL). SRL can be enriched with more robust quantitative methods to provide the program manager a useful tool prior to committing to major weapons acquisition programs. This research harnesses previous mathematical models based on graph theory, Petri nets, and tropical algebra and proposes a modification of the desirable SRL mathematical properties such that a tightly integrated (multitude of interfaces) subsystem can display a lower SRL than an inherently less coupled subsystem. The synthesis of these methods informs an improved decision tool for the program manager to commit to expensive technology development. This research ties the separately developed manufacturing readiness level (MRL) into the network representation of the system and addresses shortfalls in previous frameworks, including the lack of integration weighting and the over-importance of a single extremely immature component. Tropical algebra (based on the minimum of a set of TRLs or IRLs) allows one low IRL or TRL value to diminish the SRL of the entire system, which may not be reflective of actuality if that component is not critical or tightly coupled. Integration connections can be weighted according to importance and readiness levels are modified to be a cardinal scale (based on an analytic hierarchy process). Integration arcs’ importance are dependent on the connected nodes and the additional integrations arcs connected to those nodes. Lack of integration is not represented by zero, but by a perfect integration maturity value. Naturally, the importance (or weight) of such an arc would be zero. To further explore the impact of grouping subsystems, a multi-objective genetic algorithm is then used to find various clusters or communities that can be optimized for the most representative subsystem SRL. This novel calculation is then benchmarked through simulation and using past defense acquisition program data, focusing on the newly introduced Middle Tier of Acquisition (rapidly field prototypes). The model remains a relatively simple, accessible tool, but at higher fidelity and validated with past data for the program manager to decide major defense acquisition program milestones.

Keywords: readiness, maturity, system, integration

Procedia PDF Downloads 88
24836 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 79
24835 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 330
24834 Integrated Grey Rational Analysis-Standard Deviation Method for Handover in Heterogeneous Networks

Authors: Mohanad Alhabo, Naveed Nawaz, Mahmoud Al-Faris

Abstract:

The dense deployment of small cells is a promising solution to enhance the coverage and capacity of the heterogeneous networks (HetNets). However, the unplanned deployment could bring new challenges to the network ranging from interference, unnecessary handovers and handover failures. This will cause a degradation in the quality of service (QoS) delivered to the end user. In this paper, we propose an integrated Grey Rational Analysis Standard Deviation based handover method (GRA-SD) for HetNet. The proposed method integrates the Standard Deviation (SD) technique to acquire the weight of the handover metrics and the GRA method to select the best handover base station. The performance of the GRA-SD method is evaluated and compared with the traditional Multiple Attribute Decision Making (MADM) methods including Simple Additive Weighting (SAW) and VIKOR methods. Results reveal that the proposed method has outperformed the other methods in terms of minimizing the number of frequent unnecessary handovers and handover failures, in addition to improving the energy efficiency.

Keywords: energy efficiency, handover, HetNets, MADM, small cells

Procedia PDF Downloads 112
24833 Bounds on the Laplacian Vertex PI Energy

Authors: Ezgi Kaya, A. Dilek Maden

Abstract:

A topological index is a number related to graph which is invariant under graph isomorphism. In theoretical chemistry, molecular structure descriptors (also called topological indices) are used for modeling physicochemical, pharmacologic, toxicologic, biological and other properties of chemical compounds. Let G be a graph with n vertices and m edges. For a given edge uv, the quantity nu(e) denotes the number of vertices closer to u than v, the quantity nv(e) is defined analogously. The vertex PI index defined as the sum of the nu(e) and nv(e). Here the sum is taken over all edges of G. The energy of a graph is defined as the sum of the eigenvalues of adjacency matrix of G and the Laplacian energy of a graph is defined as the sum of the absolute value of difference of laplacian eigenvalues and average degree of G. In theoretical chemistry, the π-electron energy of a conjugated carbon molecule, computed using the Hückel theory, coincides with the energy. Hence results on graph energy assume special significance. The Laplacian matrix of a graph G weighted by the vertex PI weighting is the Laplacian vertex PI matrix and the Laplacian vertex PI eigenvalues of a connected graph G are the eigenvalues of its Laplacian vertex PI matrix. In this study, Laplacian vertex PI energy of a graph is defined of G. We also give some bounds for the Laplacian vertex PI energy of graphs in terms of vertex PI index, the sum of the squares of entries in the Laplacian vertex PI matrix and the absolute value of the determinant of the Laplacian vertex PI matrix.

Keywords: energy, Laplacian energy, laplacian vertex PI eigenvalues, Laplacian vertex PI energy, vertex PI index

Procedia PDF Downloads 239
24832 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 370
24831 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 83
24830 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 429
24829 Health and Climate Changes: "Ippocrate" a New Alert System to Monitor and Identify High Risk

Authors: A. Calabrese, V. F. Uricchio, D. di Noia, S. Favale, C. Caiati, G. P. Maggi, G. Donvito, D. Diacono, S. Tangaro, A. Italiano, E. Riezzo, M. Zippitelli, M. Toriello, E. Celiberti, D. Festa, A. Colaianni

Abstract:

Climate change has a severe impact on human health. There is a vast literature demonstrating temperature increase is causally related to cardiovascular problem and represents a high risk for human health, but there are not study that improve a solution. In this work, it is studied how the clime influenced the human parameter through the analysis of climatic conditions in an area of the Apulia Region: Capurso Municipality. At the same time, medical personnel involved identified a set of variables useful to define an index describing health condition. These scientific studies are the base of an innovative alert system, IPPOCRATE, whose aim is to asses climate risk and share information to population at risk to support prevention and mitigation actions. IPPOCRATE is an e-health system, it is designed to provide technological support to analysis of health risk related to climate and provide tools for prevention and management of critical events. It is the first integrated system of prevention of human risk caused by climate change. IPPOCRATE calculates risk weighting meteorological data with the vulnerability of monitored subjects and uses mobile and cloud technologies to acquire and share information on different data channels. It is composed of four components: Multichannel Hub. Multichannel Hub is the ICT infrastructure used to feed IPPOCRATE cloud with a different type of data coming from remote monitoring devices, or imported from meteorological databases. Such data are ingested, transformed and elaborated in order to be dispatched towards mobile app and VoIP phone systems. IPPOCRATE Multichannel Hub uses open communication protocols to create a set of APIs useful to interface IPPOCRATE with 3rd party applications. Internally, it uses non-relational paradigm to create flexible and highly scalable database. WeHeart and Smart Application The wearable device WeHeart is equipped with sensors designed to measure following biometric variables: heart rate, systolic blood pressure and diastolic blood pressure, blood oxygen saturation, body temperature and blood glucose for diabetic subjects. WeHeart is designed to be easy of use and non-invasive. For data acquisition, users need only to wear it and connect it to Smart Application by Bluetooth protocol. Easy Box was designed to take advantage from new technologies related to e-health care. EasyBox allows user to fully exploit all IPPOCRATE features. Its name, Easy Box, reveals its purpose of container for various devices that may be included depending on user needs. Territorial Registry is the IPPOCRATE web module reserved to medical personnel for monitoring, research and analysis activities. Territorial Registry allows to access to all information gathered by IPPOCRATE using GIS system in order to execute spatial analysis combining geographical data (climatological information and monitored data) with information regarding the clinical history of users and their personal details. Territorial Registry was designed for different type of users: control rooms managed by wide area health facilities, single health care center or single doctor. Territorial registry manages such hierarchy diversifying the access to system functionalities. IPPOCRATE is the first e-Health system focused on climate risk prevention.

Keywords: climate change, health risk, new technological system

Procedia PDF Downloads 866
24828 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 176
24827 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 848
24826 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 78
24825 Environmental and Socioeconomic Determinants of Climate Change Resilience in Rural Nigeria: Empirical Evidence towards Resilience Building

Authors: Ignatius Madu

Abstract:

The study aims at assessing the environmental and socioeconomic determinants of climate change resilience in rural Nigeria. This is necessary because researches and development efforts on building climate change resilience of rural areas in developing countries are usually made without the knowledge of the impacts of the inherent rural characteristics that determine resilient capacities of the households. This has, in many cases, led to costly mistakes, delayed responses, inaccurate outcomes, and other difficulties. Consequently, this assessment becomes crucial not only to policymakers and people living in risk-prone environments in rural areas but also to fill the research gap. To achieve the aim, secondary data were obtained from the Annual Abstract of Statistics 2017, LSMS-Integrated Surveys on Agriculture and General Household Survey Panel 2015/2016, and National Agriculture Sample Survey (NASS), 2010/2011.Resilience was calculated by weighting and adding the adaptive, absorptive and anticipatory measures of households variables aggregated at state levels and then regressed against rural environmental and socioeconomic characteristics influencing it. From the regression, the coefficients of the variables were used to compute the impacts of the variables using the Stochastic Regression of Impacts on Population, Affluence and Technology (STIRPAT) Model. The results showed that the northern States are generally low in resilient indices and are impacted less by the development indicators. The major determining factors are percentage of non-poor, environmental protection, road transport development, landholding, agricultural input, population density, dependency ratio (inverse), household asserts, education and maternal care. The paper concludes that any effort to a successful resilient building in rural areas of the country should first address these key factors that enhance rural development and wellbeing since it is better to take action before shocks take place.

Keywords: climate change resilience; spatial impacts; STIRPAT model; Nigeria

Procedia PDF Downloads 145
24824 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 102
24823 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 173