Search results for: data filtering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24551

Search results for: data filtering

24281 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods

Procedia PDF Downloads 342
24280 Particle Filter Implementation of a Non-Linear Dynamic Fall Model

Authors: T. Kobayashi, K. Shiba, T. Kaburagi, Y. Kurihara

Abstract:

For the elderly living alone, falls can be a serious problem encountered in daily life. Some elderly people are unable to stand up without the assistance of a caregiver. They may become unconscious after a fall, which can lead to serious aftereffects such as hypothermia, dehydration, and sometimes even death. We treat the subject as an inverted pendulum and model its angle from the equilibrium position and its angular velocity. As the model is non-linear, we implement the filtering method with a particle filter which can estimate true states of the non-linear model. In order to evaluate the accuracy of the particle filter estimation results, we calculate the root mean square error (RMSE) between the estimated angle/angular velocity and the true values generated by the simulation. The experimental results give the highest accuracy RMSE of 0.0141 rad and 0.1311 rad/s for the angle and angular velocity, respectively.

Keywords: fall, microwave Doppler sensor, non-linear dynamics model, particle filter

Procedia PDF Downloads 191
24279 Access to Health Data in Medical Records in Indonesia in Terms of Personal Data Protection Principles: The Limitation and Its Implication

Authors: Anny Retnowati, Elisabeth Sundari

Abstract:

This research aims to elaborate the meaning of personal data protection principles on patient access to health data in medical records in Indonesia and its implications. The method uses normative legal research by examining health law in Indonesia regarding the patient's right to access their health data in medical records. The data will be analysed qualitatively using the interpretation method to elaborate on the limitation of the meaning of personal data protection principles on patients' access to their data in medical records. The results show that patients only have the right to obtain copies of their health data in medical records. There is no right to inspect directly at any time. Indonesian health law limits the principle of patients' right to broad access to their health data in medical records. This restriction has implications for the reduction of personal data protection as part of human rights. This research contribute to show that a limitaion of personal data protection may abuse the human rights.

Keywords: access, health data, medical records, personal data, protection

Procedia PDF Downloads 65
24278 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: data management, digitization, industry 4.0, knowledge engineering, metamodel

Procedia PDF Downloads 335
24277 Emphasis on Difference: Ethnic and National Cultural Heritage Identities and Issues in East Asia Focusing on Korea Cases

Authors: Hyuk-Jin Lee

Abstract:

Even though 23 years have passed in the 21st century, nation-state and nationality-centered cultural identities are still the sentiments and ideologies that dominate the world. Nevertheless, as seen in many cases in Europe, a new perspective is needed to recognize mutual exchanges and influences and to view them as natural cultural exchanges between countries. The situation in East Asia is completely different from Europe. This is presumed to be from the long tradition of having an ethnocentric state concept for at least hundreds of years, quite different from Europe, where the concept of a nation-state was established relatively recently. In other words, unlike Europe, where active exchanges took place, the problem stems from the unique characteristics of East Asia, which has a strong tradition of finding its identity in 'difference'. Thus, it would not be hard to find cultural studies or news of the three East Asian countries emphasizing differences among one another. This applies to all cultural areas, including traditional architecture. For example, in the Korean traditional architecture field, buildings with effects from neighboring countries tend to be ignored, even if they are traditional Korean architecture. In addition to this, in the case of Korea, there seems to be one more cultural harmful aftereffect caused by the 36 years of Japanese colonial rule in the early 20th century; the obsessive filtering concept of 'it must be different from Japan'. In other words, the implicit ideological coercion that the definition of 'Korean cultural heritage' should not be influenced by exchanges with Japan may be found throughout Korean studies. The architectural and cultural aspects of the vast period of time, from the Three Kingdoms era to the beginning of Joseon, which was a period in which cultural influence exchanges with neighboring countries were relatively strong compared to the late Joseon Dynasty, also reflect the 'distorted filtering' caused by finding a repulsive identity against the Japanese colonial period. It is important to look the cultural heritage and traditions as they are inductively, not deductively. If not, we may often ignore or limit our own precious cultural heritage. Conversely, If Baekje, the ancient Korean Kingdom, helped Japan in construction and craftsmen played a big role in building the ancient temple, it would be a healthier perspective to view it as a cultural exchange rather than proudly seeing it as a cultural owner's perspective because this point of view is a proper reconstruction of our ancient and medieval Asian culture (strictly speaking, the color common to East Asia at the time). In particular, this study will examine this topic by giving specific examples from each field of Korean cultural studies. In the search for cultural identity, it would be more helpful for healthy relations between countries and collaborative research in the sensitive part of the interpretation of historical facts as well as cultural circles to minimize excessive meanings on originality and difference.

Keywords: cultural heritage identity, cultural ideology, East Asia, Korea

Procedia PDF Downloads 57
24276 Robust Inference with a Skew T Distribution

Authors: M. Qamarul Islam, Ergun Dogan, Mehmet Yazici

Abstract:

There is a growing body of evidence that non-normal data is more prevalent in nature than the normal one. Examples can be quoted from, but not restricted to, the areas of Economics, Finance and Actuarial Science. The non-normality considered here is expressed in terms of fat-tailedness and asymmetry of the relevant distribution. In this study a skew t distribution that can be used to model a data that exhibit inherent non-normal behavior is considered. This distribution has tails fatter than a normal distribution and it also exhibits skewness. Although maximum likelihood estimates can be obtained by solving iteratively the likelihood equations that are non-linear in form, this can be problematic in terms of convergence and in many other respects as well. Therefore, it is preferred to use the method of modified maximum likelihood in which the likelihood estimates are derived by expressing the intractable non-linear likelihood equations in terms of standardized ordered variates and replacing the intractable terms by their linear approximations obtained from the first two terms of a Taylor series expansion about the quantiles of the distribution. These estimates, called modified maximum likelihood estimates, are obtained in closed form. Hence, they are easy to compute and to manipulate analytically. In fact the modified maximum likelihood estimates are equivalent to maximum likelihood estimates, asymptotically. Even in small samples the modified maximum likelihood estimates are found to be approximately the same as maximum likelihood estimates that are obtained iteratively. It is shown in this study that the modified maximum likelihood estimates are not only unbiased but substantially more efficient than the commonly used moment estimates or the least square estimates that are known to be biased and inefficient in such cases. Furthermore, in conventional regression analysis, it is assumed that the error terms are distributed normally and, hence, the well-known least square method is considered to be a suitable and preferred method for making the relevant statistical inferences. However, a number of empirical researches have shown that non-normal errors are more prevalent. Even transforming and/or filtering techniques may not produce normally distributed residuals. Here, a study is done for multiple linear regression models with random error having non-normal pattern. Through an extensive simulation it is shown that the modified maximum likelihood estimates of regression parameters are plausibly robust to the distributional assumptions and to various data anomalies as compared to the widely used least square estimates. Relevant tests of hypothesis are developed and are explored for desirable properties in terms of their size and power. The tests based upon modified maximum likelihood estimates are found to be substantially more powerful than the tests based upon least square estimates. Several examples are provided from the areas of Economics and Finance where such distributions are interpretable in terms of efficient market hypothesis with respect to asset pricing, portfolio selection, risk measurement and capital allocation, etc.

Keywords: least square estimates, linear regression, maximum likelihood estimates, modified maximum likelihood method, non-normality, robustness

Procedia PDF Downloads 385
24275 Analysis and Forecasting of Bitcoin Price Using Exogenous Data

Authors: J-C. Leneveu, A. Chereau, L. Mansart, T. Mesbah, M. Wyka

Abstract:

Extracting and interpreting information from Big Data represent a stake for years to come in several sectors such as finance. Currently, numerous methods are used (such as Technical Analysis) to try to understand and to anticipate market behavior, with mixed results because it still seems impossible to exactly predict a financial trend. The increase of available data on Internet and their diversity represent a great opportunity for the financial world. Indeed, it is possible, along with these standard financial data, to focus on exogenous data to take into account more macroeconomic factors. Coupling the interpretation of these data with standard methods could allow obtaining more precise trend predictions. In this paper, in order to observe the influence of exogenous data price independent of other usual effects occurring in classical markets, behaviors of Bitcoin users are introduced in a model reconstituting Bitcoin value, which is elaborated and tested for prediction purposes.

Keywords: big data, bitcoin, data mining, social network, financial trends, exogenous data, global economy, behavioral finance

Procedia PDF Downloads 339
24274 Adaptive Online Object Tracking via Positive and Negative Models Matching

Authors: Shaomei Li, Yawen Wang, Chao Gao

Abstract:

To improve tracking drift which often occurs in adaptive tracking, an algorithm based on the fusion of tracking and detection is proposed in this paper. Firstly, object tracking is posed as a binary classification problem and is modeled by partial least squares (PLS) analysis. Secondly, tracking object frame by frame via particle filtering. Thirdly, validating the tracking reliability based on both positive and negative models matching. Finally, relocating the object based on SIFT features matching and voting when drift occurs. Object appearance model is updated at the same time. The algorithm cannot only sense tracking drift but also relocate the object whenever needed. Experimental results demonstrate that this algorithm outperforms state-of-the-art algorithms on many challenging sequences.

Keywords: object tracking, tracking drift, partial least squares analysis, positive and negative models matching

Procedia PDF Downloads 501
24273 On the Combination of Patient-Generated Data with Data from a Secure Clinical Network Environment: A Practical Example

Authors: Jeroen S. de Bruin, Karin Schindler, Christian Schuh

Abstract:

With increasingly more mobile health applications appearing due to the popularity of smartphones, the possibility arises that these data can be used to improve the medical diagnostic process, as well as the overall quality of healthcare, while at the same time lowering costs. However, as of yet there have been no reports of a successful combination of patient-generated data from smartphones with data from clinical routine. In this paper, we describe how these two types of data can be combined in a secure way without modification to hospital information systems, and how they can together be used in a medical expert system for automatic nutritional classification and triage.

Keywords: mobile health, data integration, expert systems, disease-related malnutrition

Procedia PDF Downloads 461
24272 The Prospects of Leveraging (Big) Data for Accelerating a Just Sustainable Transition around Different Contexts

Authors: Sombol Mokhles

Abstract:

This paper tries to show the prospects of utilising (big)data for enabling just the transition of diverse cities. Our key purpose is to offer a framework of applications and implications of utlising (big) data in comparing sustainability transitions across different cities. Relying on the cosmopolitan comparison, this paper explains the potential application of (big) data but also its limitations. The paper calls for adopting a data-driven and just perspective in including different cities around the world. Having a just and inclusive approach at the front and centre ensures a just transition with synergistic effects that leave nobody behind.

Keywords: big data, just sustainable transition, cosmopolitan city comparison, cities

Procedia PDF Downloads 83
24271 Strategic Workplace Security: The Role of Malware and the Threat of Internal Vulnerability

Authors: Modesta E. Ezema, Christopher C. Ezema, Christian C. Ugwu, Udoka F. Eze, Florence M. Babalola

Abstract:

Some employees knowingly or unknowingly contribute to loss of data and also expose data to threat in the process of getting their jobs done. Many organizations today are faced with the challenges of how to secure their data as cyber criminals constantly devise new ways of attacking the organization’s secret data. However, this paper enlists the latest strategies that must be put in place in order to protect these important data from being attacked in a collaborative work place. It also introduces us to Advanced Persistent Threats (APTs) and how it works. The empirical study was conducted to collect data from the employee in data centers on how data could be protected from malicious codes and cyber criminals and their responses are highly considered to help checkmate the activities of malicious code and cyber criminals in our work places.

Keywords: data, employee, malware, work place

Procedia PDF Downloads 362
24270 Acceptance of Big Data Technologies and Its Influence towards Employee’s Perception on Job Performance

Authors: Jia Yi Yap, Angela S. H. Lee

Abstract:

With the use of big data technologies, organization can get result that they are interested in. Big data technologies simply load all the data that is useful for the organizations and provide organizations a better way of analysing data. The purpose of this research is to get employees’ opinion from films in Malaysia to explore the use of big data technologies in their organization in order to provide how it may affect the perception of the employees on job performance. Therefore, in order to identify will accepting big data technologies in the organization affect the perception of the employee, questionnaire will be distributed to different employee from different Small and medium-sized enterprises (SME) organization listed in Malaysia. The conceptual model proposed will test with other variables in order to see the relationship between variables.

Keywords: big data technologies, employee, job performance, questionnaire

Procedia PDF Downloads 275
24269 Data Poisoning Attacks on Federated Learning and Preventive Measures

Authors: Beulah Rani Inbanathan

Abstract:

In the present era, it is vivid from the numerous outcomes that data privacy is being compromised in various ways. Machine learning is one technology that uses the centralized server, and then data is given as input which is being analyzed by the algorithms present on this mentioned server, and hence outputs are predicted. However, each time the data must be sent by the user as the algorithm will analyze the input data in order to predict the output, which is prone to threats. The solution to overcome this issue is federated learning, where the models alone get updated while the data resides on the local machine and does not get exchanged with the other local models. Nevertheless, even on these local models, there are chances of data poisoning, and it is crystal clear from various experiments done by many people. This paper delves into many ways where data poisoning occurs and the many methods through which it is prevalent that data poisoning still exists. It includes the poisoning attacks on IoT devices, Edge devices, Autoregressive model, and also, on Industrial IoT systems and also, few points on how these could be evadible in order to protect our data which is personal, or sensitive, or harmful when exposed.

Keywords: data poisoning, federated learning, Internet of Things, edge computing

Procedia PDF Downloads 69
24268 Performance Analysis of Shunt Active Power Filter for Various Reference Current Generation Techniques

Authors: Vishal V. Choudhari, Gaurao A. Dongre, S. P. Diwan

Abstract:

A number of reference current generation have been developed for analysis of shunt active power filter to mitigate the load compensation. Depending upon the type of load the technique has to be chosen. In this paper, six reference current generation techniques viz. instantaneous reactive power theory(IRP), Synchronous reference frame theory(SRF), Perfect harmonic cancellation(PHC), Unity power factor method(UPF), Self-tuning filter method(STF), Predictive filtering method(PFM) are compared for different operating conditions. The harmonics are introduced because of non-linear loads in the system. These harmonics are eliminated using above techniques. The results and performance of system simulated on MATLAB/Simulink platform. The system is experimentally implemented using DS1104 card of dSPACE system.

Keywords: SAPF, power quality, THD, IRP, SRF, dSPACE module DS1104

Procedia PDF Downloads 564
24267 Artificial Neural Networks Controller for Active Power Filter Connected to a Photovoltaic Array

Authors: Rachid Dehini, Brahim Berbaoui

Abstract:

The main objectives of shunt active power filter (SAPF) is to preserve the power system from unwanted harmonic currents produced by nonlinear loads, as well as to compensate the reactive power. The aim of this paper is to present a (PAPF) supplied by the Photovoltaic cells ,in such a way that the (PAPF) feeds the linear and nonlinear loads by harmonics currents and the excess of the energy is injected into the power system. In order to improve the performances of conventional (PAPF) This paper also proposes artificial neural networks (ANN) for harmonics identification and DC link voltage control. The simulation study results of the new (SAPF) identification technique are found quite satisfactory by assuring good filtering characteristics and high system stability.

Keywords: SAPF, harmonics current, photovoltaic cells, MPPT, artificial neural networks (ANN)

Procedia PDF Downloads 313
24266 Simulation and Hardware Implementation of Data Communication Between CAN Controllers for Automotive Applications

Authors: R. M. Kalayappan, N. Kathiravan

Abstract:

In automobile industries, Controller Area Network (CAN) is widely used to reduce the system complexity and inter-task communication. Therefore, this paper proposes the hardware implementation of data frame communication between one controller to other. The CAN data frames and protocols will be explained deeply, here. The data frames are transferred without any collision or corruption. The simulation is made in the KEIL vision software to display the data transfer between transmitter and receiver in CAN. ARM7 micro-controller is used to transfer data’s between the controllers in real time. Data transfer is verified using the CRO.

Keywords: control area network (CAN), automotive electronic control unit, CAN 2.0, industry

Procedia PDF Downloads 379
24265 Improving the Statistics Nature in Research Information System

Authors: Rajbir Cheema

Abstract:

In order to introduce an integrated research information system, this will provide scientific institutions with the necessary information on research activities and research results in assured quality. Since data collection, duplication, missing values, incorrect formatting, inconsistencies, etc. can arise in the collection of research data in different research information systems, which can have a wide range of negative effects on data quality, the subject of data quality should be treated with better results. This paper examines the data quality problems in research information systems and presents the new techniques that enable organizations to improve their quality of research information.

Keywords: Research information systems (RIS), research information, heterogeneous sources, data quality, data cleansing, science system, standardization

Procedia PDF Downloads 135
24264 Switched Uses of a Bidirectional Microphone as a Microphone and Sensors with High Gain and Wide Frequency Range

Authors: Toru Shionoya, Yosuke Kurihara, Takashi Kaburagi, Kajiro Watanabe

Abstract:

Mass-produced bidirectional microphones have attractive characteristics. They work as a microphone as well as a sensor with high gain over a wide frequency range; they are also highly reliable and economical. We present novel multiple functional uses of the microphones. A mathematical model for explaining the high-pass-filtering characteristics of bidirectional microphones was presented. Based on the model, the characteristics of the microphone were investigated, and a novel use for the microphone as a sensor with a wide frequency range was presented. In this study, applications for using the microphone as a security sensor and a human biosensor were introduced. The mathematical model was validated through experiments, and the feasibility of the abovementioned applications for security monitoring and the biosignal monitoring were examined through experiments.

Keywords: bidirectional microphone, low-frequency, mathematical model, frequency response

Procedia PDF Downloads 518
24263 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research

Authors: Carla Silva

Abstract:

Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.

Keywords: data mining, research analysis, investment decision-making, educational research

Procedia PDF Downloads 334
24262 Distribution of Synechococcus and Prochlorococcus in Southeastern Coast of Peninsular Malaysia

Authors: Roswati Md. Amin, Nurul Asmera Mudiman, Muhammad Faisal Abd. Rahman, Md-Suffian Idris, Noor Hazwani Mohd Azmi

Abstract:

Distribution of picophytoplankton from two genera, Synechococcus and Prochlorococcus at the surface water (0.5m) were observed from coastal to offshore area of the southeastern coast of Peninsular Malaysia, for a six day cruise in August 2014 during SouthWest monsoon. The picophytoplankton was divided into two different size fractions (0.7-2.7μm and <0.7 μm) by filtering through GF/D (2.7 μm) and GF/F (0.7 μm) filter papers and counted by using flow cytometer. Synechococcus and Prochlorococcus contribute higher at 0.7-2.7μm size range (ca. 90% and 95%, respectively) compared to <0.7 μm (ca. 10% and 5%, respectively). Synechococcus (>52%) dominated the total picophytoplankton compared to Prochlorococcus (<26%) for both size fractions in southeastern coast of Peninsular Malaysia. Total density (<2.7 μm) of Synechococcus was ranging between 1.72 x104 and 12.57 x104 cells ml-1, while Prochlorococcus varied from 1.50 x104 to 8.62 x104. Both Synechococcus and Prochlorococcus abundance showed a decreasing trend from coastal to offshore.

Keywords: Peninsular Malaysia, prochlorococcus, South China Sea, synechococcus

Procedia PDF Downloads 291
24261 Permeodynamic Particulate Matter Filtration for Improved Air Quality

Authors: Hamad M. Alnagran, Mohammed S. Imbabi

Abstract:

Particulate matter (PM) in the air we breathe is detrimental to health. Overcoming this problem has attracted interest and prompted research on the use of PM filtration in commercial buildings and homes to be carried out. The consensus is that tangible health benefits can result from the use of PM filters in most urban environments, to clean up the building’s fresh air supply and thereby reduce exposure of residents to airborne PM. The authors have investigated and are developing a new large-scale Permeodynamic Filtration Technology (PFT) capable of permanently filtering and removing airborne PMs from outdoor spaces, thus also benefiting internal spaces such as the interiors of buildings. Theoretical models were developed, and laboratory trials carried out to determine, and validate through measurement permeodynamic filtration efficiency and pressure drop as functions of PM particle size distributions. The conclusion is that PFT offers a potentially viable, cost effective end of pipe solution to the problem of airborne PM.

Keywords: air filtration, particulate matter, particle size distribution, permeodynamic

Procedia PDF Downloads 181
24260 A Method of Detecting the Difference in Two States of Brain Using Statistical Analysis of EEG Raw Data

Authors: Digvijaysingh S. Bana, Kiran R. Trivedi

Abstract:

This paper introduces various methods for the alpha wave to detect the difference between two states of brain. One healthy subject participated in the experiment. EEG was measured on the forehead above the eye (FP1 Position) with reference and ground electrode are on the ear clip. The data samples are obtained in the form of EEG raw data. The time duration of reading is of one minute. Various test are being performed on the alpha band EEG raw data.The readings are performed in different time duration of the entire day. The statistical analysis is being carried out on the EEG sample data in the form of various tests.

Keywords: electroencephalogram(EEG), biometrics, authentication, EEG raw data

Procedia PDF Downloads 447
24259 A Study on Big Data Analytics, Applications and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, Healthcare, and business intelligence contain voluminous and incremental data, which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organization's decision-making strategy can be enhanced using big data analytics and applying different machine learning techniques and statistical tools on such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates on various frameworks in the process of Analysis using different machine-learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 60
24258 A Study on Big Data Analytics, Applications, and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, healthcare, and business intelligence contain voluminous and incremental data which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organisation decision-making strategy can be enhanced by using big data analytics and applying different machine learning techniques and statistical tools to such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates various frameworks in the process of analysis using different machine learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 76
24257 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 409
24256 Framework for Integrating Big Data and Thick Data: Understanding Customers Better

Authors: Nikita Valluri, Vatcharaporn Esichaikul

Abstract:

With the popularity of data-driven decision making on the rise, this study focuses on providing an alternative outlook towards the process of decision-making. Combining quantitative and qualitative methods rooted in the social sciences, an integrated framework is presented with a focus on delivering a much more robust and efficient approach towards the concept of data-driven decision-making with respect to not only Big data but also 'Thick data', a new form of qualitative data. In support of this, an example from the retail sector has been illustrated where the framework is put into action to yield insights and leverage business intelligence. An interpretive approach to analyze findings from both kinds of quantitative and qualitative data has been used to glean insights. Using traditional Point-of-sale data as well as an understanding of customer psychographics and preferences, techniques of data mining along with qualitative methods (such as grounded theory, ethnomethodology, etc.) are applied. This study’s final goal is to establish the framework as a basis for providing a holistic solution encompassing both the Big and Thick aspects of any business need. The proposed framework is a modified enhancement in lieu of traditional data-driven decision-making approach, which is mainly dependent on quantitative data for decision-making.

Keywords: big data, customer behavior, customer experience, data mining, qualitative methods, quantitative methods, thick data

Procedia PDF Downloads 138
24255 Regularized Euler Equations for Incompressible Two-Phase Flow Simulations

Authors: Teng Li, Kamran Mohseni

Abstract:

This paper presents an inviscid regularization technique for the incompressible two-phase flow simulations. This technique is known as observable method due to the understanding of observability that any feature smaller than the actual resolution (physical or numerical), i.e., the size of wire in hotwire anemometry or the grid size in numerical simulations, is not able to be captured or observed. Differ from most regularization techniques that applies on the numerical discretization, the observable method is employed at PDE level during the derivation of equations. Difficulties in the simulation and analysis of realistic fluid flow often result from discontinuities (or near-discontinuities) in the calculated fluid properties or state. Accurately capturing these discontinuities is especially crucial when simulating flows involving shocks, turbulence or sharp interfaces. Over the past several years, the properties of this new regularization technique have been investigated that show the capability of simultaneously regularizing shocks and turbulence. The observable method has been performed on the direct numerical simulations of shocks and turbulence where the discontinuities are successfully regularized and flow features are well captured. In the current paper, the observable method will be extended to two-phase interfacial flows. Multiphase flows share the similar features with shocks and turbulence that is the nonlinear irregularity caused by the nonlinear terms in the governing equations, namely, Euler equations. In the direct numerical simulation of two-phase flows, the interfaces are usually treated as the smooth transition of the properties from one fluid phase to the other. However, in high Reynolds number or low viscosity flows, the nonlinear terms will generate smaller scales which will sharpen the interface, causing discontinuities. Many numerical methods for two-phase flows fail at high Reynolds number case while some others depend on the numerical diffusion from spatial discretization. The observable method regularizes this nonlinear mechanism by filtering the convective terms and this process is inviscid. The filtering effect is controlled by an observable scale which is usually about a grid length. Single rising bubble and Rayleigh-Taylor instability are studied, in particular, to examine the performance of the observable method. A pseudo-spectral method is used for spatial discretization which will not introduce numerical diffusion, and a Total Variation Diminishing (TVD) Runge Kutta method is applied for time integration. The observable incompressible Euler equations are solved for these two problems. In rising bubble problem, the terminal velocity and shape of the bubble are particularly examined and compared with experiments and other numerical results. In the Rayleigh-Taylor instability, the shape of the interface are studied for different observable scale and the spike and bubble velocities, as well as positions (under a proper observable scale), are compared with other simulation results. The results indicate that this regularization technique can potentially regularize the sharp interface in the two-phase flow simulations

Keywords: Euler equations, incompressible flow simulation, inviscid regularization technique, two-phase flow

Procedia PDF Downloads 478
24254 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: text mining, topic extraction, independent, incremental, independent component analysis

Procedia PDF Downloads 287
24253 Open Data for e-Governance: Case Study of Bangladesh

Authors: Sami Kabir, Sadek Hossain Khoka

Abstract:

Open Government Data (OGD) refers to all data produced by government which are accessible in reusable way by common people with access to Internet and at free of cost. In line with “Digital Bangladesh” vision of Bangladesh government, the concept of open data has been gaining momentum in the country. Opening all government data in digital and customizable format from single platform can enhance e-governance which will make government more transparent to the people. This paper presents a well-in-progress case study on OGD portal by Bangladesh Government in order to link decentralized data. The initiative is intended to facilitate e-service towards citizens through this one-stop web portal. The paper further discusses ways of collecting data in digital format from relevant agencies with a view to making it publicly available through this single point of access. Further, possible layout of this web portal is presented.

Keywords: e-governance, one-stop web portal, open government data, reusable data, web of data

Procedia PDF Downloads 330
24252 Comparative Analysis of Two Approaches to Joint Signal Detection, ToA and AoA Estimation in Multi-Element Antenna Arrays

Authors: Olesya Bolkhovskaya, Alexey Davydov, Alexander Maltsev

Abstract:

In this paper two approaches to joint signal detection, time of arrival (ToA) and angle of arrival (AoA) estimation in multi-element antenna array are investigated. Two scenarios were considered: first one, when the waveform of the useful signal is known a priori and, second one, when the waveform of the desired signal is unknown. For first scenario, the antenna array signal processing based on multi-element matched filtering (MF) with the following non-coherent detection scheme and maximum likelihood (ML) parameter estimation blocks is exploited. For second scenario, the signal processing based on the antenna array elements covariance matrix estimation with the following eigenvector analysis and ML parameter estimation blocks is applied. The performance characteristics of both signal processing schemes are thoroughly investigated and compared for different useful signals and noise parameters.

Keywords: antenna array, signal detection, ToA, AoA estimation

Procedia PDF Downloads 472