Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25921

Search results for: wind data

25111 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods

Procedia PDF Downloads 360

25110 Access to Health Data in Medical Records in Indonesia in Terms of Personal Data Protection Principles: The Limitation and Its Implication

Authors: Anny Retnowati, Elisabeth Sundari

Abstract:

This research aims to elaborate the meaning of personal data protection principles on patient access to health data in medical records in Indonesia and its implications. The method uses normative legal research by examining health law in Indonesia regarding the patient's right to access their health data in medical records. The data will be analysed qualitatively using the interpretation method to elaborate on the limitation of the meaning of personal data protection principles on patients' access to their data in medical records. The results show that patients only have the right to obtain copies of their health data in medical records. There is no right to inspect directly at any time. Indonesian health law limits the principle of patients' right to broad access to their health data in medical records. This restriction has implications for the reduction of personal data protection as part of human rights. This research contribute to show that a limitaion of personal data protection may abuse the human rights.

Keywords: access, health data, medical records, personal data, protection

Procedia PDF Downloads 90

25109 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: data management, digitization, industry 4.0, knowledge engineering, metamodel

Procedia PDF Downloads 355

25108 Analysis and Forecasting of Bitcoin Price Using Exogenous Data

Authors: J-C. Leneveu, A. Chereau, L. Mansart, T. Mesbah, M. Wyka

Abstract:

Extracting and interpreting information from Big Data represent a stake for years to come in several sectors such as finance. Currently, numerous methods are used (such as Technical Analysis) to try to understand and to anticipate market behavior, with mixed results because it still seems impossible to exactly predict a financial trend. The increase of available data on Internet and their diversity represent a great opportunity for the financial world. Indeed, it is possible, along with these standard financial data, to focus on exogenous data to take into account more macroeconomic factors. Coupling the interpretation of these data with standard methods could allow obtaining more precise trend predictions. In this paper, in order to observe the influence of exogenous data price independent of other usual effects occurring in classical markets, behaviors of Bitcoin users are introduced in a model reconstituting Bitcoin value, which is elaborated and tested for prediction purposes.

Keywords: big data, bitcoin, data mining, social network, financial trends, exogenous data, global economy, behavioral finance

Procedia PDF Downloads 354

25107 On the Combination of Patient-Generated Data with Data from a Secure Clinical Network Environment: A Practical Example

Authors: Jeroen S. de Bruin, Karin Schindler, Christian Schuh

Abstract:

With increasingly more mobile health applications appearing due to the popularity of smartphones, the possibility arises that these data can be used to improve the medical diagnostic process, as well as the overall quality of healthcare, while at the same time lowering costs. However, as of yet there have been no reports of a successful combination of patient-generated data from smartphones with data from clinical routine. In this paper, we describe how these two types of data can be combined in a secure way without modification to hospital information systems, and how they can together be used in a medical expert system for automatic nutritional classification and triage.

Keywords: mobile health, data integration, expert systems, disease-related malnutrition

Procedia PDF Downloads 476

25106 The Prospects of Leveraging (Big) Data for Accelerating a Just Sustainable Transition around Different Contexts

Authors: Sombol Mokhles

Abstract:

This paper tries to show the prospects of utilising (big)data for enabling just the transition of diverse cities. Our key purpose is to offer a framework of applications and implications of utlising (big) data in comparing sustainability transitions across different cities. Relying on the cosmopolitan comparison, this paper explains the potential application of (big) data but also its limitations. The paper calls for adopting a data-driven and just perspective in including different cities around the world. Having a just and inclusive approach at the front and centre ensures a just transition with synergistic effects that leave nobody behind.

Keywords: big data, just sustainable transition, cosmopolitan city comparison, cities

Procedia PDF Downloads 98

25105 Strategic Workplace Security: The Role of Malware and the Threat of Internal Vulnerability

Authors: Modesta E. Ezema, Christopher C. Ezema, Christian C. Ugwu, Udoka F. Eze, Florence M. Babalola

Abstract:

Some employees knowingly or unknowingly contribute to loss of data and also expose data to threat in the process of getting their jobs done. Many organizations today are faced with the challenges of how to secure their data as cyber criminals constantly devise new ways of attacking the organization’s secret data. However, this paper enlists the latest strategies that must be put in place in order to protect these important data from being attacked in a collaborative work place. It also introduces us to Advanced Persistent Threats (APTs) and how it works. The empirical study was conducted to collect data from the employee in data centers on how data could be protected from malicious codes and cyber criminals and their responses are highly considered to help checkmate the activities of malicious code and cyber criminals in our work places.

Keywords: data, employee, malware, work place

Procedia PDF Downloads 382

25104 Acceptance of Big Data Technologies and Its Influence towards Employee’s Perception on Job Performance

Authors: Jia Yi Yap, Angela S. H. Lee

Abstract:

With the use of big data technologies, organization can get result that they are interested in. Big data technologies simply load all the data that is useful for the organizations and provide organizations a better way of analysing data. The purpose of this research is to get employees’ opinion from films in Malaysia to explore the use of big data technologies in their organization in order to provide how it may affect the perception of the employees on job performance. Therefore, in order to identify will accepting big data technologies in the organization affect the perception of the employee, questionnaire will be distributed to different employee from different Small and medium-sized enterprises (SME) organization listed in Malaysia. The conceptual model proposed will test with other variables in order to see the relationship between variables.

Keywords: big data technologies, employee, job performance, questionnaire

Procedia PDF Downloads 296

25103 Evaluation of the Self-Organizing Map and the Adaptive Neuro-Fuzzy Inference System Machine Learning Techniques for the Estimation of Crop Water Stress Index of Wheat under Varying Application of Irrigation Water Levels for Efficient Irrigation Scheduling

Authors: Aschalew C. Workneh, K. S. Hari Prasad, C. S. P. Ojha

Abstract:

The crop water stress index (CWSI) is a cost-effective, non-destructive, and simple technique for tracking the start of crop water stress. This study investigated the feasibility of CWSI derived from canopy temperature to detect the water status of wheat crops. Artificial intelligence (AI) techniques have become increasingly popular in recent years for determining CWSI. In this study, the performance of two AI techniques, adaptive neuro-fuzzy inference system (ANFIS) and self-organizing maps (SOM), are compared while determining the CWSI of paddy crops. Field experiments were conducted for varying irrigation water applications during two seasons in 2022 and 2023 at the irrigation field laboratory at the Civil Engineering Department, Indian Institute of Technology Roorkee, India. The ANFIS and SOM-simulated CWSI values were compared with the experimentally calculated CWSI (EP-CWSI). Multiple regression analysis was used to determine the upper and lower CWSI baselines. The upper CWSI baseline was found to be a function of crop height and wind speed, while the lower CWSI baseline was a function of crop height, air vapor pressure deficit, and wind speed. The performance of ANFIS and SOM were compared based on mean absolute error (MAE), mean bias error (MBE), root mean squared error (RMSE), index of agreement (d), Nash-Sutcliffe efficiency (NSE), and coefficient of correlation (R²). Both models successfully estimated the CWSI of the paddy crop with higher correlation coefficients and lower statistical errors. However, the ANFIS (R²=0.81, NSE=0.73, d=0.94, RMSE=0.04, MAE= 0.00-1.76 and MBE=-2.13-1.32) outperformed the SOM model (R²=0.77, NSE=0.68, d=0.90, RMSE=0.05, MAE= 0.00-2.13 and MBE=-2.29-1.45). Overall, the results suggest that ANFIS is a reliable tool for accurately determining CWSI in wheat crops compared to SOM.

Keywords: adaptive neuro-fuzzy inference system, canopy temperature, crop water stress index, self-organizing map, wheat

Procedia PDF Downloads 53

25102 Understanding Hydrodynamic in Lake Victoria Basin in a Catchment Scale: A Literature Review

Authors: Seema Paul, John Mango Magero, Prosun Bhattacharya, Zahra Kalantari, Steve W. Lyon

Abstract:

The purpose of this review paper is to develop an understanding of lake hydrodynamics and the potential climate impact on the Lake Victoria (LV) catchment scale. This paper briefly discusses the main problems of lake hydrodynamics and its’ solutions that are related to quality assessment and climate effect. An empirical methodology in modeling and mapping have considered for understanding lake hydrodynamic and visualizing the long-term observational daily, monthly, and yearly mean dataset results by using geographical information system (GIS) and Comsol techniques. Data were obtained for the whole lake and five different meteorological stations, and several geoprocessing tools with spatial analysis are considered to produce results. The linear regression analyses were developed to build climate scenarios and a linear trend on lake rainfall data for a long period. A potential evapotranspiration rate has been described by the MODIS and the Thornthwaite method. The rainfall effect on lake water level observed by Partial Differential Equations (PDE), and water quality has manifested by a few nutrients parameters. The study revealed monthly and yearly rainfall varies with monthly and yearly maximum and minimum temperatures, and the rainfall is high during cool years and the temperature is high associated with below and average rainfall patterns. Rising temperatures are likely to accelerate evapotranspiration rates and more evapotranspiration is likely to lead to more rainfall, drought is more correlated with temperature and cloud is more correlated with rainfall. There is a trend in lake rainfall and long-time rainfall on the lake water surface has affected the lake level. The onshore and offshore have been concentrated by initial literature nutrients data. The study recommended that further studies should consider fully lake bathymetry development with flow analysis and its’ water balance, hydro-meteorological processes, solute transport, wind hydrodynamics, pollution and eutrophication these are crucial for lake water quality, climate impact assessment, and water sustainability.

Keywords: climograph, climate scenarios, evapotranspiration, linear trend flow, rainfall event on LV, concentration

Procedia PDF Downloads 97

25101 Study on Eco-Feedback of Thermal Comfort and Cost Efficiency for Low Energy Residence

Authors: Y. Jin, N. Zhang, X. Luo, W. Zhang

Abstract:

China with annual increasing 0.5-0.6 billion squares city residence has brought in enormous energy consumption by HVAC facilities and other appliances. In this regard, governments and researchers are encouraging renewable energy like solar energy, geothermal energy using in houses. However, high cost of equipment and low energy conversion result in a very low acceptable to residents. So what’s the equilibrium point of eco-feedback to reach economic benefit and thermal comfort? That is the main question should be answered. In this paper, the objective is an on-site solar PV and heater house, which has been evaluated as a low energy building. Since HVAC system is considered as main energy consumption equipment, the residence with 24-hour monitoring system set to measure temperature, wind velocity and energy in-out value with no HVAC system for one month of summer and winter. Thermal comfort time period will be analyzed and confirmed; then the air-conditioner will be started within thermal discomfort time for the following one summer and winter month. The same data will be recorded to calculate the average energy consumption monthly for a purpose of whole day thermal comfort. Finally, two analysis work will be done: 1) Original building thermal simulation by computer at design stage with actual measured temperature after construction will be contrastive analyzed; 2) The cost of renewable energy facilities and power consumption converted to cost efficient rate to assess the feasibility of renewable energy input for residence. The results of the experiment showed that a certain deviation exists between actual measured data and simulated one for human thermal comfort, especially in summer period. Moreover, the cost-effectiveness is high for a house in targeting city Guilin now with at least 11 years of cost-covering. The conclusion proves that an eco-feedback of a low energy residence is never only consideration of its energy net value, but also the cost efficiency that is the critical factor to push renewable energy acceptable by the public.

Keywords: cost efficiency, eco-feedback, low energy residence, thermal comfort

Procedia PDF Downloads 255

25100 Data Poisoning Attacks on Federated Learning and Preventive Measures

Authors: Beulah Rani Inbanathan

Abstract:

In the present era, it is vivid from the numerous outcomes that data privacy is being compromised in various ways. Machine learning is one technology that uses the centralized server, and then data is given as input which is being analyzed by the algorithms present on this mentioned server, and hence outputs are predicted. However, each time the data must be sent by the user as the algorithm will analyze the input data in order to predict the output, which is prone to threats. The solution to overcome this issue is federated learning, where the models alone get updated while the data resides on the local machine and does not get exchanged with the other local models. Nevertheless, even on these local models, there are chances of data poisoning, and it is crystal clear from various experiments done by many people. This paper delves into many ways where data poisoning occurs and the many methods through which it is prevalent that data poisoning still exists. It includes the poisoning attacks on IoT devices, Edge devices, Autoregressive model, and also, on Industrial IoT systems and also, few points on how these could be evadible in order to protect our data which is personal, or sensitive, or harmful when exposed.

Keywords: data poisoning, federated learning, Internet of Things, edge computing

Procedia PDF Downloads 85

25099 Rain Gauges Network Optimization in Southern Peninsular Malaysia

Authors: Mohd Khairul Bazli Mohd Aziz, Fadhilah Yusof, Zulkifli Yusop, Zalina Mohd Daud, Mohammad Afif Kasno

Abstract:

Recent developed rainfall network design techniques have been discussed and compared by many researchers worldwide due to the demand of acquiring higher levels of accuracy from collected data. In many studies, rain-gauge networks are designed to provide good estimation for areal rainfall and for flood modelling and prediction. In a certain study, even using lumped models for flood forecasting, a proper gauge network can significantly improve the results. Therefore existing rainfall network in Johor must be optimized and redesigned in order to meet the required level of accuracy preset by rainfall data users. The well-known geostatistics method (variance-reduction method) that is combined with simulated annealing was used as an algorithm of optimization in this study to obtain the optimal number and locations of the rain gauges. Rain gauge network structure is not only dependent on the station density; station location also plays an important role in determining whether information is acquired accurately. The existing network of 84 rain gauges in Johor is optimized and redesigned by using rainfall, humidity, solar radiation, temperature and wind speed data during monsoon season (November – February) for the period of 1975 – 2008. Three different semivariogram models which are Spherical, Gaussian and Exponential were used and their performances were also compared in this study. Cross validation technique was applied to compute the errors and the result showed that exponential model is the best semivariogram. It was found that the proposed method was satisfied by a network of 64 rain gauges with the minimum estimated variance and 20 of the existing ones were removed and relocated. An existing network may consist of redundant stations that may make little or no contribution to the network performance for providing quality data. Therefore, two different cases were considered in this study. The first case considered the removed stations that were optimally relocated into new locations to investigate their influence in the calculated estimated variance and the second case explored the possibility to relocate all 84 existing stations into new locations to determine the optimal position. The relocations of the stations in both cases have shown that the new optimal locations have managed to reduce the estimated variance and it has proven that locations played an important role in determining the optimal network.

Keywords: geostatistics, simulated annealing, semivariogram, optimization

Procedia PDF Downloads 301

25098 Simulation and Hardware Implementation of Data Communication Between CAN Controllers for Automotive Applications

Authors: R. M. Kalayappan, N. Kathiravan

Abstract:

In automobile industries, Controller Area Network (CAN) is widely used to reduce the system complexity and inter-task communication. Therefore, this paper proposes the hardware implementation of data frame communication between one controller to other. The CAN data frames and protocols will be explained deeply, here. The data frames are transferred without any collision or corruption. The simulation is made in the KEIL vision software to display the data transfer between transmitter and receiver in CAN. ARM7 micro-controller is used to transfer data’s between the controllers in real time. Data transfer is verified using the CRO.

Keywords: control area network (CAN), automotive electronic control unit, CAN 2.0, industry

Procedia PDF Downloads 397

25097 Optimization of Marine Waste Collection Considering Dynamic Transport and Ship’s Wake Impact

Authors: Guillaume Richard, Sarra Zaied

Abstract:

Marine waste quantities increase more and more, 5 million tons of plastic waste enter the ocean every year. Their spatiotemporal distribution is never homogeneous and depends mainly on the hydrodynamic characteristics of the environment, as well as the size and location of the waste. As part of optimizing collect of marine plastic wastes, it is important to measure and monitor their evolution over time. In this context, diverse studies have been dedicated to describing waste behavior in order to identify its accumulation in ocean areas. None of the existing tools which track objects at sea had the objective of tracking down a slick of waste. Moreover, the applications related to marine waste are in the minority compared to rescue applications or oil slicks tracking applications. These approaches are able to accurately simulate an object's behavior over time but not during the collection mission of a waste sheet. This paper presents numerical modeling of a boat’s wake impact on the floating marine waste behavior during a collection mission. The aim is to predict the trajectory of a marine waste slick to optimize its collection using meteorological data of ocean currents, wind, and possibly waves. We have made the choice to use Ocean Parcels which is a Python library suitable for trajectoring particles in the ocean. The modeling results showed the important role of advection and diffusion processes in the spatiotemporal distribution of floating plastic litter. The performance of the proposed method was evaluated on real data collected from the Copernicus Marine Environment Monitoring Service (CMEMS). The results of the evaluation in Cape of Good Hope (South Africa) prove that the proposed approach can effectively predict the position and velocity of marine litter during collection, which allowed for optimizing time and more than $90\%$ of the amount of collected waste.

Keywords: marine litter, advection-diffusion equation, sea current, numerical model

Procedia PDF Downloads 86

25096 Improving the Statistics Nature in Research Information System

Authors: Rajbir Cheema

Abstract:

In order to introduce an integrated research information system, this will provide scientific institutions with the necessary information on research activities and research results in assured quality. Since data collection, duplication, missing values, incorrect formatting, inconsistencies, etc. can arise in the collection of research data in different research information systems, which can have a wide range of negative effects on data quality, the subject of data quality should be treated with better results. This paper examines the data quality problems in research information systems and presents the new techniques that enable organizations to improve their quality of research information.

Keywords: Research information systems (RIS), research information, heterogeneous sources, data quality, data cleansing, science system, standardization

Procedia PDF Downloads 155

25095 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research

Authors: Carla Silva

Abstract:

Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.

Keywords: data mining, research analysis, investment decision-making, educational research

Procedia PDF Downloads 356

25094 A Method of Detecting the Difference in Two States of Brain Using Statistical Analysis of EEG Raw Data

Authors: Digvijaysingh S. Bana, Kiran R. Trivedi

Abstract:

This paper introduces various methods for the alpha wave to detect the difference between two states of brain. One healthy subject participated in the experiment. EEG was measured on the forehead above the eye (FP1 Position) with reference and ground electrode are on the ear clip. The data samples are obtained in the form of EEG raw data. The time duration of reading is of one minute. Various test are being performed on the alpha band EEG raw data.The readings are performed in different time duration of the entire day. The statistical analysis is being carried out on the EEG sample data in the form of various tests.

Keywords: electroencephalogram(EEG), biometrics, authentication, EEG raw data

Procedia PDF Downloads 461

25093 Aerodynamic Interaction between Two Speed Skaters Measured in a Closed Wind Tunnel

Authors: Ola Elfmark, Lars M. Bardal, Luca Oggiano, H˚avard Myklebust

Abstract:

Team pursuit is a relatively new event in international long track speed skating. For a single speed skater the aerodynamic drag will account for up to 80% of the braking force, thus reducing the drag can greatly improve the performance. In a team pursuit the interactions between athletes in near proximity will also be essential, but is not well studied. In this study, systematic measurements of the aerodynamic drag, body posture and relative positioning of speed skaters have been performed in the low speed wind tunnel at the Norwegian University of Science and Technology, in order to investigate the aerodynamic interaction between two speed skaters. Drag measurements of static speed skaters drafting, leading, side-by-side, and dynamic drag measurements in a synchronized and unsynchronized movement at different distances, were performed. The projected frontal area was measured for all postures and movements and a blockage correction was performed, as the blockage ratio ranged from 5-15% in the different setups. The static drag measurements where performed on two test subjects in two different postures, a low posture and a high posture, and two different distances between the test subjects 1.5T and 3T where T being the length of the torso (T=0.63m). A drag reduction was observed for all distances and configurations, from 39% to 11.4%, for the drafting test subject. The drag of the leading test subject was only influenced at -1.5T, with the biggest drag reduction of 5.6%. An increase in drag was seen for all side-by-side measurements, the biggest increase was observed to be 25.7%, at the closest distance between the test subjects, and the lowest at 2.7% with ∼ 0.7 m between the test subjects. A clear aerodynamic interaction between the test subjects and their postures was observed for most measurements during static measurements, with results corresponding well to recent studies. For the dynamic measurements, the leading test subject had a drag reduction of 3% even at -3T. The drafting showed a drag reduction of 15% when being in a synchronized (sync) motion with the leading test subject at 4.5T. The maximal drag reduction for both the leading and the drafting test subject were observed when being as close as possible in sync, with a drag reduction of 8.5% and 25.7% respectively. This study emphasize the importance of keeping a synchronized movement by showing that the maximal gain for the leading and drafting dropped to 3.2% and 3.3% respectively when the skaters are in opposite phase. Individual differences in technique also appear to influence the drag of the other test subject.

Keywords: aerodynamic interaction, drag force, frontal area, speed skating

Procedia PDF Downloads 130

25092 A Study on Big Data Analytics, Applications and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, Healthcare, and business intelligence contain voluminous and incremental data, which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organization's decision-making strategy can be enhanced using big data analytics and applying different machine learning techniques and statistical tools on such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates on various frameworks in the process of Analysis using different machine-learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 81

25091 A Study on Big Data Analytics, Applications, and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, healthcare, and business intelligence contain voluminous and incremental data which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organisation decision-making strategy can be enhanced by using big data analytics and applying different machine learning techniques and statistical tools to such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates various frameworks in the process of analysis using different machine learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 93

25090 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 437

25089 Meteorological Risk Assessment for Ships with Fuzzy Logic Designer

Authors: Ismail Karaca, Ridvan Saracoglu, Omer Soner

Abstract:

Fuzzy Logic, an advanced method to support decision-making, is used by various scientists in many disciplines. Fuzzy programming is a product of fuzzy logic, fuzzy rules, and implication. In marine science, fuzzy programming for ships is dramatically increasing together with autonomous ship studies. In this paper, a program to support the decision-making process for ship navigation has been designed. The program is produced in fuzzy logic and rules, by taking the marine accidents and expert opinions into account. After the program was designed, the program was tested by 46 ship accidents reported by the Transportation Safety Investigation Center of Turkey. Wind speed, sea condition, visibility, day/night ratio have been used as input data. They have been converted into a risk factor within the Fuzzy Logic Designer application and fuzzy rules set by marine experts. Finally, the expert's meteorological risk factor for each accident is compared with the program's risk factor, and the error rate was calculated. The main objective of this study is to improve the navigational safety of ships, by using the advance decision support model. According to the study result, fuzzy programming is a robust model that supports safe navigation.

Keywords: calculation of risk factor, fuzzy logic, fuzzy programming for ship, safety navigation of ships

Procedia PDF Downloads 188

25088 Framework for Integrating Big Data and Thick Data: Understanding Customers Better

Authors: Nikita Valluri, Vatcharaporn Esichaikul

Abstract:

With the popularity of data-driven decision making on the rise, this study focuses on providing an alternative outlook towards the process of decision-making. Combining quantitative and qualitative methods rooted in the social sciences, an integrated framework is presented with a focus on delivering a much more robust and efficient approach towards the concept of data-driven decision-making with respect to not only Big data but also 'Thick data', a new form of qualitative data. In support of this, an example from the retail sector has been illustrated where the framework is put into action to yield insights and leverage business intelligence. An interpretive approach to analyze findings from both kinds of quantitative and qualitative data has been used to glean insights. Using traditional Point-of-sale data as well as an understanding of customer psychographics and preferences, techniques of data mining along with qualitative methods (such as grounded theory, ethnomethodology, etc.) are applied. This study’s final goal is to establish the framework as a basis for providing a holistic solution encompassing both the Big and Thick aspects of any business need. The proposed framework is a modified enhancement in lieu of traditional data-driven decision-making approach, which is mainly dependent on quantitative data for decision-making.

Keywords: big data, customer behavior, customer experience, data mining, qualitative methods, quantitative methods, thick data

Procedia PDF Downloads 161

25087 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: text mining, topic extraction, independent, incremental, independent component analysis

Procedia PDF Downloads 307

25086 Open Data for e-Governance: Case Study of Bangladesh

Authors: Sami Kabir, Sadek Hossain Khoka

Abstract:

Open Government Data (OGD) refers to all data produced by government which are accessible in reusable way by common people with access to Internet and at free of cost. In line with “Digital Bangladesh” vision of Bangladesh government, the concept of open data has been gaining momentum in the country. Opening all government data in digital and customizable format from single platform can enhance e-governance which will make government more transparent to the people. This paper presents a well-in-progress case study on OGD portal by Bangladesh Government in order to link decentralized data. The initiative is intended to facilitate e-service towards citizens through this one-stop web portal. The paper further discusses ways of collecting data in digital format from relevant agencies with a view to making it publicly available through this single point of access. Further, possible layout of this web portal is presented.

Keywords: e-governance, one-stop web portal, open government data, reusable data, web of data

Procedia PDF Downloads 354

25085 Mathematical Modelling of Drying Kinetics of Cantaloupe in a Solar Assisted Dryer

Authors: Melike Sultan Karasu Asnaz, Ayse Ozdogan Dolcek

Abstract:

Crop drying, which aims to reduce the moisture content to a certain level, is a method used to extend the shelf life and prevent it from spoiling. One of the oldest food preservation techniques is open sunor shade drying. Even though this technique is the most affordable of all drying methods, there are some drawbacks such as contamination by insects, environmental pollution, windborne dust, and direct expose to weather conditions such as wind, rain, hail. However, solar dryers that provide a hygienic and controllable environment to preserve food and extend its shelf life have been developed and used to dry agricultural products. Thus, foods can be dried quickly without being affected by weather variables, and quality products can be obtained. This research is mainly devoted to investigating the modelling of drying kinetics of cantaloupe in a forced convection solar dryer. Mathematical models for the drying process should be defined to simulate the drying behavior of the foodstuff, which will greatly contribute to the development of solar dryer designs. Thus, drying experiments were conducted and replicated five times, and various data such as temperature, relative humidity, solar irradiation, drying air speed, and weight were instantly monitored and recorded. Moisture content of sliced and pretreated cantaloupe were converted into moisture ratio and then fitted against drying time for constructing drying curves. Then, 10 quasi-theoretical and empirical drying models were applied to find the best drying curve equation according to the Levenberg-Marquardt nonlinear optimization method. The best fitted mathematical drying model was selected according to the highest coefficient of determination (R²), and the mean square of the deviations (χ^²) and root mean square error (RMSE) criterial. The best fitted model was utilized to simulate a thin layer solar drying of cantaloupe, and the simulation results were compared with the experimental data for validation purposes.

Keywords: solar dryer, mathematical modelling, drying kinetics, cantaloupe drying

Procedia PDF Downloads 125

25084 Airon Project: IoT-Based Agriculture System for the Optimization of Irrigation Water Consumption

Authors: África Vicario, Fernando J. Álvarez, Felipe Parralejo, Fernando Aranda

Abstract:

The irrigation systems of traditional agriculture, such as gravity-fed irrigation, produce a great waste of water because, generally, there is no control over the amount of water supplied in relation to the water needed. The AIRON Project tries to solve this problem by implementing an IoT-based system to sensor the irrigation plots so that the state of the crops and the amount of water used for irrigation can be known remotely. The IoT system consists of a sensor network that measures the humidity of the soil, the weather conditions (temperature, relative humidity, wind and solar radiation) and the irrigation water flow. The communication between this network and a central gateway is conducted by means of long-range wireless communication that depends on the characteristics of the irrigation plot. The main objective of the AIRON project is to deploy an IoT sensor network in two different plots of the irrigation community of Aranjuez in the Spanish region of Madrid. The first plot is 2 km away from the central gateway, so LoRa has been used as the base communication technology. The problem with this plot is the absence of mains electric power, so devices with energy-saving modes have had to be used to maximize the external batteries' use time. An ESP32 SOC board with a LoRa module is employed in this case to gather data from the sensor network and send them to a gateway consisting of a Raspberry Pi with a LoRa hat. The second plot is located 18 km away from the gateway, a range that hampers the use of LoRa technology. In order to establish reliable communication in this case, the long-term evolution (LTE) standard is used, which makes it possible to reach much greater distances by using the cellular network. As mains electric power is available in this plot, a Raspberry Pi has been used instead of the ESP32 board to collect sensor data. All data received from the two plots are stored on a proprietary server located at the irrigation management company's headquarters. The analysis of these data by means of machine learning algorithms that are currently under development should allow a short-term prediction of the irrigation water demand that would significantly reduce the waste of this increasingly valuable natural resource. The major finding of this work is the real possibility of deploying a remote sensing system for irrigated plots by using Commercial-Off-The-Shelf (COTS) devices, easily scalable and adaptable to design requirements such as the distance to the control center or the availability of mains electrical power at the site.

Keywords: internet of things, irrigation water control, LoRa, LTE, smart farming

Procedia PDF Downloads 83

25083 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 162

25082 Data Mining Practices: Practical Studies on the Telecommunication Companies in Jordan

Authors: Dina Ahmad Alkhodary

Abstract:

This study aimed to investigate the practices of Data Mining on the telecommunication companies in Jordan, from the viewpoint of the respondents. In order to achieve the goal of the study, and test the validity of hypotheses, the researcher has designed a questionnaire to collect data from managers and staff members from main department in the researched companies. The results shows improvements stages of the telecommunications companies towered Data Mining.

Keywords: data, mining, development, business

Procedia PDF Downloads 494