Search results for: incomplete data
25005 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 18925004 Seismic Hazard Study and Strong Ground Motion in Southwest Alborz, Iran
Authors: Fereshteh Pourmohammad, Mehdi Zare
Abstract:
The city of Karaj, having a population of 2.2 millions (est. 2022) is located in the South West of Alborz Mountain Belt in Northern Iran. The region is known to be a highly active seismic zone. This study is focused on the geological and seismological analyses within a radius of 200 km from the center of Karaj. There are identified five seismic zones and seven linear seismic sources. The maximum magnitude was calculated for the seismic zones. Scine tghe seismicity catalog is incomplete, we have used a parametric-historic algorithm and the Kijko and Sellevoll (1992) method was used to calculate seismicity parameters, and the return periods and the probability frequency of recurrence of the earthquake magnitude in each zone obtained for 475-years return period. According to the calculations, the highest and lowest earthquake magnitudes of 7.6 and 6.2 were respectively obtained in Zones 1 and 4. This result is a new and extremely important in view point of earthquake risk in a densely population city. The maximum strong horizontal ground motion for the 475-years return period 0.42g and for 2475-year return period 0.70g also the maximum strong vertical ground motion for 475-years return period 0.25g and 2475-years return period 0.44g was calculated using attenuation relationships. These acceleration levels are new, and are obtained to be about 25% higher than presented values in the Iranian building code.Keywords: seismic zones, ground motion, return period, hazard analysis
Procedia PDF Downloads 9725003 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data
Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim
Abstract:
Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.Keywords: activity pattern, data fusion, smart-card, XGboost
Procedia PDF Downloads 24625002 A Mutually Exclusive Task Generation Method Based on Data Augmentation
Authors: Haojie Wang, Xun Li, Rui Yin
Abstract:
In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.Keywords: mutex task generation, data augmentation, meta-learning, text classification.
Procedia PDF Downloads 14325001 Revolutionizing Traditional Farming Using Big Data/Cloud Computing: A Review on Vertical Farming
Authors: Milind Chaudhari, Suhail Balasinor
Abstract:
Due to massive deforestation and an ever-increasing population, the organic content of the soil is depleting at a much faster rate. Due to this, there is a big chance that the entire food production in the world will drop by 40% in the next two decades. Vertical farming can help in aiding food production by leveraging big data and cloud computing to ensure plants are grown naturally by providing the optimum nutrients sunlight by analyzing millions of data points. This paper outlines the most important parameters in vertical farming and how a combination of big data and AI helps in calculating and analyzing these millions of data points. Finally, the paper outlines how different organizations are controlling the indoor environment by leveraging big data in enhancing food quantity and quality.Keywords: big data, IoT, vertical farming, indoor farming
Procedia PDF Downloads 17525000 Fulfillment of Models of Prenatal Care in Adolescents from Mexico and Chile
Authors: Alejandra Sierra, Gloria Valadez, Adriana Dávalos, Mirliana Ramírez
Abstract:
For years, the Pan American Health Organization/World Health Organization and other organizations have made efforts to the improve access and the quality of prenatal care as part of comprehensive programs for maternal and neonatal health, the standards of care have been renewed in order to migrate from a medical perspective to a holistic perspective. However, despite the efforts currently antenatal care models have not been verified by a scientific evaluation in order to determine their effectiveness. The teenage pregnancy is considered as a very important phenomenon since it has been strongly associated with inequalities, poverty and the lack of gender quality; therefore it is important to analyze the antenatal care that’s been given, including not only the clinical intervention but also the activities surrounding the advertising and the health education. In this study, the objective was to describe if the previously established activities (on the prenatal care models) are being performed in the care of pregnant teenagers attending prenatal care in health institutions in two cities in México and Chile during 2013. Methods: Observational and descriptive study, of a transversal cohort. 170 pregnant women (13-19 years) were included in prenatal care in two health institutions (100 women from León-Mexico and 70 from Chile-Coquimbo). Data collection: direct survey, perinatal clinical record card which was used as checklists: WHO antenatal care model WHO-2003, Official Mexican Standard NOM-007-SSA2-1993 and Personalized Service Manual on Reproductive Process- Chile Crece Contigo; for data analysis descriptive statistics were used. The project was approved by the relevant ethics committees. Results: Regarding the fulfillment of interventions focused on physical, gynecological exam, immunizations, monitoring signs and biochemical parameters in both groups was met by more than 84%; the activities of guidance and counseling pregnant teenagers in Leon compliance rates were below 50%, on the other hand, although pregnant women in Coquimbo had a higher percentage of compliance, no one reached 100%. The topics that less was oriented were: family planning, signs and symptoms of complications and labor. Conclusions: Although the coverage of the interventions indicated in the prenatal care models was high, there were still shortcomings in the fulfillment of activities to orientation, education and health promotion. Deficiencies in adherence to prenatal care guidelines could be due to different circumstances such as lack of registration or incomplete filling of medical records, lack of medical supplies or health personnel, absences of people at prenatal check-up appointments, among many others. Therefore, studies are required to evaluate the quality of prenatal care and the effectiveness of existing models, considering the role of the different actors (pregnant women, professionals and health institutions) involved in the functionality and quality of prenatal care models, in order to create strategies to design or improve the application of a complete process of promotion and prevention of maternal and child health as well as sexual and reproductive health in general.Keywords: adolescent health, health systems, maternal health, primary health care
Procedia PDF Downloads 20624999 Data Challenges Facing Implementation of Road Safety Management Systems in Egypt
Authors: A. Anis, W. Bekheet, A. El Hakim
Abstract:
Implementing a Road Safety Management System (SMS) in a crowded developing country such as Egypt is a necessity. Beginning a sustainable SMS requires a comprehensive reliable data system for all information pertinent to road crashes. In this paper, a survey for the available data in Egypt and validating it for using in an SMS in Egypt. The research provides some missing data, and refer to the unavailable data in Egypt, looking forward to the contribution of the scientific society, the authorities, and the public in solving the problem of missing or unreliable crash data. The required data for implementing an SMS in Egypt are divided into three categories; the first is available data such as fatality and injury rates and it is proven in this research that it may be inconsistent and unreliable, the second category of data is not available, but it may be estimated, an example of estimating vehicle cost is available in this research, the third is not available and can be measured case by case such as the functional and geometric properties of a facility. Some inquiries are provided in this research for the scientific society, such as how to improve the links among stakeholders of road safety in order to obtain a consistent, non-biased, and reliable data system.Keywords: road safety management system, road crash, road fatality, road injury
Procedia PDF Downloads 14824998 Big Data-Driven Smart Policing: Big Data-Based Patrol Car Dispatching in Abu Dhabi, UAE
Authors: Oualid Walid Ben Ali
Abstract:
Big Data has become one of the buzzwords today. The recent explosion of digital data has led the organization, either private or public, to a new era towards a more efficient decision making. At some point, business decided to use that concept in order to learn what make their clients tick with phrases like ‘sales funnel’ analysis, ‘actionable insights’, and ‘positive business impact’. So, it stands to reason that Big Data was viewed through green (read: money) colored lenses. Somewhere along the line, however someone realized that collecting and processing data doesn’t have to be for business purpose only, but also could be used for other purposes to assist law enforcement or to improve policing or in road safety. This paper presents briefly, how Big Data have been used in the fields of policing order to improve the decision making process in the daily operation of the police. As example, we present a big-data driven system which is sued to accurately dispatch the patrol cars in a geographic environment. The system is also used to allocate, in real-time, the nearest patrol car to the location of an incident. This system has been implemented and applied in the Emirate of Abu Dhabi in the UAE.Keywords: big data, big data analytics, patrol car allocation, dispatching, GIS, intelligent, Abu Dhabi, police, UAE
Procedia PDF Downloads 49024997 Mining Multicity Urban Data for Sustainable Population Relocation
Authors: Xu Du, Aparna S. Varde
Abstract:
In this research, we propose to conduct diagnostic and predictive analysis about the key factors and consequences of urban population relocation. To achieve this goal, urban simulation models extract the urban development trends as land use change patterns from a variety of data sources. The results are treated as part of urban big data with other information such as population change and economic conditions. Multiple data mining methods are deployed on this data to analyze nonlinear relationships between parameters. The result determines the driving force of population relocation with respect to urban sprawl and urban sustainability and their related parameters. Experiments so far reveal that data mining methods discover useful knowledge from the multicity urban data. This work sets the stage for developing a comprehensive urban simulation model for catering to specific questions by targeted users. It contributes towards achieving sustainability as a whole.Keywords: data mining, environmental modeling, sustainability, urban planning
Procedia PDF Downloads 30824996 Model Order Reduction for Frequency Response and Effect of Order of Method for Matching Condition
Authors: Aref Ghafouri, Mohammad javad Mollakazemi, Farhad Asadi
Abstract:
In this paper, model order reduction method is used for approximation in linear and nonlinearity aspects in some experimental data. This method can be used for obtaining offline reduced model for approximation of experimental data and can produce and follow the data and order of system and also it can match to experimental data in some frequency ratios. In this study, the method is compared in different experimental data and influence of choosing of order of the model reduction for obtaining the best and sufficient matching condition for following the data is investigated in format of imaginary and reality part of the frequency response curve and finally the effect and important parameter of number of order reduction in nonlinear experimental data is explained further.Keywords: frequency response, order of model reduction, frequency matching condition, nonlinear experimental data
Procedia PDF Downloads 40324995 An Empirical Study of the Impacts of Big Data on Firm Performance
Authors: Thuan Nguyen
Abstract:
In the present time, data to a data-driven knowledge-based economy is the same as oil to the industrial age hundreds of years ago. Data is everywhere in vast volumes! Big data analytics is expected to help firms not only efficiently improve performance but also completely transform how they should run their business. However, employing the emergent technology successfully is not easy, and assessing the roles of big data in improving firm performance is even much harder. There was a lack of studies that have examined the impacts of big data analytics on organizational performance. This study aimed to fill the gap. The present study suggested using firms’ intellectual capital as a proxy for big data in evaluating its impact on organizational performance. The present study employed the Value Added Intellectual Coefficient method to measure firm intellectual capital, via its three main components: human capital efficiency, structural capital efficiency, and capital employed efficiency, and then used the structural equation modeling technique to model the data and test the models. The financial fundamental and market data of 100 randomly selected publicly listed firms were collected. The results of the tests showed that only human capital efficiency had a significant positive impact on firm profitability, which highlighted the prominent human role in the impact of big data technology.Keywords: big data, big data analytics, intellectual capital, organizational performance, value added intellectual coefficient
Procedia PDF Downloads 24524994 Automated Test Data Generation For some types of Algorithm
Authors: Hitesh Tahbildar
Abstract:
The cost of test data generation for a program is computationally very high. In general case, no algorithm to generate test data for all types of algorithms has been found. The cost of generating test data for different types of algorithm is different. Till date, people are emphasizing the need to generate test data for different types of programming constructs rather than different types of algorithms. The test data generation methods have been implemented to find heuristics for different types of algorithms. Some algorithms that includes divide and conquer, backtracking, greedy approach, dynamic programming to find the minimum cost of test data generation have been tested. Our experimental results say that some of these types of algorithm can be used as a necessary condition for selecting heuristics and programming constructs are sufficient condition for selecting our heuristics. Finally we recommend the different heuristics for test data generation to be selected for different types of algorithms.Keywords: ongest path, saturation point, lmax, kL, kS
Procedia PDF Downloads 40524993 A Comparative Analysis of Social Stratification in the Participation of Women in Agricultural Activity: A Case Study of District Khushab (Punjab) and D. I. Khan (KPK), Pakistan
Authors: Sohail Ahmad Umer
Abstract:
Since last few decades a question is raising on the subject of the importance of women in different societies of the world particularly in the developing societies of Asia and Africa. Female population constitutes almost 50% of the total population of the world and is playing a significant role in the economy with male population. In Pakistan, a developing country of Asia with majority of Muslim population, working women role is more focused. Women of rural background who are working as voluntary workers and their working hours are neither recorded nor recognized. Agricultural statistics shows that the female participation rate is below 40% while other sources claim them below 20%. Here in present study, another effort has been made to compare the women role in two different provinces of Pakistan to analyze the participation of women in agricultural activities like sowing, picking, irrigating the fields, harvesting and threshing of crops, caring and feeding of the animals, collecting the firewood and etc,as without these activities the farming would be incomplete. One hundred villages in the district Khushab (Punjab) and one hundred villages in district D.I.Khan (KPK) were selected and 33% of the families of each village have been interviewed to study their input in agriculture work. Another important feature is the social stratification therefore the contribution by different variables like the ownership, tenancy, education and caste has also been studied.Keywords: caste, social stratification, tenancy, voluntary workers
Procedia PDF Downloads 37024992 Endometrial Thickness Cut-Off for Evacuation of Retained Product of Conception
Authors: Nambiar Ritu, Ali Ban, Munawar Farida, Israell Imelda, T. Farouk Eman Rasheeda, Jangalgi Renuka, S. Boma Nellie
Abstract:
Aim: To define the ultrasonographic endometrial thickness (USG ET) cutoff for evacuation of retained pieces of conception (ERPC). Background: Studies of conservative management of 1st trimester miscarriage have questioned the need for post miscarriage curettage. Therapeutic decision making with transvaginal scan post miscarriage endometrial thickness in patients clinically thought to be incomplete miscarriage is often not clear. Method: Retrospective analysis of all 1ST trimester ERPC at Al Rahba Hospital from June 2012 to July 2013 was done. Total of 164 patients underwent ERPC. All cases were reviewed for pre-operative USG ET and post ERPC histopathological examination. TVS was done to evaluate the maximum ET of the uterine cavity along the long axis of the uterus and features of retained products was noted. All cases without preoperative USG ET measurement were excluded from the study, therefore only 62 out of 164 cases were included in the study. The patients were divided into three groups: o Group A: have retained products within endometrial cavity. o Group B: endometrial thickness equal or more than 20 mm. o Group C: endometrial thickness equal or less than 19.9 mm. o Post ERPC product was sent for HPE and the results were compared. Transvaginal sonographic findings can be used as a deciding factor in the management of patients with 1st trimester miscarriage who need ERPC. Our proposed cutoff in clinically stable patients requiring ERPC is more than 20 mm.Keywords: ERPC, histopathological examination, long axis of the uterus, USG ET
Procedia PDF Downloads 21624991 The Perspective on Data Collection Instruments for Younger Learners
Authors: Hatice Kübra Koç
Abstract:
For academia, collecting reliable and valid data is one of the most significant issues for researchers. However, it is not the same procedure for all different target groups; meanwhile, during data collection from teenagers, young adults, or adults, researchers can use common data collection tools such as questionnaires, interviews, and semi-structured interviews; yet, for young learners and very young ones, these reliable and valid data collection tools cannot be easily designed or applied by the researchers. In this study, firstly, common data collection tools are examined for ‘very young’ and ‘young learners’ participant groups since it is thought that the quality and efficiency of an academic study is mainly based on its valid and correct data collection and data analysis procedure. Secondly, two different data collection instruments for very young and young learners are stated as discussing the efficacy of them. Finally, a suggested data collection tool – a performance-based questionnaire- which is specifically developed for ‘very young’ and ‘young learners’ participant groups in the field of teaching English to young learners as a foreign language is presented in this current study. The designing procedure and suggested items/factors for the suggested data collection tool are accordingly revealed at the end of the study to help researchers have studied with young and very learners.Keywords: data collection instruments, performance-based questionnaire, young learners, very young learners
Procedia PDF Downloads 9324990 Generating Swarm Satellite Data Using Long Short-Term Memory and Generative Adversarial Networks for the Detection of Seismic Precursors
Authors: Yaxin Bi
Abstract:
Accurate prediction and understanding of the evolution mechanisms of earthquakes remain challenging in the fields of geology, geophysics, and seismology. This study leverages Long Short-Term Memory (LSTM) networks and Generative Adversarial Networks (GANs), a generative model tailored to time-series data, for generating synthetic time series data based on Swarm satellite data, which will be used for detecting seismic anomalies. LSTMs demonstrated commendable predictive performance in generating synthetic data across multiple countries. In contrast, the GAN models struggled to generate synthetic data, often producing non-informative values, although they were able to capture the data distribution of the time series. These findings highlight both the promise and challenges associated with applying deep learning techniques to generate synthetic data, underscoring the potential of deep learning in generating synthetic electromagnetic satellite data.Keywords: LSTM, GAN, earthquake, synthetic data, generative AI, seismic precursors
Procedia PDF Downloads 3224989 Generation of Quasi-Measurement Data for On-Line Process Data Analysis
Authors: Hyun-Woo Cho
Abstract:
For ensuring the safety of a manufacturing process one should quickly identify an assignable cause of a fault in an on-line basis. To this end, many statistical techniques including linear and nonlinear methods have been frequently utilized. However, such methods possessed a major problem of small sample size, which is mostly attributed to the characteristics of empirical models used for reference models. This work presents a new method to overcome the insufficiency of measurement data in the monitoring and diagnosis tasks. Some quasi-measurement data are generated from existing data based on the two indices of similarity and importance. The performance of the method is demonstrated using a real data set. The results turn out that the presented methods are able to handle the insufficiency problem successfully. In addition, it is shown to be quite efficient in terms of computational speed and memory usage, and thus on-line implementation of the method is straightforward for monitoring and diagnosis purposes.Keywords: data analysis, diagnosis, monitoring, process data, quality control
Procedia PDF Downloads 48224988 Reversibility of Photosynthetic Activity and Pigment-protein Complexes Expression During Seed Development of Soybean and Black Soybean
Authors: Tzan-Chain Lee
Abstract:
Seeds are non-leaves green tissues. Photosynthesis begins with light absorption by chlorophyll and then the energy transfer between two pigment-protein complexes (PPC). Most studies of photosynthesis and PPC expression were focused on leaves; however, during seeds’ development were rare. Developed seeds from beginning pod (stage R3) to dried seed (stage R8), and the dried seed after sowing for 1-4 day, were analyzed for their chlorophyll contents. Thornber and MARS gel systems analysis compositions of PPC. Chlorophyll fluorescence was used to detect maximal photosynthetic efficiency (Fv/Fm). During soybean and black soybean seeds development (stages R3-R6), Fv/Fm up to 0.8, and then down-regulated after full seed (stage R7). In dried seed (stage R8), the two plant seeds lost photosynthetic activity (Fv/Fm=0), but chlorophyll degradation only occurred in soybean after full seed. After seeds sowing for 4 days, chlorophyll drastically increased in soybean seeds, and Fv/Fm recovered to 0.8 in the two seeds. In PPC, the two soybean seeds contained all PPC during seeds development (stages R3-R6), including CPI, CPII, A1, AB1, AB2, and AB3. However, many proteins A1, AB1, AB2, and CPI were totally missing in the two dried seeds (stage R8). The deficiency of these proteins in dried seeds might be caused by the incomplete photosynthetic activity. After seeds germination and seedling exposed to light for 4 days, all PPC were recovered, suggesting that completed PPC took place in the two soybean seeds. This study showed the reversibility of photosynthetic activity and pigment-protein complexes during soybean and black soybean seeds development.Keywords: light-harvesting complex, pigment–protein complexes, soybean cotyledon, grana development
Procedia PDF Downloads 14924987 Emerging Technology for Business Intelligence Applications
Authors: Hsien-Tsen Wang
Abstract:
Business Intelligence (BI) has long helped organizations make informed decisions based on data-driven insights and gain competitive advantages in the marketplace. In the past two decades, businesses witnessed not only the dramatically increasing volume and heterogeneity of business data but also the emergence of new technologies, such as Artificial Intelligence (AI), Semantic Web (SW), Cloud Computing, and Big Data. It is plausible that the convergence of these technologies would bring more value out of business data by establishing linked data frameworks and connecting in ways that enable advanced analytics and improved data utilization. In this paper, we first review and summarize current BI applications and methodology. Emerging technologies that can be integrated into BI applications are then discussed. Finally, we conclude with a proposed synergy framework that aims at achieving a more flexible, scalable, and intelligent BI solution.Keywords: business intelligence, artificial intelligence, semantic web, big data, cloud computing
Procedia PDF Downloads 9524986 Using Equipment Telemetry Data for Condition-Based maintenance decisions
Authors: John Q. Todd
Abstract:
Given that modern equipment can provide comprehensive health, status, and error condition data via built-in sensors, maintenance organizations have a new and valuable source of insight to take advantage of. This presentation will expose what these data payloads might look like and how they can be filtered, visualized, calculated into metrics, used for machine learning, and generate alerts for further action.Keywords: condition based maintenance, equipment data, metrics, alerts
Procedia PDF Downloads 18824985 Intensive Use of Software in Teaching and Learning Calculus
Authors: Nodelman V.
Abstract:
Despite serious difficulties in the assimilation of the conceptual system of Calculus, software in the educational process is used only occasionally, and even then, mainly for illustration purposes. The following are a few reasons: The non-trivial nature of the studied material, Lack of skills in working with software, Fear of losing time working with software, The variety of the software itself, the corresponding interface, syntax, and the methods of working with the software, The need to find suitable models, and familiarize yourself with working with them, Incomplete compatibility of the found models with the content and teaching methods of the studied material. This paper proposes an active use of the developed non-commercial software VusuMatica, which allows removing these restrictions through Broad support for the studied mathematical material (and not only Calculus). As a result - no need to select the right software, Emphasizing the unity of mathematics, its intrasubject and interdisciplinary relations, User-friendly interface, Absence of special syntax in defining mathematical objects, Ease of building models of the studied material and manipulating them, Unlimited flexibility of models thanks to the ability to redefine objects, which allows exploring objects characteristics, and considering examples and counterexamples of the concepts under study. The construction of models is based on an original approach to the analysis of the structure of the studied concepts. Thanks to the ease of construction, students are able not only to use ready-made models but also to create them on their own and explore the material studied with their help. The presentation includes examples of using VusuMatica in studying the concepts of limit and continuity of a function, its derivative, and integral.Keywords: counterexamples, limitations and requirements, software, teaching and learning calculus, user-friendly interface and syntax
Procedia PDF Downloads 8124984 Ethics Can Enable Open Source Data Research
Authors: Dragana Calic
Abstract:
The openness, availability and the sheer volume of big data have provided, what some regard as, an invaluable and rich dataset. Researchers, businesses, advertising agencies, medical institutions, to name only a few, collect, share, and analyze this data to enable their processes and decision making. However, there are important ethical considerations associated with the use of big data. The rapidly evolving nature of online technologies has overtaken the many legislative, privacy, and ethical frameworks and principles that exist. For example, should we obtain consent to use people’s online data, and under what circumstances can privacy considerations be overridden? Current guidance on how to appropriately and ethically handle big data is inconsistent. Consequently, this paper focuses on two quite distinct but related ethical considerations that are at the core of the use of big data for research purposes. They include empowering the producers of data and empowering researchers who want to study big data. The first consideration focuses on informed consent which is at the core of empowering producers of data. In this paper, we discuss some of the complexities associated with informed consent and consider studies of producers’ perceptions to inform research ethics guidelines and practice. The second consideration focuses on the researcher. Similarly, we explore studies that focus on researchers’ perceptions and experiences.Keywords: big data, ethics, producers’ perceptions, researchers’ perceptions
Procedia PDF Downloads 28424983 Risk Assessments of Longest Dry Spells Phenomenon in Northern Tunisia
Authors: Majid Mathlouthi, Fethi Lebdi
Abstract:
Throughout the world, the extent and magnitude of droughts have economic, social and environmental consequences. Today climate change has become more and more felt; most likely they increase the frequency and duration of droughts. An analysis by event of dry event, from series of observations of the daily rainfall is carried out. A daily precipitation threshold value has been set. A catchment localized in Northern Tunisia where the average rainfall is about 600 mm has been studied. Rainfall events are defined as an uninterrupted series of rainfall days understanding at least a day having received a precipitation superior or equal to a fixed threshold. The dry events are constituted of a series of dry days framed by two successive rainfall events. A rainfall event is a vector of coordinates the duration, the rainfall depth per event and the duration of the dry event. The depth and duration are found to be correlated. So we use conditional probabilities to analyse the depth per event. The negative binomial distribution fits well the dry event. The duration of the rainfall event follows a geometric distribution. The length of the climatically cycle adjusts to the Incomplete Gamma. Results of this analysis was used to study of the effects of climate change on water resources and crops and to calibrate precipitation models with little rainfall records. In response to long droughts in the basin, the drought management system is based on three phases during each of the three phases; different measurements are applied and executed. The first is before drought, preparedness and early warning; the second is drought management, mitigation in the event of drought; and the last subsequent drought, when the drought is over.Keywords: dry spell, precipitation threshold, climate vulnerability, adaptation measures
Procedia PDF Downloads 8424982 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification
Procedia PDF Downloads 46524981 Seismic Data Scaling: Uncertainties, Potential and Applications in Workstation Interpretation
Authors: Ankur Mundhra, Shubhadeep Chakraborty, Y. R. Singh, Vishal Das
Abstract:
Seismic data scaling affects the dynamic range of a data and with present day lower costs of storage and higher reliability of Hard Disk data, scaling is not suggested. However, in dealing with data of different vintages, which perhaps were processed in 16 bits or even 8 bits and are need to be processed with 32 bit available data, scaling is performed. Also, scaling amplifies low amplitude events in deeper region which disappear due to high amplitude shallow events that saturate amplitude scale. We have focused on significance of scaling data to aid interpretation. This study elucidates a proper seismic loading procedure in workstations without using default preset parameters as available in most software suites. Differences and distribution of amplitude values at different depth for seismic data are probed in this exercise. Proper loading parameters are identified and associated steps are explained that needs to be taken care of while loading data. Finally, the exercise interprets the un-certainties which might arise when correlating scaled and unscaled versions of seismic data with synthetics. As, seismic well tie correlates the seismic reflection events with well markers, for our study it is used to identify regions which are enhanced and/or affected by scaling parameter(s).Keywords: clipping, compression, resolution, seismic scaling
Procedia PDF Downloads 47024980 Selection of Qualitative Research Strategy for Bullying and Harassment in Sport
Authors: J. Vveinhardt, V. B. Fominiene, L. Jeseviciute-Ufartiene
Abstract:
Relevance of Research: Qualitative research is still regarded as highly subjective and not sufficiently scientific in order to achieve objective research results. However, it is agreed that a qualitative study allows revealing the hidden motives of the research participants, creating new theories, and highlighting the field of problem. There is enough research done to reveal these qualitative research aspects. However, each research area has its own specificity, and sport is unique due to the image of its participants, who are understood as strong and invincible. Therefore, a sport participant might have personal issues to recognize himself as a victim in the context of bullying and harassment. Accordingly, researcher has a dilemma in general making to speak a victim in sport. Thus, ethical aspects of qualitative research become relevant. The plenty fields of sport make a problem determining the sample size of research. Thus, the corresponding problem of this research is which and why qualitative research strategies are the most suitable revealing the phenomenon of bullying and harassment in sport. Object of research is qualitative research strategy for bullying and harassment in sport. Purpose of the research is to analyze strategies of qualitative research selecting suitable one for bullying and harassment in sport. Methods of research were scientific research analyses of qualitative research application for bullying and harassment research. Research Results: Four mane strategies are applied in the qualitative research; inductive, deductive, retroductive, and abductive. Inductive and deductive strategies are commonly used researching bullying and harassment in sport. The inductive strategy is applied as quantitative research in order to reveal and describe the prevalence of bullying and harassment in sport. The deductive strategy is used through qualitative methods in order to explain the causes of bullying and harassment and to predict the actions of the participants of bullying and harassment in sport and the possible consequences of these actions. The most commonly used qualitative method for the research of bullying and harassment in sports is semi-structured interviews in speech and in written. However, these methods may restrict the openness of the participants in the study when recording on the dictator or collecting incomplete answers when the participant in the survey responds in writing because it is not possible to refine the answers. Qualitative researches are more prevalent in terms of technology-defined research data. For example, focus group research in a closed forum allows participants freely interact with each other because of the confidentiality of the selected participants in the study. The moderator can purposefully formulate and submit problem-solving questions to the participants. Hence, the application of intelligent technology through in-depth qualitative research can help discover new and specific information on bullying and harassment in sport. Acknowledgement: This research is funded by the European Social Fund according to the activity ‘Improvement of researchers’ qualification by implementing world-class R&D projects of Measure No. 09.3.3-LMT-K-712.Keywords: bullying, focus group, harassment, narrative, sport, qualitative research
Procedia PDF Downloads 18124979 Association of Social Data as a Tool to Support Government Decision Making
Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias
Abstract:
Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.Keywords: social data, government decision making, association of social data, data mining
Procedia PDF Downloads 36924978 A Particle Filter-Based Data Assimilation Method for Discrete Event Simulation
Authors: Zhi Zhu, Boquan Zhang, Tian Jing, Jingjing Li, Tao Wang
Abstract:
Data assimilation is a model and data hybrid-driven method that dynamically fuses new observation data with a numerical model to iteratively approach the real system state. It is widely used in state prediction and parameter inference of continuous systems. Because of the discrete event system’s non-linearity and non-Gaussianity, traditional Kalman Filter based on linear and Gaussian assumptions cannot perform data assimilation for such systems, so particle filter has gradually become a technical approach for discrete event simulation data assimilation. Hence, we proposed a particle filter-based discrete event simulation data assimilation method and took the unmanned aerial vehicle (UAV) maintenance service system as a proof of concept to conduct simulation experiments. The experimental results showed that the filtered state data is closer to the real state of the system, which verifies the effectiveness of the proposed method. This research can provide a reference framework for the data assimilation process of other complex nonlinear systems, such as discrete-time and agent simulation.Keywords: discrete event simulation, data assimilation, particle filter, model and data-driven
Procedia PDF Downloads 1424977 Outlier Detection in Stock Market Data using Tukey Method and Wavelet Transform
Authors: Sadam Alwadi
Abstract:
Outlier values become a problem that frequently occurs in the data observation or recording process. Thus, the need for data imputation has become an essential matter. In this work, it will make use of the methods described in the prior work to detect the outlier values based on a collection of stock market data. In order to implement the detection and find some solutions that maybe helpful for investors, real closed price data were obtained from the Amman Stock Exchange (ASE). Tukey and Maximum Overlapping Discrete Wavelet Transform (MODWT) methods will be used to impute the detect the outlier values.Keywords: outlier values, imputation, stock market data, detecting, estimation
Procedia PDF Downloads 8124976 PEINS: A Generic Compression Scheme Using Probabilistic Encoding and Irrational Number Storage
Authors: P. Jayashree, S. Rajkumar
Abstract:
With social networks and smart devices generating a multitude of data, effective data management is the need of the hour for networks and cloud applications. Some applications need effective storage while some other applications need effective communication over networks and data reduction comes as a handy solution to meet out both requirements. Most of the data compression techniques are based on data statistics and may result in either lossy or lossless data reductions. Though lossy reductions produce better compression ratios compared to lossless methods, many applications require data accuracy and miniature details to be preserved. A variety of data compression algorithms does exist in the literature for different forms of data like text, image, and multimedia data. In the proposed work, a generic progressive compression algorithm, based on probabilistic encoding, called PEINS is projected as an enhancement over irrational number stored coding technique to cater to storage issues of increasing data volumes as a cost effective solution, which also offers data security as a secondary outcome to some extent. The proposed work reveals cost effectiveness in terms of better compression ratio with no deterioration in compression time.Keywords: compression ratio, generic compression, irrational number storage, probabilistic encoding
Procedia PDF Downloads 294