Search results for: data mining analytics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25011

Search results for: data mining analytics

23631 Sampling Error and Its Implication for Capture Fisheries Management in Ghana

Authors: Temiloluwa J. Akinyemi, Denis W. Aheto, Wisdom Akpalu

Abstract:

Capture fisheries in developing countries provide significant animal protein and directly supports the livelihoods of several communities. However, the misperception of biophysical dynamics owing to a lack of adequate scientific data has contributed to the suboptimal management in marine capture fisheries. This is because yield and catch potentials are sensitive to the quality of catch and effort data. Yet, studies on fisheries data collection practices in developing countries are hard to find. This study investigates the data collection methods utilized by fisheries technical officers within the four fishing regions of Ghana. We found that the officers employed data collection and sampling procedures which were not consistent with the technical guidelines curated by FAO. For example, 50 instead of 166 landing sites were sampled, while 290 instead of 372 canoes were sampled. We argue that such sampling errors could result in the over-capitalization of capture fish stocks and significant losses in resource rents.

Keywords: Fisheries data quality, fisheries management, Ghana, Sustainable Fisheries

Procedia PDF Downloads 77
23630 Improvement of Data Transfer over Simple Object Access Protocol (SOAP)

Authors: Khaled Ahmed Kadouh, Kamal Ali Albashiri

Abstract:

This paper presents a designed algorithm involves improvement of transferring data over Simple Object Access Protocol (SOAP). The aim of this work is to establish whether using SOAP in exchanging XML messages has any added advantages or not. The results showed that XML messages without SOAP take longer time and consume more memory, especially with binary data.

Keywords: JAX-WS, SMTP, SOAP, web service, XML

Procedia PDF Downloads 484
23629 Enhancing Healthcare Data Protection and Security

Authors: Joseph Udofia, Isaac Olufadewa

Abstract:

Everyday, the size of Electronic Health Records data keeps increasing as new patients visit health practitioner and returning patients fulfil their appointments. As these data grow, so is their susceptibility to cyber-attacks from criminals waiting to exploit this data. In the US, the damages for cyberattacks were estimated at $8 billion (2018), $11.5 billion (2019) and $20 billion (2021). These attacks usually involve the exposure of PII. Health data is considered PII, and its exposure carry significant impact. To this end, an enhancement of Health Policy and Standards in relation to data security, especially among patients and their clinical providers, is critical to ensure ethical practices, confidentiality, and trust in the healthcare system. As Clinical accelerators and applications that contain user data are used, it is expedient to have a review and revamp of policies like the Payment Card Industry Data Security Standard (PCI DSS), the Health Insurance Portability and Accountability Act (HIPAA), the Fast Healthcare Interoperability Resources (FHIR), all aimed to ensure data protection and security in healthcare. FHIR caters for healthcare data interoperability, FHIR caters to healthcare data interoperability, as data is being shared across different systems from customers to health insurance and care providers. The astronomical cost of implementation has deterred players in the space from ensuring compliance, leading to susceptibility to data exfiltration and data loss on the security accuracy of protected health information (PHI). Though HIPAA hones in on the security accuracy of protected health information (PHI) and PCI DSS on the security of payment card data, they intersect with the shared goal of protecting sensitive information in line with industry standards. With advancements in tech and the emergence of new technology, it is necessary to revamp these policies to address the complexity and ambiguity, cost barrier, and ever-increasing threats in cyberspace. Healthcare data in the wrong hands is a recipe for disaster, and we must enhance its protection and security to protect the mental health of the current and future generations.

Keywords: cloud security, healthcare, cybersecurity, policy and standard

Procedia PDF Downloads 71
23628 Channels Splitting Strategy for Optical Local Area Networks of Passive Star Topology

Authors: Peristera Baziana

Abstract:

In this paper, we present a network configuration for a WDM LANs of passive star topology that assume that the set of data WDM channels is split into two separate sets of channels, with different access rights over them. Especially, a synchronous transmission WDMA access algorithm is adopted in order to increase the probability of successful transmission over the data channels and consequently to reduce the probability of data packets transmission cancellation in order to avoid the data channels collisions. Thus, a control pre-transmission access scheme is followed over a separate control channel. An analytical Markovian model is studied and the average throughput is mathematically derived. The performance is studied for several numbers of data channels and various values of control phase duration.

Keywords: access algorithm, channels division, collisions avoidance, wavelength division multiplexing

Procedia PDF Downloads 280
23627 Improving Security in Healthcare Applications Using Federated Learning System With Blockchain Technology

Authors: Aofan Liu, Qianqian Tan, Burra Venkata Durga Kumar

Abstract:

Data security is of the utmost importance in the healthcare area, as sensitive patient information is constantly sent around and analyzed by many different parties. The use of federated learning, which enables data to be evaluated locally on devices rather than being transferred to a central server, has emerged as a potential solution for protecting the privacy of user information. To protect against data breaches and unauthorized access, federated learning alone might not be adequate. In this context, the application of blockchain technology could provide the system extra protection. This study proposes a distributed federated learning system that is built on blockchain technology in order to enhance security in healthcare. This makes it possible for a wide variety of healthcare providers to work together on data analysis without raising concerns about the confidentiality of the data. The technical aspects of the system, including as the design and implementation of distributed learning algorithms, consensus mechanisms, and smart contracts, are also investigated as part of this process. The technique that was offered is a workable alternative that addresses concerns about the safety of healthcare while also fostering collaborative research and the interchange of data.

Keywords: data privacy, distributed system, federated learning, machine learning

Procedia PDF Downloads 105
23626 Frequent Pattern Mining for Digenic Human Traits

Authors: Atsuko Okazaki, Jurg Ott

Abstract:

Some genetic diseases (‘digenic traits’) are due to the interaction between two DNA variants. For example, certain forms of Retinitis Pigmentosa (a genetic form of blindness) occur in the presence of two mutant variants, one in the ROM1 gene and one in the RDS gene, while the occurrence of only one of these mutant variants leads to a completely normal phenotype. Detecting such digenic traits by genetic methods is difficult. A common approach to finding disease-causing variants is to compare 100,000s of variants between individuals with a trait (cases) and those without the trait (controls). Such genome-wide association studies (GWASs) have been very successful but hinge on genetic effects of single variants, that is, there should be a difference in allele or genotype frequencies between cases and controls at a disease-causing variant. Frequent pattern mining (FPM) methods offer an avenue at detecting digenic traits even in the absence of single-variant effects. The idea is to enumerate pairs of genotypes (genotype patterns) with each of the two genotypes originating from different variants that may be located at very different genomic positions. What is needed is for genotype patterns to be significantly more common in cases than in controls. Let Y = 2 refer to cases and Y = 1 to controls, with X denoting a specific genotype pattern. We are seeking association rules, ‘X → Y’, with high confidence, P(Y = 2|X), significantly higher than the proportion of cases, P(Y = 2) in the study. Clearly, generally available FPM methods are very suitable for detecting disease-associated genotype patterns. We use fpgrowth as the basic FPM algorithm and built a framework around it to enumerate high-frequency digenic genotype patterns and to evaluate their statistical significance by permutation analysis. Application to a published dataset on opioid dependence furnished results that could not be found with classical GWAS methodology. There were 143 cases and 153 healthy controls, each genotyped for 82 variants in eight genes of the opioid system. The aim was to find out whether any of these variants were disease-associated. The single-variant analysis did not lead to significant results. Application of our FPM implementation resulted in one significant (p < 0.01) genotype pattern with both genotypes in the pattern being heterozygous and originating from two variants on different chromosomes. This pattern occurred in 14 cases and none of the controls. Thus, the pattern seems quite specific to this form of substance abuse and is also rather predictive of disease. An algorithm called Multifactor Dimension Reduction (MDR) was developed some 20 years ago and has been in use in human genetics ever since. This and our algorithms share some similar properties, but they are also very different in other respects. The main difference seems to be that our algorithm focuses on patterns of genotypes while the main object of inference in MDR is the 3 × 3 table of genotypes at two variants.

Keywords: digenic traits, DNA variants, epistasis, statistical genetics

Procedia PDF Downloads 109
23625 Speed-Up Data Transmission by Using Bluetooth Module on Gas Sensor Node of Arduino Board

Authors: Hiesik Kim, YongBeum Kim

Abstract:

Internet of Things (IoT) applications are widely serviced and spread worldwide. Local wireless data transmission technique must be developed to speed up with some technique. Bluetooth wireless data communication is wireless technique is technique made by Special Inter Group(SIG) using the frequency range 2.4 GHz, and it is exploiting Frequency Hopping to avoid collision with different device. To implement experiment, equipment for experiment transmitting measured data is made by using Arduino as Open source hardware, Gas sensor, and Bluetooth Module and algorithm controlling transmission speed is demonstrated. Experiment controlling transmission speed also is progressed by developing Android Application receiving measured data, and controlling this speed is available at the experiment result. it is important that in the future, improvement for communication algorithm be needed because few error occurs when data is transferred or received.

Keywords: Arduino, Bluetooth, gas sensor, internet of things, transmission Speed

Procedia PDF Downloads 472
23624 Exploring the Capabilities of Sentinel-1A and Sentinel-2A Data for Landslide Mapping

Authors: Ismayanti Magfirah, Sartohadi Junun, Samodra Guruh

Abstract:

Landslides are one of the most frequent and devastating natural disasters in Indonesia. Many studies have been conducted regarding this phenomenon. However, there is a lack of attention in the landslide inventory mapping. The natural condition (dense forest area) and the limited human and economic resources are some of the major problems in building landslide inventory in Indonesia. Considering the importance of landslide inventory data in susceptibility, hazard, and risk analysis, it is essential to generate landslide inventory based on available resources. In order to achieve this, the first thing we have to do is identify the landslides' location. The presence of Sentinel-1A and Sentinel-2A data gives new insights into land monitoring investigation. The free access, high spatial resolution, and short revisit time, make the data become one of the most trending open sources data used in landslide mapping. Sentinel-1A and Sentinel-2A data have been used broadly for landslide detection and landuse/landcover mapping. This study aims to generate landslide map by integrating Sentinel-1A and Sentinel-2A data use change detection method. The result will be validated by field investigation to make preliminary landslide inventory in the study area.

Keywords: change detection method, landslide inventory mapping, Sentinel-1A, Sentinel-2A

Procedia PDF Downloads 155
23623 Enhancing the Pricing Expertise of an Online Distribution Channel

Authors: Luis N. Pereira, Marco P. Carrasco

Abstract:

Dynamic pricing is a revenue management strategy in which hotel suppliers define, over time, flexible and different prices for their services for different potential customers, considering the profile of e-consumers and the demand and market supply. This means that the fundamentals of dynamic pricing are based on economic theory (price elasticity of demand) and market segmentation. This study aims to define a dynamic pricing strategy and a contextualized offer to the e-consumers profile in order to improve the number of reservations of an online distribution channel. Segmentation methods (hierarchical and non-hierarchical) were used to identify and validate an optimal number of market segments. A profile of the market segments was studied, considering the characteristics of the e-consumers and the probability of reservation a room. In addition, the price elasticity of demand was estimated for each segment using econometric models. Finally, predictive models were used to define rules for classifying new e-consumers into pre-defined segments. The empirical study illustrates how it is possible to improve the intelligence of an online distribution channel system through an optimal dynamic pricing strategy and a contextualized offer to the profile of each new e-consumer. A database of 11 million e-consumers of an online distribution channel was used in this study. The results suggest that an appropriate policy of market segmentation in using of online reservation systems is benefit for the service suppliers because it brings high probability of reservation and generates more profit than fixed pricing.

Keywords: dynamic pricing, e-consumers segmentation, online reservation systems, predictive analytics

Procedia PDF Downloads 221
23622 A DEA Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most DEA models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp DEA into DEA with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the DEA model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units’ efficiency. Finally, the developed DEA model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, DEA, fuzzy, decision making units, higher education institutions

Procedia PDF Downloads 31
23621 Author Profiling: Prediction of Learners’ Gender on a MOOC Platform Based on Learners’ Comments

Authors: Tahani Aljohani, Jialin Yu, Alexandra. I. Cristea

Abstract:

The more an educational system knows about a learner, the more personalised interaction it can provide, which leads to better learning. However, asking a learner directly is potentially disruptive, and often ignored by learners. Especially in the booming realm of MOOC Massive Online Learning platforms, only a very low percentage of users disclose demographic information about themselves. Thus, in this paper, we aim to predict learners’ demographic characteristics, by proposing an approach using linguistically motivated Deep Learning Architectures for Learner Profiling, particularly targeting gender prediction on a FutureLearn MOOC platform. Additionally, we tackle here the difficult problem of predicting the gender of learners based on their comments only – which are often available across MOOCs. The most common current approaches to text classification use the Long Short-Term Memory (LSTM) model, considering sentences as sequences. However, human language also has structures. In this research, rather than considering sentences as plain sequences, we hypothesise that higher semantic - and syntactic level sentence processing based on linguistics will render a richer representation. We thus evaluate, the traditional LSTM versus other bleeding edge models, which take into account syntactic structure, such as tree-structured LSTM, Stack-augmented Parser-Interpreter Neural Network (SPINN) and the Structure-Aware Tag Augmented model (SATA). Additionally, we explore using different word-level encoding functions. We have implemented these methods on Our MOOC dataset, which is the most performant one comparing with a public dataset on sentiment analysis that is further used as a cross-examining for the models' results.

Keywords: deep learning, data mining, gender predication, MOOCs

Procedia PDF Downloads 127
23620 Data-Driven Decision Making: Justification of Not Leaving Class without It

Authors: Denise Hexom, Judith Menoher

Abstract:

Teachers and administrators across America are being asked to use data and hard evidence to inform practice as they begin the task of implementing Common Core State Standards. Yet, the courses they are taking in schools of education are not preparing teachers or principals to understand the data-driven decision making (DDDM) process nor to utilize data in a much more sophisticated fashion. DDDM has been around for quite some time, however, it has only recently become systematically and consistently applied in the field of education. This paper discusses the theoretical framework of DDDM; empirical evidence supporting the effectiveness of DDDM; a process a department in a school of education has utilized to implement DDDM; and recommendations to other schools of education who attempt to implement DDDM in their decision-making processes and in their students’ coursework.

Keywords: data-driven decision making, institute of higher education, special education, continuous improvement

Procedia PDF Downloads 373
23619 Quantile Coherence Analysis: Application to Precipitation Data

Authors: Yaeji Lim, Hee-Seok Oh

Abstract:

The coherence analysis measures the linear time-invariant relationship between two data sets and has been studied various fields such as signal processing, engineering, and medical science. However classical coherence analysis tends to be sensitive to outliers and focuses only on mean relationship. In this paper, we generalized cross periodogram to quantile cross periodogram and provide richer inter-relationship between two data sets. This is a general version of Laplace cross periodogram. We prove its asymptotic distribution under the long range process and compare them with ordinary coherence through numerical examples. We also present real data example to confirm the usefulness of quantile coherence analysis.

Keywords: coherence, cross periodogram, spectrum, quantile

Procedia PDF Downloads 377
23618 Conception of a Predictive Maintenance System for Forest Harvesters from Multiple Data Sources

Authors: Lazlo Fauth, Andreas Ligocki

Abstract:

For cost-effective use of harvesters, expensive repairs and unplanned downtimes must be reduced as far as possible. The predictive detection of failing systems and the calculation of intelligent service intervals, necessary to avoid these factors, require in-depth knowledge of the machines' behavior. Such know-how needs permanent monitoring of the machine state from different technical perspectives. In this paper, three approaches will be presented as they are currently pursued in the publicly funded project PreForst at Ostfalia University of Applied Sciences. These include the intelligent linking of workshop and service data, sensors on the harvester, and a special online hydraulic oil condition monitoring system. Furthermore the paper shows potentials as well as challenges for the use of these data in the conception of a predictive maintenance system.

Keywords: predictive maintenance, condition monitoring, forest harvesting, forest engineering, oil data, hydraulic data

Procedia PDF Downloads 125
23617 Exploring Environmental, Social, and Governance (ESG) Standards for Space Exploration

Authors: Rachael Sullivan, Joshua Berman

Abstract:

The number of satellites orbiting earth are in the thousands now. Commercial launches are increasing, and civilians are venturing into the outer reaches of the atmosphere. As the space industry continues to grow and evolve, so too will the demand on resources, the disparities amongst socio-economic groups, and space company governance standards. Outside of just ensuring that space operations are compliant with government regulations, export controls, and international sanctions, companies should also keep in mind the impact their operations will have on society and the environment. Those looking to expand their operations into outer space should remain mindful of both the opportunities and challenges that they could encounter along the way. From commercial launches promoting civilian space travel—like the recent launches from Blue Origin, Virgin Galactic, and Space X—to regulatory and policy shifts, the commercial landscape beyond the Earth's atmosphere is evolving. But practices will also have to become sustainable. Through a review and analysis of space industry trends, international government regulations, and empirical data, this research explores how Environmental, Social, and Governance (ESG) reporting and investing will manifest within a fast-changing space industry.Institutions, regulators, investors, and employees are increasingly relying on ESG. Those working in the space industry will be no exception. Companies (or investors) that are already engaging or plan to engage in space operations should consider 1) environmental standards and objectives when tackling space debris and space mining, 2) social standards and objectives when considering how such practices may impact access and opportunities for different socioeconomic groups to the benefits of space exploration, and 3) how decision-making and governing boards will function ethically, equitably, and sustainably as we chart new paths and encounter novel challenges in outer space.

Keywords: climate, environment, ESG, law, outer space, regulation

Procedia PDF Downloads 135
23616 Sampled-Data Control for Fuel Cell Systems

Authors: H. Y. Jung, Ju H. Park, S. M. Lee

Abstract:

A sampled-data controller is presented for solid oxide fuel cell systems which is expressed by a sector bounded nonlinear model. The sector bounded nonlinear systems, which have a feedback connection with a linear dynamical system and nonlinearity satisfying certain sector type constraints. Also, the sampled-data control scheme is very useful since it is possible to handle digital controller and increasing research efforts have been devoted to sampled-data control systems with the development of modern high-speed computers. The proposed control law is obtained by solving a convex problem satisfying several linear matrix inequalities. Simulation results are given to show the effectiveness of the proposed design method.

Keywords: sampled-data control, fuel cell, linear matrix inequalities, nonlinear control

Procedia PDF Downloads 557
23615 How Western Donors Allocate Official Development Assistance: New Evidence From a Natural Language Processing Approach

Authors: Daniel Benson, Yundan Gong, Hannah Kirk

Abstract:

Advancement in national language processing techniques has led to increased data processing speeds, and reduced the need for cumbersome, manual data processing that is often required when processing data from multilateral organizations for specific purposes. As such, using named entity recognition (NER) modeling and the Organisation of Economically Developed Countries (OECD) Creditor Reporting System database, we present the first geotagged dataset of OECD donor Official Development Assistance (ODA) projects on a global, subnational basis. Our resulting data contains 52,086 ODA projects geocoded to subnational locations across 115 countries, worth a combined $87.9bn. This represents the first global, OECD donor ODA project database with geocoded projects. We use this new data to revisit old questions of how ‘well’ donors allocate ODA to the developing world. This understanding is imperative for policymakers seeking to improve ODA effectiveness.

Keywords: international aid, geocoding, subnational data, natural language processing, machine learning

Procedia PDF Downloads 56
23614 Compressed Suffix Arrays to Self-Indexes Based on Partitioned Elias-Fano

Authors: Guo Wenyu, Qu Youli

Abstract:

A practical and simple self-indexing data structure, Partitioned Elias-Fano (PEF) - Compressed Suffix Arrays (CSA), is built in linear time for the CSA based on PEF indexes. Moreover, the PEF-CSA is compared with two classical compressed indexing methods, Ferragina and Manzini implementation (FMI) and Sad-CSA on different type and size files in Pizza & Chili. The PEF-CSA performs better on the existing data in terms of the compression ratio, count, and locates time except for the evenly distributed data such as proteins data. The observations of the experiments are that the distribution of the φ is more important than the alphabet size on the compression ratio. Unevenly distributed data φ makes better compression effect, and the larger the size of the hit counts, the longer the count and locate time.

Keywords: compressed suffix array, self-indexing, partitioned Elias-Fano, PEF-CSA

Procedia PDF Downloads 238
23613 Data, Digital Identity and Antitrust Law: An Exploratory Study of Facebook’s Novi Digital Wallet

Authors: Wanjiku Karanja

Abstract:

Facebook has monopoly power in the social networking market. It has grown and entrenched its monopoly power through the capture of its users’ data value chains. However, antitrust law’s consumer welfare roots have prevented it from effectively addressing the role of data capture in Facebook’s market dominance. These regulatory blind spots are augmented in Facebook’s proposed Diem cryptocurrency project and its Novi Digital wallet. Novi, which is Diem’s digital identity component, shall enable Facebook to collect an unprecedented volume of consumer data. Consequently, Novi has seismic implications on internet identity as the network effects of Facebook’s large user base could establish it as the de facto internet identity layer. Moreover, the large tracts of data Facebook shall collect through Novi shall further entrench Facebook's market power. As such, the attendant lock-in effects of this project shall be very difficult to reverse. Urgent regulatory action is therefore required to prevent this expansion of Facebook’s data resources and monopoly power. This research thus highlights the importance of data capture to competition and market health in the social networking industry. It utilizes interviews with key experts to empirically interrogate the impact of Facebook’s data capture and control of its users’ data value chains on its market power. This inquiry is contextualized against Novi’s expansive effect on Facebook’s data value chains. It thus addresses the novel antitrust issues arising at the nexus of Facebook’s monopoly power and the privacy of its users’ data. It also explores the impact of platform design principles, specifically data portability and data portability, in mitigating Facebook’s anti-competitive practices. As such, this study finds that Facebook is a powerful monopoly that dominates the social media industry to the detriment of potential competitors. Facebook derives its power from its size, annexure of the consumer data value chain, and control of its users’ social graphs. Additionally, the platform design principles of data interoperability and data portability are not a panacea to restoring competition in the social networking market. Their success depends on the establishment of robust technical standards and regulatory frameworks.

Keywords: antitrust law, data protection law, data portability, data interoperability, digital identity, Facebook

Procedia PDF Downloads 112
23612 Recommendations for Data Quality Filtering of Opportunistic Species Occurrence Data

Authors: Camille Van Eupen, Dirk Maes, Marc Herremans, Kristijn R. R. Swinnen, Ben Somers, Stijn Luca

Abstract:

In ecology, species distribution models are commonly implemented to study species-environment relationships. These models increasingly rely on opportunistic citizen science data when high-quality species records collected through standardized recording protocols are unavailable. While these opportunistic data are abundant, uncertainty is usually high, e.g., due to observer effects or a lack of metadata. Data quality filtering is often used to reduce these types of uncertainty in an attempt to increase the value of studies relying on opportunistic data. However, filtering should not be performed blindly. In this study, recommendations are built for data quality filtering of opportunistic species occurrence data that are used as input for species distribution models. Using an extensive database of 5.7 million citizen science records from 255 species in Flanders, the impact on model performance was quantified by applying three data quality filters, and these results were linked to species traits. More specifically, presence records were filtered based on record attributes that provide information on the observation process or post-entry data validation, and changes in the area under the receiver operating characteristic (AUC), sensitivity, and specificity were analyzed using the Maxent algorithm with and without filtering. Controlling for sample size enabled us to study the combined impact of data quality filtering, i.e., the simultaneous impact of an increase in data quality and a decrease in sample size. Further, the variation among species in their response to data quality filtering was explored by clustering species based on four traits often related to data quality: commonness, popularity, difficulty, and body size. Findings show that model performance is affected by i) the quality of the filtered data, ii) the proportional reduction in sample size caused by filtering and the remaining absolute sample size, and iii) a species ‘quality profile’, resulting from a species classification based on the four traits related to data quality. The findings resulted in recommendations on when and how to filter volunteer generated and opportunistically collected data. This study confirms that correctly processed citizen science data can make a valuable contribution to ecological research and species conservation.

Keywords: citizen science, data quality filtering, species distribution models, trait profiles

Procedia PDF Downloads 184
23611 Data Quality Enhancement with String Length Distribution

Authors: Qi Xiu, Hiromu Hota, Yohsuke Ishii, Takuya Oda

Abstract:

Recently, collectable manufacturing data are rapidly increasing. On the other hand, mega recall is getting serious as a social problem. Under such circumstances, there are increasing needs for preventing mega recalls by defect analysis such as root cause analysis and abnormal detection utilizing manufacturing data. However, the time to classify strings in manufacturing data by traditional method is too long to meet requirement of quick defect analysis. Therefore, we present String Length Distribution Classification method (SLDC) to correctly classify strings in a short time. This method learns character features, especially string length distribution from Product ID, Machine ID in BOM and asset list. By applying the proposal to strings in actual manufacturing data, we verified that the classification time of strings can be reduced by 80%. As a result, it can be estimated that the requirement of quick defect analysis can be fulfilled.

Keywords: string classification, data quality, feature selection, probability distribution, string length

Procedia PDF Downloads 308
23610 Temporally Coherent 3D Animation Reconstruction from RGB-D Video Data

Authors: Salam Khalifa, Naveed Ahmed

Abstract:

We present a new method to reconstruct a temporally coherent 3D animation from single or multi-view RGB-D video data using unbiased feature point sampling. Given RGB-D video data, in form of a 3D point cloud sequence, our method first extracts feature points using both color and depth information. In the subsequent steps, these feature points are used to match two 3D point clouds in consecutive frames independent of their resolution. Our new motion vectors based dynamic alignment method then fully reconstruct a spatio-temporally coherent 3D animation. We perform extensive quantitative validation using novel error functions to analyze the results. We show that despite the limiting factors of temporal and spatial noise associated to RGB-D data, it is possible to extract temporal coherence to faithfully reconstruct a temporally coherent 3D animation from RGB-D video data.

Keywords: 3D video, 3D animation, RGB-D video, temporally coherent 3D animation

Procedia PDF Downloads 358
23609 Determining Abnomal Behaviors in UAV Robots for Trajectory Control in Teleoperation

Authors: Kiwon Yeom

Abstract:

Change points are abrupt variations in a data sequence. Detection of change points is useful in modeling, analyzing, and predicting time series in application areas such as robotics and teleoperation. In this paper, a change point is defined to be a discontinuity in one of its derivatives. This paper presents a reliable method for detecting discontinuities within a three-dimensional trajectory data. The problem of determining one or more discontinuities is considered in regular and irregular trajectory data from teleoperation. We examine the geometric detection algorithm and illustrate the use of the method on real data examples.

Keywords: change point, discontinuity, teleoperation, abrupt variation

Procedia PDF Downloads 155
23608 Phytoremediation of Lead Polluted Soils with Native Weeds in Nigeria

Authors: Comfort Adeoye, Anthony Eneji

Abstract:

Lead pollution by mining, industrial dumping, and other anthropogenic uses are corroding the environment. Efforts being made to control it include physical, chemical and biological methods. The failure of the aforementioned methods are largely due to the fact that they are cumbersome, expensive, and not eco-friendly. Some plant species can be used for remediation of these pollutants. The objective of this work is to investigate the abilities of two native weed species to remediate two lead-polluted soils: a) Battery dumpsite and, (b) Naturally occurring lead mine. Soil samples were taken from the two sites: a) Kumapayi in Ibadan, a battery dumpsite, (b) Zamfara, a natural lead mine. Screen house experiment in Complete Randomized Design (CRD) replicated three times was carried out at I.I.T.A. Unpolluted soils were collected and polluted with various rates of lead concentrations of 0, 0.1, 0.2, and 0.5%. These were planted with weed species. Plant growth parameters were monitored for twelve weeks, after which the plants were harvested. Dry weight and plant uptake of the lead were taken. Analysis of data was carried out using, Genstat, Excel and descriptive statistics. Relative concentration of lead (Pb) in the above and below ground parts of Gomphrena celusoides revealed that a higher amount of Pb is taken up in the root compared with the shoots at different levels of Pb pollution. However, lead uptake at 0.5% > 0.2% > 0.1% > Control. In essence, phytoremediation of Gomphrena is highest at soil pollution of 0.5% and its retention is greater in the root than the shoot.In S. pyramidalis, soil retention ranges from 0.1% > 0.5% > 0.2% > control. Uptake is highest at 0.5% > 0.1% > 0.2 in stem. Uptake in leaves is highest at 0.2%, but none in the 0.5% pollution. Therefore, different plant species exhibited different accumulative mode probably due to their physiological and rooting systems. Gomphrena spp. rooting system is tap root,while that of S.pyramidalis is fibrous.

Keywords: grass, lead, phytoremediation, pollution

Procedia PDF Downloads 311
23607 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 426
23606 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 182
23605 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization

Authors: K. Umbleja, M. Ichino, H. Yaguchi

Abstract:

In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.

Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data

Procedia PDF Downloads 158
23604 A Framework on Data and Remote Sensing for Humanitarian Logistics

Authors: Vishnu Nagendra, Marten Van Der Veen, Stefania Giodini

Abstract:

Effective humanitarian logistics operations are a cornerstone in the success of disaster relief operations. However, for effectiveness, they need to be demand driven and supported by adequate data for prioritization. Without this data operations are carried out in an ad hoc manner and eventually become chaotic. The current availability of geospatial data helps in creating models for predictive damage and vulnerability assessment, which can be of great advantage to logisticians to gain an understanding on the nature and extent of the disaster damage. This translates into actionable information on the demand for relief goods, the state of the transport infrastructure and subsequently the priority areas for relief delivery. However, due to the unpredictable nature of disasters, the accuracy in the models need improvement which can be done using remote sensing data from UAVs (Unmanned Aerial Vehicles) or satellite imagery, which again come with certain limitations. This research addresses the need for a framework to combine data from different sources to support humanitarian logistic operations and prediction models. The focus is on developing a workflow to combine data from satellites and UAVs post a disaster strike. A three-step approach is followed: first, the data requirements for logistics activities are made explicit, which is done by carrying out semi-structured interviews with on field logistics workers. Second, the limitations in current data collection tools are analyzed to develop workaround solutions by following a systems design approach. Third, the data requirements and the developed workaround solutions are fit together towards a coherent workflow. The outcome of this research will provide a new method for logisticians to have immediately accurate and reliable data to support data-driven decision making.

Keywords: unmanned aerial vehicles, damage prediction models, remote sensing, data driven decision making

Procedia PDF Downloads 367
23603 Facility Data Model as Integration and Interoperability Platform

Authors: Nikola Tomasevic, Marko Batic, Sanja Vranes

Abstract:

Emerging Semantic Web technologies can be seen as the next step in evolution of the intelligent facility management systems. Particularly, this considers increased usage of open source and/or standardized concepts for data classification and semantic interpretation. To deliver such facility management systems, providing the comprehensive integration and interoperability platform in from of the facility data model is a prerequisite. In this paper, one of the possible modelling approaches to provide such integrative facility data model which was based on the ontology modelling concept was presented. Complete ontology development process, starting from the input data acquisition, ontology concepts definition and finally ontology concepts population, was described. At the beginning, the core facility ontology was developed representing the generic facility infrastructure comprised of the common facility concepts relevant from the facility management perspective. To develop the data model of a specific facility infrastructure, first extension and then population of the core facility ontology was performed. For the development of the full-blown facility data models, Malpensa and Fiumicino airports in Italy, two major European air-traffic hubs, were chosen as a test-bed platform. Furthermore, the way how these ontology models supported the integration and interoperability of the overall airport energy management system was analyzed as well.

Keywords: airport ontology, energy management, facility data model, ontology modeling

Procedia PDF Downloads 433
23602 Shear Strength Characterization of Coal Mine Spoil in Very-High Dumps with Large Scale Direct Shear Testing

Authors: Leonie Bradfield, Stephen Fityus, John Simmons

Abstract:

The shearing behavior of current and planned coal mine spoil dumps up to 400m in height is studied using large-sample-high-stress direct shear tests performed on a range of spoils common to the coalfields of Eastern Australia. The motivation for the study is to address industry concerns that some constructed spoil dump heights ( > 350m) are exceeding the scale ( ≤ 120m) for which reliable design information exists, and because modern geotechnical laboratories are not equipped to test representative spoil specimens at field-scale stresses. For more than two decades, shear strength estimation for spoil dumps has been based on either infrequent, very small-scale tests where oversize particles are scalped to comply with device specimen size capacity such that the influence of prototype-sized particles on shear strength is not captured; or on published guidelines that provide linear shear strength envelopes derived from small-scale test data and verified in practice by slope performance of dumps up to 120m in height. To date, these published guidelines appear to have been reliable. However, in the field of rockfill dam design there is a broad acceptance of a curvilinear shear strength envelope, and if this is applicable to coal mine spoils, then these industry-accepted guidelines may overestimate the strength and stability of dumps at higher stress levels. The pressing need to rationally define the shearing behavior of more representative spoil specimens at field-scale stresses led to the successful design, construction and operation of a large direct shear machine (LDSM) and its subsequent application to provide reliable design information for current and planned very-high dumps. The LDSM can test at a much larger scale, in terms of combined specimen size (720mm x 720mm x 600mm) and stress (σn up to 4.6MPa), than has ever previously been achieved using a direct shear machine for geotechnical testing of rockfill. The results of an extensive LDSM testing program on a wide range of coal-mine spoils are compared to a published framework that widely accepted by the Australian coal mining industry as the standard for shear strength characterization of mine spoil. A critical outcome is that the LDSM data highlights several non-compliant spoils, and stress-dependent shearing behavior, for which the correct application of the published framework will not provide reliable shear strength parameters for design. Shear strength envelopes developed from the LDSM data are also compared with dam engineering knowledge, where failure envelopes of rockfills are curved in a concave-down manner. The LDSM data indicates that shear strength envelopes for coal-mine spoils abundant with rock fragments are not in fact curved and that the shape of the failure envelope is ultimately determined by the strength of rock fragments. Curvilinear failure envelopes were found to be appropriate for soil-like spoils containing minor or no rock fragments, or hard-soil aggregates.

Keywords: coal mine, direct shear test, high dump, large scale, mine spoil, shear strength, spoil dump

Procedia PDF Downloads 154