Search results for: Data Mining
24441 An Embarrassingly Simple Semi-supervised Approach to Increase Recall in Online Shopping Domain to Match Structured Data with Unstructured Data
Authors: Sachin Nagargoje
Abstract:
Complete labeled data is often difficult to obtain in a practical scenario. Even if one manages to obtain the data, the quality of the data is always in question. In shopping vertical, offers are the input data, which is given by advertiser with or without a good quality of information. In this paper, an author investigated the possibility of using a very simple Semi-supervised learning approach to increase the recall of unhealthy offers (has badly written Offer Title or partial product details) in shopping vertical domain. The author found that the semisupervised learning method had improved the recall in the Smart Phone category by 30% on A=B testing on 10% traffic and increased the YoY (Year over Year) number of impressions per month by 33% at production. This also made a significant increase in Revenue, but that cannot be publicly disclosed.Keywords: semi-supervised learning, clustering, recall, coverage
Procedia PDF Downloads 12224440 Genodata: The Human Genome Variation Using BigData
Authors: Surabhi Maiti, Prajakta Tamhankar, Prachi Uttam Mehta
Abstract:
Since the accomplishment of the Human Genome Project, there has been an unparalled escalation in the sequencing of genomic data. This project has been the first major vault in the field of medical research, especially in genomics. This project won accolades by using a concept called Bigdata which was earlier, extensively used to gain value for business. Bigdata makes use of data sets which are generally in the form of files of size terabytes, petabytes, or exabytes and these data sets were traditionally used and managed using excel sheets and RDBMS. The voluminous data made the process tedious and time consuming and hence a stronger framework called Hadoop was introduced in the field of genetic sciences to make data processing faster and efficient. This paper focuses on using SPARK which is gaining momentum with the advancement of BigData technologies. Cloud Storage is an effective medium for storage of large data sets which is generated from the genetic research and the resultant sets produced from SPARK analysis.Keywords: human genome project, Bigdata, genomic data, SPARK, cloud storage, Hadoop
Procedia PDF Downloads 25924439 Ontology for a Voice Transcription of OpenStreetMap Data: The Case of Space Apprehension by Visually Impaired Persons
Authors: Said Boularouk, Didier Josselin, Eitan Altman
Abstract:
In this paper, we present a vocal ontology of OpenStreetMap data for the apprehension of space by visually impaired people. Indeed, the platform based on produsage gives a freedom to data producers to choose the descriptors of geocoded locations. Unfortunately, this freedom, called also folksonomy leads to complicate subsequent searches of data. We try to solve this issue in a simple but usable method to extract data from OSM databases in order to send them to visually impaired people using Text To Speech technology. We focus on how to help people suffering from visual disability to plan their itinerary, to comprehend a map by querying computer and getting information about surrounding environment in a mono-modal human-computer dialogue.Keywords: TTS, ontology, open street map, visually impaired
Procedia PDF Downloads 29524438 Design and Development of a Platform for Analyzing Spatio-Temporal Data from Wireless Sensor Networks
Authors: Walid Fantazi
Abstract:
The development of sensor technology (such as microelectromechanical systems (MEMS), wireless communications, embedded systems, distributed processing and wireless sensor applications) has contributed to a broad range of WSN applications which are capable of collecting a large amount of spatiotemporal data in real time. These systems require real-time data processing to manage storage in real time and query the data they process. In order to cover these needs, we propose in this paper a Snapshot spatiotemporal data model based on object-oriented concepts. This model allows saving storing and reducing data redundancy which makes it easier to execute spatiotemporal queries and save analyzes time. Further, to ensure the robustness of the system as well as the elimination of congestion from the main access memory we propose a spatiotemporal indexing technique in RAM called Captree *. As a result, we offer an RIA (Rich Internet Application) -based SOA application architecture which allows the remote monitoring and control.Keywords: WSN, indexing data, SOA, RIA, geographic information system
Procedia PDF Downloads 25424437 Issue Reorganization Using the Measure of Relevance
Authors: William Wong Xiu Shun, Yoonjin Hyun, Mingyu Kim, Seongi Choi, Namgyu Kim
Abstract:
Recently, the demand of extracting the R&D keywords from the issues and using them in retrieving R&D information is increasing rapidly. But it is hard to identify the related issues or to distinguish them. Although the similarity between the issues cannot be identified, but with the R&D lexicon, the issues that always shared the same R&D keywords can be determined. In details, the R&D keywords that associated with particular issue is implied the key technology elements that needed to solve the problem of the particular issue. Furthermore, the related issues that sharing the same R&D keywords can be showed in a more systematic way through the issue clustering constructed from the perspective of R&D. Thus, sharing of the R&D result and reusable of the R&D technology can be facilitated. Indirectly, the redundancy of investment on the same R&D can be reduce as the R&D information can be shared between those corresponding issues and reusability of the related R&D can be improved. Therefore, a methodology of constructing an issue clustering from the perspective of common R&D keywords is proposed to satisfy the demands mentioned.Keywords: clustering, social network analysis, text mining, topic analysis
Procedia PDF Downloads 57324436 Cloning and Characterization of UDP-Glucose Pyrophosphorylases from Lactobacillus kefiranofaciens and Rhodococcus wratislaviensis
Authors: Mesfin Angaw Tesfay
Abstract:
Uridine-5’-diphosphate (UDP)-glucose is one of the most versatile building blocks within the metabolism of prokaryotes and eukaryotes, serving as an activated sugar donor during the glycosylation of natural products. It is formed by the enzyme UDP-glucose pyrophosphorylase (UGPase) using uridine-5′-triphosphate (UTP) and α-d-glucose 1-phosphate as a substrate. Herein, two UGPase genes from Lactobacillus kefiranofaciens ZW3 (LkUGPase) and Rhodococcus wratislaviensis IFP 2016 (RwUGPase) were identified through genome mining approaches. The LkUGPase and RwUGPase have 299 and 306 amino acids, respectively. Both UGPase has the conserved UTP binding site (G-X-G-T-R-X-L-P) and the glucose -1-phosphate binding site (V-E-K-P). The LkUGPase and RwUGPase were cloned in E. coli, and SDS-PAGE analysis showed the expression of both enzymes forming about 36 KDa of protein band after induction. LkUGPase and RwUGPase have an activity of 1549.95 and 671.53 U/mg, respectively. Currently, their kinetic properties are under investigation.Keywords: UGPase, LkUGPase, RwUGPase, UDP-glucose, glycosylation
Procedia PDF Downloads 2524435 Prediction of Marine Ecosystem Changes Based on the Integrated Analysis of Multivariate Data Sets
Authors: Prozorkevitch D., Mishurov A., Sokolov K., Karsakov L., Pestrikova L.
Abstract:
The current body of knowledge about the marine environment and the dynamics of marine ecosystems includes a huge amount of heterogeneous data collected over decades. It generally includes a wide range of hydrological, biological and fishery data. Marine researchers collect these data and analyze how and why the ecosystem changes from past to present. Based on these historical records and linkages between the processes it is possible to predict future changes. Multivariate analysis of trends and their interconnection in the marine ecosystem may be used as an instrument for predicting further ecosystem evolution. A wide range of information about the components of the marine ecosystem for more than 50 years needs to be used to investigate how these arrays can help to predict the future.Keywords: barents sea ecosystem, abiotic, biotic, data sets, trends, prediction
Procedia PDF Downloads 11724434 Optical Fiber Data Throughput in a Quantum Communication System
Authors: Arash Kosari, Ali Araghi
Abstract:
A mathematical model for an optical-fiber communication channel is developed which results in an expression that calculates the throughput and loss of the corresponding link. The data are assumed to be transmitted by using of separate photons with different polarizations. The derived model also shows the dependency of data throughput with length of the channel and depolarization factor. It is observed that absorption of photons affects the throughput in a more intensive way in comparison with that of depolarization. Apart from that, the probability of depolarization and the absorption of radiated photons are obtained.Keywords: absorption, data throughput, depolarization, optical fiber
Procedia PDF Downloads 28624433 Event Driven Dynamic Clustering and Data Aggregation in Wireless Sensor Network
Authors: Ashok V. Sutagundar, Sunilkumar S. Manvi
Abstract:
Energy, delay and bandwidth are the prime issues of wireless sensor network (WSN). Energy usage optimization and efficient bandwidth utilization are important issues in WSN. Event triggered data aggregation facilitates such optimal tasks for event affected area in WSN. Reliable delivery of the critical information to sink node is also a major challenge of WSN. To tackle these issues, we propose an event driven dynamic clustering and data aggregation scheme for WSN that enhances the life time of the network by minimizing redundant data transmission. The proposed scheme operates as follows: (1) Whenever the event is triggered, event triggered node selects the cluster head. (2) Cluster head gathers data from sensor nodes within the cluster. (3) Cluster head node identifies and classifies the events out of the collected data using Bayesian classifier. (4) Aggregation of data is done using statistical method. (5) Cluster head discovers the paths to the sink node using residual energy, path distance and bandwidth. (6) If the aggregated data is critical, cluster head sends the aggregated data over the multipath for reliable data communication. (7) Otherwise aggregated data is transmitted towards sink node over the single path which is having the more bandwidth and residual energy. The performance of the scheme is validated for various WSN scenarios to evaluate the effectiveness of the proposed approach in terms of aggregation time, cluster formation time and energy consumed for aggregation.Keywords: wireless sensor network, dynamic clustering, data aggregation, wireless communication
Procedia PDF Downloads 45124432 Offshore Outsourcing: Global Data Privacy Controls and International Compliance Issues
Authors: Michelle J. Miller
Abstract:
In recent year, there has been a rise of two emerging issues that impact the global employment and business market that the legal community must review closer: offshore outsourcing and data privacy. These two issues intersect because employment opportunities are shifting due to offshore outsourcing and some States, like the United States, anti-outsourcing legislation has been passed or presented to retain jobs within the country. In addition, the legal requirements to retain the privacy of data as a global employer extends to employees and third party service provides, including services outsourced to offshore locations. For this reason, this paper will review the intersection of these two issues with a specific focus on data privacy.Keywords: outsourcing, data privacy, international compliance, multinational corporations
Procedia PDF Downloads 41124431 Weighted Data Replication Strategy for Data Grid Considering Economic Approach
Authors: N. Mansouri, A. Asadi
Abstract:
Data Grid is a geographically distributed environment that deals with data intensive application in scientific and enterprise computing. Data replication is a common method used to achieve efficient and fault-tolerant data access in Grids. In this paper, a dynamic data replication strategy, called Enhanced Latest Access Largest Weight (ELALW) is proposed. This strategy is an enhanced version of Latest Access Largest Weight strategy. However, replication should be used wisely because the storage capacity of each Grid site is limited. Thus, it is important to design an effective strategy for the replication replacement task. ELALW replaces replicas based on the number of requests in future, the size of the replica, and the number of copies of the file. It also improves access latency by selecting the best replica when various sites hold replicas. The proposed replica selection selects the best replica location from among the many replicas based on response time that can be determined by considering the data transfer time, the storage access latency, the replica requests that waiting in the storage queue and the distance between nodes. Simulation results utilizing the OptorSim show our replication strategy achieve better performance overall than other strategies in terms of job execution time, effective network usage and storage resource usage.Keywords: data grid, data replication, simulation, replica selection, replica placement
Procedia PDF Downloads 26024430 Evaluation of Satellite and Radar Rainfall Product over Seyhan Plain
Authors: Kazım Kaba, Erdem Erdi, M. Akif Erdoğan, H. Mustafa Kandırmaz
Abstract:
Rainfall is crucial data source for very different discipline such as agriculture, hydrology and climate. Therefore rain rate should be known well both spatial and temporal for any area. Rainfall is measured by using rain-gauge at meteorological ground stations traditionally for many years. At the present time, rainfall products are acquired from radar and satellite images with a temporal and spatial continuity. In this study, we investigated the accuracy of these rainfall data according to rain-gauge data. For this purpose, we used Adana-Hatay radar hourly total precipitation product (RN1) and Meteosat convective rainfall rate (CRR) product over Seyhan plain. We calculated daily rainfall values from RN1 and CRR hourly precipitation products. We used the data of rainy days of four stations located within range of the radar from October 2013 to November 2015. In the study, we examined two rainfall data over Seyhan plain and the correlation between the rain-gauge data and two raster rainfall data was observed lowly.Keywords: meteosat, radar, rainfall, rain-gauge, Turkey
Procedia PDF Downloads 32824429 Data-Driven Dynamic Overbooking Model for Tour Operators
Authors: Kannapha Amaruchkul
Abstract:
We formulate a dynamic overbooking model for a tour operator, in which most reservations contain at least two people. The cancellation rate and the timing of the cancellation may depend on the group size. We propose two overbooking policies, namely economic- and service-based. In an economic-based policy, we want to minimize the expected oversold and underused cost, whereas, in a service-based policy, we ensure that the probability of an oversold situation does not exceed the pre-specified threshold. To illustrate the applicability of our approach, we use tour package data in 2016-2018 from a tour operator in Thailand to build a data-driven robust optimization model, and we tested the proposed overbooking policy in 2019. We also compare the data-driven approach to the conventional approach of fitting data into a probability distribution.Keywords: applied stochastic model, data-driven robust optimization, overbooking, revenue management, tour operator
Procedia PDF Downloads 13424428 Modeling and Statistical Analysis of a Soap Production Mix in Bejoy Manufacturing Industry, Anambra State, Nigeria
Authors: Okolie Chukwulozie Paul, Iwenofu Chinwe Onyedika, Sinebe Jude Ebieladoh, M. C. Nwosu
Abstract:
The research work is based on the statistical analysis of the processing data. The essence is to analyze the data statistically and to generate a design model for the production mix of soap manufacturing products in Bejoy manufacturing company Nkpologwu, Aguata Local Government Area, Anambra state, Nigeria. The statistical analysis shows the statistical analysis and the correlation of the data. T test, Partial correlation and bi-variate correlation were used to understand what the data portrays. The design model developed was used to model the data production yield and the correlation of the variables show that the R2 is 98.7%. However, the results confirm that the data is fit for further analysis and modeling. This was proved by the correlation and the R-squared.Keywords: General Linear Model, correlation, variables, pearson, significance, T-test, soap, production mix and statistic
Procedia PDF Downloads 44524427 The Importance of Imaging and Functional Tests for Early Detection of Occupational Diseases in Kosovo's Miners
Authors: Krenare Shabani, Kreshnike Dedushi Hoti, Serbeze Kabashi, Jeton Shatri, Arben Rroji, Mrikë Bunjaku, Leotrim Berisha, Jona Kosova, Edmond Puca, Bleriana Shabani
Abstract:
Introduction: Workers in Kosovo's mining industry are subjected to hazardous working conditions and airborne particles, such as silica dust, which can cause silicosis and other severe respiratory illnesses. The purpose of this research is to assess the health impacts of such exposures, as well as the importance of imaging and functional testing in detecting pathological changes early on. Methodology: The study is prospective and cross-sectional and was carried out during the year 2024. 626 people (446 miners and 180 non-miners) were enrolled in the study. Subjects underwent spirometry and chest radiography. Data were analysed with SPSS24. Results: The average age of the participants is 48 years. Demographics and Smoking: Smoking was common among young miners. Radiological Changes: Radiographic abnormalities in the lungs were seen in 23.1% of miners and 10.6% of non-miners, including small irregular opacities and emphysematous changes. Lung Function: The FEV1/FVC ratio decreased with increased exposure time, indicating a decline in pulmonary function.Impact of Exposure Duration: Longer exposure duration was associated with a higher number of miners experiencing coughs and requiring medical consultations such as CT scans and biopsies. Conclusions: Medical imaging and functional testing are critical for early diagnosis of lung abnormalities in miners.Findings demonstrate a strong correlation between extended exposure to mine dust and the development of respiratory disorders, emphasising the importance of preventative measures and routine health monitoring.Keywords: silicosis, miners, imaging, spirometry
Procedia PDF Downloads 2824426 Examination of Occupational Health and Safety Practices in Ghana
Authors: Zakari Mustapha, Clinto Aigbavboa, Wellinton Didi Thwala
Abstract:
Occupational Health and Safety (OHS) issues has been a major challenge to the Ghanaian government. The purpose of the study was to examine OHS practices in Ghana. The study looked at various views from different scholars about OHS practices in order to achieve the objective of the study. Literature review was conducted on OHS in Ghana. Findings from the study shows Ministry of Roads and Transport (MRT) and Ministry of Water Resources, Works and Housing (MWRWH) are two government ministries in charge of construction and implementation of the construction sector policy. The Factories, Offices and Shops Act 1970, Act 328 and the Mining Regulations 1970 LI 665 are the two major edicts. The study presents a strong background on OHS practices in Ghana and contribute to the body of knowledge on the solution to the current trends and challenges of OHS in the construction sector.Keywords: ILO convention, OHS challenges, OHS practices, OHS improvement
Procedia PDF Downloads 36724425 Optimization of Real Time Measured Data Transmission, Given the Amount of Data Transmitted
Authors: Michal Kopcek, Tomas Skulavik, Michal Kebisek, Gabriela Krizanova
Abstract:
The operation of nuclear power plants involves continuous monitoring of the environment in their area. This monitoring is performed using a complex data acquisition system, which collects status information about the system itself and values of many important physical variables e.g. temperature, humidity, dose rate etc. This paper describes a proposal and optimization of communication that takes place in teledosimetric system between the central control server responsible for the data processing and storing and the decentralized measuring stations, which are measuring the physical variables. Analyzes of ongoing communication were performed and consequently the optimization of the system architecture and communication was done.Keywords: communication protocol, transmission optimization, data acquisition, system architecture
Procedia PDF Downloads 51924424 Industrial Kaolinite Resource Deposits Study in Grahamstown Area, Eastern Cape, South Africa
Authors: Adeola Ibukunoluwa Samuel, Afsoon Kazerouni
Abstract:
Industrial mineral kaolin has many favourable properties such as colour, shape, softness, non-abrasiveness, natural whiteness, as well as chemical stability. It occurs extensively in North of Bedford road Grahamstown, South Africa. The relationship between both the physical and chemical properties as lead to its application in the production of certain industrial products which are used by the public; this includes the prospect of production of paper, ceramics, rubber, paint, and plastics. Despite its interesting economic potentials, kaolinite clay mineral remains undermined, and this is threatening its sustainability in the mineral industry. This research study focuses on a detailed evaluation of the kaolinite mineral and possible ways to increase its lifespan in the industry. The methods employed for this study includes petrographic microscopy analysis, X-ray powder diffraction analysis (XRD), and proper field reconnaissance survey. Results emanating from this research include updated geological information on Grahamstown. Also, mineral transformation phases such as quartz, kaolinite, calcite and muscovite were identified in the clay samples. Petrographic analysis of the samples showed that the study area has been subjected to intense tectonic deformation and cement replacement. Also, different dissolution patterns were identified on the Grahamstown kaolinitic clay deposits. Hence incorporating analytical studies and data interpretations, possible ways such as the establishment of processing refinery near mining plants, which will, in turn, provide employment for the locals and land reclamation is suggested. In addition, possible future sustainable industrial applications of the clay minerals seem to be possible if additives, cellulosic wastes are used to alter the clay mineral.Keywords: kaolinite, industrial use, sustainability, Grahamstown, clay minerals
Procedia PDF Downloads 18824423 Shear Strength Characterization of Coal Mine Spoil in Very-High Dumps with Large Scale Direct Shear Testing
Authors: Leonie Bradfield, Stephen Fityus, John Simmons
Abstract:
The shearing behavior of current and planned coal mine spoil dumps up to 400m in height is studied using large-sample-high-stress direct shear tests performed on a range of spoils common to the coalfields of Eastern Australia. The motivation for the study is to address industry concerns that some constructed spoil dump heights ( > 350m) are exceeding the scale ( ≤ 120m) for which reliable design information exists, and because modern geotechnical laboratories are not equipped to test representative spoil specimens at field-scale stresses. For more than two decades, shear strength estimation for spoil dumps has been based on either infrequent, very small-scale tests where oversize particles are scalped to comply with device specimen size capacity such that the influence of prototype-sized particles on shear strength is not captured; or on published guidelines that provide linear shear strength envelopes derived from small-scale test data and verified in practice by slope performance of dumps up to 120m in height. To date, these published guidelines appear to have been reliable. However, in the field of rockfill dam design there is a broad acceptance of a curvilinear shear strength envelope, and if this is applicable to coal mine spoils, then these industry-accepted guidelines may overestimate the strength and stability of dumps at higher stress levels. The pressing need to rationally define the shearing behavior of more representative spoil specimens at field-scale stresses led to the successful design, construction and operation of a large direct shear machine (LDSM) and its subsequent application to provide reliable design information for current and planned very-high dumps. The LDSM can test at a much larger scale, in terms of combined specimen size (720mm x 720mm x 600mm) and stress (σn up to 4.6MPa), than has ever previously been achieved using a direct shear machine for geotechnical testing of rockfill. The results of an extensive LDSM testing program on a wide range of coal-mine spoils are compared to a published framework that widely accepted by the Australian coal mining industry as the standard for shear strength characterization of mine spoil. A critical outcome is that the LDSM data highlights several non-compliant spoils, and stress-dependent shearing behavior, for which the correct application of the published framework will not provide reliable shear strength parameters for design. Shear strength envelopes developed from the LDSM data are also compared with dam engineering knowledge, where failure envelopes of rockfills are curved in a concave-down manner. The LDSM data indicates that shear strength envelopes for coal-mine spoils abundant with rock fragments are not in fact curved and that the shape of the failure envelope is ultimately determined by the strength of rock fragments. Curvilinear failure envelopes were found to be appropriate for soil-like spoils containing minor or no rock fragments, or hard-soil aggregates.Keywords: coal mine, direct shear test, high dump, large scale, mine spoil, shear strength, spoil dump
Procedia PDF Downloads 16124422 The Duty of Application and Connection Providers Regarding the Supply of Internet Protocol by Court Order in Brazil to Determine Authorship of Acts Practiced on the Internet
Authors: João Pedro Albino, Ana Cláudia Pires Ferreira de Lima
Abstract:
Humanity has undergone a transformation from the physical to the virtual world, generating an enormous amount of data on the world wide web, known as big data. Many facts that occur in the physical world or in the digital world are proven through records made on the internet, such as digital photographs, posts on social media, contract acceptances by digital platforms, email, banking, and messaging applications, among others. These data recorded on the internet have been used as evidence in judicial proceedings. The identification of internet users is essential for the security of legal relationships. This research was carried out on scientific articles and materials from courses and lectures, with an analysis of Brazilian legislation and some judicial decisions on the request of static data from logs and Internet Protocols (IPs) from application and connection providers. In this article, we will address the determination of authorship of data processing on the internet by obtaining the IP address and the appropriate judicial procedure for this purpose under Brazilian law.Keywords: IP address, digital forensics, big data, data analytics, information and communication technology
Procedia PDF Downloads 12424421 The Comparison of Safety Factor in Dry and Rainy Condition at Coal Bearing Formation. Case Study: Lahat Area South Sumatera Province, Indonesia
Authors: Teguh Nurhidayat, Nurhamid, Dicky Muslim, Zufialdi Zakaria, Irvan Sophian
Abstract:
This paper presents the role of climate change as the factor that induces landslide. Case study is located at Lahat Regency, South Sumatera Province, Indonesia. Study area has high economic value of coal reserves (mostly subbituminous – bituminous), which is developable for open pit coal mining in the future. Seams are found in Muara Enim Formation. This formation is at south Sumatera basin which is formed at Tertiary as a result of collision between the indian plate and eurasian plate. South Sumatera basin which is a basin located in back arc basin. This study aims to unravel the relationship between slope stability with different season condition in tropical climate. Undisturbed soil samples were obtained in the field along with other geological data. Laboratory works were carried out to obtain physical and mechanical properties of soils. Methodology to analyze slope stability is bishop method. Bishop methods are used to identify safety factor of slope. Result shows that slopes in rainy season conditions are more prone to landslides than in dry season. In the dry seasons with moisture content is 22.65%, safety factor is 1.28 the slope in stable condition. If rain is approaching with moisture content increasing to 97.8%, the slope began to be critical. On wet condition groundwater levels is increased, followed by γ (unit weight), c (cohesion), and φ (angle of friction) at 18.04, 5,88 kN/m2, and 28,04°, respectively, which ultimately determines the security factor FS to be 1.01 (slope in unstable conditions).Keywords: rainfall, moisture content, slope analysis, landslide prone
Procedia PDF Downloads 31324420 Sourcing and Compiling a Maltese Traffic Dataset MalTra
Authors: Gabriele Borg, Alexei De Bono, Charlie Abela
Abstract:
There on a constant rise in the availability of high volumes of data gathered from multiple sources, resulting in an abundance of unprocessed information that can be used to monitor patterns and trends in user behaviour. Similarly, year after year, Malta is also constantly experiencing ongoing population growth and an increase in mobilization demand. This research takes advantage of data which is continuously being sourced and converting it into useful information related to the traffic problem on the Maltese roads. The scope of this paper is to provide a methodology to create a custom dataset (MalTra - Malta Traffic) compiled from multiple participants from various locations across the island to identify the most common routes taken to expose the main areas of activity. This use of big data is seen being used in various technologies and is referred to as ITSs (Intelligent Transportation Systems), which has been concluded that there is significant potential in utilising such sources of data on a nationwide scale.Keywords: Big Data, vehicular traffic, traffic management, mobile data patterns
Procedia PDF Downloads 10924419 Comparative Study of Accuracy of Land Cover/Land Use Mapping Using Medium Resolution Satellite Imagery: A Case Study
Authors: M. C. Paliwal, A. K. Jain, S. K. Katiyar
Abstract:
Classification of satellite imagery is very important for the assessment of its accuracy. In order to determine the accuracy of the classified image, usually the assumed-true data are derived from ground truth data using Global Positioning System. The data collected from satellite imagery and ground truth data is then compared to find out the accuracy of data and error matrices are prepared. Overall and individual accuracies are calculated using different methods. The study illustrates advanced classification and accuracy assessment of land use/land cover mapping using satellite imagery. IRS-1C-LISS IV data were used for classification of satellite imagery. The satellite image was classified using the software in fourteen classes namely water bodies, agricultural fields, forest land, urban settlement, barren land and unclassified area etc. Classification of satellite imagery and calculation of accuracy was done by using ERDAS-Imagine software to find out the best method. This study is based on the data collected for Bhopal city boundaries of Madhya Pradesh State of India.Keywords: resolution, accuracy assessment, land use mapping, satellite imagery, ground truth data, error matrices
Procedia PDF Downloads 50824418 Glasshouse Experiment to Improve Phytomanagement Solutions for Cu-Polluted Mine Soils
Authors: Marc Romero-Estonllo, Judith Ramos-Castro, Yaiza San Miguel, Beatriz Rodríguez-Garrido, Carmela Monterroso
Abstract:
Mining activity is among the main sources of trace and heavy metal(loid) pollution worldwide, which is a hazard to human and environmental health. That is why several projects have been emerging for the remediation of such polluted places. Phytomanagement strategies draw good performances besides big side benefits. In this work, a glasshouse assay with trace element polluted soils from an old Cu mine ore (NW of Spain) which forms part of the PhytoSUDOE network of phytomanaged contaminated field sites (PhytoSUDOE Project (SOE1/P5/E0189)) was set. The objective was to evaluate improvements induced by the following phytoremediation-related treatments. Three increasingly complex amendments alone or together with plant growth (Populus nigra L. alone and together with Tripholium repens L.) were tested. And three different rhizosphere bioinocula were applied (Plant Growth Promoting Bacteria (PGP), mycorrhiza (MYC), or mixed (PGP+MYC)). After 110 days of growth, plants were collected, biomass was weighed, and tree length was measured. Physical-chemical analyses were carried out to determine pH, effective Cation Exchange Capacity, carbon and nitrogen contents, bioavailable phosphorous (Olsen bicarbonate method), pseudo total element content (microwave acid digested fraction), EDTA extractable metals (complexed fraction), and NH4NO3 extractable metals (easily bioavailable fraction). On plant material, nitrogen content and acid digestion elements were determined. Amendment usage, plant growth, and bioinoculation were demonstrated to improve soil fertility and/or plant health within the time span of this study. Particularly, pH levels increased from 3 (highly acidic) to 5 (acidic) in the worst-case scenario, even reaching 7 (neutrality) in the best plots. Organic matter and pH increments were related to polluting metals’ bioavailability decrements. Plants grew better both with the most complex amendment and the middle one, with few differences due to bioinoculation. Using the less complex amendment (just compost) beneficial effects of bioinoculants were more observable, although plants didn’t thrive very well. On unamended soils, plants neither sprouted nor bloomed. The scheme assayed in this study is suitable for phytomanagement of these kinds of soils affected by mining activity. These findings should be tested now on a larger scale.Keywords: aided phytoremediation, mine pollution, phytostabilization, soil pollution, trace elements
Procedia PDF Downloads 6624417 Effect of Genuine Missing Data Imputation on Prediction of Urinary Incontinence
Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Ananias Diokno
Abstract:
Missing data is a common challenge in statistical analyses of most clinical survey datasets. A variety of methods have been developed to enable analysis of survey data to deal with missing values. Imputation is the most commonly used among the above methods. However, in order to minimize the bias introduced due to imputation, one must choose the right imputation technique and apply it to the correct type of missing data. In this paper, we have identified different types of missing values: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD) and applied rough set imputation on only the GMD portion of the missing data. We have used rough set imputation to evaluate the effect of such imputation on prediction by generating several simulation datasets based on an existing epidemiological dataset (MESA). To measure how well each dataset lends itself to the prediction model (logistic regression), we have used p-values from the Wald test. To evaluate the accuracy of the prediction, we have considered the width of 95% confidence interval for the probability of incontinence. Both imputed and non-imputed simulation datasets were fit to the prediction model, and they both turned out to be significant (p-value < 0.05). However, the Wald score shows a better fit for the imputed compared to non-imputed datasets (28.7 vs. 23.4). The average confidence interval width was decreased by 10.4% when the imputed dataset was used, meaning higher precision. The results show that using the rough set method for missing data imputation on GMD data improve the predictive capability of the logistic regression. Further studies are required to generalize this conclusion to other clinical survey datasets.Keywords: rough set, imputation, clinical survey data simulation, genuine missing data, predictive index
Procedia PDF Downloads 16824416 Database Management System for Orphanages to Help Track of Orphans
Authors: Srivatsav Sanjay Sridhar, Asvitha Raja, Prathit Kalra, Soni Gupta
Abstract:
Database management is a system that keeps track of details about a person in an organisation. Not a lot of orphanages these days are shifting to a computer and program-based system, but unfortunately, most have only pen and paper-based records, which not only consumes space but it is also not eco-friendly. It comes as a hassle when one has to view a record of a person as they have to search through multiple records, and it will consume time. This program will organise all the data and can pull out any information about anyone whose data is entered. This is also a safe way of storage as physical data gets degraded over time or, worse, destroyed due to natural disasters. In this developing world, it is only smart enough to shift all data to an electronic-based storage system. The program comes with all features, including creating, inserting, searching, and deleting the data, as well as printing them.Keywords: database, orphans, programming, C⁺⁺
Procedia PDF Downloads 15724415 Computerized Scoring System: A Stethoscope to Understand Consumer's Emotion through His or Her Feedback
Authors: Chen Yang, Jun Hu, Ping Li, Lili Xue
Abstract:
Most companies pay careful attention to consumer feedback collection, so it is popular to find the ‘feedback’ button of all kinds of mobile apps. Yet it is much more changeling to analyze these feedback texts and to catch the true feelings of a consumer regarding either a problem or a complimentary of consumers who hands out the feedback. Especially to the Chinese content, it is possible that; in one context the Chinese feedback expresses positive feedback, but in the other context, the same Chinese feedback may be a negative one. For example, in Chinese, the feedback 'operating with loudness' works well with both refrigerator and stereo system. Apparently, this feedback towards a refrigerator shows negative feedback; however, the same feedback is positive towards a stereo system. By introducing Bradley, M. and Lang, P.'s Affective Norms for English Text (ANET) theory and Bucci W.’s Referential Activity (RA) theory, we, usability researchers at Pingan, are able to decipher the feedback and to find the hidden feelings behind the content. We subtract 2 disciplines ‘valence’ and ‘dominance’ out of 3 of ANET and 2 disciplines ‘concreteness’ and ‘specificity’ out of 4 of RA to organize our own rating system with a scale of 1 to 5 points. This rating system enables us to judge the feelings/emotion behind each feedback, and it works well with both single word/phrase and a whole paragraph. The result of the rating reflects the strength of the feeling/emotion of the consumer when he/she is typing the feedback. In our daily work, we first require a consumer to answer the net promoter score (NPS) before writing the feedback, so we can determine the feedback is positive or negative. Secondly, we code the feedback content according to company problematic list, which contains 200 problematic items. In this way, we are able to collect the data that how many feedbacks left by the consumer belong to one typical problem. Thirdly, we rate each feedback based on the rating system mentioned above to illustrate the strength of the feeling/emotion when our consumer writes the feedback. In this way, we actually obtain two kinds of data 1) the portion, which means how many feedbacks are ascribed into one problematic item and 2) the severity, how strong the negative feeling/emotion is when the consumer is writing this feedback. By crossing these two, and introducing the portion into X-axis and severity into Y-axis, we are able to find which typical problem gets the high score in both portion and severity. The higher the score of a problem has, the more urgent a problem is supposed to be solved as it means more people write stronger negative feelings in feedbacks regarding this problem. Moreover, by introducing hidden Markov model to program our rating system, we are able to computerize the scoring system and are able to process thousands of feedback in a short period of time, which is efficient and accurate enough for the industrial purpose.Keywords: computerized scoring system, feeling/emotion of consumer feedback, referential activity, text mining
Procedia PDF Downloads 17624414 New Two-Way Map-Reduce Join Algorithm: Hash Semi Join
Authors: Marwa Hussein Mohamed, Mohamed Helmy Khafagy, Samah Ahmed Senbel
Abstract:
Map Reduce is a programming model used to handle and support massive data sets. Rapidly increasing in data size and big data are the most important issue today to make an analysis of this data. map reduce is used to analyze data and get more helpful information by using two simple functions map and reduce it's only written by the programmer, and it includes load balancing , fault tolerance and high scalability. The most important operation in data analysis are join, but map reduce is not directly support join. This paper explains two-way map-reduce join algorithm, semi-join and per split semi-join, and proposes new algorithm hash semi-join that used hash table to increase performance by eliminating unused records as early as possible and apply join using hash table rather than using map function to match join key with other data table in the second phase but using hash tables isn't affecting on memory size because we only save matched records from the second table only. Our experimental result shows that using a hash table with hash semi-join algorithm has higher performance than two other algorithms while increasing the data size from 10 million records to 500 million and running time are increased according to the size of joined records between two tables.Keywords: map reduce, hadoop, semi join, two way join
Procedia PDF Downloads 51324413 Enabling Quantitative Urban Sustainability Assessment with Big Data
Authors: Changfeng Fu
Abstract:
Sustainable urban development has been widely accepted a common sense in the modern urban planning and design. However, the measurement and assessment of urban sustainability, especially the quantitative assessment have been always an issue obsessing planning and design professionals. This paper will present an on-going research on the principles and technologies to develop a quantitative urban sustainability assessment principles and techniques which aim to integrate indicators, geospatial and geo-reference data, and assessment techniques together into a mechanism. It is based on the principles and techniques of geospatial analysis with GIS and statistical analysis methods. The decision-making technologies and methods such as AHP and SMART are also adopted to address overall assessment conclusions. The possible interfaces and presentation of data and quantitative assessment results are also described. This research is based on the knowledge, situations and data sources of UK, but it is potentially adaptable to other countries or regions. The implementation potentials of the mechanism are also discussed.Keywords: urban sustainability assessment, quantitative analysis, sustainability indicator, geospatial data, big data
Procedia PDF Downloads 35924412 Development of Generalized Correlation for Liquid Thermal Conductivity of N-Alkane and Olefin
Authors: A. Ishag Mohamed, A. A. Rabah
Abstract:
The objective of this research is to develop a generalized correlation for the prediction of thermal conductivity of n-Alkanes and Alkenes. There is a minority of research and lack of correlation for thermal conductivity of liquids in the open literature. The available experimental data are collected covering the groups of n-Alkanes and Alkenes.The data were assumed to correlate to temperature using Filippov correlation. Nonparametric regression of Grace Algorithm was used to develop the generalized correlation model. A spread sheet program based on Microsoft Excel was used to plot and calculate the value of the coefficients. The results obtained were compared with the data that found in Perry's Chemical Engineering Hand Book. The experimental data correlated to the temperature ranged "between" 273.15 to 673.15 K, with R2 = 0.99.The developed correlation reproduced experimental data that which were not included in regression with absolute average percent deviation (AAPD) of less than 7 %. Thus the spread sheet was quite accurate which produces reliable data.Keywords: N-Alkanes, N-Alkenes, nonparametric, regression
Procedia PDF Downloads 654