Search results for: environmental data
8173 An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data
Authors: Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon-kyung Lee, Hyejung Yoon
Abstract:
Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Keywords: Ant colony system, biological data, clustering, DNA chip.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19748172 The Resource Description Framework (RDF) as a Modern Structure for Medical Data
Authors: Gabriela Lindemann, Danilo Schmidt, Thomas Schrader, Dietmar Keune
Abstract:
The amount and heterogeneity of data in biomedical research, notably in interdisciplinary fields, requires new methods for the collection, presentation and analysis of information. Important data from laboratory experiments as well as patient trials are available but come out of distributed resources. The Charité - University Hospital Berlin has established together with the German Research Foundation (DFG) a new information service centre for kidney diseases and transplantation (Open European Nephrology Science Centre - OpEN.SC). Beside a collaborative aspect to create new research groups every single partner or institution of this science information centre making his own data available is allowed to search the whole data pool of the various involved centres. A core task is the implementation of a non-restricting open data structure for the various different data sources. We decided to use a modern RDF model and in a first phase transformed original data coming from the web-based Electronic Patient Record database TBase©.
Keywords: Medical databases, Resource Description Framework (RDF), metadata repository.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20328171 XML Data Management in Compressed Relational Database
Authors: Hongzhi Wang, Jianzhong Li, Hong Gao
Abstract:
XML is an important standard of data exchange and representation. As a mature database system, using relational database to support XML data may bring some advantages. But storing XML in relational database has obvious redundancy that wastes disk space, bandwidth and disk I/O when querying XML data. For the efficiency of storage and query XML, it is necessary to use compressed XML data in relational database. In this paper, a compressed relational database technology supporting XML data is presented. Original relational storage structure is adaptive to XPath query process. The compression method keeps this feature. Besides traditional relational database techniques, additional query process technologies on compressed relations and for special structure for XML are presented. In this paper, technologies for XQuery process in compressed relational database are presented..Keywords: XML, compression, query processing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18068170 Environmental and Toxicological Impacts of Glyphosate with Its Formulating Adjuvant
Authors: I. Székács, Á. Fejes, S. Klátyik, E. Takács, D. Patkó, J. Pomóthy, M. Mörtl, R. Horváth, E. Madarász, B. Darvas, A. Székács
Abstract:
Environmental and toxicological characteristics of formulated pesticides may substantially differ from those of their active ingredients or other components alone. This phenomenon is demonstrated in the case of the herbicide active ingredient glyphosate. Due to its extensive application, this active ingredient was found in surface and ground water samples collected in Békés County, Hungary, in the concentration range of 0.54–0.98 ng/ml. The occurrence of glyphosate appeared to be somewhat higher at areas under intensive agriculture, industrial activities and public road services, but the compound was detected at areas under organic (ecological) farming or natural grasslands, indicating environmental mobility. Increased toxicity of the formulated herbicide product Roundup compared to that of glyphosate was observed on the indicator aquatic organism Daphnia magna Straus. Acute LC50 values of Roundup and its formulating adjuvant polyethoxylated tallowamine (POEA) exceeded 20 and 3.1 mg/ml, respectively, while that of glyphosate (as isopropyl salt) was found to be substantially lower (690-900 mg/ml) showing good agreement with literature data. Cytotoxicity of Roundup, POEA and glyphosate has been determined on the neuroectodermal cell line, NE-4C measured both by cell viability test and holographic microscopy. Acute toxicity (LC50) of Roundup, POEA and glyphosate on NE-4C cells was found to be 0.013±0.002%, 0.017±0.009% and 6.46±2.25%, respectively (in equivalents of diluted Roundup solution), corresponding to 0.022±0.003 and 53.1±18.5 mg/ml for POEA and glyphosate, respectively, indicating no statistical difference between Roundup and POEA and 2.5 orders of magnitude difference between these and glyphosate. The same order of cellular toxicity seen in average cell area has been indicated under quantitative cell visualization. The results indicate that toxicity of the formulated herbicide is caused by the formulating agent, but in some parameters toxicological synergy occurs between POEA and glyphosate.
Keywords: Glyphosate, polyethoxylated tallowamine, Roundup, combined aquatic and cellular toxicity, synergy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 63688169 Mapping Soil Fertility at Different Scales to Support Sustainable Brazilian Agriculture
Authors: Rachel Bardy Prado, Vinícius de Melo Benites, José Carlos Polidoro, Carlos Eduardo Gonçalves, Alexey Naumov
Abstract:
Most agricultural crops cultivated in Brazil are highly nutrient demanding. Brazilian soils are generally acidic with low base saturation and available nutrients. Demand for fertilizer application has increased because the national agricultural sector expansion. To improve productivity without environmental impact, there is the need for the utilization of novel procedures and techniques to optimize fertilizer application. This includes the digital soil mapping and GIS application applied to mapping in different scales. This paper is based on research, realized during 2005 to 2010 by Brazilian Corporation for Agricultural Research (EMBRAPA) and its partners. The purpose was to map soil fertility in national and regional scales. A soil profile data set in national scale (1:5,000,000) was constructed from the soil archives of Embrapa Soils, Rio de Janeiro and in the regional scale (1:250,000) from COMIGO Cooperative soil data set, Rio Verde, Brazil. The mapping was doing using ArcGIS 9.1 tools from ESRI.Keywords: agricultural sustainability, fertilizer optimization, GIS, soil attributes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26188168 A System for Analyzing and Eliciting Public Grievances Using Cache Enabled Big Data
Authors: P. Kaladevi, N. Giridharan
Abstract:
The system for analyzing and eliciting public grievances serves its main purpose to receive and process all sorts of complaints from the public and respond to users. Due to the more number of complaint data becomes big data which is difficult to store and process. The proposed system uses HDFS to store the big data and uses MapReduce to process the big data. The concept of cache was applied in the system to provide immediate response and timely action using big data analytics. Cache enabled big data increases the response time of the system. The unstructured data provided by the users are efficiently handled through map reduce algorithm. The processing of complaints takes place in the order of the hierarchy of the authority. The drawbacks of the traditional database system used in the existing system are set forth by our system by using Cache enabled Hadoop Distributed File System. MapReduce framework codes have the possible to leak the sensitive data through computation process. We propose a system that add noise to the output of the reduce phase to avoid signaling the presence of sensitive data. If the complaints are not processed in the ample time, then automatically it is forwarded to the higher authority. Hence it ensures assurance in processing. A copy of the filed complaint is sent as a digitally signed PDF document to the user mail id which serves as a proof. The system report serves to be an essential data while making important decisions based on legislation.Keywords: Big Data, Hadoop, HDFS, Caching, MapReduce, web personalization, e-governance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15928167 An Epidemiological Study on an Outbreak of Gastroenteritis Linked to Dinner Served at a Senior High School in Accra
Authors: Benjamin Osei Tutu, Rita Asante, Emefa Atsu
Abstract:
Background: An outbreak of gastroenteritis occurred in December 2019 after students of a Senior High School in Accra were served with kenkey and fish during their dinner. An investigation was conducted to characterize the affected people, the source of contamination, the etiologic food and agent. Methods: An epidemiological study was conducted with cases selected from the student population who were ill. Controls were selected from among students who also ate from the school canteen during dinner but were not ill. Food history of each case and control was taken to assess their exposure status. Epi Info 7 was used to analyze the data obtained from the outbreak. Attack rates and odds ratios were calculated to determine the risk of foodborne infection for each of the foods consumed by the population. The source of contamination of the foods was ascertained by conducting an environmental risk assessment at the school. Results: Data were obtained from 126 students, out of which 57 (45.2%) were cases and 69 (54.8%) were controls. The cases presented with symptoms such as diarrhea (85.96%), abdominal cramps (66.67%), vomiting (50.88%), headache (21.05%), fever (17.86%) and nausea (3.51%). The peak incubation period was 18 hours with a minimum and maximum incubation periods of 6 and 50 hours respectively. From the incubation period, duration of illness and the symptoms, non-typhoidal salmonellosis was suspected. Multivariate analysis indicated that the illness was associated with the consumption of the fried fish served, however this was statistically insignificant (AOR 3.1.00, P = 0.159). No stool, blood or food samples were available for organism isolation and confirmation of suspected etiologic agent. The environmental risk assessment indicated poor hand washing practices on the part of both the food handlers and students. Conclusion: The outbreak could probably be due to the consumption of the fried fish that might have been contaminated with Salmonella sp. as a result of poor hand washing practices in the school.
Keywords: Case control study, food poisoning, handwashing, Salmonella, school.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6708166 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
Authors: S.Aranganayagi, K.Thangavel
Abstract:
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36908165 A Design Methodology and Tool to Support Ecodesign Implementation in Induction Hobs
Authors: Anna Costanza Russo, Daniele Landi, Michele Germani
Abstract:
Nowadays, the European Ecodesign Directive has emerged as a new approach to integrate environmental concerns into the product design and related processes. Ecodesign aims to minimize environmental impacts throughout the product life cycle, without compromising performances and costs. In addition, the recent Ecodesign Directives require products which are increasingly eco-friendly and eco-efficient, preserving high-performances. It is very important for producers measuring performances, for electric cooking ranges, hobs, ovens, and grills for household use, and a low power consumption of appliances represents a powerful selling point, also in terms of ecodesign requirements. The Ecodesign Directive provides a clear framework about the sustainable design of products and it has been extended in 2009 to all energy-related products, or products with an impact on energy consumption during the use. The European Regulation establishes measures of ecodesign of ovens, hobs, and kitchen hoods, and domestic use and energy efficiency of a product has a significant environmental aspect in the use phase which is the most impactful in the life cycle. It is important that the product parameters and performances are not affected by ecodesign requirements from a user’s point of view, and the benefits of reducing energy consumption in the use phase should offset the possible environmental impact in the production stage. Accurate measurements of cooking appliance performance are essential to help the industry to produce more energy efficient appliances. The development of ecodriven products requires ecoinnovation and ecodesign tools to support the sustainability improvement. The ecodesign tools should be practical and focused on specific ecoobjectives in order to be largely diffused. The main scope of this paper is the development, implementation, and testing of an innovative tool, which could be an improvement for the sustainable design of induction hobs. In particular, a prototypical software tool is developed in order to simulate the energy performances of the induction hobs. The tool is focused on a multiphysics model which is able to simulate the energy performances and the efficiency of induction hobs starting from the design data. The multiphysics model is composed by an electromagnetic simulation and a thermal simulation. The electromagnetic simulation is able to calculate the eddy current induced in the pot, which leads to the Joule heating of material. The thermal simulation is able to measure the energy consumption during the operational phase. The Joule heating caused from the eddy currents is the output of electromagnetic simulation and the input of thermal ones. The aims of the paper are the development of integrated tools and methodologies of virtual prototyping in the context of the ecodesign. This tool could be a revolutionary instrument in the field of industrial engineering and it gives consideration to the environmental aspects of product design and focus on the ecodesign of energy-related products, in order to achieve a reduced environmental impact.
Keywords: Ecodesign, induction hobs, virtual prototyping, energy efficiency.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12728164 Use of Hair as an Indicator of Environmental Lead Pollution: Characteristics and Seasonal Variation of Lead Pollution in Egypt
Authors: A. A. K. Abou-Arab, M. A. Abou Donia, Nevin E. Sharaf, Sherif R. Mohamed, A. K. Enab
Abstract:
Lead being a toxic heavy metal that mankind is exposed to the highest levels of this metal from environmental pollutants. A total of 180 Male scalp hair samples were collected from different environments in Greater Cairo (GC), i.e. industrial, heavy traffic and rural areas (60 samples from each) having different activities during the period of, 1/5/2010 to 1/11/2012. Hair samples were collected during five stages. Data proved that the concentration of lead in male industrial areas of Cairo ranged between 6.2847 to 19.0432 μg/g, with mean value of 12.3288 μg/g. On the other hand, lead content of hair samples of residential-traffic areas ranged between 2.8634 to 16.3311 μg/g with mean value of 9.7552 μg/g. While lead concentration on the hair of the male residents living in rural area ranged between 1.0499-9.0402μg/g with mean value of 4.7327 μg/g. The Pb concentration in scalp hair of Cairo residents of residential-traffic and rural traffic areas was observed to follow the same pattern. The pattern was that of decrease concentration of summer and its increase in winter. Then, there was a marked increase in Pb concentration of summer 2012, and this increase was significant. These were obviously seen for the residential-traffic and rural areas residents. Pb pollution in residents of industrial areas showed the same seasonal pattern, but there was marked to decrease in Pb concentration of summer 2012, and this decrease was significant. Lead pollution in residents of GC was serious. It is worth noting that the atmosphere is still contaminated by lead despite a decade of using unleaded gasoline. Strong seasonal variation in higher Pb concentration on winter than in summer was found. Major contributions to the pollution with Pb could include industry emissions, motor vehicle emissions and long transported dust from outside Cairo. More attention should be paid to the reduction of Pb content of the urban aerosol and to the Pb pollution health.
Keywords: Hair, lead, environmental exposure, seasonal variations, Egypt.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16918163 Mobile Phone as a Tool for Data Collection in Field Research
Authors: Sandro Mourão, Karla Okada
Abstract:
The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.Keywords: Data Gathering, Field Research, Mobile Phone, Survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20598162 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis
Authors: N. R. N. Idris, S. Baharom
Abstract:
A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.
Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17408161 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.
Keywords: Cluster analysis, education, mathematics, profiles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8938160 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data
Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan
Abstract:
The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23898159 The Two Layers of Food Safety and GMOs in the Hungarian Agricultural Law
Authors: Gergely Horváth
Abstract:
The study presents the complexity of food safety dividing it into two layers. Beyond the basic layer of requirements, there is a more demanding higher level linked with quality and purity aspects. It would be important to give special prominence to both layers, given that massive illnesses are caused by foods even though officially licensed. Then the study discusses an exciting safety challenge stemming from the risks of genetically modified organisms (GMOs). Furthermore, it features legal case examples that illustrate how certain liability questions are solved or not yet decided in connection with the production of genetically modified crops. In addition, a special kind of land grabbing, more precisely land grabbing from non-GMO farming systems can also be noticed as well as a new phenomenon eroding food sovereignty. Coexistence, the state where organic, conventional, and GM farming systems are standing alongside each other is an unsuitable experiment that cannot be successful, because of biophysical reasons (such as cross-pollination). Agricultural and environmental lawyers both try to find the optimal solution. Agri-environmental measures are introduced as a special subfield of law maintaining also food safety. The important steps of agri-environmental legislation are aiming at the protection of natural values, the environmental media and strengthening food safety as well, practically the quality of agricultural products intended for human consumption. The major findings of the study focus on searching for the appropriate approach capable of solving the security and safety problems of food production. The most interesting concepts of the Hungarian national and EU food law legislation are analyzed in more detail with descriptive, analytic and comparative methods.
Keywords: Food law, food safety, food security, GMO, agri-environmental measures.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12218158 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning
Authors: Chunming Xu
Abstract:
Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14478157 “Green Growth” in Kazakhstan: Political Leadership, Business Strategies and Environmental Fiscal Reform for Competitive System Change
Authors: A. S. Salimzhanova, J. C. Sardinas, O. A. Yanovskaya
Abstract:
The objective of this research work is to discuss the concept of “green growth” in the Republic of Kazakhstan introduced by its government in the “National Sustainable Development Strategy” with the objective of transition to a resource-efficient, “green economy.” We believe that emerging economies like Kazakhstan can pursue a cleaner and more efficient development path by introducing an environmental tax system based on resource consumption rather than only income and labor. The key issues discussed in this article are the eco-efficiency, which refers to closing the gap between economic and ecological efficiencies, and the structural change of the economy toward “green growth.” We also strongly believe that studying the experience of East Asian countries on “green reform” including eco-innovation and “green solutions” in business is essential to the case of Kazakhstan. All of these will raise the status of Kazakhstan to the level of one of the thirty developed countries over the next decades.
Keywords: Economic strategy, green growth, green solutions, natural resource management, environmental tax system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21788156 Combining the Deep Neural Network with the K-Means for Traffic Accident Prediction
Authors: Celso L. Fernando, Toshio Yoshii, Takahiro Tsubota
Abstract:
Understanding the causes of a road accident and predicting their occurrence is key to prevent deaths and serious injuries from road accident events. Traditional statistical methods such as the Poisson and the Logistics regressions have been used to find the association of the traffic environmental factors with the accident occurred; recently, an artificial neural network, ANN, a computational technique that learns from historical data to make a more accurate prediction, has emerged. Although the ability to make accurate predictions, the ANN has difficulty dealing with highly unbalanced attribute patterns distribution in the training dataset; in such circumstances, the ANN treats the minority group as noise. However, in the real world data, the minority group is often the group of interest; e.g., in the road traffic accident data, the events of the accident are the group of interest. This study proposes a combination of the k-means with the ANN to improve the predictive ability of the neural network model by alleviating the effect of the unbalanced distribution of the attribute patterns in the training dataset. The results show that the proposed method improves the ability of the neural network to make a prediction on a highly unbalanced distributed attribute patterns dataset; however, on an even distributed attribute patterns dataset, the proposed method performs almost like a standard neural network.
Keywords: Accident risks estimation, artificial neural network, deep learning, K-mean, road safety.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9768155 Determining Cluster Boundaries Using Particle Swarm Optimization
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17208154 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm
Authors: Ameur Abdelkader, Abed Bouarfa Hafida
Abstract:
Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.
Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10768153 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain
Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami
Abstract:
To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of the manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. Blockchain mechanism such as Bitcoin using Public Key Infrastructure (PKI) requires plaintext to be shared between companies in order to verify the identity of the company that sent the data. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems, this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is top-secret. In this scenario, we show an implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.
Keywords: Business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8198152 A Bibliometric Assessment on Sustainability and Clustering
Authors: Fernanda M. Assef, Maria Teresinha A. Steiner, David Gabriel F. de Barros
Abstract:
Review researches are useful in terms of analysis of research problems. Between the types of review documents, we commonly find bibliometric studies. This type of application often helps the global visualization of a research problem and helps academics worldwide to understand the context of a research area better. In this document, a bibliometric view surrounding clustering techniques and sustainability problems is presented. The authors aimed at which issues mostly use clustering techniques and even which sustainability issue would be more impactful on today’s moment of research. During the bibliometric analysis, we found 10 different groups of research in clustering applications for sustainability issues: Energy; Environmental; Non-urban Planning; Sustainable Development; Sustainable Supply Chain; Transport; Urban Planning; Water; Waste Disposal; and, Others. Moreover, by analyzing the citations of each group, it was discovered that the Environmental group could be classified as the most impactful research cluster in the area mentioned. After the content analysis of each paper classified in the environmental group, it was found that the k-means technique is preferred for solving sustainability problems with clustering methods since it appeared the most amongst the documents. The authors finally conclude that a bibliometric assessment could help indicate a gap of researches on waste disposal – which was the group with the least amount of publications – and the most impactful research on environmental problems.
Keywords: Bibliometric assessment, clustering, sustainability, territorial partitioning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3918151 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach
Authors: Sarisa Pinkham, Kanyarat Bussaban
Abstract:
The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.
Keywords: Daily rainfall, Image processing, Approximation, Pixel value data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17588150 Automatic Generation of Ontology from Data Source Directed by Meta Models
Authors: Widad Jakjoud, Mohamed Bahaj, Jamal Bakkas
Abstract:
Through this paper we present a method for automatic generation of ontological model from any data source using Model Driven Architecture (MDA), this generation is dedicated to the cooperation of the knowledge engineering and software engineering. Indeed, reverse engineering of a data source generates a software model (schema of data) that will undergo transformations to generate the ontological model. This method uses the meta-models to validate software and ontological models.
Keywords: Meta model, model, ontology, data source.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19988149 Climate Change and Environmental Education: The Application of Concept Map for Representing the Knowledge Complexity of Climate Change
Authors: Hsueh-Chih, Chen, Yau-Ting, Sung, Tsai-Wen, Lin, Hung-Teng, Chou
Abstract:
It has formed an essential issue that Climate Change, composed of highly knowledge complexity, reveals its significant impact on human existence. Therefore, specific national policies, some of which present the educational aspects, have been published for overcoming the imperative problem. Accordingly, the study aims to analyze as well as integrate the relationship between Climate Change and environmental education and apply the perspective of concept map to represent the knowledge contents and structures of Climate Change; by doing so, knowledge contents of Climate Change could be represented in an even more comprehensive way and manipulated as the tool for environmental education. The method adapted for this study is knowledge conversion model compounded of the platform for experts and teachers, who were the participants for this study, to cooperate and combine each participant-s standpoints into a complete knowledge framework that is the foundation for structuring the concept map. The result of this research contains the important concepts, the precise propositions and the entire concept map for representing the robust concepts of Climate Change.
Keywords: Climate Change, knowledge complexity, concept map.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17368148 Steps towards the Development of National Health Data Standards in Developing Countries: An Exploratory Qualitative Study in Saudi Arabia
Authors: Abdullah I. Alkraiji, Thomas W. Jackson, Ian R. Murray
Abstract:
The proliferation of health data standards today is somewhat overlapping and conflicting, resulting in market confusion and leading to increasing proprietary interests. The government role and support in standardization for health data are thought to be crucial in order to establish credible standards for the next decade, to maximize interoperability across the health sector, and to decrease the risks associated with the implementation of non-standard systems. The normative literature missed out the exploration of the different steps required to be undertaken by the government towards the development of national health data standards. Based on the lessons learned from a qualitative study investigating the different issues to the adoption of health data standards in the major tertiary hospitals in Saudi Arabia and the opinions and feedback from different experts in the areas of data exchange and standards and medical informatics in Saudi Arabia and UK, a list of steps required towards the development of national health data standards was constructed. Main steps are the existence of: a national formal reference for health data standards, an agreed national strategic direction for medical data exchange, a national medical information management plan and a national accreditation body, and more important is the change management at the national and organizational level. The outcome of this study can be used by academics and practitioners to develop the planning of health data standards, and in particular those in developing countries.
Keywords: Interoperability, Case Study, Health Data Standards, Medical Data Exchange, Saudi Arabia.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20028147 Test Data Compression Using a Hybrid of Bitmask Dictionary and 2n Pattern Runlength Coding Methods
Authors: C. Kalamani, K. Paramasivam
Abstract:
In VLSI, testing plays an important role. Major problem in testing are test data volume and test power. The important solution to reduce test data volume and test time is test data compression. The Proposed technique combines the bit maskdictionary and 2n pattern run length-coding method and provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty. This method has been implemented using Mat lab and HDL Language to reduce test data volume and memory requirements. This method is applied on various benchmark test sets and compared the results with other existing methods. The proposed technique can achieve a compression ratio up to 86%.Keywords: Bit Mask dictionary, 2n pattern run length code, system-on-chip, SOC, test data compression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19218146 A Hybrid Data Mining Method for the Medical Classification of Chest Pain
Authors: Sung Ho Ha, Seong Hyeon Joo
Abstract:
Data mining techniques have been used in medical research for many years and have been known to be effective. In order to solve such problems as long-waiting time, congestion, and delayed patient care, faced by emergency departments, this study concentrates on building a hybrid methodology, combining data mining techniques such as association rules and classification trees. The methodology is applied to real-world emergency data collected from a hospital and is evaluated by comparing with other techniques. The methodology is expected to help physicians to make a faster and more accurate classification of chest pain diseases.Keywords: Data mining, medical decisions, medical domainknowledge, chest pain.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22228145 Knowledge Discovery and Data Mining Techniques in Textile Industry
Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler
Abstract:
This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.Keywords: Data mining, textile production, decision trees, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15398144 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern
Authors: Mahdi Esmaeili, Mansour Tarafdar
Abstract:
The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1477