Search results for: Stated Preference Data.
7276 Words of Peace in the Speeches of the Egyptian President, Abdulfattah El-Sisi: A Corpus-Based Study
Authors: Mohamed S. Negm, Waleed S. Mandour
Abstract:
The present study aims primarily at investigating words of peace (lexemes of peace) in the formal speeches of the Egyptian president Abdulfattah El-Sisi in a two-year span of time, from 2018 to 2019. This paper attempts to shed light not only on the contextual use of the antonyms, war and peace, but also it underpins quantitative analysis through the current methods of corpus linguistics. As such, the researchers have deployed a corpus-based approach in collecting, encoding, and processing 30 presidential speeches over the stated period (23,411 words and 25,541 tokens in total). Further, semantic fields and collocational networkzs are identified and compared statistically. Results have shown a significant propensity of adopting peace, including its relevant collocation network, textually and therefore, ideationally, at the expense of war concept which in most cases surfaces euphemistically through the noun conflict. The president has not justified the action of war with an honorable cause or a valid reason. Such results, so far, have indicated a positive sociopolitical mindset the Egyptian president possesses and moreover, reveal national and international fair dealing on arising issues.
Keywords: Corpus-assisted discourse studies, critical discourse analysis, collocation network, corpus linguistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16297275 Impact of Stack Caches: Locality Awareness and Cost Effectiveness
Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang
Abstract:
Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.
Keywords: Hit rate, Locality of program, Stack cache, and Stack data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15087274 Population Trend of Canola Aphid, Lipaphis Erysimi (Kalt.) (Homoptera: Aphididae) and its Associated Natural Enemies in Different Brassica Lines along with the Effect of Gamma Radiation on Their Population
Authors: Ahmad-Ur-Rahman Saljoqi, Rahib Zada, Imtiaz Ali Khan, Iqbal Munir, Sadur-Rehman, Hazrat Jabir Alam Khan
Abstract:
Studies regarding the determination of population trend of Lipaphis erysimi (kalt.) and its associated natural enemies in different Brassica lines along with the effect of gamma radiation on their population were conducted at Agricultural Research Farm, Malakandher, Khyber Pakhtunkhwa Agricultural University Peshawar during spring 2006. Three different Brassica lines F6B3, F6B6 and F6B7 were used, which were replicated four times in Randomized Complete Block Design. The data revealed that aphid infestation invariably stated in all three varieties during last week of February 2006 (1st observation). The peak population of 4.39 aphids leaf-1 was s recorded during 2nd week of March and lowest population of 1.02 aphids leaf-1 was recorded during 5th week of March. The species of lady bird beetle (Coccinella septempunctata) and Syrphid fly (Syrphus balteatus) first appeared on 24th February with a mean number of 0.40 lady bird beetle leaf-1 and 0.87 Syrphid fly leaf-1, respectively. At the time when aphid population started to increase the peak population of C. septempunctata (0.70 lady bird beetle leaf- 1) and S. balteatus (1.04 syrphid fly leaf-1) was recorded on the 2nd week of March. Chrysoperla carnea appeared in the 1st week of March and their peak population was recorded during the 3rd week of March with mean population of 1.46 C. carnea leaf-1. Among all the Brassica lines, F6B7 showed comparatively more resistance as compared to F6B3 F6B6. F6B3 showed least resistance against L. erysimi, which was found to be the most susceptible cultivar. F6B7 was also found superior in terms of natural enemies. Maximum number of all natural enemies was recorded on this variety followed by F6B6. Lowest number of natural enemies was recorded in F6B3. No significant effect was recorded for the effect of gamma radiation on the population of aphids, natural enemies and on the varieties.Keywords: Canola aphid, Lipaphis erysimi, natural enemies, brassica lines, gamma radiation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15607273 Cross Project Software Fault Prediction at Design Phase
Authors: Pradeep Singh, Shrish Verma
Abstract:
Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. Earlier we predicted the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven datasets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.Keywords: Software Metrics, Fault prediction, Cross project, Within project.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25467272 Extreme Temperature Forecast in Mbonge, Cameroon through Return Level Analysis of the Generalized Extreme Value (GEV) Distribution
Authors: Nkongho Ayuketang Arreyndip, Ebobenow Joseph
Abstract:
In this paper, temperature extremes are forecast by employing the block maxima method of the Generalized extreme value(GEV) distribution to analyse temperature data from the Cameroon Development Corporation (C.D.C). By considering two sets of data (Raw data and simulated data) and two (stationary and non-stationary) models of the GEV distribution, return levels analysis is carried out and it was found that in the stationary model, the return values are constant over time with the raw data while in the simulated data, the return values show an increasing trend but with an upper bound. In the non-stationary model, the return levels of both the raw data and simulated data show an increasing trend but with an upper bound. This clearly shows that temperatures in the tropics even-though show a sign of increasing in the future, there is a maximum temperature at which there is no exceedence. The results of this paper are very vital in Agricultural and Environmental research.Keywords: Return level, Generalized extreme value (GEV), Meteorology, Forecasting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21067271 Mining Multicity Urban Data for Sustainable Population Relocation
Authors: Xu Du, Aparna S. Varde
Abstract:
In this research, we propose to conduct diagnostic and predictive analysis about the key factors and consequences of urban population relocation. To achieve this goal, urban simulation models extract the urban development trends as land use change patterns from a variety of data sources. The results are treated as part of urban big data with other information such as population change and economic conditions. Multiple data mining methods are deployed on this data to analyze nonlinear relationships between parameters. The result determines the driving force of population relocation with respect to urban sprawl and urban sustainability and their related parameters. This work sets the stage for developing a comprehensive urban simulation model for catering to specific questions by targeted users. It contributes towards achieving sustainability as a whole.Keywords: Data Mining, Environmental Modeling, Sustainability, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17827270 An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data
Authors: Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon-kyung Lee, Hyejung Yoon
Abstract:
Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Keywords: Ant colony system, biological data, clustering, DNA chip.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19747269 The Resource Description Framework (RDF) as a Modern Structure for Medical Data
Authors: Gabriela Lindemann, Danilo Schmidt, Thomas Schrader, Dietmar Keune
Abstract:
The amount and heterogeneity of data in biomedical research, notably in interdisciplinary fields, requires new methods for the collection, presentation and analysis of information. Important data from laboratory experiments as well as patient trials are available but come out of distributed resources. The Charité - University Hospital Berlin has established together with the German Research Foundation (DFG) a new information service centre for kidney diseases and transplantation (Open European Nephrology Science Centre - OpEN.SC). Beside a collaborative aspect to create new research groups every single partner or institution of this science information centre making his own data available is allowed to search the whole data pool of the various involved centres. A core task is the implementation of a non-restricting open data structure for the various different data sources. We decided to use a modern RDF model and in a first phase transformed original data coming from the web-based Electronic Patient Record database TBase©.
Keywords: Medical databases, Resource Description Framework (RDF), metadata repository.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20317268 Comparison of Growth and Biomass of Red Alga Cultured on Rope and Net
Authors: E. Kouhgardi, S. Dashti, H. Fekrandish
Abstract:
This research has been conducted to study the method of culture and comparing growth and biomass of Gracilaria corticata cultured on rope and net for 50 days through two treatments (first treatment: culture of alga on net and the second treatment: culture of alga on rope and each treatment was repeated by four cases). During culture period, the water of aquariums was replaced once every two days for 40-50%. Also, 0.3-0.5 grams of urea fertilizer was added to the culture environment for fertilization. Moreover, some of the environmental factors such as pH, salinity and temperature of the environment were measured on a daily basis. During the culture period, extent of longitudinal growth of the species of both treatments was equal. The said length was reached from 8-10 cm to 10.5-13 cm accordingly. The resulted weight in repetitions of the first treatment was higher than that of the second treatment in such a way as in the first treatment, its weight reached from 10 grams to 21.119 grams and in the second treatment, its weight reached from 10 grams to 17.663 grams. On a whole, it may be stated that that kind of alga being studied has a considerable growth with respect to its volume. The results have revealed that the percentage of daily growth and wet weight at the end of the first treatment was higher than that of the second treatment and it was registered as 0.934, 6.072 and 811.432 in the first treatment and 0.797, 4.990 and 758.071 in the second treatment respectively. This difference is significant (P<0.05). Growth and biomass of G. corticata through culture on net was more emphasizing on numerous branches due to wider bed. Moreover, higher level of the species in this method was exposed to sunlight and this increased biosynthesis and eventually increases of growth and biomass.
Keywords: Red alga, growth, biomass, culture, net, rope.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18737267 XML Data Management in Compressed Relational Database
Authors: Hongzhi Wang, Jianzhong Li, Hong Gao
Abstract:
XML is an important standard of data exchange and representation. As a mature database system, using relational database to support XML data may bring some advantages. But storing XML in relational database has obvious redundancy that wastes disk space, bandwidth and disk I/O when querying XML data. For the efficiency of storage and query XML, it is necessary to use compressed XML data in relational database. In this paper, a compressed relational database technology supporting XML data is presented. Original relational storage structure is adaptive to XPath query process. The compression method keeps this feature. Besides traditional relational database techniques, additional query process technologies on compressed relations and for special structure for XML are presented. In this paper, technologies for XQuery process in compressed relational database are presented..Keywords: XML, compression, query processing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18057266 A System for Analyzing and Eliciting Public Grievances Using Cache Enabled Big Data
Authors: P. Kaladevi, N. Giridharan
Abstract:
The system for analyzing and eliciting public grievances serves its main purpose to receive and process all sorts of complaints from the public and respond to users. Due to the more number of complaint data becomes big data which is difficult to store and process. The proposed system uses HDFS to store the big data and uses MapReduce to process the big data. The concept of cache was applied in the system to provide immediate response and timely action using big data analytics. Cache enabled big data increases the response time of the system. The unstructured data provided by the users are efficiently handled through map reduce algorithm. The processing of complaints takes place in the order of the hierarchy of the authority. The drawbacks of the traditional database system used in the existing system are set forth by our system by using Cache enabled Hadoop Distributed File System. MapReduce framework codes have the possible to leak the sensitive data through computation process. We propose a system that add noise to the output of the reduce phase to avoid signaling the presence of sensitive data. If the complaints are not processed in the ample time, then automatically it is forwarded to the higher authority. Hence it ensures assurance in processing. A copy of the filed complaint is sent as a digitally signed PDF document to the user mail id which serves as a proof. The system report serves to be an essential data while making important decisions based on legislation.Keywords: Big Data, Hadoop, HDFS, Caching, MapReduce, web personalization, e-governance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15927265 Transportation Mode Choice Analysis for Accessibility of the Mehrabad International Airport by Statistical Models
Authors: N. Mirzaei Varzeghani, M. Saffarzadeh, A. Naderan, A. Taheri
Abstract:
Countries are progressing, and the world's busiest airports see year-on-year increases in travel demand. Passenger acceptability of an airport depends on the airport's appeals, which may include one of these routes between the city and the airport, as well as the facilities to reach them. One of the critical roles of transportation planners is to predict future transportation demand so that an integrated, multi-purpose system can be provided and diverse modes of transportation (rail, air, and land) can be delivered to a destination like an airport. In this study, 356 questionnaires were filled out in person over six days. First, the attraction of business and non-business trips was studied using data and a linear regression model. Lower travel costs, more passengers aged 55 and older using this airport, and other factors are essential for business trips. Non-business travelers, on the other hand, have prioritized using personal vehicles to get to the airport and ensuring convenient access to the airport. Business travelers are also less price-sensitive than non-business travelers regarding airport travel. Furthermore, carrying additional luggage (for example, more than one suitcase per person) undoubtedly decreases the attractiveness of public transit. Afterward, based on the manner and purpose of the trip, the locations with the highest trip generation to the airport were identified. The most famous district in Tehran was District 2, with 23 visits, while the most popular mode of transportation was an online taxi, with 12 trips from that location. Then, significant variables in separation and behavior of travel methods to access the airport were investigated for all systems. In this scenario, the most crucial factor is the time it takes to get to the airport, followed by the method's user-friendliness as a component of passenger preference. It has also been demonstrated that enhancing public transportation trip times reduces private transportation's market share, including taxicabs. Based on the responses of personal and semi-public vehicles, the desire of passengers to approach the airport via public transportation systems was explored to enhance present techniques and develop new strategies for providing the most efficient modes of transportation. Using the binary model, it was clear that business travelers and people who had already driven to the airport were the least likely to change.
Keywords: Multimodal transportation, travel behavior, demand modeling, statistical models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5327264 Recycling Construction Waste Materials to Reduce the Environmental Pollutants
Authors: Mehrdad Abkenari, Alireza Rezaei, Naghmeh Pournayeb
Abstract:
There have recently been many studies and investments in developed and developing countries regarding the possibility of recycling construction waste, which are still ongoing. Since the term 'construction waste' covers a vast spectrum of materials in constructing buildings, roads and etc., many investigations are required to measure their technical performance in use as well as their time and place of use. Concrete is among the major and fundamental materials used in current construction industry. Along with the rise of population in developing countries, it is desperately required to meet the people's primary need in construction industry and on the other hand, dispose existing wastes for reducing the amount of environmental pollutants. Restrictions of natural resources and environmental pollution are the most important problems encountered by civil engineers. Reusing construction waste is an important and economic approach that not only assists the preservation of environment but also, provides us with primary raw materials. In line with consistent municipal development in disposal and reuse of construction waste, several approaches including, management of construction waste and materials, materials recycling and innovation and new inventions in materials have been predicted. This article has accordingly attempted to study the activities related to recycling of construction wastes and then, stated the economic, quantitative, qualitative and environmental results obtained.
Keywords: Civil engineering, environment, recycling, construction waste.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29307263 Vague Multiple Criteria Decision Making Analysis Method for Fighter Aircraft Selection
Authors: C. Ardil
Abstract:
Fighter aircraft selection is one of the most critical strategies for defense multiple criteria decision-making analysis to increase the decisive power of air defense and its superior power in the defense strategy. Vague set theory is an adequate approach for modeling vagueness, uncertainty, and imprecision in decision-making problems. This study integrates vague set theory and the technique for order of preference by similarity to ideal solution (TOPSIS) to support fighter aircraft selection. The proposed method is applied in the selection of fighter aircraft for the Air Force. In the proposed approach, the ratings of alternatives and the importance weights of criteria for fighter aircraft selection are represented by the vague set theory. Finally, an illustrative example for fighter aircraft selection is given to demonstrate the applicability and effectiveness of the proposed approach. The fighter aircraft candidates were selected under six criteria including costability, payloadability, maneuverability, speedability, stealthility, and survivability. Analysis results show that the best fighter aircraft is selected with the highest closeness coefficient value. The proposed method can also be applied to solve other multiple criteria decision analysis problems.
Keywords: fighter aircraft selection, vague set theory, fuzzy set theory, neutrosophic set theory, multiple criteria decision making analysis, MCDMA, TOPSIS
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5417262 Infestations of Olive Fruit Fly, Bactrocera oleae (Rossi) (Diptera: Tephritidae), in Different Olive Cultivars in Çanakkale, Turkey
Authors: Hanife Genç
Abstract:
The olive fruit fly, Bactrocera oleae (Rossi), is an economically important and endemic pest in olive (Oleae europae) orchards in Turkey. The aim of this study was to determine olive fruit fly infestation in different olive cultivars in the laboratory. Olive fly infested fruits were collected in Çanakkale province to establish wild fly population. After having reproductive olive fly colonies, 14 olive cultivars were tested in the controlled laboratory conditions, at 23±2 °C, 65% RH and 16:8 h (light: dark) photoperiod. The olive samples from 14 different olive cultivars were collected in October 2015, in Campus of Dardanos, Çanakkale Onsekiz Mart University. Observations were carried out detecting some biological parameters such as the number of oviposition stings, active infestation, total infestation, the number of pupae and the adult emergence. The results indicated that oviposition stings were not associated with pupal yield. A few pupae were found within olive fruits which were not able to exit. Screening of the varieties suggested that less susceptible cultivar to olive fruit fly attacks was Arbequin while Gemlik-2M 2/3 showed significant susceptibility. Ovipositional preference of olive fly females and the success of larval development in different olive varieties are crucial for establishing new olive orchards to prevent high olive fruit fly infestation.
Keywords: Infestation, olive fruit fly, olive cultivars, oviposition sting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21517261 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
Authors: S.Aranganayagi, K.Thangavel
Abstract:
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36897260 Mobile Phone as a Tool for Data Collection in Field Research
Authors: Sandro Mourão, Karla Okada
Abstract:
The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.Keywords: Data Gathering, Field Research, Mobile Phone, Survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20587259 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis
Authors: N. R. N. Idris, S. Baharom
Abstract:
A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.
Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17407258 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.
Keywords: Cluster analysis, education, mathematics, profiles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8927257 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data
Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan
Abstract:
The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23897256 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning
Authors: Chunming Xu
Abstract:
Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14477255 Analysis of Factors Used by Farmers to Manage Risk: A Case Study on Italian Farms
Authors: A. Pontrandolfi, G. Enjolras, F. Capitanio
Abstract:
The study analyses the strategies Italian farmers use to cope with the risks that face their production. We specifically explore the potential and the limitations of the economic tools for climatic risk management in agriculture of the Common Agricultural Policy 2014-2020, that foresees contributions for economic tools for risk management, in relation to farms’ needs, exposure and vulnerability of agricultural areas to climatic risk. We consider at the farm level approaches to hedge risks in terms of the use of technical tools (agricultural practices, pesticides, fertilizers, irrigation) and economic/financial instruments (insurances, etc.). We develop cross-sectional and longitudinal analyses as well as analyses of correlation that underline the main differences between the way farms adapt their structure and management towards risk. The results show a preference for technical tools, despite the presence of important public aids on economic tools such as insurances. Therefore, there is a strong need for a more effective and integrated risk management policy scheme. Synergies between economic tools and risk reduction actions of a more technical, structural and management nature (production diversification, irrigation infrastructures, technological and management innovations and formation-information-consultancy, etc.) are emphasized.Keywords: Agriculture and climate change, climatic risk management, insurance schemes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12177254 Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach
Authors: Lin Yang, Zhian Mi, Jiacheng Xiao, Rong Li
Abstract:
Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.
Keywords: Harmonic analysis, machine learning, music classification and tagging, MIDI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7587253 Determining Cluster Boundaries Using Particle Swarm Optimization
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17187252 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm
Authors: Ameur Abdelkader, Abed Bouarfa Hafida
Abstract:
Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.
Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10757251 Characterisation of Hydrocarbons in Atmospheric Aerosols from Different European Sites
Authors: C. A. Alves, A. Vicente, M. Evtyugina, C. A. Pio, A. Hoffer, G. Kiss, S. Decesari, R. Hillamo, E.Swietlicki
Abstract:
The concentrations of aliphatic and polycyclic aromatic hydrocarbons (PAH) were determined in atmospheric aerosol samples collected at a rural site in Hungary (K-puszta, summer 2008), a boreal forest (Hyytiälä, April 2007) and a polluted rural area in Italy (San Pietro Capofiume, Po Valley, April 2008). A clear distinction between “clean" and “polluted" periods was observed. Concentrations obtained for Hyytiälä are significantly lower than those for the other two sites. Source reconciliation was performed using diagnostic parameters, such as the carbon preference index and ratios between PAH. The presence of an unresolved complex mixture of hydrocarbons, especially for the Finnish and Italian samples, is indicative of petrogenic inputs. In K-puszta, the aliphatic hydrocarbons are dominated by leaf wax n-alkanes. The long range transport of anthropogenic pollution contributed to the Finnish aerosol. Industrial activities and vehicular emissions represent major sources in San Pietro Capofiume. PAH in K-puszta consist of both pyrogenic and petrogenic compounds.
Keywords: Particulate matter, n-alkanes, PAH, BaPE, ruralsites, source reconciliation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19437250 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain
Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami
Abstract:
To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of the manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. Blockchain mechanism such as Bitcoin using Public Key Infrastructure (PKI) requires plaintext to be shared between companies in order to verify the identity of the company that sent the data. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems, this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is top-secret. In this scenario, we show an implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.
Keywords: Business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8197249 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach
Authors: Sarisa Pinkham, Kanyarat Bussaban
Abstract:
The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.
Keywords: Daily rainfall, Image processing, Approximation, Pixel value data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17587248 Automatic Generation of Ontology from Data Source Directed by Meta Models
Authors: Widad Jakjoud, Mohamed Bahaj, Jamal Bakkas
Abstract:
Through this paper we present a method for automatic generation of ontological model from any data source using Model Driven Architecture (MDA), this generation is dedicated to the cooperation of the knowledge engineering and software engineering. Indeed, reverse engineering of a data source generates a software model (schema of data) that will undergo transformations to generate the ontological model. This method uses the meta-models to validate software and ontological models.
Keywords: Meta model, model, ontology, data source.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19987247 Steps towards the Development of National Health Data Standards in Developing Countries: An Exploratory Qualitative Study in Saudi Arabia
Authors: Abdullah I. Alkraiji, Thomas W. Jackson, Ian R. Murray
Abstract:
The proliferation of health data standards today is somewhat overlapping and conflicting, resulting in market confusion and leading to increasing proprietary interests. The government role and support in standardization for health data are thought to be crucial in order to establish credible standards for the next decade, to maximize interoperability across the health sector, and to decrease the risks associated with the implementation of non-standard systems. The normative literature missed out the exploration of the different steps required to be undertaken by the government towards the development of national health data standards. Based on the lessons learned from a qualitative study investigating the different issues to the adoption of health data standards in the major tertiary hospitals in Saudi Arabia and the opinions and feedback from different experts in the areas of data exchange and standards and medical informatics in Saudi Arabia and UK, a list of steps required towards the development of national health data standards was constructed. Main steps are the existence of: a national formal reference for health data standards, an agreed national strategic direction for medical data exchange, a national medical information management plan and a national accreditation body, and more important is the change management at the national and organizational level. The outcome of this study can be used by academics and practitioners to develop the planning of health data standards, and in particular those in developing countries.
Keywords: Interoperability, Case Study, Health Data Standards, Medical Data Exchange, Saudi Arabia.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2002