Search results for: Data mining techniques
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9245

Search results for: Data mining techniques

7685 Compressed Suffix Arrays to Self-Indexes Based on Partitioned Elias-Fano

Authors: Guo Wenyu, Qu Youli

Abstract:

A practical and simple self-indexing data structure, Partitioned Elias-Fano (PEF) - Compressed Suffix Arrays (CSA), is built in linear time for the CSA based on PEF indexes. Moreover, the PEF-CSA is compared with two classical compressed indexing methods, Ferragina and Manzini implementation (FMI) and Sad-CSA on different type and size files in Pizza & Chili. The PEF-CSA performs better on the existing data in terms of the compression ratio, count, and locates time except for the evenly distributed data such as proteins data. The observations of the experiments are that the distribution of the φ is more important than the alphabet size on the compression ratio. Unevenly distributed data φ makes better compression effect, and the larger the size of the hit counts, the longer the count and locate time.

Keywords: Compressed suffix array, self-indexing, partitioned Elias-Fano, PEF-CSA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1074
7684 A Decision Matrix for the Evaluation of Triplestores for Use in a Virtual Research Environment

Authors: Tristan O’Neill, Trina Myers, Jarrod Trevathan

Abstract:

The Tropical Data Hub (TDH) is a virtual research environment that provides researchers with an e-research infrastructure to congregate significant tropical data sets for data reuse, integration, searching, and correlation. However, researchers often require data and metadata synthesis across disciplines for cross-domain analyses and knowledge discovery. A triplestore offers a semantic layer to achieve a more intelligent method of search to support the synthesis requirements by automating latent linkages in the data and metadata. Presently, the benchmarks to aid the decision of which triplestore is best suited for use in an application environment like the TDH are limited to performance. This paper describes a new evaluation tool developed to analyze both features and performance. The tool comprises a weighted decision matrix to evaluate the interoperability, functionality, performance, and support availability of a range of integrated and native triplestores to rank them according to requirements of the TDH.

Keywords: Virtual research environment, Semantic Web, performance analysis, tropical data hub.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1694
7683 Real-time Performance Study of EPA Periodic Data Transmission

Authors: Liu Ning, Zhong Chongquan, Teng Hongfei

Abstract:

EPA (Ethernet for Plant Automation) resolves the nondeterministic problem of standard Ethernet and accomplishes real-time communication by means of micro-segment topology and deterministic scheduling mechanism. This paper studies the real-time performance of EPA periodic data transmission from theoretical and experimental perspective. By analyzing information transmission characteristics and EPA deterministic scheduling mechanism, 5 indicators including delivery time, time synchronization accuracy, data-sending time offset accuracy, utilization percentage of configured timeslice and non-RTE bandwidth that can be used to specify the real-time performance of EPA periodic data transmission are presented and investigated. On this basis, the test principles and test methods of the indicators are respectively studied and some formulas for real-time performance of EPA system are derived. Furthermore, an experiment platform is developed to test the indicators of EPA periodic data transmission in a micro-segment. According to the analysis and the experiment, the methods to improve the real-time performance of EPA periodic data transmission including optimizing network structure, studying self-adaptive adjustment method of timeslice and providing data-sending time offset accuracy for configuration are proposed.

Keywords: EPA system, Industrial Ethernet, Periodic data, Real-time performance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1462
7682 The Effects of Multipath on OFDM Systems for Broadband Power-Line Communications a Case of Medium Voltage Channel

Authors: Justinian Anatory, N. Theethayi, R. Thottappillil, C. Mwase, N.H. Mvungi

Abstract:

Power-line networks are widely used today for broadband data transmission. However, due to multipaths within the broadband power line communication (BPLC) systems owing to stochastic changes in the network load impedances, branches, etc., network or channel capacity performances are affected. This paper attempts to investigate the performance of typical medium voltage channels that uses Orthogonal Frequency Division Multiplexing (OFDM) techniques with Quadrature Amplitude Modulation (QAM) sub carriers. It has been observed that when the load impedances are different from line characteristic impedance channel performance decreases. Also as the number of branches in the link between the transmitter and receiver increases a loss of 4dB/branch is found in the signal to noise ratio (SNR). The information presented in the paper could be useful for an appropriate design of the BPLC systems.

Keywords: Communication channel model, Power-line communication, Transfer function, Multipath, Branched network, OFDM, QAM, performance evaluation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1842
7681 Web Usability : A Fuzzy Approach to the Navigation Structure Enhancement in a Website System, Case of Iranian Civil Aviation Organization Website

Authors: Hamed Qahri Saremi, Gholam Ali Montazer

Abstract:

With the proliferation of World Wide Web, development of web-based technologies and the growth in web content, the structure of a website becomes more complex and web navigation becomes a critical issue to both web designers and users. In this paper we define the content and web pages as two important and influential factors in website navigation and paraphrase the enhancement in the website navigation as making some useful changes in the link structure of the website based on the aforementioned factors. Then we suggest a new method for proposing the changes using fuzzy approach to optimize the website architecture. Applying the proposed method to a real case of Iranian Civil Aviation Organization (CAO) website, we discuss the results of the novel approach at the final section.

Keywords: Web content, Web navigation, Website system, Webusage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
7680 Data Quality Enhancement with String Length Distribution

Authors: Qi Xiu, Hiromu Hota, Yohsuke Ishii, Takuya Oda

Abstract:

Recently, collectable manufacturing data are rapidly increasing. On the other hand, mega recall is getting serious as a social problem. Under such circumstances, there are increasing needs for preventing mega recalls by defect analysis such as root cause analysis and abnormal detection utilizing manufacturing data. However, the time to classify strings in manufacturing data by traditional method is too long to meet requirement of quick defect analysis. Therefore, we present String Length Distribution Classification method (SLDC) to correctly classify strings in a short time. This method learns character features, especially string length distribution from Product ID, Machine ID in BOM and asset list. By applying the proposal to strings in actual manufacturing data, we verified that the classification time of strings can be reduced by 80%. As a result, it can be estimated that the requirement of quick defect analysis can be fulfilled.

Keywords: Data quality, feature selection, probability distribution, string classification, string length.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1323
7679 Efficient and Extensible Data Processing Framework in Ubiquitious Sensor Networks

Authors: Junghoon Lee, Gyung-Leen Park, Ho-Young Kwak, Cheol Min Kim

Abstract:

This paper presents the design and implements the prototype of an intelligent data processing framework in ubiquitous sensor networks. Much focus is put on how to handle the sensor data stream as well as the interoperability between the low-level sensor data and application clients. Our framework first addresses systematic middleware which mitigates the interaction between the application layer and low-level sensors, for the sake of analyzing a great volume of sensor data by filtering and integrating to create value-added context information. Then, an agent-based architecture is proposed for real-time data distribution to efficiently forward a specific event to the appropriate application registered in the directory service via the open interface. The prototype implementation demonstrates that our framework can host a sophisticated application on the ubiquitous sensor network and it can autonomously evolve to new middleware, taking advantages of promising technologies such as software agents, XML, cloud computing, and the like.

Keywords: sensor network, intelligent farm, middleware, event detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1350
7678 Reduce, Reuse and Recycle: Grand Challenges in Construction Recovery Process

Authors: Abioye A. Oyenuga, Rao Bhamidimarri

Abstract:

Hurling a successful Construction and Demolition Waste (C&DW) recycling operation around the globe is a challenge today, predominantly because secondary materials markets are yet to be integrated. Reducing, Reusing and recycling of (C&DW) have been employed over the years, and various techniques have been investigated. However, the economic and environmental viability of its application seems limited. This paper discusses the costs and benefits in using secondary materials and focus on investigating reuse and recycling process for five major types of construction materials: concrete, metal, wood, cardboard/paper and plasterboard. Data obtained from demolition specialists and contractors are considered and evaluated. The research paper found that construction material recovery process fully incorporate a 3R’s principle contributing to saving energy and natural resources. This scrutiny leads to the empathy of grand challenges in construction material recovery process. Recommendations to deepen material recovery process are also discussed.

Keywords: Construction & Demolition Waste (C&DW), 3R concept, Recycling, Reuse, Life-Cycle Assessment (LCA), Waste Management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5094
7677 Eliciting and Confirming Data, Information, Knowledge and Wisdom in a Specialist Health Care Setting: The WICKED Method

Authors: S. Impey, D. Berry, S. Furtado, M. Galvin, L. Grogan, O. Hardiman, L. Hederman, M. Heverin, V. Wade, L. Douris, D. O'Sullivan, G. Stephens

Abstract:

Healthcare is a knowledge-rich environment. This knowledge, while valuable, is not always accessible outside the borders of individual clinics. This research aims to address part of this problem (at a study site) by constructing a maximal data set (knowledge artefact) for motor neurone disease (MND). This data set is proposed as an initial knowledge base for a concurrent project to develop an MND patient data platform. It represents the domain knowledge at the study site for the duration of the research (12 months). A knowledge elicitation method was also developed from the lessons learned during this process - the WICKED method. WICKED is an anagram of the words: eliciting and confirming data, information, knowledge, wisdom. But it is also a reference to the concept of wicked problems, which are complex and challenging, as is eliciting expert knowledge. The method was evaluated at a second site, and benefits and limitations were noted. Benefits include that the method provided a systematic way to manage data, information, knowledge and wisdom (DIKW) from various sources, including healthcare specialists and existing data sets. Limitations surrounded the time required and how the data set produced only represents DIKW known during the research period. Future work is underway to address these limitations.

Keywords: Healthcare, knowledge acquisition, maximal data sets, action design science.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 511
7676 Temporally Coherent 3D Animation Reconstruction from RGB-D Video Data

Authors: Salam Khalifa, Naveed Ahmed

Abstract:

We present a new method to reconstruct a temporally coherent 3D animation from single or multi-view RGB-D video data using unbiased feature point sampling. Given RGB-D video data, in form of a 3D point cloud sequence, our method first extracts feature points using both color and depth information. In the subsequent steps, these feature points are used to match two 3D point clouds in consecutive frames independent of their resolution. Our new motion vectors based dynamic alignement method then fully reconstruct a spatio-temporally coherent 3D animation. We perform extensive quantitative validation using novel error functions to analyze the results. We show that despite the limiting factors of temporal and spatial noise associated to RGB-D data, it is possible to extract temporal coherence to faithfully reconstruct a temporally coherent 3D animation from RGB-D video data.

Keywords: 3D video, 3D animation, RGB-D video, Temporally Coherent 3D Animation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2067
7675 A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection

Authors: Niloofar Yousefi, Marie Alaghband, Ivan Garibay

Abstract:

With the increase of credit card usage, the volume of credit card misuse also has significantly increased, which may cause appreciable financial losses for both credit card holders and financial organizations issuing credit cards. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and their customers. Compounding on the complex nature of such adverse strategies, credit card fraudulent activities are rare events compared to the number of legitimate transactions. Hence, the challenge to develop fraud detection that are accurate and efficient is substantially intensified and, as a consequence, credit card fraud detection has lately become a very active area of research. In this work, we provide a survey of current techniques most relevant to the problem of credit card fraud detection. We carry out our survey in two main parts. In the first part, we focus on studies utilizing classical machine learning models, which mostly employ traditional transnational features to make fraud predictions. These models typically rely on some static physical characteristics, such as what the user knows (knowledge-based method), or what he/she has access to (object-based method). In the second part of our survey, we review more advanced techniques of user authentication, which use behavioral biometrics to identify an individual based on his/her unique behavior while he/she is interacting with his/her electronic devices. These approaches rely on how people behave (instead of what they do), which cannot be easily forged. By providing an overview of current approaches and the results reported in the literature, this survey aims to drive the future research agenda for the community in order to develop more accurate, reliable and scalable models of credit card fraud detection.

Keywords: credit card fraud detection, user authentication, behavioral biometrics, machine learning, literature survey

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 522
7674 Application of Multi-Dimensional Principal Component Analysis to Medical Data

Authors: Naoki Yamamoto, Jun Murakami, Chiharu Okuma, Yutaro Shigeto, Satoko Saito, Takashi Izumi, Nozomi Hayashida

Abstract:

Multi-dimensional principal component analysis (PCA) is the extension of the PCA, which is used widely as the dimensionality reduction technique in multivariate data analysis, to handle multi-dimensional data. To calculate the PCA the singular value decomposition (SVD) is commonly employed by the reason of its numerical stability. The multi-dimensional PCA can be calculated by using the higher-order SVD (HOSVD), which is proposed by Lathauwer et al., similarly with the case of ordinary PCA. In this paper, we apply the multi-dimensional PCA to the multi-dimensional medical data including the functional independence measure (FIM) score, and describe the results of experimental analysis.

Keywords: multi-dimensional principal component analysis, higher-order SVD (HOSVD), functional independence measure (FIM), medical data, tensor decomposition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2490
7673 Application of Computational Intelligence Techniques for Economic Load Dispatch

Authors: S.C. Swain, S. Panda, A.K. Mohanty, C. Ardil

Abstract:

This paper presents the applications of computational intelligence techniques to economic load dispatch problems. The fuel cost equation of a thermal plant is generally expressed as continuous quadratic equation. In real situations the fuel cost equations can be discontinuous. In view of the above, both continuous and discontinuous fuel cost equations are considered in the present paper. First, genetic algorithm optimization technique is applied to a 6- generator 26-bus test system having continuous fuel cost equations. Results are compared to conventional quadratic programming method to show the superiority of the proposed computational intelligence technique. Further, a 10-generator system each with three fuel options distributed in three areas is considered and particle swarm optimization algorithm is employed to minimize the cost of generation. To show the superiority of the proposed approach, the results are compared with other published methods.

Keywords: Economic Load Dispatch, Continuous Fuel Cost, Quadratic Programming, Real-Coded Genetic Algorithm, Discontinuous Fuel Cost, Particle Swarm Optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2265
7672 Towards Achieving Energy Efficiency in Kazakhstan

Authors: Aigerim Uyzbayeva, Valeriya Tyo, Nurlan Ibrayev

Abstract:

Kazakhstan is currently one of the dynamically developing states in its region. The stable growth in all sectors of the economy leads to a corresponding increase in energy consumption. Thus country consumes significant amount of energy due to the high level of industrialisation and the presence of energy-intensive manufacturing such as mining and metallurgy which in turn leads to low energy efficiency. With allowance for this the Government has set several priorities to adopt a transition of Republic of Kazakhstan to a “green economy”. This article provides an overview of Kazakhstan’s energy efficiency situation in for the period of 1991- 2014. First, the dynamics of production and consumption of conventional energy resources are given. Second, the potential of renewable energy sources is summarised followed by the description of GHG emissions trends in the country. Third, Kazakhstan’ national initiatives, policies and locally implemented projects in the field of energy efficiency are described.

Keywords: Energy efficiency in Kazakhstan, greenhouse gases, renewable energy, sustainable development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3533
7671 Procedure Model for Data-Driven Decision Support Regarding the Integration of Renewable Energies into Industrial Energy Management

Authors: M. Graus, K. Westhoff, X. Xu

Abstract:

The climate change causes a change in all aspects of society. While the expansion of renewable energies proceeds, industry could not be convinced based on general studies about the potential of demand side management to reinforce smart grid considerations in their operational business. In this article, a procedure model for a case-specific data-driven decision support for industrial energy management based on a holistic data analytics approach is presented. The model is executed on the example of the strategic decision problem, to integrate the aspect of renewable energies into industrial energy management. This question is induced due to considerations of changing the electricity contract model from a standard rate to volatile energy prices corresponding to the energy spot market which is increasingly more affected by renewable energies. The procedure model corresponds to a data analytics process consisting on a data model, analysis, simulation and optimization step. This procedure will help to quantify the potentials of sustainable production concepts based on the data from a factory. The model is validated with data from a printer in analogy to a simple production machine. The overall goal is to establish smart grid principles for industry via the transformation from knowledge-driven to data-driven decisions within manufacturing companies.

Keywords: Data analytics, green production, industrial energy management, optimization, renewable energies, simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1719
7670 Intelligent Video-Based Monitoring of Freeway Traffic

Authors: Saad M. Al-Garni, Adel A. Abdennour

Abstract:

Freeways are originally designed to provide high mobility to road users. However, the increase in population and vehicle numbers has led to increasing congestions around the world. Daily recurrent congestion substantially reduces the freeway capacity when it is most needed. Building new highways and expanding the existing ones is an expensive solution and impractical in many situations. Intelligent and vision-based techniques can, however, be efficient tools in monitoring highways and increasing the capacity of the existing infrastructures. The crucial step for highway monitoring is vehicle detection. In this paper, we propose one of such techniques. The approach is based on artificial neural networks (ANN) for vehicles detection and counting. The detection process uses the freeway video images and starts by automatically extracting the image background from the successive video frames. Once the background is identified, subsequent frames are used to detect moving objects through image subtraction. The result is segmented using Sobel operator for edge detection. The ANN is, then, used in the detection and counting phase. Applying this technique to the busiest freeway in Riyadh (King Fahd Road) achieved higher than 98% detection accuracy despite the light intensity changes, the occlusion situations, and shadows.

Keywords: Background Extraction, Neural Networks, VehicleDetection, Freeway Traffic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1906
7669 Value Engineering and Its Effect in Reduction of Industrial Organization Energy Expenses

Authors: Habibollah Najafi, Amir Abbas Yazdani, Hosseinali Nahavandi

Abstract:

The review performed on the condition of energy consumption & rate in Iran, shows that unfortunately the subject of optimization and conservation of energy in active industries of country lacks a practical & effective method and in most factories, the energy consumption and rate is more than in similar industries of industrial countries. The increasing demand of electrical energy and the overheads which it imposes on the organization, forces companies to search for suitable approaches to optimize energy consumption and demand management. Application of value engineering techniques is among these approaches. Value engineering is considered a powerful tool for improving profitability. These tools are used for reduction of expenses, increasing profits, quality improvement, increasing market share, performing works in shorter durations, more efficient utilization of sources & etc. In this article, we shall review the subject of value engineering and its capabilities for creating effective transformations in industrial organizations, in order to reduce energy costs & the results have been investigated and described during a case study in Mazandaran wood and paper industries, the biggest consumer of energy in north of Iran, for the purpose of presenting the effects of performed tasks in optimization of energy consumption by utilizing value engineering techniques in one case study.

Keywords: Value Engineering (VE), Expense, Energy, Industrial

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2259
7668 Estimating European Tourism Demand for Malaysia

Authors: Zainudin Arsad, Norul Baine Mat Johor

Abstract:

Tourism industry is an important sector in Malaysia economy and this motivates the examination of long-run relationships between tourist arrivals from three selected European countries in Malaysia and four possible determinants; relative prices, exchange rates, transportation cost and relative prices of substitute destination. The study utilizes data from January 1999 to September 2008 and employs standard econometric techniques that include unit root test and cointegration test. The estimated demand model indicates that depreciation of local currency and increases in prices at substitute destination have positive impact on tourist arrivals while increase in transportation cost has negative impact on tourist arrivals. In addition, the model suggests that higher rate of increase in local prices relative to prices at tourist country of origin may not deter tourists from coming to Malaysia

Keywords: origin country, unit root test, cointegration test

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2378
7667 The Use of Substances and Sports Performance among Youth: Implications for Lagos State Sports

Authors: Osifeko Olalekan Remigious, Adesanya Adebisi Joseph, Omolade Akinmade Olatunde

Abstract:

The focus of this study was to determine the factors associated with the use of substances for sport performance of youth in Lagos state sport. Questionnaire was the instrument used for the study. Descriptive research method was used. The estimated population for the study was 2000 sport men and women. The sample size was 200 respondents for purposive sampling techniques were used. The instrument was validated in it content and constructs value. The instrument was administered with the assistance of the coaches. Same 200 copies administered were returned. The data obtained was analysed using simple percentage and chi-square (x2) for stated hypothesis at 0.05 level of significance. The finding reveal that sport injuries exercise induced and anaphylaxis and asthma and feeling of loss of efficacy associated with alcohol used on sport performance among the users of substances. Alcohol users are recommended to partake in sport like swimming, basketball and volleyball because they have space of time for resting while at play. Government should be fully in charge of the health of sport men and women.

Keywords: Implications, Lagos state, substances, sports performance, youths.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1970
7666 Data Oriented Model of Image: as a Framework for Image Processing

Authors: A. Habibizad Navin, A. Sadighi, M. Naghian Fesharaki, M. Mirnia, M. Teshnelab, R. Keshmiri

Abstract:

This paper presents a new data oriented model of image. Then a representation of it, ADBT, is introduced. The ability of ADBT is clustering, segmentation, measuring similarity of images etc, with desired precision and corresponding speed.

Keywords: Data oriented modelling, image, clustering, segmentation, classification, ADBT and image processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1789
7665 MIBiClus: Mutual Information based Biclustering Algorithm

Authors: Neelima Gupta, Seema Aggarwal

Abstract:

Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.

Keywords: Biclustering, mutual information.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1622
7664 Isospectral Hulthén Potential

Authors: Anil Kumar

Abstract:

Supersymmetric Quantum Mechanics is an interesting framework to analyze nonrelativistic quantal problems. Using these techniques, we construct a family of strictly isospectral Hulth´en potentials. Isospectral wave functions are generated and plotted for different values of the deformation parameter.

Keywords: Hulth´en potential, Isospectral Hamiltonian.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3512
7663 GeNS: a Biological Data Integration Platform

Authors: Joel Arrais, João E. Pereira, João Fernandes, José Luís Oliveira

Abstract:

The scientific achievements coming from molecular biology depend greatly on the capability of computational applications to analyze the laboratorial results. A comprehensive analysis of an experiment requires typically the simultaneous study of the obtained dataset with data that is available in several distinct public databases. Nevertheless, developing a centralized access to these distributed databases rises up a set of challenges such as: what is the best integration strategy, how to solve nomenclature clashes, how to solve database overlapping data and how to deal with huge datasets. In this paper we present GeNS, a system that uses a simple and yet innovative approach to address several biological data integration issues. Compared with existing systems, the main advantages of GeNS are related to its maintenance simplicity and to its coverage and scalability, in terms of number of supported databases and data types. To support our claims we present the current use of GeNS in two concrete applications. GeNS currently contains more than 140 million of biological relations and it can be publicly downloaded or remotely access through SOAP web services.

Keywords: Data integration, biological databases

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629
7662 User Intention Generation with Large Language Models Using Chain-of-Thought Prompting

Authors: Gangmin Li, Fan Yang

Abstract:

Personalized recommendation is crucial for any recommendation system. One of the techniques for personalized recommendation is to identify the intention. Traditional user intention identification uses the user’s selection when facing multiple items. This modeling relies primarily on historical behavior data resulting in challenges such as the cold start, unintended choice, and failure to capture intention when items are new. Motivated by recent advancements in Large Language Models (LLMs) like ChatGPT, we present an approach for user intention identification by embracing LLMs with Chain-of-Thought (CoT) prompting. We use the initial user profile as input to LLMs and design a collection of prompts to align the LLM's response through various recommendation tasks encompassing rating prediction, search and browse history, user clarification, etc. Our tests on real-world datasets demonstrate the improvements in recommendation by explicit user intention identification and, with that intention, merged into a user model.

Keywords: Personalized recommendation, generative user modeling, user intention identification, large language models, chain-of-thought prompting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 55
7661 Can Physical Activity and Dietary Fat Intake Influence Body Mass Index in a Cross-sectional Correlational Design?

Authors: D.O. Omondi, L.O.A. Othuon, G.M. Mbagaya

Abstract:

The purpose of this study was to determine the influence of physical activity and dietary fat intake on Body Mass Index (BMI) of lecturers within a higher learning institutionalized setting. The study adopted a Cross-sectional Correlational Design and included 120 lecturers selected proportionately by simple random sampling techniques from a population of 600 lecturers. Data was collected using questionnaires, which had sections including physical activity checklist adopted from the international physical activity questionnaire (IPAQ), 24-hour food recall, anthropometric measurements mainly weight and height. Analysis involved the use of bivariate correlations and linear regression. A significant inverse association was registered between BMI and duration (in minutes) spent doing moderate intense physical activity per day (r=-0.322, p<0.01). Physical activity also predicted BMI (r2=0.096, F=13.616, β=-3.22, t=-3.69, n=120, P<0.01). However, the association between Body Mass Index and dietary fat was not significant (r=0.038, p>0.05). Physical activity emerged as a more powerful determinant of BMI compared to dietary fat intake.

Keywords: Physical activity, dietary fat intake, Body MassIndex, Kenya.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1702
7660 Comparison of Different Methods to Produce Fuzzy Tolerance Relations for Rainfall Data Classification in the Region of Central Greece

Authors: N. Samarinas, C. Evangelides, C. Vrekos

Abstract:

The aim of this paper is the comparison of three different methods, in order to produce fuzzy tolerance relations for rainfall data classification. More specifically, the three methods are correlation coefficient, cosine amplitude and max-min method. The data were obtained from seven rainfall stations in the region of central Greece and refers to 20-year time series of monthly rainfall height average. Three methods were used to express these data as a fuzzy relation. This specific fuzzy tolerance relation is reformed into an equivalence relation with max-min composition for all three methods. From the equivalence relation, the rainfall stations were categorized and classified according to the degree of confidence. The classification shows the similarities among the rainfall stations. Stations with high similarity can be utilized in water resource management scenarios interchangeably or to augment data from one to another. Due to the complexity of calculations, it is important to find out which of the methods is computationally simpler and needs fewer compositions in order to give reliable results.

Keywords: Classification, fuzzy logic, tolerance relations, rainfall data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1018
7659 The Implicit Methods for the Study of Tolerance

Authors: M. Bambulyakа

Abstract:

Tolerance is a tool for achieving a social cohesion, particularly, among individuals and groups with different values. The aim is to study the characteristics of the ethnic tolerance, the inhabitants of Latvia. The ethnic tolerance is taught as a set of conscious and unconscious orientations of the individual in social interaction and inter-ethnic communication. It uses the tools of empirical studies of the ethnic tolerance which allows to identify the explicitly and implicitly levels of the emotional component of Latvia's residents. Explicit measurements were made using the techniques of self-report which revealed the index of the ethnic tolerance and the ethnic identity of the participants. The implicit component was studied using methods based on the effect of the emotional priming. During the processing of the results, there were calculated indicators of the positive and negative implicit attitudes towards members of their own and other ethnicity as well as the explicit parameters of the ethnic tolerance and the ethnic identity of Latvia-s residents. The implicit measurements of the ratio of neighboring ethnic groups against each other showed a mutual negative attitude whereas the explicit measurements indicate a neutral attitude. The data obtained contribute to a further study of the ethnic tolerance of Latvia's residents.

Keywords: ethnic tolerance, implicit measure, priming, ethnic attitudes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
7658 An Advanced Time-Frequency Domain Method for PD Extraction with Non-Intrusive Measurement

Authors: Guomin Luo, Daming Zhang, Yong Kwee Koh, Kim Teck Ng, Helmi Kurniawan, Weng Hoe Leong

Abstract:

Partial discharge (PD) detection is an important method to evaluate the insulation condition of metal-clad apparatus. Non-intrusive sensors which are easy to install and have no interruptions on operation are preferred in onsite PD detection. However, it often lacks of accuracy due to the interferences in PD signals. In this paper a novel PD extraction method that uses frequency analysis and entropy based time-frequency (TF) analysis is introduced. The repetitive pulses from convertor are first removed via frequency analysis. Then, the relative entropy and relative peak-frequency of each pulse (i.e. time-indexed vector TF spectrum) are calculated and all pulses with similar parameters are grouped. According to the characteristics of non-intrusive sensor and the frequency distribution of PDs, the pulses of PD and interferences are separated. Finally the PD signal and interferences are recovered via inverse TF transform. The de-noised result of noisy PD data demonstrates that the combination of frequency and time-frequency techniques can discriminate PDs from interferences with various frequency distributions.

Keywords: Entropy, Fourier analysis, non-intrusive measurement, time-frequency analysis, partial discharge

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1578
7657 Semantic Indexing Approach of a Corpora Based On Ontology

Authors: Mohammed Erritali

Abstract:

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. This paper presents a new semantic indexing approach of a documentary corpus. The indexing process starts first by a term weighting phase to determine the importance of these terms in the documents. Then the use of a thesaurus like Wordnet allows moving to the conceptual level. Each candidate concept is evaluated by determining its level of representation of the document, that is to say, the importance of the concept in relation to other concepts of the document. Finally, the semantic index is constructed by attaching to each concept of the ontology, the documents of the corpus in which these concepts are found.

Keywords: Semantic, indexing, corpora, WordNet, ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1364
7656 Phytopathology Prediction in Dry Soil Using Artificial Neural Networks Modeling

Authors: F. Allag, S. Bouharati, M. Belmahdi, R. Zegadi

Abstract:

The rapid expansion of deserts in recent decades as a result of human actions combined with climatic changes has highlighted the necessity to understand biological processes in arid environments. Whereas physical processes and the biology of flora and fauna have been relatively well studied in marginally used arid areas, knowledge of desert soil micro-organisms remains fragmentary. The objective of this study is to conduct a diversity analysis of bacterial communities in unvegetated arid soils. Several biological phenomena in hot deserts related to microbial populations and the potential use of micro-organisms for restoring hot desert environments. Dry land ecosystems have a highly heterogeneous distribution of resources, with greater nutrient concentrations and microbial densities occurring in vegetated than in bare soils. In this work, we found it useful to use techniques of artificial intelligence in their treatment especially artificial neural networks (ANN). The use of the ANN model, demonstrate his capability for addressing the complex problems of uncertainty data.

Keywords: Desert soil, Climatic changes, Bacteria, Vegetation, Artificial neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1884