Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1639

Search results for: DNA database

1429 A Cloud-Based Spectrum Database Approach for Licensed Shared Spectrum Access

Authors: Hazem Abd El Megeed, Mohamed El-Refaay, Norhan Magdi Osman

Abstract:

Spectrum scarcity is a challenging obstacle in wireless communications systems. It hinders the introduction of innovative wireless services and technologies that require larger bandwidth comparing to legacy technologies. In addition, the current worldwide allocation of radio spectrum bands is already congested and can not afford additional squeezing or optimization to accommodate new wireless technologies. This challenge is a result of accumulative contributions from different factors that will be discussed later in this paper. One of these factors is the radio spectrum allocation policy governed by national regulatory authorities nowadays. The framework for this policy allocates specified portion of radio spectrum to a particular wireless service provider on exclusive utilization basis. This allocation is executed according to technical specification determined by the standard bodies of each Radio Access Technology (RAT). Dynamic access of spectrum is a framework for flexible utilization of radio spectrum resources. In this framework there is no exclusive allocation of radio spectrum and even the public safety agencies can share their spectrum bands according to a governing policy and service level agreements. In this paper, we explore different methods for accessing the spectrum dynamically and its associated implementation challenges.

Keywords: licensed shared access, cognitive radio, spectrum sharing, spectrum congestion, dynamic spectrum access, spectrum database, spectrum trading, reconfigurable radio systems, opportunistic spectrum allocation (OSA)

Procedia PDF Downloads 433

1428 Building Information Modeling-Based Information Exchange to Support Facilities Management Systems

Authors: Sandra T. Matarneh, Mark Danso-Amoako, Salam Al-Bizri, Mark Gaterell

Abstract:

Today’s facilities are ever more sophisticated and the need for available and reliable information for operation and maintenance activities is vital. The key challenge for facilities managers is to have real-time accurate and complete information to perform their day-to-day activities and to provide their senior management with accurate information for decision-making process. Currently, there are various technology platforms, data repositories, or database systems such as Computer-Aided Facility Management (CAFM) that are used for these purposes in different facilities. In most current practices, the data is extracted from paper construction documents and is re-entered manually in one of these computerized information systems. Construction Operations Building information exchange (COBie), is a non-proprietary data format that contains the asset non-geometric data which was captured and collected during the design and construction phases for owners and facility managers use. Recently software vendors developed add-in applications to generate COBie spreadsheet automatically. However, most of these add-in applications are capable of generating a limited amount of COBie data, in which considerable time is still required to enter the remaining data manually to complete the COBie spreadsheet. Some of the data which cannot be generated by these COBie add-ins is essential for facilities manager’s day-to-day activities such as job sheet which includes preventive maintenance schedules. To facilitate a seamless data transfer between BIM models and facilities management systems, we developed a framework that enables automated data generation using the data extracted directly from BIM models to external web database, and then enabling different stakeholders to access to the external web database to enter the required asset data directly to generate a rich COBie spreadsheet that contains most of the required asset data for efficient facilities management operations. The proposed framework is a part of ongoing research and will be demonstrated and validated on a typical university building. Moreover, the proposed framework supplements the existing body of knowledge in facilities management domain by providing a novel framework that facilitates seamless data transfer between BIM models and facilities management systems.

Keywords: building information modeling, BIM, facilities management systems, interoperability, information management

Procedia PDF Downloads 117

1427 Development of Internet of Things (IoT) with Mobile Voice Picking and Cargo Tracing Systems in Warehouse Operations of Third-Party Logistics

Authors: Eugene Y. C. Wong

Abstract:

The increased market competition, customer expectation, and warehouse operating cost in third-party logistics have motivated the continuous exploration in improving operation efficiency in warehouse logistics. Cargo tracing in ordering picking process consumes excessive time for warehouse operators when handling enormous quantities of goods flowing through the warehouse each day. Internet of Things (IoT) with mobile cargo tracing apps and database management systems are developed this research to facilitate and reduce the cargo tracing time in order picking process of a third-party logistics firm. An operation review is carried out in the firm with opportunities for improvement being identified, including inaccurate inventory record in warehouse management system, excessive tracing time on stored products, and product misdelivery. The facility layout has been improved by modifying the designated locations of various types of products. The relationship among the pick and pack processing time, cargo tracing time, delivery accuracy, inventory turnover, and inventory count operation time in the warehouse are evaluated. The correlation of the factors affecting the overall cycle time is analysed. A mobile app is developed with the use of MIT App Inventor and the Access management database to facilitate cargo tracking anytime anywhere. The information flow framework from warehouse database system to cloud computing document-sharing, and further to the mobile app device is developed. The improved performance on cargo tracing in the order processing cycle time of warehouse operators have been collected and evaluated. The developed mobile voice picking and tracking systems brings significant benefit to the third-party logistics firm, including eliminating unnecessary cargo tracing time in order picking process and reducing warehouse operators overtime cost. The mobile tracking device is further planned to enhance the picking time and cycle count of warehouse operators with voice picking system in the developed mobile apps as future development.

Keywords: warehouse, order picking process, cargo tracing, mobile app, third-party logistics

Procedia PDF Downloads 375

1426 A Framework for Security Risk Level Measures Using CVSS for Vulnerability Categories

Authors: Umesh Kumar Singh, Chanchala Joshi

Abstract:

With increasing dependency on IT infrastructure, the main objective of a system administrator is to maintain a stable and secure network, with ensuring that the network is robust enough against malicious network users like attackers and intruders. Security risk management provides a way to manage the growing threats to infrastructures or system. This paper proposes a framework for risk level estimation which uses vulnerability database National Institute of Standards and Technology (NIST) National Vulnerability Database (NVD) and the Common Vulnerability Scoring System (CVSS). The proposed framework measures the frequency of vulnerability exploitation; converges this measured frequency with standard CVSS score and estimates the security risk level which helps in automated and reasonable security management. In this paper equation for the Temporal score calculation with respect to availability of remediation plan is derived and further, frequency of exploitation is calculated with determined temporal score. The frequency of exploitation along with CVSS score is used to calculate the security risk level of the system. The proposed framework uses the CVSS vectors for risk level estimation and measures the security level of specific network environment, which assists system administrator for assessment of security risks and making decision related to mitigation of security risks.

Keywords: CVSS score, risk level, security measurement, vulnerability category

Procedia PDF Downloads 322

1425 Image Processing techniques for Surveillance in Outdoor Environment

Authors: Jayanth C., Anirudh Sai Yetikuri, Kavitha S. N.

Abstract:

This paper explores the development and application of computer vision and machine learning techniques for real-time pose detection, facial recognition, and number plate extraction. Utilizing MediaPipe for pose estimation, the research presents methods for detecting hand raises and ducking postures through real-time video analysis. Complementarily, facial recognition is employed to compare and verify individual identities using the face recognition library. Additionally, the paper demonstrates a robust approach for extracting and storing vehicle number plates from images, integrating Optical Character Recognition (OCR) with a database management system. The study highlights the effectiveness and versatility of these technologies in practical scenarios, including security and surveillance applications. The findings underscore the potential of combining computer vision techniques to address diverse challenges and enhance automated systems for both individual and vehicular identification. This research contributes to the fields of computer vision and machine learning by providing scalable solutions and demonstrating their applicability in real-world contexts.

Keywords: computer vision, pose detection, facial recognition, number plate extraction, machine learning, real-time analysis, OCR, database management

Procedia PDF Downloads 27

1424 Characteristics of Acute Poisoning in Emergency Departments: Multicenter Study in Korea

Authors: Hyuk-Hoon Kim, Young Gi Min

Abstract:

Background: Acute poisoning is the common cause of morbidity and mortality. Characteristics of acute poisoning differ between countries. While other countries operate the database system for poisoning, Korea has not collected the database for acute poisoning. Distribution of incidence of acute poisoning depending on the types of materials have also not studied in Korea. Our aims are to evaluate the etiologic and demographic characteristics of acute poisoning cases and to obtain up-to-date information on acute poisonings. Method: We retrospectively recorded cases of acute poisoning from eight emergency departments of second level or university hospitals from different cities in Gyeonggi province in Korea from April 2006 and March 2015. The distributions of incidence of acute poisoning depending on the types of materials are mapped by geographic information system. Result: A total of 3,449 poisoned cases were analyzed. Mean estimated age of patients was 39.56 ± 22.40 years. Mean male to female ratio of patients was 1:1.4. Mean proportion of intentional poisoning was 57.9%. Common materials are benzodiazepine (16.6%), carbon monoxide (10.5%), pesticide (8.1%) and zolpidem (7.1%) Common route of exposure is ingestion (79.5%) and followed by inhalation (16.5%). Common treatment methods are gastric lavage (20%) and activated charcoal (30%). Most cases had uneventful recovery; 61.4% were treated as outpatients and 0.1% of the poisoning resulted in death in ER. Conclusion: Even though the cases enrolled in our study is not the overall cases of acute poisoning in Korea, our study could be the basis of countermeasures for analysis and prevention of acute poisoning in Korea.

Keywords: acute poisoning, emergency department, epidemiology, Korea

Procedia PDF Downloads 405

1423 Medical Authorizations for Cannabis-Based Products in Canada: Sante Cannabis Data on Patient’s Safety and Treatment Profiles

Authors: Rihab Gamaoun, Cynthia El Hage, Laura Ruiz, Erin Prosk, Antonio Vigano

Abstract:

Introduction: Santé Cannabis (SC), a Canadian medical cannabis-specialized group of clinics based in Montreal and in the province of Québec, has served more than 5000 patients seeking cannabis-based treatment prescription for medical indications over the past five years. Within a research frame, data on the use of medical cannabis products from all the above patients were prospectively collected, leading to a large real-world database on the use of medical cannabis. The aim of this study was to gather information on the profiles of both patients and prescribed medical cannabis products at SC clinics and to assess the safety of medical cannabis among Canadian patients. Methods: Using a retrospective analysis of the database, records of 2585 patients who were prescribed medical cannabis products for therapeutic purposes between 01-November 2017 and 04-September 2019 were included. Patients’ demographics, primary diagnosis, route of administration, and chemovars recorded at the initial visits were investigated. Results: At baseline: 9% of SC patients were female, with a mean age of 57 (SD= 15.8, range= [18-96]); Cannabis products were prescribed mainly for patients with a diagnosis of chronic pain (65.9% of patients), cancer (9.4%), neurological disorders (6.5%), mood disorders (5.8 %) and inflammatory diseases (4.1%). Route of administration and chemovars of prescribed cannabis products were the following: 96% of patients received cannabis oil (51% CBD rich, 42.5% CBD:THC); 32.1% dried cannabis (21.3% CBD:THC, 7.4% THC rich, 3.4 CBD rich), and 2.1% oral spray cannabis (1.1% CBD:THC, 0.8% CBD rich, 0.2% THC rich). Most patients were prescribed simultaneously, a combination of products with different administration routes and chemovars. Safety analysis is undergoing. Conclusion: Our results provided initial information on the profile of medical cannabis products prescribed in a Canadian population and the experienced adverse events over the past three years. The Santé Cannabis database represents a unique opportunity for comparing clinical practices in prescribing and titrating cannabis-based medications across different centers. Ultimately real-world data, including information about safety and effectiveness, will help to create standardized and validated guidelines for choosing dose, route of administration, and chemovars types for the cannabis-based medication in different diseases and indications.

Keywords: medical cannabis, real-world data, safety, pharmacovigilance

Procedia PDF Downloads 108

1422 Developing an ANN Model to Predict Anthropometric Dimensions Based on Real Anthropometric Database

Authors: Waleed A. Basuliman, Khalid S. AlSaleh, Mohamed Z. Ramadan

Abstract:

Applying the anthropometric dimensions is considered one of the important factors when designing any human-machine system. In this study, the estimation of anthropometric dimensions has been improved by developing artificial neural network that aims to predict the anthropometric measurements of the male in Saudi Arabia. A total of 1427 Saudi males from age 6 to 60 participated in measuring twenty anthropometric dimensions. These anthropometric measurements are important for designing the majority of work and life applications in Saudi Arabia. The data were collected during 8 months from different locations in Riyadh City. Five of these dimensions were used as predictors variables (inputs) of the model, and the remaining fifteen dimensions were set to be the measured variables (outcomes). The hidden layers have been varied during the structuring stage, and the best performance was achieved with the network structure 6-25-15. The results showed that the developed Neural Network model was significantly able to predict the body dimensions for the population of Saudi Arabia. The network mean absolute percentage error (MAPE) and the root mean squared error (RMSE) were found 0.0348 and 3.225 respectively. The accuracy of the developed neural network was evaluated by compare the predicted outcomes with a multiple regression model. The ANN model performed better and resulted excellent correlation coefficients between the predicted and actual dimensions.

Keywords: artificial neural network, anthropometric measurements, backpropagation, real anthropometric database

Procedia PDF Downloads 578

1421 Improved Image Retrieval for Efficient Localization in Urban Areas Using Location Uncertainty Data

Authors: Mahdi Salarian, Xi Xu, Rashid Ansari

Abstract:

Accurate localization of mobile devices based on camera-acquired visual media information usually requires a search over a very large GPS-referenced image database. This paper proposes an efficient method for limiting the search space for image retrieval engine by extracting and leveraging additional media information about Estimated Positional Error (EP E) to address complexity and accuracy issues in the search, especially to be used for compensating GPS location inaccuracy in dense urban areas. The improved performance is achieved by up to a hundred-fold reduction in the search area used in available reference methods while providing improved accuracy. To test our procedure we created a database by acquiring Google Street View (GSV) images for down town of Chicago. Other available databases are not suitable for our approach due to lack of EP E for the query images. We tested the procedure using more than 200 query images along with EP E acquired mostly in the densest areas of Chicago with different phones and in different conditions such as low illumination and from under rail tracks. The effectiveness of our approach and the effect of size and sector angle of the search area are discussed and experimental results demonstrate how our proposed method can improve performance just by utilizing a data that is available for mobile systems such as smart phones.

Keywords: localization, retrieval, GPS uncertainty, bag of word

Procedia PDF Downloads 283

1420 Automatic Target Recognition in SAR Images Based on Sparse Representation Technique

Authors: Ahmet Karagoz, Irfan Karagoz

Abstract:

Synthetic Aperture Radar (SAR) is a radar mechanism that can be integrated into manned and unmanned aerial vehicles to create high-resolution images in all weather conditions, regardless of day and night. In this study, SAR images of military vehicles with different azimuth and descent angles are pre-processed at the first stage. The main purpose here is to reduce the high speckle noise found in SAR images. For this, the Wiener adaptive filter, the mean filter, and the median filters are used to reduce the amount of speckle noise in the images without causing loss of data. During the image segmentation phase, pixel values are ordered so that the target vehicle region is separated from other regions containing unnecessary information. The target image is parsed with the brightest 20% pixel value of 255 and the other pixel values of 0. In addition, by using appropriate parameters of statistical region merging algorithm, segmentation comparison is performed. In the step of feature extraction, the feature vectors belonging to the vehicles are obtained by using Gabor filters with different orientation, frequency and angle values. A number of Gabor filters are created by changing the orientation, frequency and angle parameters of the Gabor filters to extract important features of the images that form the distinctive parts. Finally, images are classified by sparse representation method. In the study, l₁ norm analysis of sparse representation is used. A joint database of the feature vectors generated by the target images of military vehicle types is obtained side by side and this database is transformed into the matrix form. In order to classify the vehicles in a similar way, the test images of each vehicle is converted to the vector form and l₁ norm analysis of the sparse representation method is applied through the existing database matrix form. As a result, correct recognition has been performed by matching the target images of military vehicles with the test images by means of the sparse representation method. 97% classification success of SAR images of different military vehicle types is obtained.

Keywords: automatic target recognition, sparse representation, image classification, SAR images

Procedia PDF Downloads 367

1419 Latitudinal Impact on Spatial and Temporal Variability of 7Be Activity Concentrations in Surface Air along Europe

Authors: M. A. Hernández-Ceballos, M. Marín-Ferrer, G. Cinelli, L. De Felice, T. Tollefsen, E. Nweke, P. V. Tognoli, S. Vanzo, M. De Cort

Abstract:

This study analyses the latitudinal impact of the spatial and temporal distribution on the cosmogenic isotope 7Be in surface air along Europe. The long-term database of the 6 sampling sites (Ivalo, Helsinki, Berlin, Freiburg, Sevilla and La Laguna), that regularly provide data to the Radioactivity Environmental Monitoring (REM) network managed by the Joint Research Centre (JRC) in Ispra, were used. The selection of the stations was performed attending to different factors, such as 1) heterogeneity in terms of latitude and altitude, and 2) long database coverage. The combination of these two parameters ensures a high degree of representativeness of the results. In the later, the temporal coverage varies between stations, being used in the present study sampling stations with a database more or less continuously from 1984 to 2011. The mean values of 7Be activity concentration presented a spatial distribution value ranging from 2.0 ± 0.9 mBq/m3 (Ivalo, north) to 4.8 ± 1.5 mBq/m3 (La Laguna, south). An increasing gradient with latitude was observed from the north to the south, 0.06 mBq/m3. However, there was no correlation with altitude, since all stations are sited within the atmospheric boundary layer. The analyses of the data indicated a dynamic range of 7Be activity for solar cycle and phase (maximum or minimum), having been observed different impact on stations according to their location. The results indicated a significant seasonal behavior, with the maximum concentrations occurring in the summer and minimum in the winter, although with differences in the values reached and in the month registered. Due to the large heterogeneity in the temporal pattern with which the individual radionuclide analyses were performed in each station, the 7Be monthly index was calculated to normalize the measurements and perform the direct comparison of monthly evolution among stations. Different intensity and evolution of the mean monthly index were observed. The knowledge of the spatial and temporal distribution of this natural radionuclide in the atmosphere is a key parameter for modeling studies of atmospheric processes, which are important phenomena to be taken into account in the case of a nuclear accident.

Keywords: Berilium-7, latitudinal impact in Europe, seasonal and monthly variability, solar cycle

Procedia PDF Downloads 339

1418 Expert Supporting System for Diagnosing Lymphoid Neoplasms Using Probabilistic Decision Tree Algorithm and Immunohistochemistry Profile Database

Authors: Yosep Chong, Yejin Kim, Jingyun Choi, Hwanjo Yu, Eun Jung Lee, Chang Suk Kang

Abstract:

For the past decades, immunohistochemistry (IHC) has been playing an important role in the diagnosis of human neoplasms, by helping pathologists to make a clearer decision on differential diagnosis, subtyping, personalized treatment plan, and finally prognosis prediction. However, the IHC performed in various tumors of daily practice often shows conflicting and very challenging results to interpret. Even comprehensive diagnosis synthesizing clinical, histologic and immunohistochemical findings can be helpless in some twisted cases. Another important issue is that the IHC data is increasing exponentially and more and more information have to be taken into account. For this reason, we reached an idea to develop an expert supporting system to help pathologists to make a better decision in diagnosing human neoplasms with IHC results. We gave probabilistic decision tree algorithm and tested the algorithm with real case data of lymphoid neoplasms, in which the IHC profile is more important to make a proper diagnosis than other human neoplasms. We designed probabilistic decision tree based on Bayesian theorem, program computational process using MATLAB (The MathWorks, Inc., USA) and prepared IHC profile database (about 104 disease category and 88 IHC antibodies) based on WHO classification by reviewing the literature. The initial probability of each neoplasm was set with the epidemiologic data of lymphoid neoplasm in Korea. With the IHC results of 131 patients sequentially selected, top three presumptive diagnoses for each case were made and compared with the original diagnoses. After the review of the data, 124 out of 131 were used for final analysis. As a result, the presumptive diagnoses were concordant with the original diagnoses in 118 cases (93.7%). The major reason of discordant cases was that the similarity of the IHC profile between two or three different neoplasms. The expert supporting system algorithm presented in this study is in its elementary stage and need more optimization using more advanced technology such as deep-learning with data of real cases, especially in differentiating T-cell lymphomas. Although it needs more refinement, it may be used to aid pathological decision making in future. A further application to determine IHC antibodies for a certain subset of differential diagnoses might be possible in near future.

Keywords: database, expert supporting system, immunohistochemistry, probabilistic decision tree

Procedia PDF Downloads 225

1417 Static Analysis of Security Issues of the Python Packages Ecosystem

Authors: Adam Gorine, Faten Spondon

Abstract:

Python is considered the most popular programming language and offers its own ecosystem for archiving and maintaining open-source software packages. This system is called the python package index (PyPI), the repository of this programming language. Unfortunately, one-third of these software packages have vulnerabilities that allow attackers to execute code automatically when a vulnerable or malicious package is installed. This paper contributes to large-scale empirical studies investigating security issues in the python ecosystem by evaluating package vulnerabilities. These provide a series of implications that can help the security of software ecosystems by improving the process of discovering, fixing, and managing package vulnerabilities. The vulnerable dataset is generated using the NVD, the national vulnerability database, and the Snyk vulnerability dataset. In addition, we evaluated 807 vulnerability reports in the NVD and 3900 publicly known security vulnerabilities in Python Package Manager (pip) from the Snyk database from 2002 to 2022. As a result, many Python vulnerabilities appear in high severity, followed by medium severity. The most problematic areas have been improper input validation and denial of service attacks. A hybrid scanning tool that combines the three scanners bandit, snyk and dlint, which provide a clear report of the code vulnerability, is also described.

Keywords: Python vulnerabilities, bandit, Snyk, Dlint, Python package index, ecosystem, static analysis, malicious attacks

Procedia PDF Downloads 140

1416 A Pattern Recognition Neural Network Model for Detection and Classification of SQL Injection Attacks

Authors: Naghmeh Moradpoor Sheykhkanloo

Abstract:

Structured Query Language Injection (SQLI) attack is a code injection technique in which malicious SQL statements are inserted into a given SQL database by simply using a web browser. Losing data, disclosing confidential information or even changing the value of data are the severe damages that SQLI attack can cause on a given database. SQLI attack has also been rated as the number-one attack among top ten web application threats on Open Web Application Security Project (OWASP). OWASP is an open community dedicated to enabling organisations to consider, develop, obtain, function, and preserve applications that can be trusted. In this paper, we propose an effective pattern recognition neural network model for detection and classification of SQLI attacks. The proposed model is built from three main elements of: a Uniform Resource Locator (URL) generator in order to generate thousands of malicious and benign URLs, a URL classifier in order to: 1) classify each generated URL to either a benign URL or a malicious URL and 2) classify the malicious URLs into different SQLI attack categories, and an NN model in order to: 1) detect either a given URL is a malicious URL or a benign URL and 2) identify the type of SQLI attack for each malicious URL. The model is first trained and then evaluated by employing thousands of benign and malicious URLs. The results of the experiments are presented in order to demonstrate the effectiveness of the proposed approach.

Keywords: neural networks, pattern recognition, SQL injection attacks, SQL injection attack classification, SQL injection attack detection

Procedia PDF Downloads 470

1415 Sequence Analysis and Molecular Cloning of PROTEOLYSIS 6 in Tomato

Authors: Nurulhikma Md Isa, Intan Elya Suka, Nur Farhana Roslan, Chew Bee Lynn

Abstract:

The evolutionarily conserved N-end rule pathway marks proteins for degradation by the Ubiquitin Proteosome System (UPS) based on the nature of their N-terminal residue. Proteins with a destabilizing N-terminal residue undergo a series of condition-dependent N-terminal modifications, resulting in their ubiquitination and degradation. Intensive research has been carried out in Arabidopsis previously. The group VII Ethylene Response Factor (ERFs) transcription factors are the first N-end rule pathway substrates found in Arabidopsis and their role in regulating oxygen sensing. ERFs also function as central hubs for the perception of gaseous signals in plants and control different plant developmental including germination, stomatal aperture, hypocotyl elongation and stress responses. However, nothing is known about the role of this pathway during fruit development and ripening aspect. The plant model system Arabidopsis cannot represent fleshy fruit model system therefore tomato is the best model plant to study. PROTEOLYSIS6 (PRT6) is an E3 ubiquitin ligase of the N-end rule pathway. Two homologs of PRT6 sequences have been identified in tomato genome database using the PRT6 protein sequence from model plant Arabidopsis thaliana. Homology search against Ensemble Plant database (tomato) showed Solyc09g010830.2 is the best hit with highest score of 1143, e-value of 0.0 and 61.3% identity compare to the second hit Solyc10g084760.1. Further homology search was done using NCBI Blast database to validate the data. The result showed best gene hit was XP_010325853.1 of uncharacterized protein LOC101255129 (Solanum lycopersicum) with highest score of 1601, e-value 0.0 and 48% identity. Both Solyc09g010830.2 and uncharacterized protein LOC101255129 were genes located at chromosome 9. Further validation was carried out using BLASTP program between these two sequences (Solyc09g010830.2 and uncharacterized protein LOC101255129) to investigate whether they were the same proteins represent PRT6 in tomato. Results showed that both proteins have 100 % identity, indicates that they were the same gene represents PRT6 in tomato. In addition, we used two different RNAi constructs that were driven under 35S and Polygalacturonase (PG) promoters to study the function of PRT6 during tomato developmental stages and ripening processes.

Keywords: ERFs, PRT6, tomato, ubiquitin

Procedia PDF Downloads 241

1414 Effective Nutrition Label Use on Smartphones

Authors: Vladimir Kulyukin, Tanwir Zaman, Sarat Kiran Andhavarapu

Abstract:

Research on nutrition label use identifies four factors that impede comprehension and retention of nutrition information by consumers: label’s location on the package, presentation of information within the label, label’s surface size, and surrounding visual clutter. In this paper, a system is presented that makes nutrition label use more effective for nutrition information comprehension and retention. The system’s front end is a smartphone application. The system’s back end is a four node Linux cluster for image recognition and data storage. Image frames captured on the smartphone are sent to the back end for skewed or aligned barcode recognition. When barcodes are recognized, corresponding nutrition labels are retrieved from a cloud database and presented to the user on the smartphone’s touchscreen. Each displayed nutrition label is positioned centrally on the touchscreen with no surrounding visual clutter. Wikipedia links to important nutrition terms are embedded to improve comprehension and retention of nutrition information. Standard touch gestures (e.g., zoom in/out) available on mainstream smartphones are used to manipulate the label’s surface size. The nutrition label database currently includes 200,000 nutrition labels compiled from public web sites by a custom crawler. Stress test experiments with the node cluster are presented. Implications for proactive nutrition management and food policy are discussed.

Keywords: mobile computing, cloud computing, nutrition label use, nutrition management, barcode scanning

Procedia PDF Downloads 375

1413 Leveraging SHAP Values for Effective Feature Selection in Peptide Identification

Authors: Sharon Li, Zhonghang Xia

Abstract:

Post-database search is an essential phase in peptide identification using tandem mass spectrometry (MS/MS) to refine peptide-spectrum matches (PSMs) produced by database search engines. These engines frequently face difficulty differentiating between correct and incorrect peptide assignments. Despite advances in statistical and machine learning methods aimed at improving the accuracy of peptide identification, challenges remain in selecting critical features for these models. In this study, two machine learning models—a random forest tree and a support vector machine—were applied to three datasets to enhance PSMs. SHAP values were utilized to determine the significance of each feature within the models. The experimental results indicate that the random forest model consistently outperformed the SVM across all datasets. Further analysis of SHAP values revealed that the importance of features varies depending on the dataset, indicating that a feature's role in model predictions can differ significantly. This variability in feature selection can lead to substantial differences in model performance, with false discovery rate (FDR) differences exceeding 50% between different feature combinations. Through SHAP value analysis, the most effective feature combinations were identified, significantly enhancing model performance.

Keywords: peptide identification, SHAP value, feature selection, random forest tree, support vector machine

Procedia PDF Downloads 30

1412 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques

Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian

Abstract:

Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.

Keywords: data mining, k-means, road traffic accidents, Waze, Weka

Procedia PDF Downloads 418

1411 Evaluation of Commercial Back-analysis Package in Condition Assessment of Railways

Authors: Shadi Fathi, Moura Mehravar, Mujib Rahman

Abstract:

Over the years,increased demands on railways, the emergence of high-speed trains and heavy axle loads, ageing, and deterioration of the existing tracks, is imposing costly maintenance actions on the railway sector. The need for developing a fast andcost-efficient non-destructive assessment method for the structural evaluation of railway tracksis therefore critically important. The layer modulus is the main parameter used in the structural design and evaluation of the railway track substructure (foundation). Among many recently developed NDTs, Falling Weight Deflectometer (FWD) test, widely used in pavement evaluation, has shown promising results for railway track substructure monitoring. The surface deflection data collected by FWD are used to estimate the modulus of substructure layers through the back-analysis technique. Although there are different commerciallyavailableback-analysis programs are used for pavement applications, there are onlya limited number of research-based techniques have been so far developed for railway track evaluation. In this paper, the suitability, accuracy, and reliability of the BAKFAAsoftware are investigated. The main rationale for selecting BAKFAA as it has a relatively straightforward user interfacethat is freely available and widely used in highway and airport pavement evaluation. As part of the study, a finite element (FE) model of a railway track section near Leominsterstation, Herefordshire, UK subjected to the FWD test, was developed and validated against available field data. Then, a virtual experimental database (including 218 sets of FWD testing data) was generated using theFE model and employed as the measured database for the BAKFAA software. This database was generated considering various layers’ moduli for each layer of track substructure over a predefined range. The BAKFAA predictions were compared against the cone penetration test (CPT) data (available from literature; conducted near to Leominster station same section as the FWD was performed). The results reveal that BAKFAA overestimatesthe layers’ moduli of each substructure layer. To adjust the BAKFA with the CPT data, this study introduces a correlation model to make the BAKFAA applicable in railway applications.

Keywords: back-analysis, bakfaa, railway track substructure, falling weight deflectometer (FWD), cone penetration test (CPT)

Procedia PDF Downloads 130

1410 Evaluation of NoSQL in the Energy Marketplace with GraphQL Optimization

Authors: Michael Howard

Abstract:

The growing popularity of electric vehicles in the United States requires an ever-expanding infrastructure of commercial DC fast charging stations. The U.S. Department of Energy estimates 33,355 publicly available DC fast charging stations as of September 2023. In 2017, 115,370 gasoline stations were operating in the United States, much more ubiquitous than DC fast chargers. Range anxiety is an important impediment to the adoption of electric vehicles and is even more relevant in underserved regions in the country. The peer-to-peer energy marketplace helps fill the demand by allowing private home and small business owners to rent their 240 Volt, level-2 charging facilities. The existing, publicly accessible outlets are wrapped with a Cloud-connected microcontroller managing security and charging sessions. These microcontrollers act as Edge devices communicating with a Cloud message broker, while both buyer and seller users interact with the framework via a web-based user interface. The database storage used by the marketplace framework is a key component in both the cost of development and the performance that contributes to the user experience. A traditional storage solution is the SQL database. The architecture and query language have been in existence since the 1970s and are well understood and documented. The Structured Query Language supported by the query engine provides fine granularity with user query conditions. However, difficulty in scaling across multiple nodes and cost of its server-based compute have resulted in a trend in the last 20 years towards other NoSQL, serverless approaches. In this study, we evaluate the NoSQL vs. SQL solutions through a comparison of Google Cloud Firestore and Cloud SQL MySQL offerings. The comparison pits Google's serverless, document-model, non-relational, NoSQL against the server-base, table-model, relational, SQL service. The evaluation is based on query latency, flexibility/scalability, and cost criteria. Through benchmarking and analysis of the architecture, we determine whether Firestore can support the energy marketplace storage needs and if the introduction of a GraphQL middleware layer can overcome its deficiencies.

Keywords: non-relational, relational, MySQL, mitigate, Firestore, SQL, NoSQL, serverless, database, GraphQL

Procedia PDF Downloads 63

1409 A Geo DataBase to Investigate the Maximum Distance Error in Quality of Life Studies

Authors: Paolino Di Felice

Abstract:

The background and significance of this study come from papers already appeared in the literature which measured the impact of public services (e.g., hospitals, schools, ...) on the citizens’ needs satisfaction (one of the dimensions of QOL studies) by calculating the distance between the place where they live and the location on the territory of the services. Those studies assume that the citizens' dwelling coincides with the centroid of the polygon that expresses the boundary of the administrative district, within the city, they belong to. Such an assumption “introduces a maximum measurement error equal to the greatest distance between the centroid and the border of the administrative district.”. The case study, this abstract reports about, investigates the implications descending from the adoption of such an approach but at geographical scales greater than the urban one, namely at the three levels of nesting of the Italian administrative units: the (20) regions, the (110) provinces, and the 8,094 municipalities. To carry out this study, it needs to be decided: a) how to store the huge amount of (spatial and descriptive) input data and b) how to process them. The latter aspect involves: b.1) the design of algorithms to investigate the geometry of the boundary of the Italian administrative units; b.2) their coding in a programming language; b.3) their execution and, eventually, b.4) archiving the results in a permanent support. The IT solution we implemented is centered around a (PostgreSQL/PostGIS) Geo DataBase structured in terms of three tables that fit well to the hierarchy of nesting of the Italian administrative units: municipality(id, name, provinceId, istatCode, regionId, geometry) province(id, name, regionId, geometry) region(id, name, geometry). The adoption of the DBMS technology allows us to implement the steps "a)" and "b)" easily. In particular, step "b)" is simplified dramatically by calling spatial operators and spatial built-in User Defined Functions within SQL queries against the Geo DB. The major findings coming from our experiments can be summarized as follows. The approximation that, on the average, descends from assimilating the residence of the citizens with the centroid of the administrative unit of reference is of few kilometers (4.9) at the municipalities level, while it becomes conspicuous at the other two levels (28.9 and 36.1, respectively). Therefore, studies such as those mentioned above can be extended up to the municipal level without affecting the correctness of the interpretation of the results, but not further. The IT framework implemented to carry out the experiments can be replicated for studies referring to the territory of other countries all over the world.

Keywords: quality of life, distance measurement error, Italian administrative units, spatial database

Procedia PDF Downloads 373

1408 State Estimator Performance Enhancement: Methods for Identifying Errors in Modelling and Telemetry

Authors: M. Ananthakrishnan, Sunil K Patil, Koti Naveen, Inuganti Hemanth Kumar

Abstract:

State estimation output of EMS forms the base case for all other advanced applications used in real time by a power system operator. Ensuring tuning of state estimator is a repeated process and cannot be left once a good solution is obtained. This paper attempts to demonstrate methods to improve state estimator solution by identifying incorrect modelling and telemetry inputs to the application. In this work, identification of database topology modelling error by plotting static network using node-to-node connection details is demonstrated with examples. Analytical methods to identify wrong transmission parameters, incorrect limits and mistakes in pseudo load and generator modelling are explained with various cases observed. Further, methods used for active and reactive power tuning using bus summation display, reactive power absorption summary, and transformer tap correction are also described. In a large power system, verifying all network static data and modelling parameter on regular basis is difficult .The proposed tuning methods can be easily used by operators to quickly identify errors to obtain the best possible state estimation performance. This, in turn, can lead to improved decision-support capabilities, ultimately enhancing the safety and reliability of the power grid.

Keywords: active power tuning, database modelling, reactive power, state estimator

Procedia PDF Downloads 11

1407 Finite Element Analysis of Raft Foundation on Various Soil Types under Earthquake Loading

Authors: Qassun S. Mohammed Shafiqu, Murtadha A. Abdulrasool

Abstract:

The design of shallow foundations to withstand different dynamic loads has given considerable attention in recent years. Dynamic loads may be due to the earthquakes, pile driving, blasting, water waves, and machine vibrations. But, predicting the behavior of shallow foundations during earthquakes remains a difficult task for geotechnical engineers. A database for dynamic and static parameters for different soils in seismic active zones in Iraq is prepared which has been collected from geophysical and geotechnical investigation works. Then, analysis of a typical 3-D soil-raft foundation system under earthquake loading is carried out using the database. And a parametric study has been carried out taking into consideration the influence of some parameters on the dynamic behavior of the raft foundation, such as raft stiffness, damping ratio as well as the influence of the earthquake acceleration-time records. The results of the parametric study show that the settlement caused by the earthquake can be decreased by about 72% with increasing the thickness from 0.5 m to 1.5 m. But, it has been noticed that reduction in the maximum bending moment by about 82% was predicted by decreasing the raft thickness from 1.5 m to 0.5 m in all sites model. Also, it has been observed that the maximum lateral displacement, the maximum vertical settlement and the maximum bending moment for damping ratio 0% is about 14%, 20%, and 18% higher than that for damping ratio 7.5%, respectively for all sites model.

Keywords: shallow foundation, seismic behavior, raft thickness, damping ratio

Procedia PDF Downloads 149

1406 KPI and Tool for the Evaluation of Competency in Warehouse Management for Furniture Business

Authors: Kritchakhris Na-Wattanaprasert

Abstract:

The objective of this research is to design and develop a prototype of a key performance indicator system this is suitable for warehouse management in a case study and use requirement. In this study, we design a prototype of key performance indicator system (KPI) for warehouse case study of furniture business by methodology in step of identify scope of the research and study related papers, gather necessary data and users requirement, develop key performance indicator base on balance scorecard, design pro and database for key performance indicator, coding the program and set relationship of database and finally testing and debugging each module. This study use Balance Scorecard (BSC) for selecting and grouping key performance indicator. The system developed by using Microsoft SQL Server 2010 is used to create the system database. In regard to visual-programming language, Microsoft Visual C# 2010 is chosen as the graphic user interface development tool. This system consists of six main menus: menu login, menu main data, menu financial perspective, menu customer perspective, menu internal, and menu learning and growth perspective. Each menu consists of key performance indicator form. Each form contains a data import section, a data input section, a data searches – edit section, and a report section. The system generates outputs in 5 main reports, the KPI detail reports, KPI summary report, KPI graph report, benchmarking summary report and benchmarking graph report. The user will select the condition of the report and period time. As the system has been developed and tested, discovers that it is one of the ways to judging the extent to warehouse objectives had been achieved. Moreover, it encourages the warehouse functional proceed with more efficiency. In order to be useful propose for other industries, can adjust this system appropriately. To increase the usefulness of the key performance indicator system, the recommendations for further development are as follows: -The warehouse should review the target value and set the better suitable target periodically under the situation fluctuated in the future. -The warehouse should review the key performance indicators and set the better suitable key performance indicators periodically under the situation fluctuated in the future for increasing competitiveness and take advantage of new opportunities.

Keywords: key performance indicator, warehouse management, warehouse operation, logistics management

Procedia PDF Downloads 432

1405 Factors Affecting Cesarean Section among Women in Qatar Using Multiple Indicator Cluster Survey Database

Authors: Sahar Elsaleh, Ghada Farhat, Shaikha Al-Derham, Fasih Alam

Abstract:

Background: Cesarean section (CS) delivery is one of the major concerns both in developing and developed countries. The rate of CS deliveries are on the rise globally, and especially in Qatar. Many socio-economic, demographic, clinical and institutional factors play an important role for cesarean sections. This study aims to investigate factors affecting the prevalence of CS among women in Qatar using the UNICEF’s Multiple Indicator Cluster Survey (MICS) 2012 database. Methods: The study has focused on the women’s questionnaire of the MICS, which was successfully distributed to 5699 participants. Following study inclusion and exclusion criteria, a final sample of 761 women aged 19- 49 years who had at least one delivery of giving birth in their lifetime before the survey were included. A number of socio-economic, demographic, clinical and institutional factors, identified through literature review and available in the data, were considered for the analyses. Bivariate and multivariate logistic regression models, along with a multi-level modeling to investigate clustering effect, were undertaken to identify the factors that affect CS prevalence in Qatar. Results: From the bivariate analyses the study has shown that, a number of categorical factors are statistically significantly associated with the dependent variable (CS). When identifying the factors from a multivariate logistic regression, the study found that only three categorical factors -‘age of women’, ‘place at delivery’ and ‘baby weight’ appeared to be significantly affecting the CS among women in Qatar. Although the MICS dataset is based on a cluster survey, an exploratory multi-level analysis did not show any clustering effect, i.e. no significant variation in results at higher level (households), suggesting that all analyses at lower level (individual respondent) are valid without any significant bias in results. Conclusion: The study found a statistically significant association between the dependent variable (CS delivery) and age of women, frequency of TV watching, assistance at birth and place of birth. These results need to be interpreted cautiously; however, it can be used as evidence-base for further research on cesarean section delivery in Qatar.

Keywords: cesarean section, factors, multiple indicator cluster survey, MICS database, Qatar

Procedia PDF Downloads 118

1404 Simultaneous Measurement of Wave Pressure and Wind Speed with the Specific Instrument and the Unit of Measurement Description

Authors: Branimir Jurun, Elza Jurun

Abstract:

The focus of this paper is the description of an instrument called 'Quattuor 45' and defining of wave pressure measurement. Special attention is given to measurement of wave pressure created by the wind speed increasing obtained with the instrument 'Quattuor 45' in the investigated area. The study begins with respect to theoretical attitudes and numerous up to date investigations related to the waves approaching the coast. The detailed schematic view of the instrument is enriched with pictures from ground plan and side view. Horizontal stability of the instrument is achieved by mooring which relies on two concrete blocks. Vertical wave peak monitoring is ensured by one float above the instrument. The synthesis of horizontal stability and vertical wave peak monitoring allows to create a representative database for wave pressure measuring. Instrument ‘Quattuor 45' is named according to the way the database is received. Namely, the electronic part of the instrument consists of the main chip ‘Arduino', its memory, four load cells with the appropriate modules and the wind speed sensor 'Anemometers'. The 'Arduino' chip is programmed to store two data from each load cell and two data from the anemometer on SD card each second. The next part of the research is dedicated to data processing. All measured results are stored automatically in the database and after that detailed processing is carried out in the MS Excel. The result of the wave pressure measurement is synthesized by the unit of measurement kN/m². This paper also suggests a graphical presentation of the results by multi-line graph. The wave pressure is presented on the left vertical axis, while the wind speed is shown on the right vertical axis. The time of measurement is displayed on the horizontal axis. The paper proposes an algorithm for wind speed measurements showing the results for two characteristic winds in the Adriatic Sea, called 'Bura' and 'Jugo'. The first of them is the northern wind that reaches high speeds, causing low and extremely steep waves, where the pressure of the wave is relatively weak. On the other hand, the southern wind 'Jugo' has a lower speed than the northern wind, but due to its constant duration and constant speed maintenance, it causes extremely long and high waves that cause extremely high wave pressure.

Keywords: instrument, measuring unit, waves pressure metering, wind seed measurement

Procedia PDF Downloads 198

1403 Molecular Portraits: The Role of Posttranslational Modification in Cancer Metastasis

Authors: Navkiran Kaur, Apoorva Mathur, Abhishree Agarwal, Sakshi Gupta, Tuhin Rashmi

Abstract:

Aim: Breast cancer is the most common cancer in women worldwide, and resistance to the current therapeutics, often concurrently, is an increasing clinical challenge. Glycosylation of proteins is one of the most important post-translational modifications. It is widely known that aberrant glycosylation has been implicated in many different diseases due to changes associated with biological function and protein folding. Alterations in cell surface glycosylation, can promote invasive behavior of tumor cells that ultimately lead to the progression of cancer. In breast cancer, there is an increasing evidence pertaining to the role of glycosylation in tumor formation and metastasis. In the present study, an attempt has been made to study the disease associated sialoglycoproteins in breast cancer by using bioinformatics tools. The sequence will be retrieved from UniProt database. A database in the form of a word document was made by a collection of FASTA sequences of breast cancer gene sequence. Glycosylation was studied using yinOyang tool on ExPASy and Differential genes expression and protein analysis was done in context of breast cancer metastasis. The number of residues predicted O-glc NAc threshold containing 50 aberrant glycosylation sites or more was detected and recorded for individual sequence. We found that the there is a significant change in the expression profiling of glycosylation patterns of various proteins associated with breast cancer. Differential aberrant glycosylated proteins in breast cancer cells with respect to non-neoplastic cells are an important factor for the overall progression and development of cancer.

Keywords: breast cancer, bioinformatics, cancer, metastasis, glycosylation

Procedia PDF Downloads 294

1402 A Pipeline for Detecting Copy Number Variation from Whole Exome Sequencing Using Comprehensive Tools

Authors: Cheng-Yang Lee, Petrus Tang, Tzu-Hao Chang

Abstract:

Copy number variations (CNVs) have played an important role in many kinds of human diseases, such as Autism, Schizophrenia and a number of cancers. Many diseases are found in genome coding regions and whole exome sequencing (WES) is a cost-effective and powerful technology in detecting variants that are enriched in exons and have potential applications in clinical setting. Although several algorithms have been developed to detect CNVs using WES and compared with other algorithms for finding the most suitable methods using their own samples, there were not consistent datasets across most of algorithms to evaluate the ability of CNV detection. On the other hand, most of algorithms is using command line interface that may greatly limit the analysis capability of many laboratories. We create a series of simulated WES datasets from UCSC hg19 chromosome 22, and then evaluate the CNV detective ability of 19 algorithms from OMICtools database using our simulated WES datasets. We compute the sensitivity, specificity and accuracy in each algorithm for validation of the exome-derived CNVs. After comparison of 19 algorithms from OMICtools database, we construct a platform to install all of the algorithms in a virtual machine like VirtualBox which can be established conveniently in local computers, and then create a simple script that can be easily to use for detecting CNVs using algorithms selected by users. We also build a table to elaborate on many kinds of events, such as input requirement, CNV detective ability, for all of the algorithms that can provide users a specification to choose optimum algorithms.

Keywords: whole exome sequencing, copy number variations, omictools, pipeline

Procedia PDF Downloads 320

1401 Analysis of Brownfield Soil Contamination Using Local Government Planning Data

Authors: Emma E. Hellawell, Susan J. Hughes

Abstract:

BBrownfield sites are currently being redeveloped for residential use. Information on soil contamination on these former industrial sites is collected as part of the planning process by the local government. This research project analyses this untapped resource of environmental data, using site investigation data submitted to a local Borough Council, in Surrey, UK. Over 150 site investigation reports were collected and interrogated to extract relevant information. This study involved three phases. Phase 1 was the development of a database for soil contamination information from local government reports. This database contained information on the source, history, and quality of the data together with the chemical information on the soil that was sampled. Phase 2 involved obtaining site investigation reports for development within the study area and extracting the required information for the database. Phase 3 was the data analysis and interpretation of key contaminants to evaluate typical levels of contaminants, their distribution within the study area, and relating these results to current guideline levels of risk for future site users. Preliminary results for a pilot study using a sample of the dataset have been obtained. This pilot study showed there is some inconsistency in the quality of the reports and measured data, and careful interpretation of the data is required. Analysis of the information has found high levels of lead in shallow soil samples, with mean and median levels exceeding the current guidance for residential use. The data also showed elevated (but below guidance) levels of potentially carcinogenic polyaromatic hydrocarbons. Of particular concern from the data was the high detection rate for asbestos fibers. These were found at low concentrations in 25% of the soil samples tested (however, the sample set was small). Contamination levels of the remaining chemicals tested were all below the guidance level for residential site use. These preliminary pilot study results will be expanded, and results for the whole local government area will be presented at the conference. The pilot study has demonstrated the potential for this extensive dataset to provide greater information on local contamination levels. This can help inform regulators and developers and lead to more targeted site investigations, improving risk assessments, and brownfield development.

Keywords: Brownfield development, contaminated land, local government planning data, site investigation

Procedia PDF Downloads 140

1400 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English

Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista

Abstract:

The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.

Keywords: corpus linguistics, historical linguistics, old English, parallel corpus

Procedia PDF Downloads 213