Search results for: k nearest neighbor classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 666

Search results for: k nearest neighbor classifier

6 Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations of previous approaches, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with attention mechanism. In a previous work on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: transformers, generative ai, gene expression design, classification

Procedia PDF Downloads 33
5 Management of the Experts in the Research Evaluation System of the University: Based on National Research University Higher School of Economics Example

Authors: Alena Nesterenko, Svetlana Petrikova

Abstract:

Research evaluation is one of the most important elements of self-regulation and development of researchers as it is impartial and independent process of assessment. The method of expert evaluations as a scientific instrument solving complicated non-formalized problems is firstly a scientifically sound way to conduct the assessment which maximum effectiveness of work at every step and secondly the usage of quantitative methods for evaluation, assessment of expert opinion and collective processing of the results. These two features distinguish the method of expert evaluations from long-known expertise widespread in many areas of knowledge. Different typical problems require different types of expert evaluations methods. Several issues which arise with these methods are experts’ selection, management of assessment procedure, proceeding of the results and remuneration for the experts. To address these issues an on-line system was created with the primary purpose of development of a versatile application for many workgroups with matching approaches to scientific work management. Online documentation assessment and statistics system allows: - To realize within one platform independent activities of different workgroups (e.g. expert officers, managers). - To establish different workspaces for corresponding workgroups where custom users database can be created according to particular needs. - To form for each workgroup required output documents. - To configure information gathering for each workgroup (forms of assessment, tests, inventories). - To create and operate personal databases of remote users. - To set up automatic notification through e-mail. The next stage is development of quantitative and qualitative criteria to form a database of experts. The inventory was made so that the experts may not only submit their personal data, place of work and scientific degree but also keywords according to their expertise, academic interests, ORCID, Researcher ID, SPIN-code RSCI, Scopus AuthorID, knowledge of languages, primary scientific publications. For each project, competition assessments are processed in accordance to ordering party demands in forms of apprised inventories, commentaries (50-250 characters) and overall review (1500 characters) in which expert states the absence of conflict of interest. Evaluation is conducted as follows: as applications are added to database expert officer selects experts, generally, two persons per application. Experts are selected according to the keywords; this method proved to be good unlike the OECD classifier. The last stage: the choice of the experts is approved by the supervisor, the e-mails are sent to the experts with invitation to assess the project. An expert supervisor is controlling experts writing reports for all formalities to be in place (time-frame, propriety, correspondence). If the difference in assessment exceeds four points, the third evaluation is appointed. As the expert finishes work on his expert opinion, system shows contract marked ‘new’, managers commence with the contract and the expert gets e-mail that the contract is formed and ready to be signed. All formalities are concluded and the expert gets remuneration for his work. The specificity of interaction of the examination officer with other experts will be presented in the report.

Keywords: expertise, management of research evaluation, method of expert evaluations, research evaluation

Procedia PDF Downloads 188
4 Human Identification and Detection of Suspicious Incidents Based on Outfit Colors: Image Processing Approach in CCTV Videos

Authors: Thilini M. Yatanwala

Abstract:

CCTV (Closed-Circuit-Television) Surveillance System is being used in public places over decades and a large variety of data is being produced every moment. However, most of the CCTV data is stored in isolation without having integrity. As a result, identification of the behavior of suspicious people along with their location has become strenuous. This research was conducted to acquire more accurate and reliable timely information from the CCTV video records. The implemented system can identify human objects in public places based on outfit colors. Inter-process communication technologies were used to implement the CCTV camera network to track people in the premises. The research was conducted in three stages and in the first stage human objects were filtered from other movable objects available in public places. In the second stage people were uniquely identified based on their outfit colors and in the third stage an individual was continuously tracked in the CCTV network. A face detection algorithm was implemented using cascade classifier based on the training model to detect human objects. HAAR feature based two-dimensional convolution operator was introduced to identify features of the human face such as region of eyes, region of nose and bridge of the nose based on darkness and lightness of facial area. In the second stage outfit colors of human objects were analyzed by dividing the area into upper left, upper right, lower left, lower right of the body. Mean color, mod color and standard deviation of each area were extracted as crucial factors to uniquely identify human object using histogram based approach. Color based measurements were written in to XML files and separate directories were maintained to store XML files related to each camera according to time stamp. As the third stage of the approach, inter-process communication techniques were used to implement an acknowledgement based CCTV camera network to continuously track individuals in a network of cameras. Real time analysis of XML files generated in each camera can determine the path of individual to monitor full activity sequence. Higher efficiency was achieved by sending and receiving acknowledgments only among adjacent cameras. Suspicious incidents such as a person staying in a sensitive area for a longer period or a person disappeared from the camera coverage can be detected in this approach. The system was tested for 150 people with the accuracy level of 82%. However, this approach was unable to produce expected results in the presence of group of people wearing similar type of outfits. This approach can be applied to any existing camera network without changing the physical arrangement of CCTV cameras. The study of human identification and suspicious incident detection using outfit color analysis can achieve higher level of accuracy and the project will be continued by integrating motion and gait feature analysis techniques to derive more information from CCTV videos.

Keywords: CCTV surveillance, human detection and identification, image processing, inter-process communication, security, suspicious detection

Procedia PDF Downloads 153
3 Salmon Diseases Connectivity between Fish Farm Management Areas in Chile

Authors: Pablo Reche

Abstract:

Since 1980’s aquaculture has become the biggest economic activity in southern Chile, being Salmo salar and Oncorhynchus mykiss the main finfish species. High fish density makes both species prone to contract diseases, what drives the industry to big losses, affecting greatly the local economy. Three are the most concerning infective agents, the infectious salmon anemia virus (ISAv), the bacteria Piscirickettsia salmonis and the copepod Caligus rogercresseyi. To regulate the industry the government arranged the salmon farms within management areas named as barrios, which coordinate the fallowing periods and antibiotics treatments of their salmon farms. In turn, barrios are gathered into larger management areas, named as macrozonas whose purpose is to minimize the risk of disease transmission between them and to enclose the outbreaks within their boundaries. However, disease outbreaks still happen and transmission to neighbor sites enlarges the initial event. Salmon disease agents are mostly transported passively by local currents. Thus, to understand how transmission occurs it must be firstly studied the physical environment. In Chile, salmon farming takes place in the inner seas of the southernmost regions of western Patagonia, between 41.5ºS-55ºS. This coastal marine system is characterised by western winds, latitudinally modulated by the position of the South-Eats Pacific high-pressure centre, high precipitation rates and freshwater inflows from the numerous glaciers (including the largest ice cap out of Antarctic and Greenland). All of these forcings meet in a complex bathymetry and coastline system - deep fjords, shallow sills, narrow straits, channels, archipelagos, inlets, and isolated inner seas- driving an estuarine circulation (fast outflows westwards on surface and slow deeper inflows eastwards). Such a complex system is modelled on the numerical model MIKE3, upon whose 3D current fields particle-track-biological models (one for each infective agent) are decoupled. Each agent biology is parameterized by functions for maturation and mortality (reproduction not included). Such parameterizations are depending upon environmental factors, like temperature and salinity, so their lifespan will depend upon the environmental conditions those virtual agents encounter on their way while passively transported. CLIC (Connectivity-Langrangian–IFOP-Chile) is a service platform that supports the graphical visualization of the connectivity matrices calculated from the particle trajectories files resultant of the particle-track-biological models. On CLIC users can select, from a high-resolution grid (~1km), the areas the connectivity will be calculated between them. These areas can be barrios and macrozonas. Users also can select what nodes of these areas are allowed to release and scatter particles from, depth and frequency of the initial particle release, climatic scenario (winter/summer) and type of particle (ISAv, Piscirickettsia salmonis, Caligus rogercresseyi plus an option for lifeless particles). Results include probabilities downstream (where the particles go) and upstream (where the particles come from), particle age and vertical distribution, all of them aiming to understand how currently connectivity works to eventually propose a minimum risk zonation for aquaculture purpose. Preliminary results in Chiloe inner sea shows that the risk depends not only upon dynamic conditions but upon barrios location with respect to their neighbors.

Keywords: aquaculture zonation, Caligus rogercresseyi, Chilean Patagonia, coastal oceanography, connectivity, infectious salmon anemia virus, Piscirickettsia salmonis

Procedia PDF Downloads 135
2 Revolutionizing Oil Palm Replanting: Geospatial Terrace Design for High-precision Ground Implementation Compared to Conventional Methods

Authors: Nursuhaili Najwa Masrol, Nur Hafizah Mohammed, Nur Nadhirah Rusyda Rosnan, Vijaya Subramaniam, Sim Choon Cheak

Abstract:

Replanting in oil palm cultivation is vital to enable the introduction of planting materials and provides an opportunity to improve the road, drainage, terrace design, and planting density. Oil palm replanting is fundamentally necessary every 25 years. The adoption of the digital replanting blueprint is imperative as it can assist the Malaysia Oil Palm industry in addressing challenges such as labour shortages and limited expertise related to replanting tasks. Effective replanting planning should commence at least 6 months prior to the actual replanting process. Therefore, this study will help to plan and design the replanting blueprint with high-precision translation on the ground. With the advancement of geospatial technology, it is now feasible to engage in thoroughly researched planning, which can help maximize the potential yield. A blueprint designed before replanting is to enhance management’s ability to optimize the planting program, address manpower issues, or even increase productivity. In terrace planting blueprints, geographic tools have been utilized to design the roads, drainages, terraces, and planting points based on the ARM standards. These designs are mapped with location information and undergo statistical analysis. The geospatial approach is essential in precision agriculture and ensuring an accurate translation of design to the ground by implementing high-accuracy technologies. In this study, geospatial and remote sensing technologies played a vital role. LiDAR data was employed to determine the Digital Elevation Model (DEM), enabling the precise selection of terraces, while ortho imagery was used for validation purposes. Throughout the designing process, Geographical Information System (GIS) tools were extensively utilized. To assess the design’s reliability on the ground compared with the current conventional method, high-precision GPS instruments like EOS Arrow Gold and HIPER VR GNSS were used, with both offering accuracy levels between 0.3 cm and 0.5cm. Nearest Distance Analysis was generated to compare the design with actual planting on the ground. The analysis revealed that it could not be applied to the roads due to discrepancies between actual roads and the blueprint design, which resulted in minimal variance. In contrast, the terraces closely adhered to the GPS markings, with the most variance distance being less than 0.5 meters compared to actual terraces constructed. Considering the required slope degrees for terrace planting, which must be greater than 6 degrees, the study found that approximately 65% of the terracing was constructed at a 12-degree slope, while over 50% of the terracing was constructed at slopes exceeding the minimum degrees. Utilizing blueprint replanting promising strategies for optimizing land utilization in agriculture. This approach harnesses technology and meticulous planning to yield advantages, including increased efficiency, enhanced sustainability, and cost reduction. From this study, practical implementation of this technique can lead to tangible and significant improvements in agricultural sectors. In boosting further efficiencies, future initiatives will require more sophisticated techniques and the incorporation of precision GPS devices for upcoming blueprint replanting projects besides strategic progression aims to guarantee the precision of both blueprint design stages and its subsequent implementation on the field. Looking ahead, automating digital blueprints are necessary to reduce time, workforce, and costs in commercial production.

Keywords: replanting, geospatial, precision agriculture, blueprint

Procedia PDF Downloads 53
1 On the Utility of Bidirectional Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of the flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on the spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts, as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with an attention mechanism. In previous works on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work, with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on the presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: machine learning, classification and regression, gene circuit design, bidirectional transformers

Procedia PDF Downloads 37