Search results for: text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3271

Search results for: text classification

2611 A U-Net Based Architecture for Fast and Accurate Diagram Extraction

Authors: Revoti Prasad Bora, Saurabh Yadav, Nikita Katyal

Abstract:

In the context of educational data mining, the use case of extracting information from images containing both text and diagrams is of high importance. Hence, document analysis requires the extraction of diagrams from such images and processes the text and diagrams separately. To the author’s best knowledge, none among plenty of approaches for extracting tables, figures, etc., suffice the need for real-time processing with high accuracy as needed in multiple applications. In the education domain, diagrams can be of varied characteristics viz. line-based i.e. geometric diagrams, chemical bonds, mathematical formulas, etc. There are two broad categories of approaches that try to solve similar problems viz. traditional computer vision based approaches and deep learning approaches. The traditional computer vision based approaches mainly leverage connected components and distance transform based processing and hence perform well in very limited scenarios. The existing deep learning approaches either leverage YOLO or faster-RCNN architectures. These approaches suffer from a performance-accuracy tradeoff. This paper proposes a U-Net based architecture that formulates the diagram extraction as a segmentation problem. The proposed method provides similar accuracy with a much faster extraction time as compared to the mentioned state-of-the-art approaches. Further, the segmentation mask in this approach allows the extraction of diagrams of irregular shapes.

Keywords: computer vision, deep-learning, educational data mining, faster-RCNN, figure extraction, image segmentation, real-time document analysis, text extraction, U-Net, YOLO

Procedia PDF Downloads 123
2610 Hybrid Structure Learning Approach for Assessing the Phosphate Laundries Impact

Authors: Emna Benmohamed, Hela Ltifi, Mounir Ben Ayed

Abstract:

Bayesian Network (BN) is one of the most efficient classification methods. It is widely used in several fields (i.e., medical diagnostics, risk analysis, bioinformatics research). The BN is defined as a probabilistic graphical model that represents a formalism for reasoning under uncertainty. This classification method has a high-performance rate in the extraction of new knowledge from data. The construction of this model consists of two phases for structure learning and parameter learning. For solving this problem, the K2 algorithm is one of the representative data-driven algorithms, which is based on score and search approach. In addition, the integration of the expert's knowledge in the structure learning process allows the obtainment of the highest accuracy. In this paper, we propose a hybrid approach combining the improvement of the K2 algorithm called K2 algorithm for Parents and Children search (K2PC) and the expert-driven method for learning the structure of BN. The evaluation of the experimental results, using the well-known benchmarks, proves that our K2PC algorithm has better performance in terms of correct structure detection. The real application of our model shows its efficiency in the analysis of the phosphate laundry effluents' impact on the watershed in the Gafsa area (southwestern Tunisia).

Keywords: Bayesian network, classification, expert knowledge, structure learning, surface water analysis

Procedia PDF Downloads 116
2609 An End-to-end Piping and Instrumentation Diagram Information Recognition System

Authors: Taekyong Lee, Joon-Young Kim, Jae-Min Cha

Abstract:

Piping and instrumentation diagram (P&ID) is an essential design drawing describing the interconnection of process equipment and the instrumentation installed to control the process. P&IDs are modified and managed throughout a whole life cycle of a process plant. For the ease of data transfer, P&IDs are generally handed over from a design company to an engineering company as portable document format (PDF) which is hard to be modified. Therefore, engineering companies have to deploy a great deal of time and human resources only for manually converting P&ID images into a computer aided design (CAD) file format. To reduce the inefficiency of the P&ID conversion, various symbols and texts in P&ID images should be automatically recognized. However, recognizing information in P&ID images is not an easy task. A P&ID image usually contains hundreds of symbol and text objects. Most objects are pretty small compared to the size of a whole image and are densely packed together. Traditional recognition methods based on geometrical features are not capable enough to recognize every elements of a P&ID image. To overcome these difficulties, state-of-the-art deep learning models, RetinaNet and connectionist text proposal network (CTPN) were used to build a system for recognizing symbols and texts in a P&ID image. Using the RetinaNet and the CTPN model carefully modified and tuned for P&ID image dataset, the developed system recognizes texts, equipment symbols, piping symbols and instrumentation symbols from an input P&ID image and save the recognition results as the pre-defined extensible markup language format. In the test using a commercial P&ID image, the P&ID information recognition system correctly recognized 97% of the symbols and 81.4% of the texts.

Keywords: object recognition system, P&ID, symbol recognition, text recognition

Procedia PDF Downloads 140
2608 A Framework for Automated Nuclear Waste Classification

Authors: Seonaid Hume, Gordon Dobie, Graeme West

Abstract:

Detecting and localizing radioactive sources is a necessity for safe and secure decommissioning of nuclear facilities. An important aspect for the management of the sort-and-segregation process is establishing the spatial distributions and quantities of the waste radionuclides, their type, corresponding activity, and ultimately classification for disposal. The data received from surveys directly informs decommissioning plans, on-site incident management strategies, the approach needed for a new cell, as well as protecting the workforce and the public. Manual classification of nuclear waste from a nuclear cell is time-consuming, expensive, and requires significant expertise to make the classification judgment call. Also, in-cell decommissioning is still in its relative infancy, and few techniques are well-developed. As with any repetitive and routine tasks, there is the opportunity to improve the task of classifying nuclear waste using autonomous systems. Hence, this paper proposes a new framework for the automatic classification of nuclear waste. This framework consists of five main stages; 3D spatial mapping and object detection, object classification, radiological mapping, source localisation based on gathered evidence and finally, waste classification. The first stage of the framework, 3D visual mapping, involves object detection from point cloud data. A review of related applications in other industries is provided, and recommendations for approaches for waste classification are made. Object detection focusses initially on cylindrical objects since pipework is significant in nuclear cells and indeed any industrial site. The approach can be extended to other commonly occurring primitives such as spheres and cubes. This is in preparation of stage two, characterizing the point cloud data and estimating the dimensions, material, degradation, and mass of the objects detected in order to feature match them to an inventory of possible items found in that nuclear cell. Many items in nuclear cells are one-offs, have limited or poor drawings available, or have been modified since installation, and have complex interiors, which often and inadvertently pose difficulties when accessing certain zones and identifying waste remotely. Hence, this may require expert input to feature match objects. The third stage, radiological mapping, is similar in order to facilitate the characterization of the nuclear cell in terms of radiation fields, including the type of radiation, activity, and location within the nuclear cell. The fourth stage of the framework takes the visual map for stage 1, the object characterization from stage 2, and radiation map from stage 3 and fuses them together, providing a more detailed scene of the nuclear cell by identifying the location of radioactive materials in three dimensions. The last stage involves combining the evidence from the fused data sets to reveal the classification of the waste in Bq/kg, thus enabling better decision making and monitoring for in-cell decommissioning. The presentation of the framework is supported by representative case study data drawn from an application in decommissioning from a UK nuclear facility. This framework utilises recent advancements of the detection and mapping capabilities of complex radiation fields in three dimensions to make the process of classifying nuclear waste faster, more reliable, cost-effective and safer.

Keywords: nuclear decommissioning, radiation detection, object detection, waste classification

Procedia PDF Downloads 189
2607 Temporality in Architecture and Related Knowledge

Authors: Gonca Z. Tuncbilek

Abstract:

Architectural research tends to define architecture in terms of its permanence. In this study, the term ‘temporality’ and its use in architectural discourse is re-visited. The definition, proposition, and efficacy of the temporality occur both in architecture and in its related knowledge. The temporary architecture not only fulfills the requirement of the architectural programs, but also plays a significant role in generating an environment of architectural discourse. In recent decades, there is a great interest on the temporary architectural practices regarding to the installations, exhibition spaces, pavilions, and expositions; inviting the architects to experience and think about architecture. The temporary architecture has a significant role among the architecture, the architect, and the architectural discourse. Experiencing the contemporary materials, methods and technique; they have proposed the possibilities of the future architecture. These structures give opportunities to the architects to a wide-ranging variety of freedoms to experience the ‘new’ in architecture. In addition to this experimentation, they can be considered as an agent to redefine and reform the boundaries of the architectural discipline itself. Although the definition of architecture is re-analyzed in terms of its temporality rather than its permanence; architecture, in reality, still relies on historically codified types and principles of the formation. The concept of type can be considered for several different sciences, and there is a tendency to organize and understand the world in terms of classification in many different cultures and places. ‘Type’ is used as a classification tool with/without the scope of the critical invention. This study considers theories of type, putting forward epistemological and discursive arguments related to the form of architecture, being related to historical and formal disciplinary knowledge in architecture. This study has been to emphasize the importance of the temporality in architecture as a creative tool to reveal the position within the architectural discourse. The temporary architecture offers ‘new’ opportunities in the architectural field to be analyzed. In brief, temporary structures allow the architect freedoms to the experimentation in architecture. While redefining the architecture in terms of temporality, architecture still relies on historically codified types (pavilions, exhibitions, expositions, and installations). The notion of architectural types and its varying interpretations are analyzed based on the texts of architectural theorists since the Age of Enlightenment. Investigating the classification of type in architecture particularly temporary architecture, it is necessary to return to the discussion of the origin of the knowledge and its classification.

Keywords: classification of architecture, exhibition design, pavilion design, temporary architecture

Procedia PDF Downloads 359
2606 Composite Kernels for Public Emotion Recognition from Twitter

Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang

Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

Keywords: emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining

Procedia PDF Downloads 208
2605 Psychological Nano-Therapy: A New Method in Family Therapy

Authors: Siamak Samani, Nadereh Sohrabi

Abstract:

Psychological nano-therapy is a new method based on systems theory. According to the theory, systems with severe dysfunctions are resistant to changes. Psychological nano-therapy helps the therapists to break this ice. Two key concepts in psychological nano-therapy are nano-functions and nano-behaviors. The most important step in psychological nano-therapy in family therapy is selecting the most effective nano-function and nano-behavior. The aim of this study was to check the effectiveness of psychological nano-therapy for family therapy. One group pre-test-post-test design (quasi-experimental Design) was applied for research. The sample consisted of ten families with severe marital conflict. The important character of these families was resistance for participating in family therapy. In this study, sending respectful (nano-function) text massages (nano-behavior) with cell phone were applied as a treatment. Cohesion/respect sub scale from self-report family processes scale and family readiness for therapy scale were used to assess all family members in pre-test and post-test. In this study, one of family members was asked to send a respectful text massage to other family members every day for a week. The content of the text massages were selected and checked by therapist. To compare the scores of families in pre-test and post-test paired sample t-test was used. The results of the test showed significant differences in both cohesion/respect score and family readiness for therapy between per-test and post-test. The results revealed that these families have found a better atmosphere for participation in a complete family therapy program. Indeed, this study showed that psychological nano-therapy is an effective method to make family readiness for therapy.

Keywords: family therapy, family conflicts, nano-therapy, family readiness

Procedia PDF Downloads 647
2604 Jalal-Ale-Ahmad and ‘Critical Consciousness’: A Comparative Study

Authors: Zohreh Ramin

Abstract:

One of the most important contributions that Edward Said has had in the realm of critical theory is his insistence on the worldliness of the text and the critic. By this, Said meant that the critic and the text must be considered in their ‘material’ contexts. Foregrounding the substantial role of a critic as embodying what he refers to as ‘critical consciousness’, a true critic, Said maintains, is one who can stand between the ‘dominant culture’ and ‘the totalizing forms of critical systems.’ Considered as one of Iran’s major contemporary intellectuals, Jalal Ale Ahmad is responsible for introducing the idea of ‘Westoxication’ in Iran, constructing a social paradigm of the necessity to return to tradition in contemporary Iran. The present paper intends to study Al-Ahmad’s definition of the orient versus the occident, his criticism of the ‘machination’ of contemporary Iranian society, and his solution to the problem of ‘Westoxication’. The objective of this study is to see whether Ale Ahmad can be considered as embodying the spirit of ‘critical consciousness’ as described by Said as the necessary tool in the hands of an intellectual who is simultaneously attached filitavely to his culture but can detach himself affilitavely through employing critical consciousness.

Keywords: Westoxication, filiative, affiliative, machination

Procedia PDF Downloads 166
2603 Roof Material Detection Based on Object-Based Approach Using WorldView-2 Satellite Imagery

Authors: Ebrahim Taherzadeh, Helmi Z. M. Shafri, Kaveh Shahi

Abstract:

One of the most important tasks in urban area remote sensing is detection of impervious surface (IS), such as building roof and roads. However, detection of IS in heterogeneous areas still remains as one of the most challenging works. In this study, detection of concrete roof using an object-oriented approach was proposed. A new rule-based classification was developed to detect concrete roof tile. The proposed rule-based classification was applied to WorldView-2 image. Results showed that the proposed rule has good potential to predict concrete roof material from WorldView-2 images with 85% accuracy.

Keywords: object-based, roof material, concrete tile, WorldView-2

Procedia PDF Downloads 412
2602 Global Positioning System Match Characteristics as a Predictor of Badminton Players’ Group Classification

Authors: Yahaya Abdullahi, Ben Coetzee, Linda Van Den Berg

Abstract:

The study aimed at establishing the global positioning system (GPS) determined singles match characteristics that act as predictors of successful and less-successful male singles badminton players’ group classification. Twenty-two (22) male single players (aged: 23.39 ± 3.92 years; body stature: 177.11 ± 3.06cm; body mass: 83.46 ± 14.59kg) who represented 10 African countries participated in the study. Players were categorised as successful and less-successful players according to the results of five championships’ of the 2014/2015 season. GPS units (MinimaxX V4.0), Polar Heart Rate Transmitter Belts and digital video cameras were used to collect match data. GPS-related variables were corrected for match duration and independent t-tests, a cluster analysis and a binary forward stepwise logistic regression were calculated. A Receiver Operating Characteristic Curve (ROC) was used to determine the validity of the group classification model. High-intensity accelerations per second were identified as the only GPS-determined variable that showed a significant difference between groups. Furthermore, only high-intensity accelerations per second (p=0.03) and low-intensity efforts per second (p=0.04) were identified as significant predictors of group classification with 76.88% of players that could be classified back into their original groups by making use of the GPS-based logistic regression formula. The ROC showed a value of 0.87. The identification of the last-mentioned GPS-related variables for the attainment of badminton performances, emphasizes the importance of using badminton drills and conditioning techniques to not only improve players’ physical fitness levels but also their abilities to accelerate at high intensities.

Keywords: badminton, global positioning system, match analysis, inertial movement analysis, intensity, effort

Procedia PDF Downloads 182
2601 Ancient Latin Language and Haiku Poetry: A Case Study between Teaching and Translation Studies

Authors: Arianna Sacerdoti

Abstract:

The translation of Haiku Poetry into Latin is fundamentally experimental in nature. One of the first seminal books containing such translations, alongside translations into different modern languages, 'A Piedi Scalzi', was written by Tartamella in 2016. The results of a text-oriented study of this book will be commented upon and analyzed. The author Arianna Sacerdoti made similar translations with high school student. Such an experiment garners interest across a diverse range of disciplines such as teaching, translation studies, and classics reception studies. The methodology employed is text-oriented as the Haiku poem translations will be commented on by considering their relationship with the original. The results of this investigation, conducted within the field of experimental teaching, are expected to confirm the usefulness of this approach to the teaching of Latin and its potential to actively involve students in identifying the diachronic differences between the world of classical antiquity and the contemporary one.

Keywords: ancient latin, Haiku, translation studies, reception of classics

Procedia PDF Downloads 115
2600 Revisiting the Swadesh Wordlist: How Long Should It Be

Authors: Feda Negesse

Abstract:

One of the most important indicators of research quality is a good data - collection instrument that can yield reliable and valid data. The Swadesh wordlist has been used for more than half a century for collecting data in comparative and historical linguistics though arbitrariness is observed in its application and size. This research compare s the classification results of the 100 Swadesh wordlist with those of its subsets to determine if reducing the size of the wordlist impact s its effectiveness. In the comparison, the 100, 50 and 40 wordlists were used to compute lexical distances of 29 Cushitic and Semitic languages spoken in Ethiopia and neighbouring countries. Gabmap, a based application, was employed to compute the lexical distances and to divide the languages into related clusters. The study shows that the subsets are not as effective as the 100 wordlist in clustering languages into smaller subgroups but they are equally effective in di viding languages into bigger groups such as subfamilies. It is noted that the subsets may lead to an erroneous classification whereby unrelated languages by chance form a cluster which is not attested by a comparative study. The chance to get a wrong result is higher when the subsets are used to classify languages which are not closely related. Though a further study is still needed to settle the issues around the size of the Swadesh wordlist, this study indicates that the 50 and 40 wordlists cannot be recommended as reliable substitute s for the 100 wordlist under all circumstances. The choice seems to be determined by the objective of a researcher and the degree of affiliation among the languages to be classified.

Keywords: classification, Cushitic, Swadesh, wordlist

Procedia PDF Downloads 286
2599 3D Classification Optimization of Low-Density Airborne Light Detection and Ranging Point Cloud by Parameters Selection

Authors: Baha Eddine Aissou, Aichouche Belhadj Aissa

Abstract:

Light detection and ranging (LiDAR) is an active remote sensing technology used for several applications. Airborne LiDAR is becoming an important technology for the acquisition of a highly accurate dense point cloud. A classification of airborne laser scanning (ALS) point cloud is a very important task that still remains a real challenge for many scientists. Support vector machine (SVM) is one of the most used statistical learning algorithms based on kernels. SVM is a non-parametric method, and it is recommended to be used in cases where the data distribution cannot be well modeled by a standard parametric probability density function. Using a kernel, it performs a robust non-linear classification of samples. Often, the data are rarely linearly separable. SVMs are able to map the data into a higher-dimensional space to become linearly separable, which allows performing all the computations in the original space. This is one of the main reasons that SVMs are well suited for high-dimensional classification problems. Only a few training samples, called support vectors, are required. SVM has also shown its potential to cope with uncertainty in data caused by noise and fluctuation, and it is computationally efficient as compared to several other methods. Such properties are particularly suited for remote sensing classification problems and explain their recent adoption. In this poster, the SVM classification of ALS LiDAR data is proposed. Firstly, connected component analysis is applied for clustering the point cloud. Secondly, the resulting clusters are incorporated in the SVM classifier. Radial basic function (RFB) kernel is used due to the few numbers of parameters (C and γ) that needs to be chosen, which decreases the computation time. In order to optimize the classification rates, the parameters selection is explored. It consists to find the parameters (C and γ) leading to the best overall accuracy using grid search and 5-fold cross-validation. The exploited LiDAR point cloud is provided by the German Society for Photogrammetry, Remote Sensing, and Geoinformation. The ALS data used is characterized by a low density (4-6 points/m²) and is covering an urban area located in residential parts of the city Vaihingen in southern Germany. The class ground and three other classes belonging to roof superstructures are considered, i.e., a total of 4 classes. The training and test sets are selected randomly several times. The obtained results demonstrated that a parameters selection can orient the selection in a restricted interval of (C and γ) that can be further explored but does not systematically lead to the optimal rates. The SVM classifier with hyper-parameters is compared with the most used classifiers in literature for LiDAR data, random forest, AdaBoost, and decision tree. The comparison showed the superiority of the SVM classifier using parameters selection for LiDAR data compared to other classifiers.

Keywords: classification, airborne LiDAR, parameters selection, support vector machine

Procedia PDF Downloads 140
2598 Energy Detection Based Sensing and Primary User Traffic Classification for Cognitive Radio

Authors: Urvee B. Trivedi, U. D. Dalal

Abstract:

As wireless communication services grow quickly; the seriousness of spectrum utilization has been on the rise gradually. An emerging technology, cognitive radio has come out to solve today’s spectrum scarcity problem. To support the spectrum reuse functionality, secondary users are required to sense the radio frequency environment, and once the primary users are found to be active, the secondary users are required to vacate the channel within a certain amount of time. Therefore, spectrum sensing is of significant importance. Once sensing is done, different prediction rules apply to classify the traffic pattern of primary user. Primary user follows two types of traffic patterns: periodic and stochastic ON-OFF patterns. A cognitive radio can learn the patterns in different channels over time. Two types of classification methods are discussed in this paper, by considering edge detection and by using autocorrelation function. Edge detection method has a high accuracy but it cannot tolerate sensing errors. Autocorrelation-based classification is applicable in the real environment as it can tolerate some amount of sensing errors.

Keywords: cognitive radio (CR), probability of detection (PD), probability of false alarm (PF), primary user (PU), secondary user (SU), fast Fourier transform (FFT), signal to noise ratio (SNR)

Procedia PDF Downloads 337
2597 Move Analysis of Death Row Statements: An Explanatory Study Applied to Death Row Statements in Texas Department of Criminal Justice Website

Authors: Giya Erina

Abstract:

Linguists have analyzed the rhetorical structure of various forensic genres, but only a few have investigated the complete structure of death row statements. Unlike other forensic text types, such as suicide or ransom notes, the focus of death row statement analysis is not the authenticity or falsity of the text, but its intended meaning and its communicative purpose. As it constitutes their last statement before their execution, there are probably many things that inmates would like to express. This study mainly examines the rhetorical moves of 200 death row statements from the Texas Department of Criminal Justice website using rhetorical move analysis. The rhetorical moves identified in the statements will be classified based on their communicative purpose, and they will be grouped into moves and steps. A move structure will finally be suggested from the most common or characteristic moves and steps, as well as some sub-moves. However, because of some statements’ atypicality, some moves may appear in different parts of the texts or not at all.

Keywords: Death row statements, forensic linguistics, genre analysis, move analysis

Procedia PDF Downloads 287
2596 Predictive Analytics of Student Performance Determinants

Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi

Abstract:

Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine, Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis, and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.

Keywords: student performance, supervised machine learning, classification, cross-validation, prediction

Procedia PDF Downloads 112
2595 Deep Learning Approach to Trademark Design Code Identification

Authors: Girish J. Showkatramani, Arthi M. Krishna, Sashi Nareddi, Naresh Nula, Aaron Pepe, Glen Brown, Greg Gabel, Chris Doninger

Abstract:

Trademark examination and approval is a complex process that involves analysis and review of the design components of the marks such as the visual representation as well as the textual data associated with marks such as marks' description. Currently, the process of identifying marks with similar visual representation is done manually in United States Patent and Trademark Office (USPTO) and takes a considerable amount of time. Moreover, the accuracy of these searches depends heavily on the experts determining the trademark design codes used to catalog the visual design codes in the mark. In this study, we explore several methods to automate trademark design code classification. Based on recent successes of convolutional neural networks in image classification, we have used several different convolutional neural networks such as Google’s Inception v3, Inception-ResNet-v2, and Xception net. The study also looks into other techniques to augment the results from CNNs such as using Open Source Computer Vision Library (OpenCV) to pre-process the images. This paper reports the results of the various models trained on year of annotated trademark images.

Keywords: trademark design code, convolutional neural networks, trademark image classification, trademark image search, Inception-ResNet-v2

Procedia PDF Downloads 218
2594 The Mineralogy of Shales from the Pilbara and How Chemical Weathering Affects the Intact Strength

Authors: Arturo Maldonado

Abstract:

In the iron ore mining industry, the intact strength of rock units is defined using the uniaxial compressive strength (UCS). This parameter is very important for the classification of shale materials, allowing the split between rock and cohesive soils based on the magnitude of UCS. For this research, it is assumed that UCS less than or equal to 1 MPa is representative of soils. Several researchers have anticipated that the magnitude of UCS reduces with weathering progression, also since UCS is a directional property, its magnitude depends upon the rock fabric orientation. Thus, the paper presents how the UCS of shales is affected by both weathering grade and bedding orientation. The mineralogy of shales has been defined using Hyper-spectral and chemical assays to define the mineral constituents of shale and other non-shale materials. Geological classification tools have been used to define distinct lithological types, and in this manner, the author uses mineralogical datasets to recognize and isolate shales from other rock types and develop tertiary plots for fresh and weathered shales. The mineralogical classification of shales has reduced the contamination of lithology types and facilitated the study of the physical factors affecting the intact strength of shales, like anisotropic strength due to bedding orientation. The analysis of mineralogical characteristics of shales is perhaps the most important contribution of this paper to other researchers who may wish to explore similar methods.

Keywords: rock mechanics, mineralogy, shales, weathering, anisotropy

Procedia PDF Downloads 35
2593 Proposal for a Web System for the Control of Fungal Diseases in Grapes in Fruits Markets

Authors: Carlos Tarmeño Noriega, Igor Aguilar Alonso

Abstract:

Fungal diseases are common in vineyards; they cause a decrease in the quality of the products that can be sold, generating distrust of the customer towards the seller when buying fruit. Currently, technology allows the classification of fruits according to their characteristics thanks to artificial intelligence. This study proposes the implementation of a control system that allows the identification of the main fungal diseases present in the Italia grape, making use of a convolutional neural network (CNN), OpenCV, and TensorFlow. The methodology used was based on a collection of 20 articles referring to the proposed research on quality control, classification, and recognition of fruits through artificial vision techniques.

Keywords: computer vision, convolutional neural networks, quality control, fruit market, OpenCV, TensorFlow

Procedia PDF Downloads 66
2592 Financial Reports and Common Ownership: An Analysis of the Mechanisms Common Owners Use to Induce Anti-Competitive Behavior

Authors: Kevin Smith

Abstract:

Publicly traded company in the US are legally obligated to host earnings calls that discuss their most recent financial reports. During these calls, investors are able to ask these companies questions about these financial reports and on the future direction of the company. This paper examines whether common institutional owners use these calls as a way to indirectly signal to companies in their portfolio to not take actions that could hurt the common owner's interests. This paper uses transcripts taken from the earnings calls of the six largest health insurance companies in the US from 2014 to 2019. This data is analyzed using text analysis and sentiment analysis to look for patterns in the statements made by common owners. The analysis found that common owners where more likely to recommend against direct price competition and instead redirect the insurance companies towards more passive actions, like investing in new technologies. This result indicates a mechanism that common owners use to reduce competition in the health insurance market.

Keywords: common ownership, text analysis, sentiment analysis, machine learning

Procedia PDF Downloads 61
2591 A Review of Research on Pre-training Technology for Natural Language Processing

Authors: Moquan Gong

Abstract:

In recent years, with the rapid development of deep learning, pre-training technology for natural language processing has made great progress. The early field of natural language processing has long used word vector methods such as Word2Vec to encode text. These word vector methods can also be regarded as static pre-training techniques. However, this context-free text representation brings very limited improvement to subsequent natural language processing tasks and cannot solve the problem of word polysemy. ELMo proposes a context-sensitive text representation method that can effectively handle polysemy problems. Since then, pre-training language models such as GPT and BERT have been proposed one after another. Among them, the BERT model has significantly improved its performance on many typical downstream tasks, greatly promoting the technological development in the field of natural language processing, and has since entered the field of natural language processing. The era of dynamic pre-training technology. Since then, a large number of pre-trained language models based on BERT and XLNet have continued to emerge, and pre-training technology has become an indispensable mainstream technology in the field of natural language processing. This article first gives an overview of pre-training technology and its development history, and introduces in detail the classic pre-training technology in the field of natural language processing, including early static pre-training technology and classic dynamic pre-training technology; and then briefly sorts out a series of enlightening technologies. Pre-training technology, including improved models based on BERT and XLNet; on this basis, analyze the problems faced by current pre-training technology research; finally, look forward to the future development trend of pre-training technology.

Keywords: natural language processing, pre-training, language model, word vectors

Procedia PDF Downloads 37
2590 StockTwits Sentiment Analysis on Stock Price Prediction

Authors: Min Chen, Rubi Gupta

Abstract:

Understanding and predicting stock market movements is a challenging problem. It is believed stock markets are partially driven by public sentiments, which leads to numerous research efforts to predict stock market trend using public sentiments expressed on social media such as Twitter but with limited success. Recently a microblogging website StockTwits is becoming increasingly popular for users to share their discussions and sentiments about stocks and financial market. In this project, we analyze the text content of StockTwits tweets and extract financial sentiment using text featurization and machine learning algorithms. StockTwits tweets are first pre-processed using techniques including stopword removal, special character removal, and case normalization to remove noise. Features are extracted from these preprocessed tweets through text featurization process using bags of words, N-gram models, TF-IDF (term frequency-inverse document frequency), and latent semantic analysis. Machine learning models are then trained to classify the tweets' sentiment as positive (bullish) or negative (bearish). The correlation between the aggregated daily sentiment and daily stock price movement is then investigated using Pearson’s correlation coefficient. Finally, the sentiment information is applied together with time series stock data to predict stock price movement. The experiments on five companies (Apple, Amazon, General Electric, Microsoft, and Target) in a duration of nine months demonstrate the effectiveness of our study in improving the prediction accuracy.

Keywords: machine learning, sentiment analysis, stock price prediction, tweet processing

Procedia PDF Downloads 137
2589 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics

Procedia PDF Downloads 409
2588 Shaking the Iceberg: Metaphoric Shifting and Loss in the German Translations of 'The Sun Also Rises'

Authors: Christopher Dick

Abstract:

While the translation of 'literal language' poses numerous challenges for the translator, the translation of 'figurative language' creates even more complicated issues. It has been only in the last several decades that scholars have attempted to propose theories of figurative language translation, including metaphor translation. Even less work has applied these theories to metaphoric translation in literary texts. And almost no work has linked an analysis of metaphors in translation with the recent scholarship on conceptual metaphors. A study of literature in translation must not only examine the inevitable shifts that occur as specific metaphors move from source language to target language but also analyze the ways in which these shifts impact conceptual metaphors and, ultimately, the text as a whole. Doing so contributes to on-going efforts to bridge the sometimes wide gulf between considerations of content and form in literary studies. This paper attempts to add to the body of scholarly literature on metaphor translation and the function of metaphor in a literary text. Specifically, the study examines the metaphoric expressions in Hemingway’s The Sun Also Rises. First, the issue of Hemingway and metaphor is addressed. Next, the study examines the specific metaphors in the original novel in English and the German translations, first in Annemarie Horschitz’s 1928 German version and then in the recent Werner Schmitz 2013 translation. Hemingway’s metaphors, far from being random occurrences of figurative language, are linguistic manifestations of deeper conceptual metaphors that are central to an interpretation of the text. By examining the modifications that are made to these original metaphoric expressions as they are translated into German, one can begin to appreciate the shifts involved with metaphor translation. The translation of Hemingway’s metaphors into German represents significant metaphoric loss and shifting that subsequently shakes the important conceptual metaphors in the novel.

Keywords: Hemingway, Conceptual Metaphor, Translation, Stylistics

Procedia PDF Downloads 341
2587 Monitoring of Cannabis Cultivation with High-Resolution Images

Authors: Levent Basayigit, Sinan Demir, Burhan Kara, Yusuf Ucar

Abstract:

Cannabis is mostly used for drug production. In some countries, an excessive amount of illegal cannabis is cultivated and sold. Most of the illegal cannabis cultivation occurs on the lands far from settlements. In farmlands, it is cultivated with other crops. In this method, cannabis is surrounded by tall plants like corn and sunflower. It is also cultivated with tall crops as the mixed culture. The common method of the determination of the illegal cultivation areas is to investigate the information obtained from people. This method is not sufficient for the determination of illegal cultivation in remote areas. For this reason, more effective methods are needed for the determination of illegal cultivation. Remote Sensing is one of the most important technologies to monitor the plant growth on the land. The aim of this study is to monitor cannabis cultivation area using satellite imagery. The main purpose of this study was to develop an applicable method for monitoring the cannabis cultivation. For this purpose, cannabis was grown as single or surrounded by the corn and sunflower in plots. The morphological characteristics of cannabis were recorded two times per month during the vegetation period. The spectral signature library was created with the spectroradiometer. The parcels were monitored with high-resolution satellite imagery. With the processing of satellite imagery, the cultivation areas of cannabis were classified. To separate the Cannabis plots from the other plants, the multiresolution segmentation algorithm was found to be the most successful for classification. WorldView Improved Vegetative Index (WV-VI) classification was the most accurate method for monitoring the plant density. As a result, an object-based classification method and vegetation indices were sufficient for monitoring the cannabis cultivation in multi-temporal Earthwiev images.

Keywords: Cannabis, drug, remote sensing, object-based classification

Procedia PDF Downloads 262
2586 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

Introduction: The problems of unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many research papers found that the performance of existing classifier tends to be biased towards the majority class. The k -nearest neighbors’ nonparametric discriminant analysis is one method that was proposed for classifying unbalanced classes with good performance. Hence, the methods of discriminant analysis are of interest to us in investigating misclassification error rates for class-imbalanced data of three diabetes risk groups. Objective: The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification application of class-imbalanced data of diabetes risk groups. Methods: Data from a healthy project for 599 staffs in a government hospital in Bangkok were obtained for the classification problem. The staffs were diagnosed into one of three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data along with the variables; diabetes risk group, age, gender, cholesterol, and BMI was analyzed and bootstrapped up to 50 and 100 samples, 599 observations per sample, for additional estimation of misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples show non-normality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. In finding the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions with three choices of (0.90:0.05:0.05), (0.80: 0.10: 0.10) or (0.70, 0.15, 0.15). Results: The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k = 3 or k = 4 and the prior probabilities of {non-risk:risk:diabetic} as {0.90:0.05:0.05} or {0.80:0.10:0.10} gave the smallest error rate of misclassification. Conclusion: The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: error rate, bootstrap, diabetes risk groups, k-nearest neighbors

Procedia PDF Downloads 425
2585 2D Point Clouds Features from Radar for Helicopter Classification

Authors: Danilo Habermann, Aleksander Medella, Carla Cremon, Yusef Caceres

Abstract:

This paper aims to analyze the ability of 2d point clouds features to classify different models of helicopters using radars. This method does not need to estimate the blade length, the number of blades of helicopters, and the period of their micro-Doppler signatures. It is also not necessary to generate spectrograms (or any other image based on time and frequency domain). This work transforms a radar return signal into a 2D point cloud and extracts features of it. Three classifiers are used to distinguish 9 different helicopter models in order to analyze the performance of the features used in this work. The high accuracy obtained with each of the classifiers demonstrates that the 2D point clouds features are very useful for classifying helicopters from radar signal.

Keywords: helicopter classification, point clouds features, radar, supervised classifiers

Procedia PDF Downloads 208
2584 A Semiotic Framework for Edutainment Cinema

Authors: Robin Gengan

Abstract:

The film industry is one of the most impactful creative sectors in modern social influence. It has relational effects on knowledge and psychological impact on the youth. Much focus in current filmmaking is either in fictional drama or documentary. The purpose of this article is to combine the two into a third genre; edutainment in which film is approached as a visual educational text. Similar to language text, cinema can be applied to semiotic reading. Film interpretation is a phenomenological order, unique to each viewer. There are cultural norms and tropes that are more universal between the practice of semiotic reading, symbolism and interpretation. Film semiotics and narration are a juxtaposition of moving visual texts and sound to create meaning through film codes and social conventions to form an educational narrative that makes the medium effective for learning and teaching. The aim of this article is to explore and set precedence for more creative building-blocks into future research on edutainment cinema. This will further stimulate and benefit innovative entrepreneurial filmmaking and future academic research.

Keywords: cinema, edutainment, epistemology, multimodality, semiotics, structuralism

Procedia PDF Downloads 39
2583 The Power of Words: The Use of Language in Ethan Frome

Authors: Ritu Sharma

Abstract:

In order to be objective, critics must examine the dynamic relationships between the author, the reader, the text, and the outside world. However, it is also crucial to recognize that because the language was created by God, meaning is ingrained in it. Meaning is located in and discovered through literature rather than being limited to the author, reader, text, or the outside world. The link between the author, the reader, and the text is crucial because literature unites an author and a reader through the use of language. Literature is a potent kind of communication, and Ethan Frome's audience is forever changed as a result of the book's language and the language its characters use. The narrative of Ethan Frome and his wife Zeena is presented in Ethan Frome. Ethan's story is told throughout the course of the book, revealed through the eyes of the narrator, an outsider passing through Starkfield, as well as through the insight that the narrator gains from the townspeople and his stay on the Frome farm. The story is set in the rural New England community of Starkfield, Massachusetts. The weather provides the ideal setting for Ethan and the narrator to get to know one another as the narrator gets preoccupied with unraveling the narrative that underlies Ethan's physical anomalies. In addition to telling a gripping tale and capturing human nature as it is, Ethan Frome uses its storyline to achieve something more significant. The book by Edith Wharton supports language. Zeena's deliberate and convincing language challenges relativity and meaninglessness. Ethan and Mattie's effort to effectively use words reflects the complexity of language, and their battle illustrates the influence that language may have if and when it is used. Ethan Frome defends the written word, the foundation upon which it is constructed, as a literary work. Communication is based on language, and as the characters respond to and get involved in disputes throughout the book, Zeena, Ethan, and Mattie, each reflects particular theories of communication that help define their uses of communication within the broader context of language.

Keywords: dynamic relationships, potent, communication, complexity

Procedia PDF Downloads 77
2582 Improved Safety Science: Utilizing a Design Hierarchy

Authors: Ulrica Pettersson

Abstract:

Collection of information on incidents is regularly done through pre-printed incident report forms. These tend to be incomplete and frequently lack essential information. ne consequence is that reports with inadequate information, that do not fulfil analysts’ requirements, are transferred into the analysis process. To improve an incident reporting form, theory in design science, witness psychology and interview and questionnaire research has been used. Previously three experiments have been conducted to evaluate the form and shown significant improved results. The form has proved to capture knowledge, regardless of the incidents’ character or context. The aim in this paper is to describe how design science, in more detail a design hierarchy can be used to construct a collection form for improvements in safety science.

Keywords: data collection, design science, incident reports, safety science

Procedia PDF Downloads 211