Search results for: patent sentiment analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28015

Search results for: patent sentiment analysis

27925 Deep Learning-Based Approach to Automatic Abstractive Summarization of Patent Documents

Authors: Sakshi V. Tantak, Vishap K. Malik, Neelanjney Pilarisetty

Abstract:

A patent is an exclusive right granted for an invention. It can be a product or a process that provides an innovative method of doing something, or offers a new technical perspective or solution to a problem. A patent can be obtained by making the technical information and details about the invention publicly available. The patent owner has exclusive rights to prevent or stop anyone from using the patented invention for commercial uses. Any commercial usage, distribution, import or export of a patented invention or product requires the patent owner’s consent. It has been observed that the central and important parts of patents are scripted in idiosyncratic and complex linguistic structures that can be difficult to read, comprehend or interpret for the masses. The abstracts of these patents tend to obfuscate the precise nature of the patent instead of clarifying it via direct and simple linguistic constructs. This makes it necessary to have an efficient access to this knowledge via concise and transparent summaries. However, as mentioned above, due to complex and repetitive linguistic constructs and extremely long sentences, common extraction-oriented automatic text summarization methods should not be expected to show a remarkable performance when applied to patent documents. Other, more content-oriented or abstractive summarization techniques are able to perform much better and generate more concise summaries. This paper proposes an efficient summarization system for patents using artificial intelligence, natural language processing and deep learning techniques to condense the knowledge and essential information from a patent document into a single summary that is easier to understand without any redundant formatting and difficult jargon.

Keywords: abstractive summarization, deep learning, natural language Processing, patent document

Procedia PDF Downloads 123
27924 Overview and Future Opportunities of Sarcasm Detection on Social Media Communications

Authors: Samaneh Nadali, Masrah Azrifah Azmi Murad, Nurfadhlina Mohammad Sharef

Abstract:

Sarcasm is a common phenomenon in social media which is a nuanced form of language for stating the opposite of what is implied. Due to the intentional ambiguity, analysis of sarcasm is a difficult task not only for a machine but even for a human. Although sarcasm detection has an important effect on sentiment, it is usually ignored in social media analysis because sarcasm analysis is too complicated. While there is a few systems exist which can detect sarcasm, almost no work has been carried out on a study and the review of the existing work in this area. This survey presents a nearly full image of sarcasm detection techniques and the related fields with brief details. The main contributions of this paper include the illustration of the recent trend of research in the sarcasm analysis and we highlight the gaps and propose a new framework that can be explored.

Keywords: sarcasm detection, sentiment analysis, social media, sarcasm analysis

Procedia PDF Downloads 458
27923 Centrality and Patent Impact: Coupled Network Analysis of Artificial Intelligence Patents Based on Co-Cited Scientific Papers

Authors: Xingyu Gao, Qiang Wu, Yuanyuan Liu, Yue Yang

Abstract:

In the era of the knowledge economy, the relationship between scientific knowledge and patents has garnered significant attention. Understanding the intricate interplay between the foundations of science and technological innovation has emerged as a pivotal challenge for both researchers and policymakers. This study establishes a coupled network of artificial intelligence patents based on co-cited scientific papers. Leveraging centrality metrics from network analysis offers a fresh perspective on understanding the influence of information flow and knowledge sharing within the network on patent impact. The study initially obtained patent numbers for 446,890 granted US AI patents from the United States Patent and Trademark Office’s artificial intelligence patent database for the years 2002-2020. Subsequently, specific information regarding these patents was acquired using the Lens patent retrieval platform. Additionally, a search and deduplication process was performed on scientific non-patent references (SNPRs) using the Web of Science database, resulting in the selection of 184,603 patents that cited 37,467 unique SNPRs. Finally, this study constructs a coupled network comprising 59,379 artificial intelligence patents by utilizing scientific papers co-cited in patent backward citations. In this network, nodes represent patents, and if patents reference the same scientific papers, connections are established between them, serving as edges within the network. Nodes and edges collectively constitute the patent coupling network. Structural characteristics such as node degree centrality, betweenness centrality, and closeness centrality are employed to assess the scientific connections between patents, while citation count is utilized as a quantitative metric for patent influence. Finally, a negative binomial model is employed to test the nonlinear relationship between these network structural features and patent influence. The research findings indicate that network structural features such as node degree centrality, betweenness centrality, and closeness centrality exhibit inverted U-shaped relationships with patent influence. Specifically, as these centrality metrics increase, patent influence initially shows an upward trend, but once these features reach a certain threshold, patent influence starts to decline. This discovery suggests that moderate network centrality is beneficial for enhancing patent influence, while excessively high centrality may have a detrimental effect on patent influence. This finding offers crucial insights for policymakers, emphasizing the importance of encouraging moderate knowledge flow and sharing to promote innovation when formulating technology policies. It suggests that in certain situations, data sharing and integration can contribute to innovation. Consequently, policymakers can take measures to promote data-sharing policies, such as open data initiatives, to facilitate the flow of knowledge and the generation of innovation. Additionally, governments and relevant agencies can achieve broader knowledge dissemination by supporting collaborative research projects, adjusting intellectual property policies to enhance flexibility, or nurturing technology entrepreneurship ecosystems.

Keywords: centrality, patent coupling network, patent influence, social network analysis

Procedia PDF Downloads 54
27922 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis

Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan

Abstract:

Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of Big Data Technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centers or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through Vader and Roberta model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and TFIDF Vectorization, and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.

Keywords: counter vectorization, convolutional neural network, crawler, data technology, long short-term memory, web scraping, sentiment analysis

Procedia PDF Downloads 88
27921 The Potential of Sentiment Analysis to Categorize Social Media Comments Using German Libraries

Authors: Felix Boehnisch, Alexander Lutz

Abstract:

Based on the number of users and the amount of content posted daily, Facebook is considered the largest social network in the world. This content includes images or text posts from companies but also private persons, which are also commented on by other users. However, it can sometimes be difficult for companies to keep track of all the posts and the reactions to them, especially when there are several posts a day that contain hundreds to thousands of comments. To facilitate this, the following paper deals with the possible applications of sentiment analysis to social media comments in order to be able to support the work in social media marketing. In a first step, post comments were divided into positive and negative by a subjective rating, then the same comments were checked for their polarity value by the two german python libraries TextBlobDE and SentiWS and also grouped into positive, negative, or even neutral. As a control, the subjective classifications were compared with the machine-generated ones by a confusion matrix, and relevant quality criteria were determined. The accuracy of both libraries was not really meaningful, with 60% to 66%. However, many words or sentences were not evaluated at all, so there seems to be room for optimization to possibly get more accurate results. In future studies, the use of these specific German libraries can be optimized to gain better insights by either applying them to stricter cleaned data or by adding a sentiment value to emojis, which have been removed from the comments in advance, as they are not contained in the libraries.

Keywords: Facebook, German libraries, polarity, sentiment analysis, social media comments

Procedia PDF Downloads 182
27920 Second Time’s a Charm: The Intervention of the European Patent Office on the Strategic Use of Divisional Applications

Authors: Alissa Lefebre

Abstract:

It might seem intuitive to hope for a fast decision on the patent grant. After all, a granted patent provides you with a monopoly position, which allows you to obstruct others from using your technology. However, this does not take into account the strategic advantages one can obtain from keeping their patent applications pending. First, you have the financial advantage of postponing certain fees, although many applicants would probably agree that this is not the main benefit. As the scope of the patent protection is only decided upon at the grant, the pendency period introduces uncertainty amongst rivals. This uncertainty entails not knowing whether the patent will actually get granted and what the scope of protection will be. Consequently, rivals can only depend upon limited and uncertain information when deciding what technology is worth pursuing. One way to keep patent applications pending, is the use of divisional applications. These applicants can be filed out of a parent application as long as that parent application is still pending. This allows the applicant to pursue (part of) the content of the parent application in another application, as the divisional application cannot exceed the scope of the parent application. In a fast-moving and complex market such as the tele- and digital communications, it might allow applicants to obtain an actual monopoly position as competitors are discouraged to pursue a certain technology. Nevertheless, this practice also has downsides to it. First of all, it has an impact on the workload of the examiners at the patent office. As the number of patent filings have been increasing over the last decades, using strategies that increase this number even more, is not desirable from the patent examiners point of view. Secondly, a pending patent does not provide you with the protection of a granted patent, thus not only create uncertainty for the rivals, but also for the applicant. Consequently, the European patent office (EPO) has come up with a “raising the bar initiative” in which they have decided to tackle the strategic use of divisional applications. Over the past years, two rules have been implemented. The first rule in 2010 introduced a time limit, upon which divisional applications could only be filed within a 24-month limit after the first communication with the patent office. However, after carrying-out a user feedback survey, the EPO abolished the rule again in 2014 and replaced it by a fee mechanism. The fee mechanism is still in place today, which might be an indication of a better result compared to the first rule change. This study tests the impact of these rules on the strategic use of divisional applications in the tele- and digital communication industry and provides empirical evidence on their success. Upon using three different survival models, we find overall evidence that divisional applications prolong the pendency time and that only the second rule is able to tackle the strategic patenting and thus decrease the pendency time.

Keywords: divisional applications, regulatory changes, strategic patenting, EPO

Procedia PDF Downloads 128
27919 Forum Shopping in Biotechnology Law: Understanding Conflict of Laws in Protecting GMO-Based Inventions as Part of a Patent Portfolio in the Greater China Region

Authors: Eugene C. Lim

Abstract:

This paper seeks to examine the extent to which ‘forum shopping’ is available to patent filers seeking protection of GMO (genetically modified organisms)-based inventions in Hong Kong. Under Hong Kong’s current re-registration system for standard patents, an inventor must first seek patent protection from one of three Designated Patent Offices (DPO) – those of the People’s Republic of China (PRC), the Europe Union (EU) (designating the UK), or the United Kingdom (UK). The ‘designated patent’ can then be re-registered by the successful patentee in Hong Kong. Interestingly, however, the EU and the PRC do not adopt a harmonized approach toward the patenting of GMOs, and there are discrepancies in their interpretation of the phrase ‘animal or plant variety’. In view of these divergences, the ability to effectively manage ‘conflict of law’ issues is an important priority for multinational biotechnology firms with a patent portfolio in the Greater China region. Generally speaking, both the EU and the PRC exclude ‘animal and plant varieties’ from the scope of patentable subject matter. However, in the EU, Article 4(2) of the Biotechnology Directive allows a genetically modified plant or animal to be patented if its ‘technical feasibility is not limited to a specific variety’. This principle has allowed for certain ‘transgenic’ mammals, such as the ‘Harvard Oncomouse’, to be the subject of a successful patent grant in the EU. There is no corresponding provision on ‘technical feasibility’ in the patent legislation of the PRC. Although the PRC has a sui generis system for protecting plant varieties, its patent legislation allows the patenting of non-biological methods for producing transgenic organisms, not the ‘organisms’ themselves. This might lead to a situation where an inventor can obtain patent protection in Hong Kong over transgenic life forms through the re-registration of a patent from a more ‘biotech-friendly’ DPO, even though the subject matter in question might not be patentable per se in the PRC. Through a comparative doctrinal analysis of legislative provisions, cases and court interpretations, this paper argues that differences in the protection afforded to GMOs do not generally prejudice the ability of global MNCs to obtain patent protection in Hong Kong. Corporations which are able to first obtain patents for GMO-based inventions in Europe can generally use their European patent as the basis for re-registration in Hong Kong, even if such protection might not be available in the PRC itself. However, the more restrictive approach to GMO-based patents adopted in the PRC would be more acutely felt by enterprises and inventors based in mainland China. The broader scope of protection offered to GMO-based patents in Europe might not be available in Hong Kong to mainland Chinese patentees under the current re-registration model for standard patents, unless they have the resources to apply for patent protection as well from another (European) DPO as the basis for re-registration.

Keywords: biotechnology, forum shopping, genetically modified organisms (GMOs), greater China region, patent portfolio

Procedia PDF Downloads 327
27918 Automatic Lead Qualification with Opinion Mining in Customer Relationship Management Projects

Authors: Victor Radich, Tania Basso, Regina Moraes

Abstract:

Lead qualification is one of the main procedures in Customer Relationship Management (CRM) projects. Its main goal is to identify potential consumers who have the ideal characteristics to establish a profitable and long-term relationship with a certain organization. Social networks can be an important source of data for identifying and qualifying leads since interest in specific products or services can be identified from the users’ expressed feelings of (dis)satisfaction. In this context, this work proposes the use of machine learning techniques and sentiment analysis as an extra step in the lead qualification process in order to improve it. In addition to machine learning models, sentiment analysis or opinion mining can be used to understand the evaluation that the user makes of a particular service, product, or brand. The results obtained so far have shown that it is possible to extract data from social networks and combine the techniques for a more complete classification.

Keywords: lead qualification, sentiment analysis, opinion mining, machine learning, CRM, lead scoring

Procedia PDF Downloads 85
27917 Tweets to Touchdowns: Predicting National Football League Achievement from Social Media Optimism

Authors: Rohan Erasala, Ian McCulloh

Abstract:

The NFL Draft is a chance for every NFL team to select their next superstar. As a result, teams heavily invest in scouting, and millions of fans partake in the online discourse surrounding the draft. This paper investigates the potential correlations between positive sentiment in individual draft selection threads from the subreddit r/NFL and if this data can be used to make successful player recommendations. It is hypothesized that there will be limited correlations and nonviable recommendations made from these threads. The hypothesis is tested using sentiment analysis of draft thread comments and analyzing correlation and precision at k of top scores. The results indicate weak correlations between the percentage of positive comments in a draft selection thread and a player’s approximate value, but potentially viable recommendations from looking at players whose draft selection threads have the highest percentage of positive comments.

Keywords: national football league, NFL, NFL Draft, sentiment analysis, Reddit, social media, NLP

Procedia PDF Downloads 85
27916 Estimating Knowledge Flow Patterns of Business Method Patents with a Hidden Markov Model

Authors: Yoonjung An, Yongtae Park

Abstract:

Knowledge flows are a critical source of faster technological progress and stouter economic growth. Knowledge flows have been accelerated dramatically with the establishment of a patent system in which each patent is required by law to disclose sufficient technical information for the invention to be recreated. Patent analysis, thus, has been widely used to help investigate technological knowledge flows. However, the existing research is limited in terms of both subject and approach. Particularly, in most of the previous studies, business method (BM) patents were not covered although they are important drivers of knowledge flows as other patents. In addition, these studies usually focus on the static analysis of knowledge flows. Some use approaches that incorporate the time dimension, yet they still fail to trace a true dynamic process of knowledge flows. Therefore, we investigate dynamic patterns of knowledge flows driven by BM patents using a Hidden Markov Model (HMM). An HMM is a popular statistical tool for modeling a wide range of time series data, with no general theoretical limit in regard to statistical pattern classification. Accordingly, it enables characterizing knowledge patterns that may differ by patent, sector, country and so on. We run the model in sets of backward citations and forward citations to compare the patterns of knowledge utilization and knowledge dissemination.

Keywords: business method patents, dynamic pattern, Hidden-Markov Model, knowledge flow

Procedia PDF Downloads 328
27915 The Fefe Indices: The Direction of Donal Trump’s Tweets Effect on the Stock Market

Authors: Sergio Andres Rojas, Julian Benavides Franco, Juan Tomas Sayago

Abstract:

An increasing amount of research demonstrates how market mood affects financial markets, but their primary goal is to demonstrate how Trump's tweets impacted US interest rate volatility. Following that lead, this work evaluates the effect that Trump's tweets had during his presidency on local and international stock markets, considering not just volatility but the direction of the movement. Three indexes for Trump's tweets were created relating his activity with movements in the S&P500 using natural language analysis and machine learning algorithms. The indexes consider Trump's tweet activity and the positive or negative market sentiment they might inspire. The first explores the relationship between tweets generating negative movements in the S&P500; the second explores positive movements, while the third explores the difference between up and down movements. A pseudo-investment strategy using the indexes produced statistically significant above-average abnormal returns. The findings also showed that the pseudo strategy generated a higher return in the local market if applied to intraday data. However, only a negative market sentiment caused this effect on daily data. These results suggest that the market reacted primarily to a negative idea reflected in the negative index. In the international market, it is not possible to identify a pervasive effect. A rolling window regression model was also performed. The result shows that the impact on the local and international markets is heterogeneous, time-changing, and differentiated for the market sentiment. However, the negative sentiment was more prone to have a significant correlation most of the time.

Keywords: market sentiment, Twitter market sentiment, machine learning, natural dialect analysis

Procedia PDF Downloads 64
27914 Framework for the Assessment of National Systems of Innovation in Biotechnology

Authors: Andrea Schiffauerova, Amnah Alzeyoudi

Abstract:

This paper studies patterns of innovation within national constitutional context. Its objective is to examine national systems of innovation in biotechnology in six leading innovative countries: the US, Japan, Germany, the UK, France and Canada. The framework proposed for this purpose consists of specific factors considered critical for the development of national systems of innovation, which are industry size, innovative activities, area of specialization, industry structure, national policy, the level of government intervention, the stock of knowledge in universities and industries, knowledge transfer from universities to industry and country-specific conditions for start-ups. The paper then uses the framework to provide detailed cross-country comparisons while highlighting particular features of national institutional context which affect the creation and diffusion of scientific knowledge within the system. The study is primarily based on the extensive survey of literature and it is complemented by the quantitative analysis of the patent data extracted from the United States Patent and Trademark Office (USPTO). The empirical analysis provides numerous insights and greatly complements the data gained from the literature and other sources. The final cross-country comparative analysis identifies three patterns followed by the national innovation systems in the six countries. The proposed cross-country relative positioning analysis may help in drawing policy implications and strategies leading to the enhancement of national competitive advantage and innovation capabilities of nations.

Keywords: comparative analysis, framework, national systems of innovation, patent analysis, United States Patent and Trademark Office (USPTO)

Procedia PDF Downloads 313
27913 Real Time Classification of Political Tendency of Twitter Spanish Users based on Sentiment Analysis

Authors: Marc Solé, Francesc Giné, Magda Valls, Nina Bijedic

Abstract:

What people say on social media has turned into a rich source of information to understand social behavior. Specifically, the growing use of Twitter social media for political communication has arisen high opportunities to know the opinion of large numbers of politically active individuals in real time and predict the global political tendencies of a specific country. It has led to an increasing body of research on this topic. The majority of these studies have been focused on polarized political contexts characterized by only two alternatives. Unlike them, this paper tackles the challenge of forecasting Spanish political trends, characterized by multiple political parties, by means of analyzing the Twitters Users political tendency. According to this, a new strategy, named Tweets Analysis Strategy (TAS), is proposed. This is based on analyzing the users tweets by means of discovering its sentiment (positive, negative or neutral) and classifying them according to the political party they support. From this individual political tendency, the global political prediction for each political party is calculated. In order to do this, two different strategies for analyzing the sentiment analysis are proposed: one is based on Positive and Negative words Matching (PNM) and the second one is based on a Neural Networks Strategy (NNS). The complete TAS strategy has been performed in a Big-Data environment. The experimental results presented in this paper reveal that NNS strategy performs much better than PNM strategy to analyze the tweet sentiment. In addition, this research analyzes the viability of the TAS strategy to obtain the global trend in a political context make up by multiple parties with an error lower than 23%.

Keywords: political tendency, prediction, sentiment analysis, Twitter

Procedia PDF Downloads 238
27912 Methodologies for Deriving Semantic Technical Information Using an Unstructured Patent Text Data

Authors: Jaehyung An, Sungjoo Lee

Abstract:

Patent documents constitute an up-to-date and reliable source of knowledge for reflecting technological advance, so patent analysis has been widely used for identification of technological trends and formulation of technology strategies. But, identifying technological information from patent data entails some limitations such as, high cost, complexity, and inconsistency because it rely on the expert’ knowledge. To overcome these limitations, researchers have applied to a quantitative analysis based on the keyword technique. By using this method, you can include a technological implication, particularly patent documents, or extract a keyword that indicates the important contents. However, it only uses the simple-counting method by keyword frequency, so it cannot take into account the sematic relationship with the keywords and sematic information such as, how the technologies are used in their technology area and how the technologies affect the other technologies. To automatically analyze unstructured technological information in patents to extract the semantic information, it should be transformed into an abstracted form that includes the technological key concepts. Specific sentence structure ‘SAO’ (subject, action, object) is newly emerged by representing ‘key concepts’ and can be extracted by NLP (Natural language processor). An SAO structure can be organized in a problem-solution format if the action-object (AO) states that the problem and subject (S) form the solution. In this paper, we propose the new methodology that can extract the SAO structure through technical elements extracting rules. Although sentence structures in the patents text have a unique format, prior studies have depended on general NLP (Natural language processor) applied to the common documents such as newspaper, research paper, and twitter mentions, so it cannot take into account the specific sentence structure types of the patent documents. To overcome this limitation, we identified a unique form of the patent sentences and defined the SAO structures in the patents text data. There are four types of technical elements that consist of technology adoption purpose, application area, tool for technology, and technical components. These four types of sentence structures from patents have their own specific word structure by location or sequence of the part of speech at each sentence. Finally, we developed algorithms for extracting SAOs and this result offer insight for the technology innovation process by providing different perspectives of technology.

Keywords: NLP, patent analysis, SAO, semantic-analysis

Procedia PDF Downloads 262
27911 Organizational Innovations of the 20th Century as High Tech of the 21st: Evidence from Patent Data

Authors: Valery Yakubovich, Shuping wu

Abstract:

Organization theorists have long claimed that organizational innovations are nontechnological, in part because they are unpatentable. The claim rests on the assumption that organizational innovations are abstract ideas embodied in persons and contexts rather than in context-free practical tools. However, over the last three decades, organizational knowledge has been increasingly embodied in digital tools which, in principle, can be patented. To provide the first empirical evidence regarding the patentability of organizational innovations, we trained two machine learning algorithms to identify a population of 205,434 patent applications for organizational technologies (OrgTech) and, among them, 141,285 applications that use organizational innovations accumulated over the 20th century. Our event history analysis of the probability of patenting an OrgTech invention shows that ideas from organizational innovations decrease the probability of patent allowance unless they describe a practical tool. We conclude that the present-day digital transformation places organizational innovations in the realm of high tech and turns the debate about organizational technologies into the challenge of designing practical organizational tools that embody big ideas about organizing. We outline an agenda for patent-based research on OrgTech as an emerging phenomenon.

Keywords: organizational innovation, organizational technology, high tech, patents, machine learning

Procedia PDF Downloads 122
27910 Fuzzy Sentiment Analysis of Customer Product Reviews

Authors: Samaneh Nadali, Masrah Azrifah Azmi Murad

Abstract:

As a result of the growth of the web, people are able to express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in internet forums, discussion groups, and blogs. Therefore, the number of product reviews has grown rapidly. The large numbers of reviews make it difficult for manufacturers or businesses to automatically classify them into different semantic orientations (positive, negative, and neutral). For sentiment classification, most existing methods utilize a list of opinion words whereas this paper proposes a fuzzy approach for evaluating sentiments expressed in customer product reviews, to predict the strength levels (e.g. very weak, weak, moderate, strong and very strong) of customer product reviews by combinations of adjective, adverb and verb. The proposed fuzzy approach has been tested on eight benchmark datasets and obtained 74% accuracy, which leads to help the organization with a more clear understanding of customer's behavior in support of business planning process.

Keywords: fuzzy logic, customer product review, sentiment analysis

Procedia PDF Downloads 363
27909 Information Disclosure And Financial Sentiment Index Using a Machine Learning Approach

Authors: Alev Atak

Abstract:

In this paper, we aim to create a financial sentiment index by investigating the company’s voluntary information disclosures. We retrieve structured content from BIST 100 companies’ financial reports for the period 1998-2018 and extract relevant financial information for sentiment analysis through Natural Language Processing. We measure strategy-related disclosures and their cross-sectional variation and classify report content into generic sections using synonym lists divided into four main categories according to their liquidity risk profile, risk positions, intra-annual information, and exposure to risk. We use Word Error Rate and Cosin Similarity for comparing and measuring text similarity and derivation in sets of texts. In addition to performing text extraction, we will provide a range of text analysis options, such as the readability metrics, word counts using pre-determined lists (e.g., forward-looking, uncertainty, tone, etc.), and comparison with reference corpus (word, parts of speech and semantic level). Therefore, we create an adequate analytical tool and a financial dictionary to depict the importance of granular financial disclosure for investors to identify correctly the risk-taking behavior and hence make the aggregated effects traceable.

Keywords: financial sentiment, machine learning, information disclosure, risk

Procedia PDF Downloads 94
27908 Sunspot Cycles: Illuminating Humanity's Mysteries

Authors: Aghamusa Azizov

Abstract:

This study investigates the correlation between solar activity and sentiment in news media coverage, using a large-scale dataset of solar activity since 1750 and over 15 million articles from "The New York Times" dating from 1851 onwards. Employing Pearson's correlation coefficient and multiple Natural Language Processing (NLP) tools—TextBlob, Vader, and DistillBERT—the research examines the extent to which fluctuations in solar phenomena are reflected in the sentiment of historical news narratives. The findings reveal that the correlation between solar activity and media sentiment is generally negligible, suggesting a weak influence of solar patterns on the portrayal of events in news media. Notably, a moderate positive correlation was observed between the sentiments derived from TextBlob and Vader, indicating consistency across NLP tools. The analysis provides insights into the historical impact of solar activity on human affairs and highlights the importance of using multiple analytical methods to understand complex relationships in large datasets. The study contributes to the broader understanding of how extraterrestrial factors may intersect with media-reported events and underlines the intricate nature of interdisciplinary research in the data science and historical domains.

Keywords: solar activity correlation, media sentiment analysis, natural language processing, historical event patterns

Procedia PDF Downloads 77
27907 Network and Sentiment Analysis of U.S. Congressional Tweets

Authors: Chaitanya Kanakamedala, Hansa Pradhan, Carter Gilbert

Abstract:

Social media platforms, such as Twitter, are excellent datasets for understanding human interactions and sentiments. This report explores social dynamics among US Congressional members through a network analysis applied to a dataset of tweets spanning 2008 to 2017 from the ’US Congressional Tweets Dataset’. In this report, we preform network analysis where connections between users (edges) are established based on a similarity threshold: two tweets are connected if the tweets they post are similar. By utilizing the Natural Language Toolkit (NLTK) and NetworkX, we quantified tweet similarity and constructed a graph comprising various interconnected components. Each component represents a cluster of users with closely aligned content. We then preform sentiment analysis on each cluster to explore the prevalent emotions and opinions within these groups. Our findings reveal that despite the initial expectation of distinct ideological divisions typically aligning with party lines, the analysis exposed a high degree of topical convergence across tweets from different political affiliations. The analysis preformed in this report not only highlights the potential of social media as a tool for political communication but also suggests a complex layer of interaction that transcends traditional partisan boundaries, reflecting a complicated landscape of politics in the digital age.

Keywords: natural language processing, sentiment analysis, centrality analysis, topic modeling

Procedia PDF Downloads 33
27906 Evidence of a Negativity Bias in the Keywords of Scientific Papers

Authors: Kseniia Zviagintseva, Brett Buttliere

Abstract:

Science is fundamentally a problem-solving enterprise, and scientists pay more attention to the negative things, that cause them dissonance and negative affective state of uncertainty or contradiction. While this is agreed upon by philosophers of science, there are few empirical demonstrations. Here we examine the keywords from those papers published by PLoS in 2014 and show with several sentiment analyzers that negative keywords are studied more than positive keywords. Our dataset is the 927,406 keywords of 32,870 scientific articles in all fields published in 2014 by the journal PLOS ONE (collected from Altmetric.com). Counting how often the 47,415 unique keywords are used, we can examine whether those negative topics are studied more than positive. In order to find the sentiment of the keywords, we utilized two sentiment analysis tools, Hu and Liu (2004) and SentiStrength (2014). The results below are for Hu and Liu as these are the less convincing results. The average keyword was utilized 19.56 times, with half of the keywords being utilized only 1 time and the maximum number of uses being 18,589 times. The keywords identified as negative were utilized 37.39 times, on average, with the positive keywords being utilized 14.72 times and the neutral keywords - 19.29, on average. This difference is only marginally significant, with an F value of 2.82, with a p of .05, but one must keep in mind that more than half of the keywords are utilized only 1 time, artificially increasing the variance and driving the effect size down. To examine more closely, we looked at those top 25 most utilized keywords that have a sentiment. Among the top 25, there are only two positive words, ‘care’ and ‘dynamics’, in position numbers 5 and 13 respectively, with all the rest being identified as negative. ‘Diseases’ is the most studied keyword with 8,790 uses, with ‘cancer’ and ‘infectious’ being the second and fourth most utilized sentiment-laden keywords. The sentiment analysis is not perfect though, as the words ‘diseases’ and ‘disease’ are split by taking 1st and 3rd positions. Combining them, they remain as the most common sentiment-laden keyword, being utilized 13,236 times. More than just splitting the words, the sentiment analyzer logs ‘regression’ and ‘rat’ as negative, and these should probably be considered false positives. Despite these potential problems, the effect is apparent, as even the positive keywords like ‘care’ could or should be considered negative, since this word is most commonly utilized as a part of ‘health care’, ‘critical care’ or ‘quality of care’ and generally associated with how to improve it. All in all, the results suggest that negative concepts are studied more, also providing support for the notion that science is most generally a problem-solving enterprise. The results also provide evidence that negativity and contradiction are related to greater productivity and positive outcomes.

Keywords: bibliometrics, keywords analysis, negativity bias, positive and negative words, scientific papers, scientometrics

Procedia PDF Downloads 186
27905 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: big data, social networks, sentiment analysis, twitter

Procedia PDF Downloads 576
27904 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering

Authors: Emiel Caron

Abstract:

Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.

Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics

Procedia PDF Downloads 194
27903 Beyond Text: Unveiling the Emotional Landscape in Academic Writing

Authors: Songyun Chen

Abstract:

Recent scholarly attention to sentiment analysis has provided researchers with a deeper understanding of how emotions are conveyed in writing and leveraged by academic authors as a persuasive tool. Using the National Research Council (NRC) Sentiment Lexicons (version 1.0) created by the National Research Council Canada, this study examined specific emotions in research articles (RAs) across four disciplines, including literature, education, biology, and computer & information science based on four datasets totaling over three million tokens, aiming to reveal how the emotions are conveyed by authors in academic writing. The results showed that four emotions—trust, anticipation, joy, and surprise—were observed in all four disciplines, while sadness emotion was spotted solely in literature. With the emotion of trust being overwhelmingly prominent, the rest emotions varied significantly across disciplines. The findings contribute to our understanding of emotion strategy applied in academic writing and genre characteristics of RAs.

Keywords: sentiment analysis, specific emotions, emotional landscape, research articles, academic writing

Procedia PDF Downloads 28
27902 Intellectual Property Rights and Health Rights: A Feasible Reform Proposal to Facilitate Access to Drugs in Developing Countries

Authors: M. G. Cattaneo

Abstract:

The non-effectiveness of certain codified human rights is particularly apparent with reference to the lack of access to essential drugs in developing countries, which represents a breach of the human right to receive adequate health assistance. This paper underlines the conflict and the legal contradictions between human rights, namely health rights, international Intellectual Property Rights, in particular patent law, as well as international trade law. The paper discusses the crucial links between R&D costs for innovation, patents and new medical drugs, with the goal of reformulating the hierarchies of priorities and of interests at stake in the international intellectual property (IP) law system. Different from what happens today, International patent law should be a legal instrument apt at rebalancing an axiological asymmetry between the (conflicting) needs at stake The core argument in the paper is the proposal of an alternative pathway, namely a feasible proposal for a patent law reform. IP laws tend to balance the benefits deriving from innovation with the costs of the provided monopoly, but since developing countries and industrialized countries are in completely different political and economic situations, it is necessary to (re)modulate such exchange according to the different needs. Based on this critical analysis, the paper puts forward a proposal, called Trading Time for Space (TTS), whereby a longer time for patent exclusive life in western countries (Time) is offered to the patent holder company, in exchange for the latter selling the medical drug at cost price in developing countries (Space). Accordingly, pharmaceutical companies should sell drugs in developing countries at the cost price, or alternatively grant a free license for the sale in such countries, without any royalties or fees. However, such social service shall be duly compensated. Therefore, the consideration for such a service shall be an extension of the temporal duration of the patent’s exclusive in the country of origin that will compensate the reduced profits caused by the supply at the price cost in developing countries.

Keywords: global health, global justice, patent law reform, access to drugs

Procedia PDF Downloads 246
27901 Innovation Trends in Latin America Countries

Authors: José Carlos Rodríguez, Mario Gómez

Abstract:

This paper analyses innovation trends in Latin America countries by means of the number of patent applications filed by residents and non-residents during the period 1965 to 2012. Making use of patent data released by the World Intellectual Property Organization (WIPO), we search for the presence of multiple structural changes in patent application series in Argentina, Brazil Chile, and Mexico. These changes may suggest that firms’ innovative activity has been modified as a result of implementing a particular science, technology and innovation (STI) policy. Accordingly, the new regulations implemented in these countries during 1980s and 1990s have influenced their intellectual property regimes. The question conducting this research is thus how STI policies in these countries have affected their innovation activity? The results achieved in this research confirm the existence of multiple structural changes in the series of patent applications resulting from STI policies implemented in these countries.

Keywords: econometric methods, innovation activity, Latin America countries, patents, science, technology and innovation policy

Procedia PDF Downloads 283
27900 Intellectual Property and SMEs in the Baltic Sea Region: A Comparative Study on the Use of the Utility Model Protection

Authors: Christina Wainikka, Besrat Tesfaye

Abstract:

Several of the countries in the Baltic Sea region are ranked high in international innovations rankings, such as the Global Innovation Index and European Innovation Scoreboard. There are however some concerns in the performance of different countries. For example, there is a widely spread notion about “The Swedish Paradox”. Sweden is ranked high due to investments in R&D and patent activity, but the outcome is not as high as could be expected. SMEs in Sweden are also below EU average when it comes to registering intellectual property rights such as patents and trademarks. This study is concentrating on the protection of utility model. This intellectual property right does not exist in Sweden, but in for example Finland and Germany. The utility model protection is sometimes referred to as a “patent light” since it is easier to obtain than the patent protection but at the same time does cover technical solutions. In examining statistics on patent activities and activities in registering utility models it is clear that utility model protection is scarcely used in the countries that have the protection. In Germany 10 577 applications were made in 2021. In Finland there were 259 applications made in 2021. This can be compared with patent applications that were 58 568 in Germany in 2021 and 1 662 in Finland in 2021. In Sweden there has never been a protection for utility models. The only protection for technical solutions is patents and business secrets. The threshold for obtaining a patent is high, due to the legal requirements and the costs. The patent protection is there for often not chosen by SMEs in Sweden. This study examines whether the protection of utility models in other countries in the Baltic region provide SMEs in these countries with better options to protect their innovations. The legal methodology is comparative law. In order to study the effects of the legal differences statistics are examined and interviews done with SMEs from different industries.

Keywords: baltic sea region, comparative law, SME, utility model

Procedia PDF Downloads 114
27899 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision-making has not been far-fetched. Proper classification of this textual information in a given context has also been very difficult. As a result, we decided to conduct a systematic review of previous literature on sentiment classification and AI-based techniques that have been used in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that can correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy by assessing different artificial intelligence techniques. We evaluated over 250 articles from digital sources like ScienceDirect, ACM, Google Scholar, and IEEE Xplore and whittled down the number of research to 31. Findings revealed that Deep learning approaches such as CNN, RNN, BERT, and LSTM outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also necessary for developing a robust sentiment classifier and can be obtained from places like Twitter, movie reviews, Kaggle, SST, and SemEval Task4. Hybrid Deep Learning techniques like CNN+LSTM, CNN+GRU, CNN+BERT outperformed single Deep Learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of sentiment analyzer development due to its simplicity and AI-based library functionalities. Based on some of the important findings from this study, we made a recommendation for future research.

Keywords: artificial intelligence, natural language processing, sentiment analysis, social network, text

Procedia PDF Downloads 115
27898 Automatic Lexicon Generation for Domain Specific Dataset for Mining Public Opinion on China Pakistan Economic Corridor

Authors: Tayyaba Azim, Bibi Amina

Abstract:

The increase in the popularity of opinion mining with the rapid growth in the availability of social networks has attracted a lot of opportunities for research in the various domains of Sentiment Analysis and Natural Language Processing (NLP) using Artificial Intelligence approaches. The latest trend allows the public to actively use the internet for analyzing an individual’s opinion and explore the effectiveness of published facts. The main theme of this research is to account the public opinion on the most crucial and extensively discussed development projects, China Pakistan Economic Corridor (CPEC), considered as a game changer due to its promise of bringing economic prosperity to the region. So far, to the best of our knowledge, the theme of CPEC has not been analyzed for sentiment determination through the ML approach. This research aims to demonstrate the use of ML approaches to spontaneously analyze the public sentiment on Twitter tweets particularly about CPEC. Support Vector Machine SVM is used for classification task classifying tweets into positive, negative and neutral classes. Word2vec and TF-IDF features are used with the SVM model, a comparison of the trained model on manually labelled tweets and automatically generated lexicon is performed. The contributions of this work are: Development of a sentiment analysis system for public tweets on CPEC subject, construction of an automatic generation of the lexicon of public tweets on CPEC, different themes are identified among tweets and sentiments are assigned to each theme. It is worth noting that the applications of web mining that empower e-democracy by improving political transparency and public participation in decision making via social media have not been explored and practised in Pakistan region on CPEC yet.

Keywords: machine learning, natural language processing, sentiment analysis, support vector machine, Word2vec

Procedia PDF Downloads 148
27897 Sarcasm Recognition System Using Hybrid Tone-Word Spotting Audio Mining Technique

Authors: Sandhya Baskaran, Hari Kumar Nagabushanam

Abstract:

Sarcasm sentiment recognition is an area of natural language processing that is being probed into in the recent times. Even with the advancements in NLP, typical translations of words, sentences in its context fail to provide the exact information on a sentiment or emotion of a user. For example, if something bad happens, the statement ‘That's just what I need, great! Terrific!’ is expressed in a sarcastic tone which could be misread as a positive sign by any text-based analyzer. In this paper, we are presenting a unique real time ‘word with its tone’ spotting technique which would provide the sentiment analysis for a tone or pitch of a voice in combination with the words being expressed. This hybrid approach increases the probability for identification of special sentiment like sarcasm much closer to the real world than by mining text or speech individually. The system uses a tone analyzer such as YIN-FFT which extracts pitch segment-wise that would be used in parallel with a speech recognition system. The clustered data is classified for sentiments and sarcasm score for each of it determined. Our Simulations demonstrates the improvement in f-measure of around 12% compared to existing detection techniques with increased precision and recall.

Keywords: sarcasm recognition, tone-word spotting, natural language processing, pitch analyzer

Procedia PDF Downloads 293
27896 Emerging Technologies in European Aeronautics: How Collaborative Innovation Efforts Are Shaping the Industry

Authors: Nikola Radovanovic, Petros Gkotsis, Mathieu Doussineau

Abstract:

Aeronautics is regarded as a strategically important sector for European competitiveness. It was at the heart of European entrepreneurial development since the industry was born. Currently, the EU is the world leader in the production of civil aircraft, including helicopters, aircraft engines, parts, and components. It is recording a surplus in trade relating to aerospace products, which are exported all over the globe. Also, this industry shows above-average investments in research and development, as demonstrated in the patent activity in this area. The post-pandemic recovery of the industry will partly depend on the possibilities to streamline collaboration in further research and innovation activities. Aeronautics features as one of the often selected priority domains in smart specialisation, which represents the main regional and national approach in developing and implementing innovation policies in Europe. The basis for the selection of priority domains for smart specialisation lies in the mapping of innovative potential, with research and patent activities being among the key elements of this analysis. This research is aimed at identifying characteristics of the trends in research and patent activities in the regions and countries that base their competitiveness on the aeronautics sector. It is also aimed at determining the scope and patterns of collaborations in aeronautics between innovators from the European regions, focusing on revealing new technology areas that emerge from these collaborations. For this purpose, we developed a methodology based on desk research and the analysis of the PATSTAT patent database as well as the databases of R&I framework programmes.

Keywords: aeronautics, smart specialisation, innovation, research, regional policy

Procedia PDF Downloads 106