Search results for: corpus database
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1927

Search results for: corpus database

1717 Investigating Iraqi EFL University Students' Productive Knowledge of Grammatical Collocations in English

Authors: Adnan Z. Mkhelif

Abstract:

Grammatical collocations (GCs) are word combinations containing a preposition or a grammatical structure, such as an infinitive (e.g. smile at, interested in, easy to learn, etc.). Such collocations tend to be difficult for Iraqi EFL university students (IUS) to master. To help address this problem, it is important to identify the factors causing it. This study aims at investigating the effects of L2 proficiency, frequency of GCs and their transparency on IUSs’ productive knowledge of GCs. The study involves 112 undergraduate participants with different proficiency levels, learning English in formal contexts in Iraq. The data collection instruments include (but not limited to) a productive knowledge test (designed by the researcher using the British National Corpus (BNC)), as well as the grammar part of the Oxford Placement Test (OPT). The study findings have shown that all the above-mentioned factors have significant effects on IUSs’ productive knowledge of GCs. In addition to establishing evidence of which factors of L2 learning might be relevant to learning GCs, it is hoped that the findings of the present study will contribute to more effective methods of teaching that can better address and help overcome the problems IUSs encounter in learning GCs. The study is thus hoped to have significant theoretical and pedagogical implications for researchers, syllabus designers as well as teachers of English as a foreign/second language.

Keywords: corpus linguistics, frequency, grammatical collocations, L2 vocabulary learning, productive knowledge, proficiency, transparency

Procedia PDF Downloads 229
1716 Speech Detection Model Based on Deep Neural Networks Classifier for Speech Emotions Recognition

Authors: A. Shoiynbek, K. Kozhakhmet, P. Menezes, D. Kuanyshbay, D. Bayazitov

Abstract:

Speech emotion recognition has received increasing research interest all through current years. There was used emotional speech that was collected under controlled conditions in most research work. Actors imitating and artificially producing emotions in front of a microphone noted those records. There are four issues related to that approach, namely, (1) emotions are not natural, and it means that machines are learning to recognize fake emotions. (2) Emotions are very limited by quantity and poor in their variety of speaking. (3) There is language dependency on SER. (4) Consequently, each time when researchers want to start work with SER, they need to find a good emotional database on their language. In this paper, we propose the approach to create an automatic tool for speech emotion extraction based on facial emotion recognition and describe the sequence of actions of the proposed approach. One of the first objectives of the sequence of actions is a speech detection issue. The paper gives a detailed description of the speech detection model based on a fully connected deep neural network for Kazakh and Russian languages. Despite the high results in speech detection for Kazakh and Russian, the described process is suitable for any language. To illustrate the working capacity of the developed model, we have performed an analysis of speech detection and extraction from real tasks.

Keywords: deep neural networks, speech detection, speech emotion recognition, Mel-frequency cepstrum coefficients, collecting speech emotion corpus, collecting speech emotion dataset, Kazakh speech dataset

Procedia PDF Downloads 74
1715 Complex Technology of Virtual Reconstruction: The Case of Kazan Imperial University of XIX-Early XX Centuries

Authors: L. K. Karimova, K. I. Shariukova, A. A. Kirpichnikova, E. A. Razuvalova

Abstract:

This article deals with technology of virtual reconstruction of Kazan Imperial University of XIX - early XX centuries. The paper describes technologies of 3D-visualization of high-resolution models of objects of university space, creation of multi-agent system and connected with these objects organized database of historical sources, variants of use of technologies of immersion into the virtual environment.

Keywords: 3D-reconstruction, multi-agent system, database, university space, virtual reconstruction, virtual heritage

Procedia PDF Downloads 242
1714 Verbal Prefix Selection in Old Japanese: A Corpus-Based Study

Authors: Zixi You

Abstract:

There are a number of verbal prefixes in Old Japanese. However, the selection or the compatibility of verbs and verbal prefixes is among the least investigated topics on Old Japanese language. Unlike other types of prefixes, verbal prefixes in dictionaries are more often than not listed with very brief information such as ‘unknown meaning’ or ‘rhythmic function only’. To fill in a part of this knowledge gap, this paper presents an exhaustive investigation based on the newly developed ‘Oxford Corpus of Old Japanese’ (OCOJ), which included nearly all existing resource of Old Japanese language, with detailed linguistics information in TEI-XML tags. In this paper, we propose the possibility that the following three prefixes, i-, sa-, ta- (with ta- being considered as a variation of sa-), are relevant to split intransitivity in Old Japanese, with evidence that unergative verbs favor i- and that unergative verbs favor sa-(ta-). This might be undermined by the fact that transitives are also found to follow i-. However, with several manifestations of split intransitivity in Old Japanese discussed, the behavior of transitives in verbal prefix selection is no longer as surprising as it may seem to be when one look at the selection of verbal prefix in isolation. It is possible that there are one or more features that played essential roles in determining the selection of i-, and the attested transitive verbs happen to have these features. The data suggest that this feature is a sense of ‘change’ of location or state involved in the event donated by the verb, which is a feature of typical unaccusatives. This is further discussed in the ‘affectedness’ hierarchy. The presentation of this paper, which includes a brief demonstration of the OCOJ, is expected to be of the interest of both specialists and general audiences.

Keywords: old Japanese, split intransitivity, unaccusatives, unergatives, verbal prefix selection

Procedia PDF Downloads 388
1713 A Corpus-Based Study of Evaluative Language in Leading Articles in British Broadsheet and Tabloid Newspapers

Authors: Fatimah AlSaiari

Abstract:

In recent years, newspapers in the United Kingdom have been no longer just a means of sharing news about what happens in the world; they are also used to influence target readers by having them become more up-to-date, well-informed, entertained, exasperated, delighted, and infuriated. To achieve these objectives and maintain influence on public opinion, journalists use a particular language in which they can convey emotions and opinions, organize their discourse, and establish solidarity with their audience. This type of language has been widely analyzed under different labels, such as evaluation, appraisal, and stance. There is a considerable amount of linguistic and non-linguistic research devoted to analyzing this type of interpersonal language in journalistic discourse, and most of these studies were carried out to challenge the traditional assumptions of the objectivity and impartiality of news reporting. However, very little research has been undertaken on evaluative language in newspaper institutional editorials, and there is hardly any systematic or exhaustive analysis of this type of language in British tabloid and broadsheet newspapers. This study will attempt to provide new insights into the nature of authorial and non-authorial evaluation in leading articles in popular and quality British newspapers, along with their targets, sources, and discourse functions. The study will also attempt to develop a framework of evaluation that can be applied to evaluative lexical items in newspaper opinion texts. The framework is both theory-driven (i.e., it builds on and modifies previous frameworks of evaluation such as appraisal theory and parameter-based approach) and data-driven (i.e., it elicits the evaluative categories from the analysis of the corpus, which helps in the development of the current framework). To achieve this aim, a corpus of 140 leading articles were selected. The findings revealed that the tabloids tended to express their stance through explicitness, dramatization, frequent reference to social actors’ emotions and beliefs, and exaggeration in negativity, while the broadsheets preferred to express their stance through mitigation ambiguity and implicitness. conceptual themes and propositions were more preferable targets for expressing stance in the broadsheets while human behavior and characters were preferable targets for the tabloids.

Keywords: appraisal theory, evaluative language, British newspapers, broadsheets & tabloids, evaluative adjectives

Procedia PDF Downloads 269
1712 Calculation of Methane Emissions from Wetlands in Slovakia via IPCC Methodology

Authors: Jozef Mindas, Jana Skvareninova

Abstract:

Wetlands are a main natural source of methane emissions, but they also represent the important biodiversity reservoirs in the landscape. There are about 26 thousands hectares of wetlands in Slovakia identified via the wetlands monitoring program. Created database of wetlands in Slovakia allows to analyze several ecological processes including also the methane emissions estimate. Based on the information from the database, the first estimate of the methane emissions from wetlands in Slovakia has been done. The IPCC methodology (Tier 1 approach) has been used with proposed emission factors for the ice-free period derived from the climatic data. The highest methane emissions of nearly 550 Gg are associated with the category of fens. Almost 11 Gg of methane is emitted from bogs, and emissions from flooded lands represent less than 8 Gg.

Keywords: bogs, methane emissions, Slovakia, wetlands

Procedia PDF Downloads 260
1711 A Corpus-Based Analysis on Code-Mixing Features in Mandarin-English Bilingual Children in Singapore

Authors: Xunan Huang, Caicai Zhang

Abstract:

This paper investigated the code-mixing features in Mandarin-English bilingual children in Singapore. First, it examined whether the code-mixing rate was different in Mandarin Chinese and English contexts. Second, it explored the syntactic categories of code-mixing in Singapore bilingual children. Moreover, this study investigated whether morphological information was preserved when inserting syntactic components into the matrix language. Data are derived from the Singapore Bilingual Corpus, in which the recordings and transcriptions of sixty English-Mandarin 5-to-6-year-old children were preserved for analysis. Results indicated that the rate of code-mixing was asymmetrical in the two language contexts, with the rate being significantly higher in the Mandarin context than that in the English context. The asymmetry is related to language dominance in that children are more likely to code-mix when using their nondominant language. Concerning the syntactic categories of code-mixing words in the Singaporean bilingual children, we found that noun-mixing, verb-mixing, and adjective-mixing are the three most frequently used categories in code-mixing in the Mandarin context. This pattern mirrors the syntactic categories of code-mixing in the Cantonese context in Cantonese-English bilingual children, and the general trend observed in lexical borrowing. Third, our results also indicated that English vocabularies that carry morphological information are embedded in bare forms in the Mandarin context. These findings shed light upon how bilingual children take advantage of the two languages in mixed utterances in a bilingual environment.

Keywords: bilingual children, code-mixing, English, Mandarin Chinese

Procedia PDF Downloads 190
1710 CMPD: Cancer Mutant Proteome Database

Authors: Po-Jung Huang, Chi-Ching Lee, Bertrand Chin-Ming Tan, Yuan-Ming Yeh, Julie Lichieh Chu, Tin-Wen Chen, Cheng-Yang Lee, Ruei-Chi Gan, Hsuan Liu, Petrus Tang

Abstract:

Whole-exome sequencing focuses on the protein coding regions of disease/cancer associated genes based on a priori knowledge is the most cost-effective method to study the association between genetic alterations and disease. Recent advances in high throughput sequencing technologies and proteomic techniques has provided an opportunity to integrate genomics and proteomics, allowing readily detectable mutated peptides corresponding to mutated genes. Since sequence database search is the most widely used method for protein identification using Mass spectrometry (MS)-based proteomics technology, a mutant proteome database is required to better approximate the real protein pool to improve disease-associated mutated protein identification. Large-scale whole exome/genome sequencing studies were launched by National Cancer Institute (NCI), Broad Institute, and The Cancer Genome Atlas (TCGA), which provide not only a comprehensive report on the analysis of coding variants in diverse samples cell lines but a invaluable resource for extensive research community. No existing database is available for the collection of mutant protein sequences related to the identified variants in these studies. CMPD is designed to address this issue, serving as a bridge between genomic data and proteomic studies and focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations.

Keywords: TCGA, cancer, mutant, proteome

Procedia PDF Downloads 566
1709 Digital Development of Cultural Heritage: Construction of Traditional Chinese Pattern Database

Authors: Shaojian Li

Abstract:

The traditional Chinese patterns, as an integral part of Chinese culture, possess unique values in history, culture, and art. However, with the passage of time and societal changes, many of these traditional patterns are at risk of being lost, damaged, or forgotten. To undertake the digital preservation and protection of these traditional patterns, this paper will collect and organize images of traditional Chinese patterns. It will provide exhaustive and comprehensive semantic annotations, creating a resource library of traditional Chinese pattern images. This will support the digital preservation and application of traditional Chinese patterns.

Keywords: digitization of cultural heritage, traditional Chinese patterns, digital humanities, database construction

Procedia PDF Downloads 31
1708 BiLex-Kids: A Bilingual Word Database for Children 5-13 Years Old

Authors: Aris R. Terzopoulos, Georgia Z. Niolaki, Lynne G. Duncan, Mark A. J. Wilson, Antonios Kyparissiadis, Jackie Masterson

Abstract:

As word databases for bilingual children are not available, researchers, educators and textbook writers must rely on monolingual databases. The aim of this study is thus to develop a bilingual word database, BiLex-kids, an online open access developmental word database for 5-13 year old bilingual children who learn Greek as a second language and have English as their dominant one. BiLex-kids is compiled from 120 Greek textbooks used in Greek-English bilingual education in the UK, USA and Australia, and provides word translations in the two languages, pronunciations in Greek, and psycholinguistic variables (e.g. Zipf, Frequency per million, Dispersion, Contextual Diversity, Neighbourhood size). After clearing the textbooks of non-relevant items (e.g. punctuation), algorithms were applied to extract the psycholinguistic indices for all words. As well as one total lexicon, the database produces values for all ages (one lexicon for each age) and for three age bands (one lexicon per age band: 5-8, 9-11, 12-13 years). BiLex-kids provides researchers with accurate figures for a wide range of psycholinguistic variables, making it a useful and reliable research tool for selecting stimuli to examine lexical processing among bilingual children. In addition, it offers children the opportunity to study word spelling, learn translations and listen to pronunciations in their second language. It further benefits educators in selecting age-appropriate words for teaching reading and spelling, while special educational needs teachers will have a resource to control the content of word lists when designing interventions for bilinguals with literacy difficulties.

Keywords: bilingual children, psycholinguistics, vocabulary development, word databases

Procedia PDF Downloads 291
1707 Spanish Language Violence Corpus: An Analysis of Offensive Language in Twitter

Authors: Beatriz Botella-Gil, Patricio Martínez-Barco, Lea Canales

Abstract:

The Internet and ICT are an integral element of and omnipresent in our daily lives. Technologies have changed the way we see the world and relate to it. The number of companies in the ICT sector is increasing every year, and there has also been an increase in the work that occurs online, from sending e-mails to the way companies promote themselves. In social life, ICT’s have gained momentum. Social networks are useful for keeping in contact with family or friends that live far away. This change in how we manage our relationships using electronic devices and social media has been experienced differently depending on the age of the person. According to currently available data, people are increasingly connected to social media and other forms of online communication. Therefore, it is no surprise that violent content has also made its way to digital media. One of the important reasons for this is the anonymity provided by social media, which causes a sense of impunity in the victim. Moreover, it is not uncommon to find derogatory comments, attacking a person’s physical appearance, hobbies, or beliefs. This is why it is necessary to develop artificial intelligence tools that allow us to keep track of violent comments that relate to violent events so that this type of violent online behavior can be deterred. The objective of our research is to create a guide for detecting and recording violent messages. Our annotation guide begins with a study on the problem of violent messages. First, we consider the characteristics that a message should contain for it to be categorized as violent. Second, the possibility of establishing different levels of aggressiveness. To download the corpus, we chose the social network Twitter for its ease of obtaining free messages. We chose two recent, highly visible violent cases that occurred in Spain. Both of them experienced a high degree of social media coverage and user comments. Our corpus has a total of 633 messages, manually tagged, according to the characteristics we considered important, such as, for example, the verbs used, the presence of exclamations or insults, and the presence of negations. We consider it necessary to create wordlists that are present in violent messages as indicators of violence, such as lists of negative verbs, insults, negative phrases. As a final step, we will use automatic learning systems to check the data obtained and the effectiveness of our guide.

Keywords: human language technologies, language modelling, offensive language detection, violent online content

Procedia PDF Downloads 101
1706 Men Act, Women Are Acted Upon: Morphosyntactic Framing of the Sexual Intercourse in Online Pornography Titles

Authors: Aleksandra Tomic

Abstract:

According to reliable sources, 4% of all websites is devoted to pornographic material, yet these estimates are often reported to be much higher. The largest internet pornography streaming website reports 21.2 billion visits in 2015 only. Considering the ubiquity of online pornography and the frequency of use, it is necessary to examine its potential influence on the construal of the sexual act and the roles of participants. Apart from the verbal and physical interactions in the pornographic movies themselves, the language in the titles of movies has the power to frame the sexual intercourse. In this study, Critical Discourse Analysis and corpus linguistics approaches will be used to examine the way the sexual intercourse and the roles of the participants are ideologically construed and perpetuated in the Internet pornography discourse. To this end, the study will explore the association between the specific morphosyntactic aspects of the references to performers of both genders, the person and the thematic role, and the gender of referred performer in the corpus of online pornographic movie titles. Distinctive collexeme analysis will be conducted to uncover possible associations between for gender of the performer denoted by the linguistic expression, and the person and thematic role assigned to it in the titles of online pornography movies. Initial results of the chi-square procedure performed on a sample of 295 online pornography movie titles on the largest pornography streaming website ‘Pornhub’ yielded significant results. The use of the three person categories was not equally distributed between genders, X2 (2, N = 106) = 32.52, p < 0.001, with female performers being referred to in the third person in 71.7% of the instances, and speaking in the first person 20.8% of the time, whereas male performers spoke in the first person 68% of the time, and were referred to in the third person in 17% of the instances. Moreover, there was a gender disparity in the assignment of thematic roles, with linguistic expressions for women being assigned the Patient role and men the Agent role in 58.8% of the cases, whereas the roles were reversed in 41.2% of the instances, X2 (1, N = 262) = 8.07633, p < 0.005. The results are discussed in terms of the ideologies surrounding female and male sexuality in the pornography discourse. Potential patterns of power imbalance, objectification, and discrimination are highlighted. Finally, the evidence from psycholinguistic studies on the influence of the language structure on event construal is related to the results of the study.

Keywords: corpus linguistics, gender studies, pornography, thematic roles

Procedia PDF Downloads 159
1705 A Lexicographic Approach to Obstacles Identified in the Ontological Representation of the Tree of Life

Authors: Sandra Young

Abstract:

The biodiversity literature is vast and heterogeneous. In today’s data age, numbers of data integration and standardisation initiatives aim to facilitate simultaneous access to all the literature across biodiversity domains for research and forecasting purposes. Ontologies are being used increasingly to organise this information, but the rationalisation intrinsic to ontologies can hit obstacles when faced with the intrinsic fluidity and inconsistency found in the domains comprising biodiversity. Essentially the problem is a conceptual one: biological taxonomies are formed on the basis of specific, physical specimens yet nomenclatural rules are used to provide labels to describe these physical objects. These labels are ambiguous representations of the physical specimen. An example of this is with the genus Melpomene, the scientific nomenclatural representation of a genus of ferns, but also for a genus of spiders. The physical specimens for each of these are vastly different, but they have been assigned the same nomenclatural reference. While there is much research into the conceptual stability of the taxonomic concept versus the nomenclature used, to the best of our knowledge as yet no research has looked empirically at the literature to see the conceptual plurality or singularity of the use of these species’ names, the linguistic representation of a physical entity. Language itself uses words as symbols to represent real world concepts, whether physical entities or otherwise, and as such lexicography has a well-founded history in the conceptual mapping of words in context for dictionary making. This makes it an ideal candidate to explore this problem. The lexicographic approach uses corpus-based analysis to look at word use in context, with a specific focus on collocated word frequencies (the frequencies of words used in specific grammatical and collocational contexts). It allows for inconsistencies and contradictions in the source data and in fact includes these in the word characterisation so that 100% of the available evidence is counted. Corpus analysis is indeed suggested as one of the ways to identify concepts for ontology building, because of its ability to look empirically at data and show patterns in language usage, which can indicate conceptual ideas which go beyond words themselves. In this sense it could potentially be used to identify if the hierarchical structures present within the empirical body of literature match those which have been identified in ontologies created to represent them. The first stages of this research have revealed a hierarchical structure that becomes apparent in the biodiversity literature when annotating scientific species’ names, common names and more general names as classes, which will be the focus of this paper. The next step in the research is focusing on a larger corpus in which specific words can be analysed and then compared with existing ontological structures looking at the same material, to evaluate the methods by means of an alternative perspective. This research aims to provide evidence as to the validity of the current methods in knowledge representation for biological entities, and also shed light on the way that scientific nomenclature is used within the literature.

Keywords: ontology, biodiversity, lexicography, knowledge representation, corpus linguistics

Procedia PDF Downloads 110
1704 Change of Endocrine and Exocrine Insufficiency on Non-Diabetes Patients after Distal Pancreatectomy: A Nationwide Database Study

Authors: Jin-Ming Wu, Te-Wei Ho, Yu-Wen Tien

Abstract:

Background: The aim of this population-based study was to determine the occurrence of diabetes and exocrine pancreatic insufficiencies (EPI) on non-diabetes subjects receiving distal pancreatectomy (DP). Method: A nationwide cohort study between 2000 and 2010 was collected from the Taiwan National Health Insurance Research Database. Among 3264 DP patients, we identified 1410 non-diabetes and 966 non-diabetes non-EPI. Results. Of 1410 non-diabetes DP subjects, 312 patients (22.1%) developed newly-diagnosed diabetes after PD. On a multiple logistic regression model, co-morbid hyperlipidemia (odds ratio, 1.640; 95% CI, 1.362–2.763; P < 0.001) and pancreatitis (odds ratio, 2.428; 95% CI, 1.889–3.121; P < 0.001) significantly contributed to higher incidences of diabetes after DP. Moreover, 380 subjects (39.3%) developed EPI, and pancreatic cancer is the statistically significant risk factor (odds ratio, 4.663; 95% CI, 2.108–6.085; P < 0.001). Conclusion: The patients with co-morbid hyperlipidemia and chronic pancreatitis had higher rates of newly-diagnosed diabetes after DP, moreover, pancreatic cancer subjects had higher rates of pancreatic exocrine insufficiency after DP. The clinicians should be alert to follow up glucose metabolism and clinical symptoms of fat intolerance for DP patients.

Keywords: distal pancreatectomy, National database, diabetes, exocrine insufficiency

Procedia PDF Downloads 176
1703 The Women-In-Mining Discourse: A Study Combining Corpus Linguistics and Discourse Analysis

Authors: Ylva Fältholm, Cathrine Norberg

Abstract:

One of the major threats identified to successful future mining is that women do not find the industry attractive. Many attempts have been made, for example in Sweden and Australia, to create organizational structures and mining communities attractive to both genders. Despite such initiatives, many mining areas are developing into gender-segregated fly-in/fly out communities dominated by men with both social and economic consequences. One of the challenges facing many mining companies is thus to break traditional gender patterns and structures. To do this increased knowledge about gender in the context of mining is needed. Since language both constitutes and reproduces knowledge, increased knowledge can be gained through an exploration and description of the mining discourse from a gender perspective. The aim of this study is to explore what conceptual ideas are activated in connection to the physical/geographical mining area and to work within the mining industry. We use a combination of critical discourse analysis implying close reading of selected texts, such as policy documents, interview materials, applications and research and innovation agendas, and analyses of linguistic patterns found in large language corpora covering millions of words of contemporary language production. The quantitative corpus data serves as a point of departure for the qualitative analysis of the texts, that is, suggests what patterns to explore further. The study shows that despite technological and organizational development, one of the most persistent discourses about mining is the conception of dangerous and unfriendly areas infused with traditional notions of masculinity ideals and manual hard work. Although some of the texts analyzed highlight gender issues, and describe gender-equalizing initiatives, such as wage-mapping systems, female networks and recruitment efforts for women executives, and thereby render the discourse less straightforward, it is shown that these texts are not unambiguous examples of a counter-discourse. They rather illustrate that discourses are not stable but include opposing discourses, in dialogue with each other. For example, many texts highlight why and how women are important to mining, at the same time as they suggest that gender and diversity are all about women: why mining is a problem for them, how they should be, and what they should do to fit in. Drawing on a constitutive view of discourse, knowledge about such conflicting perceptions of women is a prerequisite for succeeding in attracting women to the mining industry and thereby contributing to the development of future mining.

Keywords: discourse, corpus linguistics, gender, mining

Procedia PDF Downloads 237
1702 Privacy Preserving in Association Rule Mining on Horizontally Partitioned Database

Authors: Manvar Sagar, Nikul Virpariya

Abstract:

The advancement in data mining techniques plays an important role in many applications. In context of privacy and security issues, the problems caused by association rule mining technique are investigated by many research scholars. It is proved that the misuse of this technique may reveal the database owner’s sensitive and private information to others. Many researchers have put their effort to preserve privacy in Association Rule Mining. Amongst the two basic approaches for privacy preserving data mining, viz. Randomization based and Cryptography based, the later provides high level of privacy but incurs higher computational as well as communication overhead. Hence, it is necessary to explore alternative techniques that improve the over-heads. In this work, we propose an efficient, collusion-resistant cryptography based approach for distributed Association Rule mining using Shamir’s secret sharing scheme. As we show from theoretical and practical analysis, our approach is provably secure and require only one time a trusted third party. We use secret sharing for privately sharing the information and code based identification scheme to add support against malicious adversaries.

Keywords: Privacy, Privacy Preservation in Data Mining (PPDM), horizontally partitioned database, EMHS, MFI, shamir secret sharing

Procedia PDF Downloads 381
1701 SQL Generator Based on MVC Pattern

Authors: Chanchai Supaartagorn

Abstract:

Structured Query Language (SQL) is the standard de facto language to access and manipulate data in a relational database. Although SQL is a language that is simple and powerful, most novice users will have trouble with SQL syntax. Thus, we are presenting SQL generator tool which is capable of translating actions and displaying SQL commands and data sets simultaneously. The tool was developed based on Model-View-Controller (MVC) pattern. The MVC pattern is a widely used software design pattern that enforces the separation between the input, processing, and output of an application. Developers take full advantage of it to reduce the complexity in architectural design and to increase flexibility and reuse of code. In addition, we use White-Box testing for the code verification in the Model module.

Keywords: MVC, relational database, SQL, White-Box testing

Procedia PDF Downloads 402
1700 Integrating a Universal Forensic DNA Database: Anticipated Deterrent Effects

Authors: Karen Fang

Abstract:

Investigative genetic genealogy has attracted much interest in both the field of ethics and the public eye due to its global application in criminal cases. Arguments have been made regarding privacy and informed consent, especially with law enforcement using consumer genetic testing results to convict individuals. In the case of public interest, DNA databases have the strong potential to significantly reduce crime, which in turn leads to safer communities and better futures. With the advancement of genetic technologies, the integration of a universal forensic DNA database in violent crimes, crimes against children, and missing person cases is expected to deter crime while protecting one’s privacy. Rather than collecting whole genomes from the whole population, STR profiles can be used to identify unrelated individuals without compromising personal information such as physical appearance, disease risk, and geographical origin, and additionally, reduce cost and storage space. STR DNA profiling is already used in the forensic science field and going a step further benefits several areas, including the reduction in recidivism, improved criminal court case turnaround time, and just punishment. Furthermore, adding individuals to the database as early as possible prevents young offenders and first-time offenders from participating in criminal activity. It is important to highlight that DNA databases should be inclusive and tightly governed, and the misconception on the use of DNA based on crime television series and other media sources should be addressed. Nonetheless, deterrent effects have been observed in countries like the US and Denmark with DNA databases that consist of serious violent offenders. Fewer crimes were reported, and fewer people were convicted of those crimes- a favorable outcome, not even the death penalty could provide. Currently, there is no better alternative than a universal forensic DNA database made up of STR profiles. It can open doors for investigative genetic genealogy and fostering better communities. Expanding the appropriate use of DNA databases is ethically acceptable and positively impacts the public.

Keywords: bioethics, deterrent effects, DNA database, investigative genetic genealogy, privacy, public interest

Procedia PDF Downloads 132
1699 System of Quality Automation for Documents (SQAD)

Authors: R. Babi Saraswathi, K. Divya, A. Habeebur Rahman, D. B. Hari Prakash, S. Jayanth, T. Kumar, N. Vijayarangan

Abstract:

Document automation is the design of systems and workflows, assembling repetitive documents to meet the specific business needs. In any organization or institution, documenting employee’s information is very important for both employees as well as management. It shows an individual’s progress to the management. Many documents of the employee are in the form of papers, so it is very difficult to arrange and for future reference we need to spend more time in getting the exact document. Also, it is very tedious to generate reports according to our needs. The process gets even more difficult on getting approvals and hence lacks its security aspects. This project overcomes the above-stated issues. By storing the details in the database and maintaining the e-documents, the automation system reduces the manual work to a large extent. Then the approval process of some important documents can be done in a much-secured manner by using Digital Signature and encryption techniques. Details are maintained in the database and e-documents are stored in specific folders and generation of various kinds of reports is possible. Moreover, an efficient search method is implemented is used in the database. Automation supporting document maintenance in many aspects is useful for minimize data entry, reduce the time spent on proof-reading, avoids duplication, and reduce the risks associated with the manual error, etc.

Keywords: e-documents, automation, digital signature, encryption

Procedia PDF Downloads 363
1698 A Corpus-Based Approach to Understanding Market Access in Fisheries and Aquaculture: A Systematic Literature Review

Authors: Cheryl Marie Cordeiro

Abstract:

Although fisheries and aquaculture studies might seem marginal to international business (IB) studies in general, fisheries and aquaculture IB (FAIB) management is currently facing increasing pressure to meet global demand and consumption for fish in the next coming decades. In part address to this challenge, the purpose of this systematic review of literature (SLR) study is to investigate the use of the term ‘market access’ in its context of use in the generic literature and business sector discourse, in comparison to the more specific literature and discourse in fisheries, aquaculture and seafood. This SLR aims to uncover the knowledge/interest gaps between the academic subject discourses and business sector practices. Corpus driven in methodology and using a triangulation method of three different text analysis software including AntConc, VOSviewer and Web of Science (WoS) analytics, the SLR results indicate a gap in conceptual knowledge and business practices in how ‘market access’ is conceived and used in the context of the pharmaceutical healthcare industry and FAIB research and practice. While it is acknowledged that the product orientation of different business sectors might differ, this SLR study works with the assumption that both business sectors are global in orientation. These business sectors are complex in their operations from product to market. This SLR suggests a conceptual model in understanding the challenges, the potential barriers as well as avenues for solutions to developing market access for FAIB.

Keywords: market access, fisheries and aquaculture, international business, systematic literature review

Procedia PDF Downloads 125
1697 Cognitive Translation and Conceptual Wine Tasting Metaphors: A Corpus-Based Research

Authors: Christine Demaecker

Abstract:

Many researchers have underlined the importance of metaphors in specialised language. Their use of specific domains helps us understand the conceptualisations used to communicate new ideas or difficult topics. Within the wide area of specialised discourse, wine tasting is a very specific example because it is almost exclusively metaphoric. Wine tasting metaphors express various conceptualisations. They are not linguistic but rather conceptual, as defined by Lakoff & Johnson. They correspond to the linguistic expression of a mental projection from a well-known or more concrete source domain onto the target domain, which is the taste of wine. But unlike most specialised terminologies, the vocabulary is never clearly defined. When metaphorical terms are listed in dictionaries, their definitions remain vague, unclear, and circular. They cannot be replaced by literal linguistic expressions. This makes it impossible to transfer them into another language with the traditional linguistic translation methods. Qualitative research investigates whether wine tasting metaphors could rather be translated with the cognitive translation process, as well described by Nili Mandelblit (1995). The research is based on a corpus compiled from two high-profile wine guides; the Parker’s Wine Buyer’s Guide and its translation into French and the Guide Hachette des Vins and its translation into English. In this small corpus with a total of 68,826 words, 170 metaphoric expressions have been identified in the original English text and 180 in the original French text. They have been selected with the MIPVU Metaphor Identification Procedure developed at the Vrije Universiteit Amsterdam. The selection demonstrates that both languages use the same set of conceptualisations, which are often combined in wine tasting notes, creating conceptual integrations or blends. The comparison of expressions in the source and target texts also demonstrates the use of the cognitive translation approach. In accordance with the principle of relevance, the translation always uses target language conceptualisations, but compared to the original, the highlighting of the projection is often different. Also, when original metaphors are complex with a combination of conceptualisations, at least one element of the original metaphor underlies the target expression. This approach perfectly integrates into Lederer’s interpretative model of translation (2006). In this triangular model, the transfer of conceptualisation could be included at the level of ‘deverbalisation/reverbalisation’, the crucial stage of the model, where the extraction of meaning combines with the encyclopedic background to generate the target text.

Keywords: cognitive translation, conceptual integration, conceptual metaphor, interpretative model of translation, wine tasting metaphor

Procedia PDF Downloads 106
1696 Biimodal Biometrics System Using Fusion of Iris and Fingerprint

Authors: Attallah Bilal, Hendel Fatiha

Abstract:

This paper proposes the bimodal biometrics system for identity verification iris and fingerprint, at matching score level architecture using weighted sum of score technique. The features are extracted from the pre processed images of iris and fingerprint. These features of a query image are compared with those of a database image to obtain matching scores. The individual scores generated after matching are passed to the fusion module. This module consists of three major steps i.e., normalization, generation of similarity score and fusion of weighted scores. The final score is then used to declare the person as genuine or an impostor. The system is tested on CASIA database and gives an overall accuracy of 91.04% with FAR of 2.58% and FRR of 8.34%.

Keywords: iris, fingerprint, sum rule, fusion

Procedia PDF Downloads 341
1695 Characteristics Features and Action Mechanism of Some Country Made Pistols

Authors: Ajitesh Pal, Arpan Datta Roy, H. K. Pratihari

Abstract:

The different illegal firearms crudely made by skilled gunsmith from scrap materials are popularly known as country made firearms. Such firearms along with improvised ammunition are clandestinely marketed at the cheaper price without any license to the extremist group, criminal, poachers and firearm lovers. As per National Crime Records Bureau (NCRB), MHA, Govt of India about 80% firearm cases are committed by country made/improvised firearms. The ballistic division of the laboratory has examined a good number of cases. The analysis of firearm cases received for forensic examination revealed that 7.65mm calibre pistols mostly improvised firearm are commonly used in firearm related crime cases. In the present communication, physical parameters and other characteristics features of some 7.65mm calibre pistols have been discussed in detail. The detailed study on country made (CM) firearm will help to prepare a database related to type of material used, origin of the raw material and tools used for inscription. The study also includes to establish the chemistry of propellants & head stamp pattern. The database will be helpful to the firearm examiners, researchers, students pursuing study on forensic science as reference material.

Keywords: improvised pistol, stringent gun law, working mechanism, parameters, database

Procedia PDF Downloads 46
1694 The Modification of Convolutional Neural Network in Fin Whale Identification

Authors: Jiahao Cui

Abstract:

In the past centuries, due to climate change and intense whaling, the global whale population has dramatically declined. Among the various whale species, the fin whale experienced the most drastic drop in number due to its popularity in whaling. Under this background, identifying fin whale calls could be immensely beneficial to the preservation of the species. This paper uses feature extraction to process the input audio signal, then a network based on AlexNet and three networks based on the ResNet model was constructed to classify fin whale calls. A mixture of the DOSITS database and the Watkins database was used during training. The results demonstrate that a modified ResNet network has the best performance considering precision and network complexity.

Keywords: convolutional neural network, ResNet, AlexNet, fin whale preservation, feature extraction

Procedia PDF Downloads 93
1693 Time and Cost Prediction Models for Language Classification Over a Large Corpus on Spark

Authors: Jairson Barbosa Rodrigues, Paulo Romero Martins Maciel, Germano Crispim Vasconcelos

Abstract:

This paper presents an investigation of the performance impacts regarding the variation of five factors (input data size, node number, cores, memory, and disks) when applying a distributed implementation of Naïve Bayes for text classification of a large Corpus on the Spark big data processing framework. Problem: The algorithm's performance depends on multiple factors, and knowing before-hand the effects of each factor becomes especially critical as hardware is priced by time slice in cloud environments. Objectives: To explain the functional relationship between factors and performance and to develop linear predictor models for time and cost. Methods: the solid statistical principles of Design of Experiments (DoE), particularly the randomized two-level fractional factorial design with replications. This research involved 48 real clusters with different hardware arrangements. The metrics were analyzed using linear models for screening, ranking, and measurement of each factor's impact. Results: Our findings include prediction models and show some non-intuitive results about the small influence of cores and the neutrality of memory and disks on total execution time, and the non-significant impact of data input scale on costs, although notably impacts the execution time.

Keywords: big data, design of experiments, distributed machine learning, natural language processing, spark

Procedia PDF Downloads 86
1692 A Novel Framework for User-Friendly Ontology-Mediated Access to Relational Databases

Authors: Efthymios Chondrogiannis, Vassiliki Andronikou, Efstathios Karanastasis, Theodora Varvarigou

Abstract:

A large amount of data is typically stored in relational databases (DB). The latter can efficiently handle user queries which intend to elicit the appropriate information from data sources. However, direct access and use of this data requires the end users to have an adequate technical background, while they should also cope with the internal data structure and values presented. Consequently the information retrieval is a quite difficult process even for IT or DB experts, taking into account the limited contributions of relational databases from the conceptual point of view. Ontologies enable users to formally describe a domain of knowledge in terms of concepts and relations among them and hence they can be used for unambiguously specifying the information captured by the relational database. However, accessing information residing in a database using ontologies is feasible, provided that the users are keen on using semantic web technologies. For enabling users form different disciplines to retrieve the appropriate data, the design of a Graphical User Interface is necessary. In this work, we will present an interactive, ontology-based, semantically enable web tool that can be used for information retrieval purposes. The tool is totally based on the ontological representation of underlying database schema while it provides a user friendly environment through which the users can graphically form and execute their queries.

Keywords: ontologies, relational databases, SPARQL, web interface

Procedia PDF Downloads 251
1691 Rendering Cognition Based Learning in Coherence with Development within the Context of PostgreSQL

Authors: Manuela Nayantara Jeyaraj, Senuri Sucharitharathna, Chathurika Senarath, Yasanthy Kanagaraj, Indraka Udayakumara

Abstract:

PostgreSQL is an Object Relational Database Management System (ORDBMS) that has been in existence for a while. Despite the superior features that it wraps and packages to manage database and data, the database community has not fully realized the importance and advantages of PostgreSQL. Hence, this research tends to focus on provisioning a better environment of development for PostgreSQL in order to induce the utilization and elucidate the importance of PostgreSQL. PostgreSQL is also known to be the world’s most elementary SQL-compliant open source ORDBMS. But, users have not yet resolved to PostgreSQL due to the facts that it is still under the layers and the complexity of its persistent textual environment for an introductory user. Simply stating this, there is a dire need to explicate an easy way of making the users comprehend the procedure and standards with which databases are created, tables and the relationships among them, manipulating queries and their flow based on conditions in PostgreSQL to help the community resolve to PostgreSQL at an augmented rate. Hence, this research under development within the context tends to initially identify the dominant features provided by PostgreSQL over its competitors. Following the identified merits, an analysis on why the database community holds a hesitance in migrating to PostgreSQL’s environment will be carried out. These will be modulated and tailored based on the scope and the constraints discovered. The resultant of the research proposes a system that will serve as a designing platform as well as a learning tool that will provide an interactive method of learning via a visual editor mode and incorporate a textual editor for well-versed users. The study is based on conjuring viable solutions that analyze a user’s cognitive perception in comprehending human computer interfaces and the behavioural processing of design elements. By providing a visually draggable and manipulative environment to work with Postgresql databases and table queries, it is expected to highlight the elementary features displayed by Postgresql over any other existent systems in order to grasp and disseminate the importance and simplicity offered by this to a hesitant user.

Keywords: cognition, database, PostgreSQL, text-editor, visual-editor

Procedia PDF Downloads 251
1690 Analysis and Prediction of COVID-19 by Using Recurrent LSTM Neural Network Model in Machine Learning

Authors: Grienggrai Rajchakit

Abstract:

As we all know that coronavirus is announced as a pandemic in the world by WHO. It is speeded all over the world with few days of time. To control this spreading, every citizen maintains social distance and self-preventive measures are the best strategies. As of now, many researchers and scientists are continuing their research in finding out the exact vaccine. The machine learning model finds that the coronavirus disease behaves in an exponential manner. To abolish the consequence of this pandemic, an efficient step should be taken to analyze this disease. In this paper, a recurrent neural network model is chosen to predict the number of active cases in a particular state. To make this prediction of active cases, we need a database. The database of COVID-19 is downloaded from the KAGGLE website and is analyzed by applying a recurrent LSTM neural network with univariant features to predict the number of active cases of patients suffering from the corona virus. The downloaded database is divided into training and testing the chosen neural network model. The model is trained with the training data set and tested with a testing dataset to predict the number of active cases in a particular state; here, we have concentrated on Andhra Pradesh state.

Keywords: COVID-19, coronavirus, KAGGLE, LSTM neural network, machine learning

Procedia PDF Downloads 137
1689 Block Mining: Block Chain Enabled Process Mining Database

Authors: James Newman

Abstract:

Process mining is an emerging technology that looks to serialize enterprise data in time series data. It has been used by many companies and has been the subject of a variety of research papers. However, the majority of current efforts have looked at how to best create process mining from standard relational databases. This paper is the first pass at outlining a database custom-built for the minimal viable product of process mining. We present Block Miner, a blockchain protocol to store process mining data across a distributed network. We demonstrate the feasibility of storing process mining data on the blockchain. We present a proof of concept and show how the intersection of these two technologies helps to solve a variety of issues, including but not limited to ransomware attacks, tax documentation, and conflict resolution.

Keywords: blockchain, process mining, memory optimization, protocol

Procedia PDF Downloads 67
1688 Modified Active (MA) Algorithm to Generate Semantic Web Related Clustered Hierarchy for Keyword Search

Authors: G. Leena Giri, Archana Mathur, S. H. Manjula, K. R. Venugopal, L. M. Patnaik

Abstract:

Keyword search in XML documents is based on the notion of lowest common ancestors in the labelled trees model of XML documents and has recently gained a lot of research interest in the database community. In this paper, we propose the Modified Active (MA) algorithm which is an improvement over the active clustering algorithm by taking into consideration the entity aspect of the nodes to find the level of the node pertaining to a particular keyword input by the user. A portion of the bibliography database is used to experimentally evaluate the modified active algorithm and results show that it performs better than the active algorithm. Our modification improves the response time of the system and thereby increases the efficiency of the system.

Keywords: keyword matching patterns, MA algorithm, semantic search, knowledge management

Procedia PDF Downloads 380