Search results for: semantic data profiling
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25673

Search results for: semantic data profiling

25133 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur

Abstract:

In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 593
25132 Implementation of Metabolomics in Conjunction with Chemometrics for the Dentification of the Differential Chemical Markers of Different Grades of Sri Lankan White, Green and Black Tea: Camellia Sinenesis L.

Authors: Dina A. Selim, Eman Shawky, Rasha M. Abu El-Khair

Abstract:

In the current study, UPLC-MS/MS combined to chemometrics were applied on seven Sri Lankan tea grades; Orange Pekoe, Flowery Pekoe, Broken Orange Pekoe Fannings, Broken Orange Pekoe black tea, green tea, silver tips and golden tips white tea grades for their comprehensive metabolic profiling. Certain metabolites, namely, Theasensinin C and E, theaflavin and theacitrin appeared to be the main chemical markers of black tea type, catechin, epicatechin, epigallocatechin, methyl epigallocatechin were the main discriminatory markers of green tea type, while theanine, oolongotheanine and quercetin glycosides were the main chemical markers of white tea type. Theogalloflavin, epigallocatechin and flavonoid glycosides were the main down-accumulated metabolites while theaflavin gallate, and N-ethyl pyrrolidinone epicatechin were the chief up- accumulated metabolites between whole and broken black tea leave grades while puerin A and C and gallic acid was the main down- accumulated metabolites and N-ethyl pyrrolidinone epicatechin gallate was the main up-accumulated one between broken and fanning black tea grades.

Keywords: tea grading, Sri Lankan tea, chemometrics, metabolomics, chemical markers

Procedia PDF Downloads 140
25131 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring

Authors: Seung-Lock Seo

Abstract:

This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.

Keywords: data mining, process data, monitoring, safety, industrial processes

Procedia PDF Downloads 401
25130 An Event-Related Potentials Study on the Processing of English Subjunctive Mood by Chinese ESL Learners

Authors: Yan Huang

Abstract:

Event-related potentials (ERPs) technique helps researchers to make continuous measures on the whole process of language comprehension, with an excellent temporal resolution at the level of milliseconds. The research on sentence processing has developed from the behavioral level to the neuropsychological level, which brings about a variety of sentence processing theories and models. However, the applicability of these models to L2 learners is still under debate. Therefore, the present study aims to investigate the neural mechanisms underlying English subjunctive mood processing by Chinese ESL learners. To this end, English subject clauses with subjunctive moods are used as the stimuli, all of which follow the same syntactic structure, “It is + adjective + that … + (should) do + …” Besides, in order to examine the role that language proficiency plays on L2 processing, this research deals with two groups of Chinese ESL learners (18 males and 22 females, mean age=21.68), namely, high proficiency group (Group H) and low proficiency group (Group L). Finally, the behavioral and neurophysiological data analysis reveals the following findings: 1) Syntax and semantics interact with each other on the SECOND phase (300-500ms) of sentence processing, which is partially in line with the Three-phase Sentence Model; 2) Language proficiency does affect L2 processing. Specifically, for Group H, it is the syntactic processing that plays the dominant role in sentence processing while for Group L, semantic processing also affects the syntactic parsing during the THIRD phase of sentence processing (500-700ms). Besides, Group H, compared to Group L, demonstrates a richer native-like ERPs pattern, which further demonstrates the role of language proficiency in L2 processing. Based on the research findings, this paper also provides some enlightenment for the L2 pedagogy as well as the L2 proficiency assessment.

Keywords: Chinese ESL learners, English subjunctive mood, ERPs, L2 processing

Procedia PDF Downloads 131
25129 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 130
25128 Umbilical Cord-Derived Cells in Corneal Epithelial Regeneration

Authors: Hasan Mahmud Reza

Abstract:

Extensive studies of the human umbilical cord, both basic and translational, over the last three decades have unveiled a plethora of information. The cord lining harbors at least two phenotypically different multipotent stem cells: mesenchymal stem cells (MSCs) and cord lining epithelial stem cells (CLECs). These cells exhibit a mixed genetic profiling of both embryonic and adult stem cells, hence display a broader stem features than cells from other sources. We have observed that umbilical cord-derived cells are immunologically privileged and non-tumorigenic by animal study. These cells are ethically acceptable, thus provides a significant advantage over other stem cells. The high proliferative capacity, viability, differentiation potential, and superior harvest of these cells have made them better candidates in comparison to contemporary adult stem cells. Following 30 replication cycles, these cells have been observed to retain their stemness, with their phenotype and karyotype intact. Transplantation of bioengineered CLEC sheets in limbal stem cell-deficient rabbit eyes resulted in regeneration of clear cornea with phenotypic expression of the normal cornea-specific epithelial cytokeratin markers. The striking features of low immunogenicity protecting self along with co-transplanted allografts from rejection largely define the transplantation potential of umbilical cord-derived stem cells.

Keywords: cord lining epithelial stem cells, mesenchymal stem cell, regenerative medicine, umbilical cord

Procedia PDF Downloads 159
25127 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault

Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola

Abstract:

Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.

Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula

Procedia PDF Downloads 84
25126 5iD Viewer: Observation of Fish School Behaviour in Labyrinths and Use of Semantic and Syntactic Entropy for School Structure Definition

Authors: Dalibor Štys, Kryštof M. Stys, Maryia Chkalova, Petr Kouba, Aliaxandr Pautsina, Dalibor Štys Jr., Jana Pečenková, Denis Durniev, Tomáš Náhlík, Petr Císař

Abstract:

In this article, a construction and some properties of the 5iD viewer, the system recording simultaneously five views of a given experimental object is reported. Properties of the system are demonstrated on the analysis of fish schooling behavior. It is demonstrated the method of instrument calibration which allows inclusion of image distortion and it is proposed and partly tested also the method of distance assessment in the case that only two opposite cameras are available. Finally, we demonstrate how the state trajectory of the behavior of the fish school may be constructed from the entropy of the system.

Keywords: 3D positioning, school behavior, distance calibration, space vision, space distortion

Procedia PDF Downloads 390
25125 An Ontology for Smart Learning Environments in Music Education

Authors: Konstantinos Sofianos, Michail Stefanidakis

Abstract:

Nowadays, despite the great advances in technology, most educational frameworks lack a strong educational design basis. E-learning has become prevalent, but it faces various challenges such as student isolation and lack of quality in the learning process. An intelligent learning system provides a student with educational material according to their learning background and learning preferences. It records full information about the student, such as demographic information, learning styles, and academic performance. This information allows the system to be fully adapted to the student’s needs. In this paper, we propose a framework and an ontology for music education, consisting of the learner model and all elements of the learning process (learning objects, teaching methods, learning activities, assessment). This framework can be integrated into an intelligent learning system and used for music education in schools for the development of professional skills and beyond.

Keywords: intelligent learning systems, e-learning, music education, ontology, semantic web

Procedia PDF Downloads 140
25124 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name

Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing

Abstract:

Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.

Keywords: NDN, order-preserving encryption, fuzzy search, privacy

Procedia PDF Downloads 487
25123 Healthcare Big Data Analytics Using Hadoop

Authors: Chellammal Surianarayanan

Abstract:

Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.

Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare

Procedia PDF Downloads 414
25122 A Deep Learning Approach to Subsection Identification in Electronic Health Records

Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan

Abstract:

Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.

Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification

Procedia PDF Downloads 219
25121 Data Disorders in Healthcare Organizations: Symptoms, Diagnoses, and Treatments

Authors: Zakieh Piri, Shahla Damanabi, Peyman Rezaii Hachesoo

Abstract:

Introduction: Healthcare organizations like other organizations suffer from a number of disorders such as Business Sponsor Disorder, Business Acceptance Disorder, Cultural/Political Disorder, Data Disorder, etc. As quality in healthcare care mostly depends on the quality of data, we aimed to identify data disorders and its symptoms in two teaching hospitals. Methods: Using a self-constructed questionnaire, we asked 20 questions in related to quality and usability of patient data stored in patient records. Research population consisted of 150 managers, physicians, nurses, medical record staff who were working at the time of study. We also asked their views about the symptoms and treatments for any data disorders they mentioned in the questionnaire. Using qualitative methods we analyzed the answers. Results: After classifying the answers, we found six main data disorders: incomplete data, missed data, late data, blurred data, manipulated data, illegible data. The majority of participants believed in their important roles in treatment of data disorders while others believed in health system problems. Discussion: As clinicians have important roles in producing of data, they can easily identify symptoms and disorders of patient data. Health information managers can also play important roles in early detection of data disorders by proactively monitoring and periodic check-ups of data.

Keywords: data disorders, quality, healthcare, treatment

Procedia PDF Downloads 434
25120 Network Conditioning and Transfer Learning for Peripheral Nerve Segmentation in Ultrasound Images

Authors: Harold Mauricio Díaz-Vargas, Cristian Alfonso Jimenez-Castaño, David Augusto Cárdenas-Peña, Guillermo Alberto Ortiz-Gómez, Alvaro Angel Orozco-Gutierrez

Abstract:

Precise identification of the nerves is a crucial task performed by anesthesiologists for an effective Peripheral Nerve Blocking (PNB). Now, anesthesiologists use ultrasound imaging equipment to guide the PNB and detect nervous structures. However, visual identification of the nerves from ultrasound images is difficult, even for trained specialists, due to artifacts and low contrast. The recent advances in deep learning make neural networks a potential tool for accurate nerve segmentation systems, so addressing the above issues from raw data. The most widely spread U-Net network yields pixel-by-pixel segmentation by encoding the input image and decoding the attained feature vector into a semantic image. This work proposes a conditioning approach and encoder pre-training to enhance the nerve segmentation of traditional U-Nets. Conditioning is achieved by the one-hot encoding of the kind of target nerve a the network input, while the pre-training considers five well-known deep networks for image classification. The proposed approach is tested in a collection of 619 US images, where the best C-UNet architecture yields an 81% Dice coefficient, outperforming the 74% of the best traditional U-Net. Results prove that pre-trained models with the conditional approach outperform their equivalent baseline by supporting learning new features and enriching the discriminant capability of the tested networks.

Keywords: nerve segmentation, U-Net, deep learning, ultrasound imaging, peripheral nerve blocking

Procedia PDF Downloads 108
25119 Predicting Mass-School-Shootings: Relevance of the FBI’s ‘Threat Assessment Perspective’ Two Decades Later

Authors: Frazer G. Thompson

Abstract:

The 1990s in America ended with a mass-school-shooting (at least four killed by gunfire excluding the perpetrator(s)) at Columbine High School in Littleton, Colorado. Post-event, many demanded that government and civilian experts develop a ‘profile’ of the potential school shooter in order to identify and preempt likely future acts of violence. This grounded theory research study seeks to explore the validity of the original hypotheses proposed by the Federal Bureau of Investigation (FBI) in 2000, as it relates to the commonality of disclosure by perpetrators of mass-school-shootings, by evaluating fourteen mass-school-shooting events between 2000 and 2019 at locations around the United States. Methods: The strategy of inquiry seeks to investigate case files, public records, witness accounts, and available psychological profiles of the shooter. The research methodology is inclusive of one-on-one interviews with members of the FBI’s Critical Incident Response Group seeking perspective on commonalities between individuals; specifically, disclosure of intent pre-event. Results: The research determined that school shooters do not ‘unfailingly’ notify others of their plans. However, in nine of the fourteen mass-school-shooting events analyzed, the perpetrator did inform the third party of their intent pre-event in some form of written, oral, or electronic communication. In the remaining five instances, the so-called ‘red-flag’ indicators of the potential for an event to occur were profound, and unto themselves, might be interpreted as notification to others of an imminent deadly threat. Conclusion: Data indicates that conclusions drawn in the FBI’s threat assessment perspective published in 2000 are relevant and current. There is evidence that despite potential ‘red-flag’ indicators which may or may not include a variety of other characteristics, perpetrators of mass-school-shooting events are likely to share their intentions with others through some form of direct or indirect communication. More significantly, implications of this research might suggest that society is often informed of potential danger pre-event but lacks any equitable means by which to disseminate, prevent, intervene, or otherwise act in a meaningful way considering said revelation.

Keywords: columbine, FBI profiling, guns, mass shooting, mental health, school violence

Procedia PDF Downloads 119
25118 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines

Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay

Abstract:

One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.

Keywords: big data, data analytics, higher education, republic of the philippines, assessment

Procedia PDF Downloads 349
25117 Written Narrative Texts as the Indicators of Communication Competence of Pupils and Students with Hearing Impairment in the Czech Language

Authors: Marie Komorna, Katerina Hadkova

Abstract:

One reason why hearing disabilities as compared to other disabilities are considered to be less serious, is the belief that deaf and hard of hearing persons can read and write without problems and can therefore fairly easily compensate for problems related to their limited ability to hear sound. However in reality this is not the case, especially as regards written Czech, deaf persons are often not able to communicate their message clearly to its recipients. Their inability to communicate fully in written language is one of the most severe problems facing a number of deaf persons, a problem which they face and which makes it difficult for them to function in a sound-based environment. Despite this fact, this issue is one which has been given only a minimum of attention in the Czech Republic. That is why we decided to focus our research on this issue, specifically targeting written communication of deaf pupils in primary and secondary schools. The paper summarizes the background and objectives of this research. The written work of deaf respondents was obtained in response to a narrative based on a series of images which depicted a continuous storyline. Based on an analysis of the obtained written work we tried to describe the specifics of the narrative abilities of the deaf authors of these texts. We also analyzed other aspects and specific traits of text written by deaf authors at a phonetic-phonological, lexical-semantic, morphological and syntactic, respectively pragmatic level. Based on the results of the project it will be possible to increase knowledge of the communication abilities of deaf persons in written Czech. The obtained data may be used during future research and for teaching purposes and/or education concepts for teaching Czech to deaf pupils.

Keywords: communication competence, deaf, narrative, written texts

Procedia PDF Downloads 339
25116 Profiling on the Holistic Identity of Malaysian Gifted Learners

Authors: Rorlinda Yusof, Siti Aishah Hassan, Afifah Mohamad Radzi, Mohd Hakimie Zainal Abidin, Amran Rasli, Inderbir Sandhu

Abstract:

The purpose of this study is to examine the self-identities of gifted and talented students and the relationship between self-identity and academic accomplishment. A random sample of 300 students enrolled in a secondary education programme at the Pusat GENIUS@pintar Negara was chosen as respondents of a 151-item holistic-identity component development tool. The validity of the instrument was assessed using Principal Components Analysis and Factor Analysis via an inter-Item Correlation Matrix (Loading values 0.44 to 0.86), which resulted in the formation of eight dimensions. The Cronbach's Alpha was calculated to determine the instrument's reliability (the overall result was 0.98). The results showed that students' holistic-identity profiles were relatively high (mean=4.09, standard deviation=0.449). In addition, spiritual identity received the greatest mean score (4.34) out of the eight components of identity investigated, while leadership identity received the lowest mean score (3.88). A conceptual framework for Islamic school leadership is recommended to implement spiritual values without differentiation to harmonize spiritual and intellectual intelligence among all the students. Some benchmarking studies with other centres for gifted and talented students are recommended for further research.

Keywords: holistic self-identity, academic achievement, self-development programme, counselling services, gifted and talented students

Procedia PDF Downloads 112
25115 Data Management and Analytics for Intelligent Grid

Authors: G. Julius P. Roy, Prateek Saxena, Sanjeev Singh

Abstract:

Power distribution utilities two decades ago would collect data from its customers not later than a period of at least one month. The origin of SmartGrid and AMI has subsequently increased the sampling frequency leading to 1000 to 10000 fold increase in data quantity. This increase is notable and this steered to coin the tern Big Data in utilities. Power distribution industry is one of the largest to handle huge and complex data for keeping history and also to turn the data in to significance. Majority of the utilities around the globe are adopting SmartGrid technologies as a mass implementation and are primarily focusing on strategic interdependence and synergies of the big data coming from new information sources like AMI and intelligent SCADA, there is a rising need for new models of data management and resurrected focus on analytics to dissect data into descriptive, predictive and dictatorial subsets. The goal of this paper is to is to bring load disaggregation into smart energy toolkit for commercial usage.

Keywords: data management, analytics, energy data analytics, smart grid, smart utilities

Procedia PDF Downloads 780
25114 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive

Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh

Abstract:

Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.

Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data

Procedia PDF Downloads 296
25113 A Linguistic Product of K-Pop: A Corpus-Based Study on the Korean-Originated Chinese Neologism Simida

Authors: Hui Shi

Abstract:

This article examines the online popularity of Chinese neologism simida, which is a loanword derived from Korean declarative sentence-final suffix seumnida. Facilitated by corpus data obtained from Weibo, the Chinese counterpart of Twitter, this study analyzes the morphological and syntactical processes behind simida’s coinage, as well as the causes of its prevalence on Chinese social media. The findings show that simida is used by Weibo bloggers in two manners: (1) as an alternative word of 'Korea' and 'Korean'; (2) as a redundant sentence-final particle which adds a Korean-like speech style to a statement. Additionally, Weibo user profile analysis further reveals demographical distribution patterns concerning this neologism and highlights young Weibo users in the third-tier cities as the leading adopters of simida. These results are accounted for under the theoretical framework of social indexicality, especially how variations generate style in the indexical field. This article argues that the creation of such an ethnically-targeted neologism is a linguistic demonstration of Chinese netizen’s two-sided attitudes toward the previously heated Korean-wave. The exotic suffix seumnida is borrowed to Chinese as simida due to its high-frequency in Korean cultural exports. Therefore, it gradually becomes a replacement of Korea-related lexical items due to markedness, regardless of semantic prosody. Its innovative implantation to Chinese syntax, on the other hand, reflects Chinese netizens’ active manipulation of language for their online identity building. This study has implications for research on the linguistic construction of identity and style and lays the groundwork for linguistic creativity in the Chinese new media.

Keywords: Chinese neologism, loanword, humor, new media

Procedia PDF Downloads 175
25112 On the Interactive Search with Web Documents

Authors: Mario Kubek, Herwig Unger

Abstract:

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-of-the-art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents

Keywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking

Procedia PDF Downloads 394
25111 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 216
25110 Glyco-Biosensing as a Novel Tool for Prostate Cancer Early-Stage Diagnosis

Authors: Pavel Damborsky, Martina Zamorova, Jaroslav Katrlik

Abstract:

Prostate cancer is annually the most common newly diagnosed cancer among men. An extensive number of evidence suggests that traditional serum Prostate-specific antigen (PSA) assay still suffers from a lack of sufficient specificity and sensitivity resulting in vast over-diagnosis and overtreatment. Thus, the early-stage detection of prostate cancer (PCa) plays undisputedly a critical role for successful treatment and improved quality of life. Over the last decade, particular altered glycans have been described that are associated with a range of chronic diseases, including cancer and inflammation. These glycans differences enable a distinction to be made between physiological and pathological state and suggest a valuable biosensing tool for diagnosis and follow-up purposes. Aberrant glycosylation is one of the major characteristics of disease progression. Consequently, the aim of this study was to develop a more reliable tool for early-stage PCa diagnosis employing lectins as glyco-recognition elements. Biosensor and biochip technology putting to use lectin-based glyco-profiling is one of the most promising strategies aimed at providing fast and efficient analysis of glycoproteins. The proof-of-concept experiments based on sandwich assay employing anti-PSA antibody and an aptamer as a capture molecules followed by lectin glycoprofiling were performed. We present a lectin-based biosensing assay for glycoprofiling of serum biomarker PSA using different biosensor and biochip platforms such as label-free surface plasmon resonance (SPR) and microarray with fluorescent label. The results suggest significant differences in interaction of particular lectins with PSA. The antibody-based assay is frequently associated with the sensitivity, reproducibility, and cross-reactivity issues. Aptamers provide remarkable advantages over antibodies due to the nucleic acid origin, stability and no glycosylation. All these data are further step for construction of highly selective, sensitive and reliable sensors for early-stage diagnosis. The experimental set-up also holds promise for the development of comparable assays with other glycosylated disease biomarkers.

Keywords: biomarker, glycosylation, lectin, prostate cancer

Procedia PDF Downloads 406
25109 Democracy Bytes: Interrogating the Exploitation of Data Democracy by Radical Terrorist Organizations

Authors: Nirmala Gopal, Sheetal Bhoola, Audecious Mugwagwa

Abstract:

This paper discusses the continued infringement and exploitation of data by non-state actors for destructive purposes, emphasizing radical terrorist organizations. It will discuss how terrorist organizations access and use data to foster their nefarious agendas. It further examines how cybersecurity, designed as a tool to curb data exploitation, is ineffective in raising global citizens' concerns about how their data can be kept safe and used for its acquired purpose. The study interrogates several policies and data protection instruments, such as the Data Protection Act, Cyber Security Policies, Protection of Personal Information(PPI) and General Data Protection Regulations (GDPR), to understand data use and storage in democratic states. The study outcomes point to the fact that international cybersecurity and cybercrime legislation, policies, and conventions have not curbed violations of data access and use by radical terrorist groups. The study recommends ways to enhance cybersecurity and reduce cyber risks using democratic principles.

Keywords: cybersecurity, data exploitation, terrorist organizations, data democracy

Procedia PDF Downloads 205
25108 Healthcare Data Mining Innovations

Authors: Eugenia Jilinguirian

Abstract:

In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.

Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database

Procedia PDF Downloads 68
25107 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods

Procedia PDF Downloads 361
25106 Molecular Profiling and Potential Bioactive Characteristics of Endophytic Fungi Isolated from Leptadenia Pyrotechnica

Authors: Walaa Al-Maghraby

Abstract:

Endophytes are organisms that colonize internal plant tissues without causing apparent harm to their host. Almost all groups of microorganisms have been found in endophytic association with plants may be fungi. They stimulate the production of secondary metabolites with a diverse range of biological activities. Leptadenia pyrotechnica is a more or less leafless, erect shrub with straight stems which is highly distributed in Saudi Arabia. Four endophytes fungi were isolated from Leptadenia pyrotechnica and identified using 18S ribosomal RNA sequences, which revealed four fungi genuses, namely Aspergillus terreus; Aspergillus welwitschiae; Aspergillus fumigatus and Aspergillus flavus. In this present study, four endophytic fungi from Leptadenia pyrotechnica were used for obtaining crude aqueous and ethyl acetate extracts for antimicrobial screening against 6 human pathogens, the antibacterial tests presented satisfactory results, where the pathogenic bacteria were inhibited by the four extracts tested, except for Escherichia coli that was inhibited by all extracts except ethyl acetate extract of Aspergillus terreus. Analysis of variance showed that the extract produced by endophyte Leptadenia pyrotechnica was the most effective against all bacteria, either gram-negative or positive. However, the extract was not efficient against pathogenic fungi. Therefore, this study indicates that endophytes from medicinal plant Leptadenia pyrotechnica could be potential sources of antibacterial substances.

Keywords: antimicrobial activity, Aspergillus sp, endophytes, Leptadenia pyrotechnica

Procedia PDF Downloads 141
25105 Access to Health Data in Medical Records in Indonesia in Terms of Personal Data Protection Principles: The Limitation and Its Implication

Authors: Anny Retnowati, Elisabeth Sundari

Abstract:

This research aims to elaborate the meaning of personal data protection principles on patient access to health data in medical records in Indonesia and its implications. The method uses normative legal research by examining health law in Indonesia regarding the patient's right to access their health data in medical records. The data will be analysed qualitatively using the interpretation method to elaborate on the limitation of the meaning of personal data protection principles on patients' access to their data in medical records. The results show that patients only have the right to obtain copies of their health data in medical records. There is no right to inspect directly at any time. Indonesian health law limits the principle of patients' right to broad access to their health data in medical records. This restriction has implications for the reduction of personal data protection as part of human rights. This research contribute to show that a limitaion of personal data protection may abuse the human rights.

Keywords: access, health data, medical records, personal data, protection

Procedia PDF Downloads 94
25104 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: data management, digitization, industry 4.0, knowledge engineering, metamodel

Procedia PDF Downloads 356