Search results for: Text Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2232

Search results for: Text Mining

1452 Grid and Market Integration of Large Scale Wind Farms using Advanced Predictive Data Mining Techniques

Authors: Umit Cali

Abstract:

The integration of intermittent energy sources like wind farms into the electricity grid has become an important challenge for the utilization and control of electric power systems, because of the fluctuating behaviour of wind power generation. Wind power predictions improve the economic and technical integration of large amounts of wind energy into the existing electricity grid. Trading, balancing, grid operation, controllability and safety issues increase the importance of predicting power output from wind power operators. Therefore, wind power forecasting systems have to be integrated into the monitoring and control systems of the transmission system operator (TSO) and wind farm operators/traders. The wind forecasts are relatively precise for the time period of only a few hours, and, therefore, relevant with regard to Spot and Intraday markets. In this work predictive data mining techniques are applied to identify a statistical and neural network model or set of models that can be used to predict wind power output of large onshore and offshore wind farms. These advanced data analytic methods helps us to amalgamate the information in very large meteorological, oceanographic and SCADA data sets into useful information and manageable systems. Accurate wind power forecasts are beneficial for wind plant operators, utility operators, and utility customers. An accurate forecast allows grid operators to schedule economically efficient generation to meet the demand of electrical customers. This study is also dedicated to an in-depth consideration of issues such as the comparison of day ahead and the short-term wind power forecasting results, determination of the accuracy of the wind power prediction and the evaluation of the energy economic and technical benefits of wind power forecasting.

Keywords: renewable energy sources, wind power, forecasting, data mining, big data, artificial intelligence, energy economics, power trading, power grids

Procedia PDF Downloads 508
1451 The Use of Neuter in Oedipus Lines to Refer to Antigone in Phoenissae of Seneca

Authors: Cíntia Martins Sanches

Abstract:

In the first part of Phoenissae of Seneca, Antigone is a guide to Oedipus, and they leave Thebes: he is blind searching for death (inflicting the punishment himself wished on the killer of Laius, ie exile and death); she is trying to convince him to give up such punishment and bring him back to Thebes. Concerning Oedipus lines, we observed a high frequency of Latin neuter in the treatment the protagonist gave to his daughter Antigone. We considered in this study that such frequency may be related to the sanctification of the daughter, who is seen by him as an enlightened being and without defects, free of the human condition (which takes on the existence of failures by essence). This study, thus, puts forward an analysis of the passages the said feature is present, relating them to the effect of meaning found in each occurrence. As part of a doctorate, this study investigates the stylistic idiom of Seneca in the Oedipus and Phoenissae tragedies, aiming at translating both tragedies expressively. The concept of stylistic idiom concerns the stylistic affinity required for a translation to be equivalent to the source text. In this wise, this study inquires into how the Latin text is organized poetically, pointing out the expressive features frequently appearing in both dramas. The method we used is based on the Semiotics theory — observing how connotation, ie a language use in which prevails the poetic function, naturally polysemous, acts to achieve each expressive effect.

Keywords: antigone, neuter, Oedipus, Phoenissae, Seneca

Procedia PDF Downloads 279
1450 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events

Authors: Jaqueline Maria Ribeiro Vieira

Abstract:

Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). Previously we developed and proposed a novel strategy capable of detecting patterns at borehole images that may point to regions that have tension and breakout characteristics, based on segmented images. In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge data set configurations.

Keywords: image segmentation, oil well visualization, classifiers, data-mining, visual computer

Procedia PDF Downloads 293
1449 Ranking Priorities for Digital Health in Portugal: Aligning Health Managers’ Perceptions with Official Policy Perspectives

Authors: Pedro G. Rodrigues, Maria J. Bárrios, Sara A. Ambrósio

Abstract:

The digitalisation of health is a profoundly transformative economic, political, and social process. As is often the case, such processes need to be carefully managed if misunderstandings, policy misalignments, or outright conflicts between the government and a wide gamut of stakeholders with competing interests are to be avoided. Thus, ensuring open lines of communication where all parties know what each other’s concerns are is key to good governance, as well as efficient and effective policymaking. This project aims to make a small but still significant contribution in this regard in that we seek to determine the extent to which health managers’ perceptions of what is a priority for digital health in Portugal are aligned with official policy perspectives. By applying state-of-the-art artificial intelligence technology first to the indexed literature on digital health and then to a set of official policy documents on the same topic, followed by a survey directed at health managers working in public and private hospitals in Portugal, we obtain two priority rankings that, when compared, will allow us to produce a synthesis and toolkit on digital health policy in Portugal, with a view to identifying areas of policy convergence and divergence. This project is also particularly peculiar in the sense that sophisticated digital methods related to text analytics are employed to study good governance aspects of digitalisation applied to health care.

Keywords: digital health, health informatics, text analytics, governance, natural language understanding

Procedia PDF Downloads 54
1448 Increasing the Capacity of Plant Bottlenecks by Using of Improving the Ratio of Mean Time between Failures to Mean Time to Repair

Authors: Jalal Soleimannejad, Mohammad Asadizeidabadi, Mahmoud Koorki, Mojtaba Azarpira

Abstract:

A significant percentage of production costs is the maintenance costs, and analysis of maintenance costs could to achieve greater productivity and competitiveness. With this is mind, the maintenance of machines and installations is considered as an essential part of organizational functions and applying effective strategies causes significant added value in manufacturing activities. Organizations are trying to achieve performance levels on a global scale with emphasis on creating competitive advantage by different methods consist of RCM (Reliability-Center-Maintenance), TPM (Total Productivity Maintenance) etc. In this study, increasing the capacity of Concentration Plant of Golgohar Iron Ore Mining & Industrial Company (GEG) was examined by using of reliability and maintainability analyses. The results of this research showed that instead of increasing the number of machines (in order to solve the bottleneck problems), the improving of reliability and maintainability would solve bottleneck problems in the best way. It should be mention that in the abovementioned study, the data set of Concentration Plant of GEG as a case study, was applied and analyzed.

Keywords: bottleneck, golgohar iron ore mining & industrial company, maintainability, maintenance costs, reliability

Procedia PDF Downloads 349
1447 The Role of Digital Text in School and Vernacular Literacies: Students Digital Practices at Cybercafés in Mexico

Authors: Guadalupe López-Bonilla

Abstract:

Students of all educational levels participate in literacy practices that may involve print or digital media. Scholars from the New Literacy Studies distinguish practices that fulfill institutional purposes such as those established at schools from literate practices aimed at doing other kinds of activities, such as reading instructions in order to play a video game; the first are known as institutional practices while the latter are considered vernacular literacies. When students perform these kinds of activities they engage with print and digital media according to the demands of the task. In this paper, it is aimed to discuss the results of a research project focusing on literacy practices of high school students at 10 urban cybercafés in Mexico. The main objective was to analyze the literacy practices of students performing both school tasks and vernacular literacies. The methodology included a focused ethnography with online and face to face observations of 10 high school students (5 male and 5 female) and interviews after performing each task. In the results, it is presented how students treat texts as open, dynamic and relational artifacts when engaging in vernacular literacies; while texts are conceived as closed, authoritarian and fixed documents when performing school activities. Samples of each type of activity are shown followed by a discussion of the pedagogical implications for improving school literacy.

Keywords: digital literacy, text, school literacy, vernacular practices

Procedia PDF Downloads 261
1446 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu

Abstract:

Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation

Procedia PDF Downloads 307
1445 Teaching Tolerance in the Language Classroom through a Text

Authors: Natalia Kasatkina

Abstract:

In an ever-increasing globalization, one’s grasp of diversity and tolerance has never been more indispensable, and it is a vital duty for all those in the field of foreign language teaching to help children cultivate such values. The present study explores the role of DIVERSITY and TOLERANCE in the language classroom and elementary, middle, and high school students’ perceptions of these two concepts. It draws on several theoretical domains of language acquisition, cultural awareness, and school psychology. Relying on these frameworks, the major findings are synthesized, and a paradigm of teaching tolerance through language-teaching is formulated. Upon analysing how tolerant our children are with ‘others’ in and outside the classroom, we have concluded that intolerance and aggression towards the ‘other’ increase with age, and that a feeling of supremacy over migrants and a sense of fear towards them begin to manifest more apparently when the students are in high school. In addition, we have also found that children in elementary school do not exhibit such prejudiced thoughts and behavior, which leads us to the believe that tolerance as well as intolerance are learned. Therefore, it is within our reach to teach our children to be open-minded and accepting. We have used the novel ‘Uncle Tom’s Cabin’ by Harriet Beecher Stowe as a springboard for lessons which are not only targeted at shedding light on the role of language in the modern world, but also aim to stimulate an awareness of cultural diversity. We equally strive to conduct further cross-cultural research in order to solidify the theory behind this study, and thus devise a language-based curriculum which would encourage tolerance through the examination of various literary texts.

Keywords: literary text, tolerance, EFL classroom, word-association test

Procedia PDF Downloads 285
1444 Ecological Risk Aspects of Essential Trace Metals in Soil Derived From Gold Mining Region, South Africa

Authors: Lowanika Victor Tibane, David Mamba

Abstract:

Human body, animals, and plants depend on certain essential metals in permissible quantities for their survival. Excessive metal concentration may cause severe malfunctioning of the organisms and even fatal in extreme cases. Because of gold mining in the Witwatersrand basin in South Africa, enormous untreated mine dumps comprise elevated concentration of essential trace elements. Elevated quantities of trace metal have direct negative impact on the quality of soil for different land use types, reduce soil efficiency for plant growth, and affect the health human and animals. A total of 21 subsoil samples were examined using inductively coupled plasma optical emission spectrometry and X-ray fluorescence methods and the results elevated men concentration of Fe (36,433.39) > S (5,071.83) > Cu (1,717,28) > Mn (612.81) > Cr (74.52) > Zn (68.67) > Ni (40.44) > Co (9.63) > P (3.49) > Mo > (2.74), reported in mg/kg. Using various contamination indices, it was discovered that the sites surveyed are on average moderately contaminated with Co, Cr, Cu, Mn, Ni, S, and Zn. The ecological risk assessment revealed a low ecological risk for Cr, Ni and Zn, whereas Cu poses a very high ecological risk.

Keywords: essential trace elements, soil contamination, contamination indices, toxicity, descriptive statistics, ecological risk evaluation

Procedia PDF Downloads 85
1443 Factors Affecting Visual Environment in Mine Lighting

Authors: N. Lakshmipathy, Ch. S. N. Murthy, M. Aruna

Abstract:

The design of lighting systems for surface mines is not an easy task because of the unique environment and work procedures encountered in the mines. The primary objective of this paper is to identify the major problems encountered in mine lighting application and to provide guidance in the solution of these problems. In the surface mining reflectance of surrounding surfaces is one of the important factors, which improve the vision, in the night hours. But due to typical working nature in the mines it is very difficult to fulfill these requirements, and also the orientation of the light at work site is a challenging task. Due to this reason machine operator and other workers in a mine need to be able to orient themselves in a difficult visual environment. The haul roads always keep on changing to tune with the mining activity. Other critical area such as dumpyards, stackyards etc. also change their phase with time, and it is difficult to illuminate such areas. Mining is a hazardous occupation, with workers exposed to adverse conditions; apart from the need for hard physical labor, there is exposure to stress and environmental pollutants like dust, noise, heat, vibration, poor illumination, radiation, etc. Visibility is restricted when operating load haul dumper and Heavy Earth Moving Machinery (HEMM) vehicles resulting in a number of serious accidents. one of the leading causes of these accidents is the inability of the equipment operator to see clearly people, objects or hazards around the machine. Results indicate blind spots are caused primarily by posts, the back of the operator's cab, and by lights and light brackets. The careful designed and implemented, lighting systems provide mine workers improved visibility and contribute to improved safety, productivity and morale. Properly designed lighting systems can improve visibility and safety during working in the opencast mines.

Keywords: contrast, efficacy, illuminance, illumination, light, luminaire, luminance, reflectance, visibility

Procedia PDF Downloads 353
1442 Multimodal Content: Fostering Students’ Language and Communication Competences

Authors: Victoria L. Malakhova

Abstract:

The research is devoted to multimodal content and its effectiveness in developing students’ linguistic and intercultural communicative competences as an indefeasible constituent of their future professional activity. Description of multimodal content both as a linguistic and didactic phenomenon makes the study relevant. The objective of the article is the analysis of creolized texts and the effect they have on fostering higher education students’ skills and their productivity. The main methods used are linguistic text analysis, qualitative and quantitative methods, deduction, generalization. The author studies texts with full and partial creolization, their features and role in composing multimodal textual space. The main verbal and non-verbal markers and paralinguistic means that enhance the linguo-pragmatic potential of creolized texts are covered. To reveal the efficiency of multimodal content application in English teaching, the author conducts an experiment among both undergraduate students and teachers. This allows specifying main functions of creolized texts in the process of language learning, detecting ways of enhancing students’ competences, and increasing their motivation. The described stages of using creolized texts can serve as an algorithm for work with multimodal content in teaching English as a foreign language. The findings contribute to improving the efficiency of the academic process.

Keywords: creolized text, English language learning, higher education, language and communication competences, multimodal content

Procedia PDF Downloads 104
1441 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization

Procedia PDF Downloads 184
1440 Sampling and Chemical Characterization of Particulate Matter in a Platinum Mine

Authors: Juergen Orasche, Vesta Kohlmeier, George C. Dragan, Gert Jakobi, Patricia Forbes, Ralf Zimmermann

Abstract:

Underground mining poses a difficult environment for both man and machines. At more than 1000 meters underneath the surface of the earth, ores and other mineral resources are still gained by conventional and motorised mining. Adding to the hazards caused by blasting and stone-chipping, the working conditions are best described by the high temperatures of 35-40°C and high humidity, at low air exchange rates. Separate ventilation shafts lead fresh air into a mine and others lead expended air back to the surface. This is essential for humans and machines working deep underground. Nevertheless, mines are widely ramified. Thus the air flow rate at the far end of a tunnel is sensed to be close to zero. In recent years, conventional mining was supplemented by mining with heavy diesel machines. These very flat machines called Load Haul Dump (LHD) vehicles accelerate and ease work in areas favourable for heavy machines. On the other hand, they emit non-filtered diesel exhaust, which constitutes an occupational hazard for the miners. Combined with a low air exchange, high humidity and inorganic dust from the mining it leads to 'black smog' underneath the earth. This work focuses on the air quality in mines employing LHDs. Therefore we performed personal sampling (samplers worn by miners during their work), stationary sampling and aethalometer (Microaeth MA200, Aethlabs) measurements in a platinum mine in around 1000 meters under the earth’s surface. We compared areas of high diesel exhaust emission with areas of conventional mining where no diesel machines were operated. For a better assessment of health risks caused by air pollution we applied a separated gas-/particle-sampling tool (or system), with first denuder section collecting intermediate VOCs. These multi-channel silicone rubber denuders are able to trap IVOCs while allowing particles ranged from 10 nm to 1 µm in diameter to be transmitted with an efficiency of nearly 100%. The second section is represented by a quartz fibre filter collecting particles and adsorbed semi-volatile organic compounds (SVOC). The third part is a graphitized carbon black adsorber – collecting the SVOCs that evaporate from the filter. The compounds collected on these three sections were analyzed in our labs with different thermal desorption techniques coupled with gas chromatography and mass spectrometry (GC-MS). VOCs and IVOCs were measured with a Shimadzu Thermal Desorption Unit (TD20, Shimadzu, Japan) coupled to a GCMS-System QP 2010 Ultra with a quadrupole mass spectrometer (Shimadzu). The GC was equipped with a 30m, BP-20 wax column (0.25mm ID, 0.25µm film) from SGE (Australia). Filters were analyzed with In-situ derivatization thermal desorption gas chromatography time-of-flight-mass spectrometry (IDTD-GC-TOF-MS). The IDTD unit is a modified GL sciences Optic 3 system (GL Sciences, Netherlands). The results showed black carbon concentrations measured with the portable aethalometers up to several mg per m³. The organic chemistry was dominated by very high concentrations of alkanes. Typical diesel engine exhaust markers like alkylated polycyclic aromatic hydrocarbons were detected as well as typical lubrication oil markers like hopanes.

Keywords: diesel emission, personal sampling, aethalometer, mining

Procedia PDF Downloads 149
1439 A Postmodern Framework for Quranic Hermeneutics

Authors: Christiane Paulus

Abstract:

Post-Islamism assumes that the Quran should not be viewed in terms of what Lyotard identifies as a ‘meta-narrative'. However, its socio-ethical content can be viewed as critical of power discourse (Foucault). Practicing religion seems to be limited to rites and individual spirituality, taqwa. Alternatively, can we build on Muhammad Abduh's classic-modern reform and develop it through a postmodernist frame? This is the main question of this study. Through his general and vague remarks on the context of the Quran, Abduh was the first to refer to the historical and cultural distance of the text as an obstacle for interpretation. His application, however, corresponded to the modern absolute idea of authentic sharia. He was followed by Amin al-Khuli, who hermeneutically linked the content of the Quran to the theory of evolution. Fazlur Rahman and Nasr Hamid abu Zeid remain reluctant to go beyond the general level in terms of context. The hermeneutic circle, therefore, persists in challenging, how to get out to overcome one’s own assumptions. The insight into and the acceptance of the lasting ambivalence of understanding can be grasped as a postmodern approach; it is documented in Derrida's discovery of the shift in text meanings, difference, also in Lyotard's theory of différend. The resulting mixture of meanings (Wolfgang Welsch) can be read together with the classic ambiguity of the premodern interpreters of the Quran (Thomas Bauer). Confronting hermeneutic difficulties in general, Niklas Luhmann proves every description an attribution, tautology, i.e., remaining in the circle. ‘De-tautologization’ is possible, namely by analyzing the distinctions in the sense of objective, temporal and social information that every text contains. This could be expanded with the Kantian aesthetic dimension of reason (critique of pure judgment) corresponding to the iʽgaz of the Coran. Luhmann asks, ‘What distinction does the observer/author make?’ Quran as a speech from God to the first listeners could be seen as a discourse responding to the problems of everyday life of that time, which can be viewed as the general goal of the entire Qoran. Through reconstructing koranic Lifeworlds (Alfred Schütz) in detail, the social structure crystallizes the socio-economic differences, the enormous poverty. The koranic instruction to provide the basic needs for the neglected groups, which often intersect (old, poor, slaves, women, children), can be seen immediately in the text. First, the references to lifeworlds/social problems and discourses in longer koranic passages should be hypothesized. Subsequently, information from the classic commentaries could be extracted, the classical Tafseer, in particular, contains rich narrative material for reconstructing. By selecting and assigning suitable, specific context information, the meaning of the description becomes condensed (Clifford Geertz). In this manner, the text gets necessarily an alienation and is newly accessible. The socio-ethical implications can thus be grasped from the difference of the original problem and the revealed/improved order/procedure; this small step can be materialized as such, not as an absolute solution but as offering plausible patterns for today’s challenges as the Agenda 2030.

Keywords: postmodern hermeneutics, condensed description, sociological approach, small steps of reform

Procedia PDF Downloads 208
1438 A General Framework for Measuring the Internal Fraud Risk of an Enterprise Resource Planning System

Authors: Imran Dayan, Ashiqul Khan

Abstract:

Internal corporate fraud, which is fraud carried out by internal stakeholders of a company, affects the well-being of the organisation just like its external counterpart. Even if such an act is carried out for the short-term benefit of a corporation, the act is ultimately harmful to the entity in the long run. Internal fraud is often carried out by relying upon aberrations from usual business processes. Business processes are the lifeblood of a company in modern managerial context. Such processes are developed and fine-tuned over time as a corporation grows through its life stages. Modern corporations have embraced technological innovations into their business processes, and Enterprise Resource Planning (ERP) systems being at the heart of such business processes is a testimony to that. Since ERP systems record a huge amount of data in their event logs, the logs are a treasure trove for anyone trying to detect any sort of fraudulent activities hidden within the day-to-day business operations and processes. This research utilises the ERP systems in place within corporations to assess the likelihood of prospective internal fraud through developing a framework for measuring the risks of fraud through Process Mining techniques and hence finds risky designs and loose ends within these business processes. This framework helps not only in identifying existing cases of fraud in the records of the event log, but also signals the overall riskiness of certain business processes, and hence draws attention for carrying out a redesign of such processes to reduce the chance of future internal fraud while improving internal control within the organisation. The research adds value by applying the concepts of Process Mining into the analysis of data from modern day applications of business process records, which is the ERP event logs, and develops a framework that should be useful to internal stakeholders for strengthening internal control as well as provide external auditors with a tool of use in case of suspicion. The research proves its usefulness through a few case studies conducted with respect to big corporations with complex business processes and an ERP in place.

Keywords: enterprise resource planning, fraud risk framework, internal corporate fraud, process mining

Procedia PDF Downloads 325
1437 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 249
1436 Alphabet Recognition Using Pixel Probability Distribution

Authors: Vaidehi Murarka, Sneha Mehta, Dishant Upadhyay

Abstract:

Our project topic is “Alphabet Recognition using pixel probability distribution”. The project uses techniques of Image Processing and Machine Learning in Computer Vision. Alphabet recognition is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files etc. Alphabet Recognition based OCR application is sometimes used in signature recognition which is used in bank and other high security buildings. One of the popular mobile applications includes reading a visiting card and directly storing it to the contacts. OCR's are known to be used in radar systems for reading speeders license plates and lots of other things. The implementation of our project has been done using Visual Studio and Open CV (Open Source Computer Vision). Our algorithm is based on Neural Networks (machine learning). The project was implemented in three modules: (1) Training: This module aims “Database Generation”. Database was generated using two methods: (a) Run-time generation included database generation at compilation time using inbuilt fonts of OpenCV library. Human intervention is not necessary for generating this database. (b) Contour–detection: ‘jpeg’ template containing different fonts of an alphabet is converted to the weighted matrix using specialized functions (contour detection and blob detection) of OpenCV. The main advantage of this type of database generation is that the algorithm becomes self-learning and the final database requires little memory to be stored (119kb precisely). (2) Preprocessing: Input image is pre-processed using image processing concepts such as adaptive thresholding, binarizing, dilating etc. and is made ready for segmentation. “Segmentation” includes extraction of lines, words, and letters from the processed text image. (3) Testing and prediction: The extracted letters are classified and predicted using the neural networks algorithm. The algorithm recognizes an alphabet based on certain mathematical parameters calculated using the database and weight matrix of the segmented image.

Keywords: contour-detection, neural networks, pre-processing, recognition coefficient, runtime-template generation, segmentation, weight matrix

Procedia PDF Downloads 377
1435 Framework for Integrating Big Data and Thick Data: Understanding Customers Better

Authors: Nikita Valluri, Vatcharaporn Esichaikul

Abstract:

With the popularity of data-driven decision making on the rise, this study focuses on providing an alternative outlook towards the process of decision-making. Combining quantitative and qualitative methods rooted in the social sciences, an integrated framework is presented with a focus on delivering a much more robust and efficient approach towards the concept of data-driven decision-making with respect to not only Big data but also 'Thick data', a new form of qualitative data. In support of this, an example from the retail sector has been illustrated where the framework is put into action to yield insights and leverage business intelligence. An interpretive approach to analyze findings from both kinds of quantitative and qualitative data has been used to glean insights. Using traditional Point-of-sale data as well as an understanding of customer psychographics and preferences, techniques of data mining along with qualitative methods (such as grounded theory, ethnomethodology, etc.) are applied. This study’s final goal is to establish the framework as a basis for providing a holistic solution encompassing both the Big and Thick aspects of any business need. The proposed framework is a modified enhancement in lieu of traditional data-driven decision-making approach, which is mainly dependent on quantitative data for decision-making.

Keywords: big data, customer behavior, customer experience, data mining, qualitative methods, quantitative methods, thick data

Procedia PDF Downloads 146
1434 Exploring Gaming-Learning Interaction in MMOG Using Data Mining Methods

Authors: Meng-Tzu Cheng, Louisa Rosenheck, Chen-Yen Lin, Eric Klopfer

Abstract:

The purpose of the research is to explore some of the ways in which gameplay data can be analyzed to yield results that feedback into the learning ecosystem. Back-end data for all users as they played an MMOG, The Radix Endeavor, was collected, and this study reports the analyses on a specific genetics quest by using the data mining techniques, including the decision tree method. In the study, different reasons for quest failure between participants who eventually succeeded and who never succeeded were revealed. Regarding the in-game tools use, trait examiner was a key tool in the quest completion process. Subsequently, the results of decision tree showed that a lack of trait examiner usage can be made up with additional Punnett square uses, displaying multiple pathways to success in this quest. The methods of analysis used in this study and the resulting usage patterns indicate some useful ways that gameplay data can provide insights in two main areas. The first is for game designers to know how players are interacting with and learning from their game. The second is for players themselves as well as their teachers to get information on how they are progressing through the game, and to provide help they may need based on strategies and misconceptions identified in the data.

Keywords: MMOG, decision tree, genetics, gaming-learning interaction

Procedia PDF Downloads 350
1433 Exploring the Correlation between Population Distribution and Urban Heat Island under Urban Data: Taking Shenzhen Urban Heat Island as an Example

Authors: Wang Yang

Abstract:

Shenzhen is a modern city of China's reform and opening-up policy, the development of urban morphology has been established on the administration of the Chinese government. This city`s planning paradigm is primarily affected by the spatial structure and human behavior. The subjective urban agglomeration center is divided into several groups and centers. In comparisons of this effect, the city development law has better to be neglected. With the continuous development of the internet, extensive data technology has been introduced in China. Data mining and data analysis has become important tools in municipal research. Data mining has been utilized to improve data cleaning such as receiving business data, traffic data and population data. Prior to data mining, government data were collected by traditional means, then were analyzed using city-relationship research, delaying the timeliness of urban development, especially for the contemporary city. Data update speed is very fast and based on the Internet. The city's point of interest (POI) in the excavation serves as data source affecting the city design, while satellite remote sensing is used as a reference object, city analysis is conducted in both directions, the administrative paradigm of government is broken and urban research is restored. Therefore, the use of data mining in urban analysis is very important. The satellite remote sensing data of the Shenzhen city in July 2018 were measured by the satellite Modis sensor and can be utilized to perform land surface temperature inversion, and analyze city heat island distribution of Shenzhen. This article acquired and classified the data from Shenzhen by using Data crawler technology. Data of Shenzhen heat island and interest points were simulated and analyzed in the GIS platform to discover the main features of functional equivalent distribution influence. Shenzhen is located in the east-west area of China. The city’s main streets are also determined according to the direction of city development. Therefore, it is determined that the functional area of the city is also distributed in the east-west direction. The urban heat island can express the heat map according to the functional urban area. Regional POI has correspondence. The research result clearly explains that the distribution of the urban heat island and the distribution of urban POIs are one-to-one correspondence. Urban heat island is primarily influenced by the properties of the underlying surface, avoiding the impact of urban climate. Using urban POIs as analysis object, the distribution of municipal POIs and population aggregation are closely connected, so that the distribution of the population corresponded with the distribution of the urban heat island.

Keywords: POI, satellite remote sensing, the population distribution, urban heat island thermal map

Procedia PDF Downloads 94
1432 Research on Evaluation of Renewable Energy Technology Innovation Strategy Based on PMC Index Model

Authors: Xue Wang, Liwei Fan

Abstract:

Renewable energy technology innovation is an important way to realize the energy transformation. Our government has issued a series of policies to guide and support the development of renewable energy. The implementation of these policies will affect the further development, utilization and technological innovation of renewable energy. In this context, it is of great significance to systematically sort out and evaluate the renewable energy technology innovation policy for improving the existing policy system. Taking the 190 renewable energy technology innovation policies issued during 2005-2021 as a sample, from the perspectives of policy issuing departments and policy keywords, it uses text mining and content analysis methods to analyze the current situation of the policies and conduct a semantic network analysis to identify the core issuing departments and core policy topic words; A PMC (Policy Modeling Consistency) index model is built to quantitatively evaluate the selected policies, analyze the overall pros and cons of the policy through its PMC index, and reflect the PMC value of the model's secondary index The core departments publish policies and the performance of each dimension of the policies related to the core topic headings. The research results show that Renewable energy technology innovation policies focus on synergy between multiple departments, while the distribution of the issuers is uneven in terms of promulgation time; policies related to different topics have their own emphasis in terms of policy types, fields, functions, and support measures, but It still needs to be improved, such as the lack of policy forecasting and supervision functions, the lack of attention to product promotion, and the relatively single support measures. Finally, this research puts forward policy optimization suggestions in terms of promoting joint policy release, strengthening policy coherence and timeliness, enhancing the comprehensiveness of policy functions, and enriching incentive measures for renewable energy technology innovation.

Keywords: renewable energy technology innovation, content analysis, policy evaluation, PMC index model

Procedia PDF Downloads 56
1431 Educational Data Mining: The Case of the Department of Mathematics and Computing in the Period 2009-2018

Authors: Mário Ernesto Sitoe, Orlando Zacarias

Abstract:

University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.

Keywords: evasion and retention, cross-validation, bagging, stacking

Procedia PDF Downloads 71
1430 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Since broadcast media wields lots of influence over the public, tools for understanding broadcasting contents are more required. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches. Scripts also provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scripts consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics based on statistical learning method. To tackle this problem, we propose a method of learning with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, by using high quality of topics, we can get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: broadcasting contents, scripts, text similarity, topic model

Procedia PDF Downloads 309
1429 Design and Implementation a Platform for Adaptive Online Learning Based on Fuzzy Logic

Authors: Budoor Al Abid

Abstract:

Educational systems are increasingly provided as open online services, providing guidance and support for individual learners. To adapt the learning systems, a proper evaluation must be made. This paper builds the evaluation model Fuzzy C Means Adaptive System (FCMAS) based on data mining techniques to assess the difficulty of the questions. The following steps are implemented; first using a dataset from an online international learning system called (slepemapy.cz) the dataset contains over 1300000 records with 9 features for students, questions and answers information with feedback evaluation. Next, a normalization process as preprocessing step was applied. Then FCM clustering algorithms are used to adaptive the difficulty of the questions. The result is three cluster labeled data depending on the higher Wight (easy, Intermediate, difficult). The FCM algorithm gives a label to all the questions one by one. Then Random Forest (RF) Classifier model is constructed on the clustered dataset uses 70% of the dataset for training and 30% for testing; the result of the model is a 99.9% accuracy rate. This approach improves the Adaptive E-learning system because it depends on the student behavior and gives accurate results in the evaluation process more than the evaluation system that depends on feedback only.

Keywords: machine learning, adaptive, fuzzy logic, data mining

Procedia PDF Downloads 179
1428 Comparative Analysis of the Computer Methods' Usage for Calculation of Hydrocarbon Reserves in the Baltic Sea

Authors: Pavel Shcherban, Vlad Golovanov

Abstract:

Nowadays, the depletion of hydrocarbon deposits on the land of the Kaliningrad region leads to active geological exploration and development of oil and natural gas reserves in the southeastern part of the Baltic Sea. LLC 'Lukoil-Kaliningradmorneft' implements a comprehensive program for the development of the region's shelf in 2014-2023. Due to heterogeneity of reservoir rocks in various open fields, as well as with ambiguous conclusions on the contours of deposits, additional geological prospecting and refinement of the recoverable oil reserves are carried out. The key element is use of an effective technique of computer stock modeling at the first stage of processing of the received data. The following step uses information for the cluster analysis, which makes it possible to optimize the field development approaches. The article analyzes the effectiveness of various methods for reserves' calculation and computer modelling methods of the offshore hydrocarbon fields. Cluster analysis allows to measure influence of the obtained data on the development of a technical and economic model for mining deposits. The relationship between the accuracy of the calculation of recoverable reserves and the need of modernization of existing mining infrastructure, as well as the optimization of the scheme of opening and development of oil deposits, is observed.

Keywords: cluster analysis, computer modelling of deposits, correction of the feasibility study, offshore hydrocarbon fields

Procedia PDF Downloads 160
1427 Use of Locally Effective Microorganisms in Conjunction with Biochar to Remediate Mine-Impacted Soils

Authors: Thomas F. Ducey, Kristin M. Trippe, James A. Ippolito, Jeffrey M. Novak, Mark G. Johnson, Gilbert C. Sigua

Abstract:

The Oronogo-Duenweg mining belt –approximately 20 square miles around the Joplin, Missouri area– is a designated United States Environmental Protection Agency Superfund site due to lead-contaminated soil and groundwater by former mining and smelting operations. Over almost a century of mining (from 1848 to the late 1960’s), an estimated ten million tons of cadmium, lead, and zinc containing material have been deposited on approximately 9,000 acres. Sites that have undergone remediation, in which the O, A, and B horizons have been removed along with the lead contamination, the exposed C horizon remains incalcitrant to revegetation efforts. These sites also suffer from poor soil microbial activity, as measured by soil extracellular enzymatic assays, though 16S ribosomal ribonucleic acid (rRNA) indicates that microbial diversity is equal to sites that have avoided mine-related contamination. Soil analysis reveals low soil organic carbon, along with high levels of bio-available zinc, that reflect the poor soil fertility conditions and low microbial activity. Our study looked at the use of several materials to restore and remediate these sites, with the goal of improving soil health. The following materials, and their purposes for incorporation into the study, were as follows: manure-based biochar for the binding of zinc and other heavy metals responsible for phytotoxicity, locally sourced biosolids and compost to incorporate organic carbon into the depleted soils, effective microorganisms harvested from nearby pristine sites to provide a stable community for nutrient cycling in the newly composited 'soil material'. Our results indicate that all four materials used in conjunction result in the greatest benefit to these mine-impacted soils, based on above ground biomass, microbial biomass, and soil enzymatic activities.

Keywords: locally effective microorganisms, biochar, remediation, reclamation

Procedia PDF Downloads 206
1426 Treating Voxels as Words: Word-to-Vector Methods for fMRI Meta-Analyses

Authors: Matthew Baucum

Abstract:

With the increasing popularity of fMRI as an experimental method, psychology and neuroscience can greatly benefit from advanced techniques for summarizing and synthesizing large amounts of data from brain imaging studies. One promising avenue is automated meta-analyses, in which natural language processing methods are used to identify the brain regions consistently associated with certain semantic concepts (e.g. “social”, “reward’) across large corpora of studies. This study builds on this approach by demonstrating how, in fMRI meta-analyses, individual voxels can be treated as vectors in a semantic space and evaluated for their “proximity” to terms of interest. In this technique, a low-dimensional semantic space is built from brain imaging study texts, allowing words in each text to be represented as vectors (where words that frequently appear together are near each other in the semantic space). Consequently, each voxel in a brain mask can be represented as a normalized vector sum of all of the words in the studies that showed activation in that voxel. The entire brain mask can then be visualized in terms of each voxel’s proximity to a given term of interest (e.g., “vision”, “decision making”) or collection of terms (e.g., “theory of mind”, “social”, “agent”), as measured by the cosine similarity between the voxel’s vector and the term vector (or the average of multiple term vectors). Analysis can also proceed in the opposite direction, allowing word cloud visualizations of the nearest semantic neighbors for a given brain region. This approach allows for continuous, fine-grained metrics of voxel-term associations, and relies on state-of-the-art “open vocabulary” methods that go beyond mere word-counts. An analysis of over 11,000 neuroimaging studies from an existing meta-analytic fMRI database demonstrates that this technique can be used to recover known neural bases for multiple psychological functions, suggesting this method’s utility for efficient, high-level meta-analyses of localized brain function. While automated text analytic methods are no replacement for deliberate, manual meta-analyses, they seem to show promise for the efficient aggregation of large bodies of scientific knowledge, at least on a relatively general level.

Keywords: FMRI, machine learning, meta-analysis, text analysis

Procedia PDF Downloads 438
1425 Solar Power Generation in a Mining Town: A Case Study for Australia

Authors: Ryan Chalk, G. M. Shafiullah

Abstract:

Climate change is a pertinent issue facing governments and societies around the world. The industrial revolution has resulted in a steady increase in the average global temperature. The mining and energy production industries have been significant contributors to this change prompting government to intervene by promoting low emission technology within these sectors. This paper initially reviews the energy problem in Australia and the mining sector with a focus on the energy requirements and production methods utilised in Western Australia (WA). Renewable energy in the form of utility-scale solar photovoltaics (PV) provides a solution to these problems by providing emission-free energy which can be used to supplement the existing natural gas turbines in operation at the proposed site. This research presents a custom renewable solution for the mining site considering the specific township network, local weather conditions, and seasonal load profiles. A summary of the required PV output is presented to supply slightly over 50% of the towns power requirements during the peak (summer) period, resulting in close to full coverage in the trench (winter) period. Dig Silent Power Factory Software has been used to simulate the characteristics of the existing infrastructure and produces results of integrating PV. Large scale PV penetration in the network introduce technical challenges, that includes; voltage deviation, increased harmonic distortion, increased available fault current and power factor. Results also show that cloud cover has a dramatic and unpredictable effect on the output of a PV system. The preliminary analyses conclude that mitigation strategies are needed to overcome voltage deviations, unacceptable levels of harmonics, excessive fault current and low power factor. Mitigation strategies are proposed to control these issues predominantly through the use of high quality, made for purpose inverters. Results show that use of inverters with harmonic filtering reduces the level of harmonic injections to an acceptable level according to Australian standards. Furthermore, the configuration of inverters to supply active and reactive power assist in mitigating low power factor problems. Use of FACTS devices; SVC and STATCOM also reduces the harmonics and improve the power factor of the network, and finally, energy storage helps to smooth the power supply.

Keywords: climate change, mitigation strategies, photovoltaic (PV), power quality

Procedia PDF Downloads 157
1424 Discerning Divergent Nodes in Social Networks

Authors: Mehran Asadi, Afrand Agah

Abstract:

In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.

Keywords: online social networks, data mining, social cloud computing, interaction and collaboration

Procedia PDF Downloads 144
1423 A Systematic Review: Prevalence and Risk Factors of Low Back Pain among Waste Collection Workers

Authors: Benedicta Asante, Brenna Bath, Olugbenga Adebayo, Catherine Trask

Abstract:

Background: Waste Collection Workers’ (WCWs) activities contribute greatly to the recycling sector and are an important component of the waste management industry. As the recycling sector evolves, reports of injuries and fatal accidents in the industry demand notice particularly common and debilitating musculoskeletal disorders such as low back pain (LBP). WCWs are likely exposed to diverse work-related hazards that could contribute to LBP. However, to our knowledge there has never been a systematic review or other synthesis of LBP findings within this workforce. The aim of this systematic review was to determine the prevalence and risk factors of LBP among WCWs. Method: A comprehensive search was conducted in Ovid Medline, EMBASE, and Global Health e-publications with search term categories ‘low back pain’ and ‘waste collection workers’. Articles were screened at title, abstract, and full-text stages by two reviewers. Data were extracted on study design, sampling strategy, socio-demographic, geographical region, and exposure definition, definition of LBP, risk factors, response rate, statistical techniques, and LBP prevalence. Risk of bias (ROB) was assessed based on Hoy Damien’s ROB scale. Results: The search of three databases generated 79 studies. Thirty-two studies met the study inclusion criteria for both title and abstract; thirteen full-text articles met the study criteria at the full-text stage. Seven articles (54%) reported prevalence within 12 months of LBP between 42-82% among WCW. The major risk factors for LBP among WCW included: awkward posture; lifting; pulling; pushing; repetitive motions; work duration; and physical loads. Summary data and syntheses of findings was presented in trend-lines and tables to establish the several prevalence periods based on age and region distribution. Public health implications: LBP is a major occupational hazard among WCWs. In light of these risks and future growth in this industry, further research should focus on more detail ergonomic exposure assessment and LBP prevention efforts.

Keywords: low back pain, scavenger, waste collection workers, waste pickers

Procedia PDF Downloads 315