Search results for: scientific data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26163

Search results for: scientific data mining

25893 Scientific Forecasting in International Relations

Authors: Djehich Mohamed Yousri

Abstract:

In this research paper, the future of international relations is believed to have an important place on the theoretical and applied levels because policy makers in the world are in dire need of such analyzes that are useful in drawing up the foreign policies of their countries, and protecting their national security from potential future threats, and in this context, The topic raised a lot of scientific controversy and intellectual debate, especially in terms of the extent of the effectiveness, accuracy, and ability of foresight methods to identify potential futures, and this is what attributed the controversy to the scientific foundations for foreseeing international relations. An arena for intellectual discussion between different thinkers in international relations belonging to different theoretical schools, which confirms to us the conceptual and implied development of prediction in order to reach the scientific level.

Keywords: foresight, forecasting, international relations, international relations theory, concept of international relations

Procedia PDF Downloads 194
25892 Active Learning Based on Science Experiments to Improve Scientific Literacy

Authors: Kunihiro Kamataki

Abstract:

In this study, active learning based on simple science experiments was developed in a university class of the freshman, in order to improve their scientific literacy. Through the active learning based on simple experiments of generation of cloud in a plastic bottle, students increased the interest in the global atmospheric problem and were able to discuss and find solutions about this problem positively from various viewpoints of the science technology, the politics, the economy, the diplomacy and the relations among nations. The results of their questionnaires and free descriptions of this class indicate that they improve the scientific literacy and motivations of other classroom lectures to acquire knowledge. It is thus suggested that the science experiment is strong tool to improve their intellectual curiosity rapidly and the connections that link the impression of science experiment and their interest of the social problem is very important to enhance their learning effect in this education.

Keywords: active learning, scientific literacy, simple scientific experiment, university education

Procedia PDF Downloads 241
25891 Digital Repositories in Algerian Universities: Content and Search Possibilities

Authors: Hakim Benoumelghar

Abstract:

The launch in 1999 of the open access Initiative (OAI) and the protocol for sharing metadata, OAI-PMH, in parallel with the provision of deposit platforms, open-source software, such as DSpace in 2002, which allow libraries to develop digital repositories and play a leading role in the open access movement, and by building institutional open archives alongside the theme. This study focuses on Algerian universities and their projects and platforms for digital repositories of theses and scientific papers and the possibilities of access to the university community to develop research and access to archives of scientific digital content offered by the scientific community. This contribution attempts to compare Algerian and foreign institutional deposits in developed countries in order to have development and perspectives to facilitate scientific research and give more possibilities to the scientific community in documentary matters.

Keywords: digital repository, repository software, university, algeria

Procedia PDF Downloads 66
25890 A Recommender System Fusing Collaborative Filtering and User’s Review Mining

Authors: Seulbi Choi, Hyunchul Ahn

Abstract:

Collaborative filtering (CF) algorithm has been popularly used for recommender systems in both academic and practical applications. It basically generates recommendation results using users’ numeric ratings. However, the additional use of the information other than user ratings may lead to better accuracy of CF. Considering that a lot of people are likely to share their honest opinion on the items they purchased recently due to the advent of the Web 2.0, user's review can be regarded as the new informative source for identifying user's preference with accuracy. Under this background, this study presents a hybrid recommender system that fuses CF and user's review mining. Our system adopts conventional memory-based CF, but it is designed to use both user’s numeric ratings and his/her text reviews on the items when calculating similarities between users.

Keywords: Recommender system, Collaborative filtering, Text mining, Review mining

Procedia PDF Downloads 327
25889 Improving University Operations with Data Mining: Predicting Student Performance

Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević

Abstract:

The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.

Keywords: data mining, knowledge discovery in databases, prediction models, student success

Procedia PDF Downloads 395
25888 Sexting Phenomenon in Educational Settings: A Data Mining Approach

Authors: Koutsopoulou Ioanna, Gkintoni Evgenia, Halkiopoulos Constantinos, Antonopoulou Hera

Abstract:

Recent advances in Internet Computer Technology (ICT) and the ever-increasing use of technological equipment amongst adolescents and young adults along with unattended access to the internet and social media and uncontrolled use of smart phones and PCs have caused social problems like sexting to emerge. The main purpose of the present article is first to present an analytic theoretical framework of sexting as a recent social phenomenon based on studies that have been conducted the last decade or so; and second to investigate Greek students’ and also social network users, sexting perceptions and to record how often social media users exchange sexual messages and to retrace demographic variables predictors. Data from 1,000 students were collected and analyzed and all statistical analysis was done by the software package WEKA. The results indicate among others, that the use of data mining methods is an important tool to draw conclusions that could affect decision and policy making especially in the field and related social topics of educational psychology. To sum up, sexting lurks many risks for adolescents and young adults students in Greece and needs to be better addressed in relevance to the stakeholders as well as society in general. Furthermore, policy makers, legislation makers and authorities will have to take action to protect minors. Prevention strategies based on Greek cultural specificities are being proposed. This social problem has raised concerns in recent years and will most likely escalate concerns in global communities in the future.

Keywords: educational ethics, sexting, Greek sexters, sex education, data mining

Procedia PDF Downloads 168
25887 A Near-Optimal Domain Independent Approach for Detecting Approximate Duplicates

Authors: Abdelaziz Fellah, Allaoua Maamir

Abstract:

We propose a domain-independent merging-cluster filter approach complemented with a set of algorithms for identifying approximate duplicate entities efficiently and accurately within a single and across multiple data sources. The near-optimal merging-cluster filter (MCF) approach is based on the Monge-Elkan well-tuned algorithm and extended with an affine variant of the Smith-Waterman similarity measure. Then we present constant, variable, and function threshold algorithms that work conceptually in a divide-merge filtering fashion for detecting near duplicates as hierarchical clusters along with their corresponding representatives. The algorithms take recursive refinement approaches in the spirit of filtering, merging, and updating, cluster representatives to detect approximate duplicates at each level of the cluster tree. Experiments show a high effectiveness and accuracy of the MCF approach in detecting approximate duplicates by outperforming the seminal Monge-Elkan’s algorithm on several real-world benchmarks and generated datasets.

Keywords: data mining, data cleaning, approximate duplicates, near-duplicates detection, data mining applications and discovery

Procedia PDF Downloads 368
25886 Design of a Small and Medium Enterprise Growth Prediction Model Based on Web Mining

Authors: Yiea Funk Te, Daniel Mueller, Irena Pletikosa Cvijikj

Abstract:

Small and medium enterprises (SMEs) play an important role in the economy of many countries. When the overall world economy is considered, SMEs represent 95% of all businesses in the world, accounting for 66% of the total employment. Existing studies show that the current business environment is characterized as highly turbulent and strongly influenced by modern information and communication technologies, thus forcing SMEs to experience more severe challenges in maintaining their existence and expanding their business. To support SMEs at improving their competitiveness, researchers recently turned their focus on applying data mining techniques to build risk and growth prediction models. However, data used to assess risk and growth indicators is primarily obtained via questionnaires, which is very laborious and time-consuming, or is provided by financial institutes, thus highly sensitive to privacy issues. Recently, web mining (WM) has emerged as a new approach towards obtaining valuable insights in the business world. WM enables automatic and large scale collection and analysis of potentially valuable data from various online platforms, including companies’ websites. While WM methods have been frequently studied to anticipate growth of sales volume for e-commerce platforms, their application for assessment of SME risk and growth indicators is still scarce. Considering that a vast proportion of SMEs own a website, WM bears a great potential in revealing valuable information hidden in SME websites, which can further be used to understand SME risk and growth indicators, as well as to enhance current SME risk and growth prediction models. This study aims at developing an automated system to collect business-relevant data from the Web and predict future growth trends of SMEs by means of WM and data mining techniques. The envisioned system should serve as an 'early recognition system' for future growth opportunities. In an initial step, we examine how structured and semi-structured Web data in governmental or SME websites can be used to explain the success of SMEs. WM methods are applied to extract Web data in a form of additional input features for the growth prediction model. The data on SMEs provided by a large Swiss insurance company is used as ground truth data (i.e. growth-labeled data) to train the growth prediction model. Different machine learning classification algorithms such as the Support Vector Machine, Random Forest and Artificial Neural Network are applied and compared, with the goal to optimize the prediction performance. The results are compared to those from previous studies, in order to assess the contribution of growth indicators retrieved from the Web for increasing the predictive power of the model.

Keywords: data mining, SME growth, success factors, web mining

Procedia PDF Downloads 247
25885 Q-Test of Undergraduate Epistemology and Scientific Thought: Development and Testing of an Assessment of Scientific Epistemology

Authors: Matthew J. Zagumny

Abstract:

The QUEST is an assessment of scientific epistemic beliefs and was developed to measure students’ intellectual development in regards to beliefs about knowledge and knowing. The QUEST utilizes Q-sort methodology, which requires participants to rate the degree to which statements describe them personally. As a measure of personal theories of knowledge, the QUEST instrument is described with the Q-sort distribution and scoring explained. A preliminary demonstration of the QUEST assessment is described with two samples of undergraduate students (novice/lower division compared to advanced/upper division students) being assessed and their average QUEST scores compared. The usefulness of an assessment of epistemology is discussed in terms of the principle that assessment tends to drive educational practice and university mission. The critical need for university and academic programs to focus on development of students’ scientific epistemology is briefly discussed.

Keywords: scientific epistemology, critical thinking, Q-sort method, STEM undergraduates

Procedia PDF Downloads 361
25884 Gender and Geographical Disparity in Editorial Boards of Lithuanian Scientific Journals: An Overview of Different Science Disciplines

Authors: Andrius Suminas

Abstract:

Editors-in-chief and members of editorial boards of scientific journals play an extremely important role in the development of science and assure research integrity, as scientific publications are the major results of research. While gender parity in tenure-track hiring decisions and promotion rates has improved, female academics remain underrepresented in senior career phases, including editors-in-chief and members of editorial boards positions of scientific journals. Journal editors and members of editorial boards exert considerable power over what is published and in certain cases the direction of an academic discipline and the career advancement of authors. For this reason it is important to minimize biases extrinsic to the merit of the work impacting publication decisions. One way to achieve this is to ensure a diverse pool of editors and members of editorial boards, ensuring the widest possible coverage of different competencies. This is in line with a diversity model of editorial appointment where editorial boards are structured to dismantle wider conditions of inequality. Another possible option, a distributive model would seek an editorial board reflective of existing proportions in the field at large. Paper presents comprehensive results of Lithuanian scientific journals study. During the research process were reviewed publicly available information from all scientific journals published in Lithuania to infer the proportions of members of editorial boards by gender and country of affiliation. The results of the study revealed differences the proportions of male and female members of editorial boards in different disciplines of science, as well as clear geographical disparity in Lithianian scientific journals editorial boards.

Keywords: scientific journals, editorial boards of scientific journals, gender disparity, geographical disparity, scientific communication

Procedia PDF Downloads 79
25883 Integrating of Multi-Criteria Decision Making and Spatial Data Warehouse in Geographic Information System

Authors: Zohra Mekranfar, Ahmed Saidi, Abdellah Mebrek

Abstract:

This work aims to develop multi-criteria decision making (MCDM) and spatial data warehouse (SDW) methods, which will be integrated into a GIS according to a ‘GIS dominant’ approach. The GIS operating tools will be operational to operate the SDW. The MCDM methods can provide many solutions to a set of problems with various and multiple criteria. When the problem is so complex, integrating spatial dimension, it makes sense to combine the MCDM process with other approaches like data mining, ascending analyses, we present in this paper an experiment showing a geo-decisional methodology of SWD construction, On-line analytical processing (OLAP) technology which combines both basic multidimensional analysis and the concepts of data mining provides powerful tools to highlight inductions and information not obvious by traditional tools. However, these OLAP tools become more complex in the presence of the spatial dimension. The integration of OLAP with a GIS is the future geographic and spatial information solution. GIS offers advanced functions for the acquisition, storage, analysis, and display of geographic information. However, their effectiveness for complex spatial analysis is questionable due to their determinism and their decisional rigor. A prerequisite for the implementation of any analysis or exploration of spatial data requires the construction and structuring of a spatial data warehouse (SDW). This SDW must be easily usable by the GIS and by the tools offered by an OLAP system.

Keywords: data warehouse, GIS, MCDM, SOLAP

Procedia PDF Downloads 156
25882 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization

Procedia PDF Downloads 175
25881 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 244
25880 Design of Personal Job Recommendation Framework on Smartphone Platform

Authors: Chayaporn Kaensar

Abstract:

Recently, Job Recommender Systems have gained much attention in industries since they solve the problem of information overload on the recruiting website. Therefore, we proposed Extended Personalized Job System that has the capability of providing the appropriate jobs for job seeker and recommending some suitable information for them using Data Mining Techniques and Dynamic User Profile. On the other hands, company can also interact to the system for publishing and updating job information. This system have emerged and supported various platforms such as web application and android mobile application. In this paper, User profiles, Implicit User Action, User Feedback, and Clustering Techniques in WEKA libraries have gained attention and implemented for this application. In additions, open source tools like Yii Web Application Framework, Bootstrap Front End Framework and Android Mobile Technology were also applied.

Keywords: recommendation, user profile, data mining, web and mobile technology

Procedia PDF Downloads 299
25879 Bankruptcy Prediction Analysis on Mining Sector Companies in Indonesia

Authors: Devina Aprilia Gunawan, Tasya Aspiranti, Inugrah Ratia Pratiwi

Abstract:

This research aims to classify the mining sector companies based on Altman’s Z-score model, and providing an analysis based on the Altman’s Z-score model’s financial ratios to provide a picture about the financial condition in mining sector companies in Indonesia and their viability in the future, and to find out the partial and simultaneous impact of each of the financial ratio variables in the Altman’s Z-score model, namely (WC/TA), (RE/TA), (EBIT/TA), (MVE/TL), and (S/TA), toward the financial condition represented by the Z-score itself. Among 38 mining sector companies listed in Indonesia Stock Exchange (IDX), 28 companies are selected as research sample according to the purposive sampling criteria.The results of this research showed that during 3 years research period at 2010-2012, the amount of the companies that was predicted to be healthy in each year was less than half of the total sample companies and not even reach up to 50%. The multiple regression analysis result showed that all of the research hypotheses are accepted, which means that (WC/TA), (RE/TA), (EBIT/TA), (MVE/TL), and (S/TA), both partially and simultaneously had an impact towards company’s financial condition.

Keywords: Altman’s Z-score model, financial condition, mining companies, Indonesia

Procedia PDF Downloads 512
25878 Discerning Divergent Nodes in Social Networks

Authors: Mehran Asadi, Afrand Agah

Abstract:

In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.

Keywords: online social networks, data mining, social cloud computing, interaction and collaboration

Procedia PDF Downloads 131
25877 The Significance of Picture Mining in the Fashion and Design as a New Research Method

Authors: Katsue Edo, Yu Hiroi

Abstract:

T Increasing attention has been paid to using pictures and photographs in research since the beginning of the 21th century in social sciences. Meanwhile we have been studying the usefulness of Picture mining, which is one of the new ways for a these picture using researches. Picture Mining is an explorative research analysis method that takes useful information from pictures, photographs and static or moving images. It is often compared with the methods of text mining. The Picture Mining concept includes observational research in the broad sense, because it also aims to analyze moving images (Ochihara and Edo 2013). In the recent literature, studies and reports using pictures are increasing due to the environmental changes. These are identified as technological and social changes (Edo et.al. 2013). Low price digital cameras and i-phones, high information transmission speed, low costs for information transferring and high performance and resolution of the cameras of mobile phones have changed the photographing behavior of people. Consequently, there is less resistance in taking and processing photographs for most of the people in the developing countries. In these studies, this method of collecting data from respondents is often called as ‘participant-generated photography’ or ‘respondent-generated visual imagery’, which focuses on the collection of data and its analysis (Pauwels 2011, Snyder 2012). But there are few systematical and conceptual studies that supports it significance of these methods. We have discussed in the recent years to conceptualize these picture using research methods and formalize theoretical findings (Edo et. al. 2014). We have identified the most efficient fields of Picture mining in the following areas inductively and in case studies; 1) Research in Consumer and Customer Lifestyles. 2) New Product Development. 3) Research in Fashion and Design. Though we have found that it will be useful in these fields and areas, we must verify these assumptions. In this study we will focus on the field of fashion and design, to determine whether picture mining methods are really reliable in this area. In order to do so we have conducted an empirical research of the respondents’ attitudes and behavior concerning pictures and photographs. We compared the attitudes and behavior of pictures toward fashion to meals, and found out that taking pictures of fashion is not as easy as taking meals and food. Respondents do not often take pictures of fashion and upload their pictures online, such as Facebook and Instagram, compared to meals and food because of the difficulty of taking them. We concluded that we should be more careful in analyzing pictures in the fashion area for there still might be some kind of bias existing even if the environment of pictures have drastically changed in these years.

Keywords: empirical research, fashion and design, Picture Mining, qualitative research

Procedia PDF Downloads 348
25876 Defining Processes of Gender Restructuring: The Case of Displaced Tribal Communities of North East India

Authors: Bitopi Dutta

Abstract:

Development Induced Displacement (DID) of subaltern groups has been an issue of intense debate in India. This research will do a gender analysis of displacement induced by the mining projects in tribal indigenous societies of North East India, centering on the primary research question which is 'How does DID reorder gendered relationship in tribal matrilineal societies?' This paper will not focus primarily on the impacts of the displacement induced by coal mining on indigenous tribal women in the North East India; it will rather study 'what' are the processes that lead to these transformations and 'how' do they operate. In doing so, the paper will locate the cracks in traditional social systems that the discourse of displacement manipulates for its own benefit. DID in this sense will not only be understood as only physical displacement, but also as social and cultural displacement. The study will cover one matrilineal tribe in the state of Meghalaya in the North East India affected by several coal mining projects in the last 30 years. In-depth unstructured interviews used to collect life narratives will be the primary mode of data collection because the indigenous culture of the tribes in Meghalaya, including the matrilineal tribes, is based on oral history where knowledge and experiences produced under a tradition of oral history exist in a continuum. This is unlike modern societies which produce knowledge in a compartmentalized system. An interview guide designed around specific themes will be used rather than specific questions to ensure the flow of narratives from the interviewee. In addition to this, a number of focus groups will be held. The data collected through the life narrative will be supplemented and contextualized through documentary research using government data, and local media sources of the region.

Keywords: displacement, gender-relations, matriliny, mining

Procedia PDF Downloads 175
25875 The Need of Sustainable Mining: Communities, Government and Legal Mining in Central Andes of Peru

Authors: Melissa R. Quispe-Zuniga, Daniel Callo-Concha, Christian Borgemeister, Klaus Greve

Abstract:

The Peruvian Andes have a high potential for mining, but many of the mining areas overlay with campesino community lands, being these key actors for agriculture and livestock production. Lead by economic incentives, some communities are renting their lands to mining companies for exploration or exploitation. However, a growing number of campesino communities, usually social and economically marginalized, have developed resistance, alluding consequences, such as water pollution, land-use change, insufficient economic compensation, etc. what eventually end up in Socio-Environmental Conflicts (SEC). It is hypothesized that disclosing the information on environmental pollution and enhance the involvement of communities in the decision-making process may contribute to prevent SEC. To assess whether such complains are grounded on the environmental impact of mining activities, we measured the heavy metals concentration in 24 indicative samples from rivers that run across mining exploitations and farming community lands. Samples were taken during the 2016 dry season and analyzed by inductively-coupled-plasma-atomic-emission-spectroscopy. The results were contrasted against the standards of monitoring government institutions (i.e., OEFA). Furthermore, we investigated the water/environmental complains related to mining in the neighboring 14 communities. We explored the relationship between communities and mining companies, via open-ended interviews with community authorities and non-participatory observations of community assemblies. We found that the concentrations of cadmium (0.023 mg/L), arsenic (0.562 mg/L) and copper (0.07 mg/L), surpass the national water quality standards for Andean rivers (0.00025 mg/L of cadmium, 0.15 mg/L of arsenic and 0.01 mg/L of copper). 57% of communities have posed environmental complains, but 21% of the total number of communities were receiving an annual economic benefit from mining projects. However, 87.5% of the communities who had posed complains have high concentration of heavy metals in their water streams. The evidence shows that mining activities tend to relate to the affectation and vulnerability of campesino community water streams, what justify the environmental complains and eventually the occurrence of a SEC.

Keywords: mining companies, campesino community, water, socio-environmental conflict

Procedia PDF Downloads 176
25874 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 366
25873 Integration of Educational Data Mining Models to a Web-Based Support System for Predicting High School Student Performance

Authors: Sokkhey Phauk, Takeo Okazaki

Abstract:

The challenging task in educational institutions is to maximize the high performance of students and minimize the failure rate of poor-performing students. An effective method to leverage this task is to know student learning patterns with highly influencing factors and get an early prediction of student learning outcomes at the timely stage for setting up policies for improvement. Educational data mining (EDM) is an emerging disciplinary field of data mining, statistics, and machine learning concerned with extracting useful knowledge and information for the sake of improvement and development in the education environment. The study is of this work is to propose techniques in EDM and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models is conducted. Subsequently, high performing models are developed to get higher performance. The hybrid random forest (Hybrid RF) produces the most successful classification. For the context of intervention and improving the learning outcomes, a feature selection method MICHI, which is the combination of mutual information (MI) and chi-square (CHI) algorithms based on the ranked feature scores, is introduced to select a dominant feature set that improves the performance of prediction and uses the obtained dominant set as information for intervention. By using the proposed techniques of EDM, an academic performance prediction system (APPS) is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental outcomes and evaluation surveys report the effectiveness and usefulness of the developed system. The system is used to help educational stakeholders and related individuals for intervening and improving student performance.

Keywords: academic performance prediction system, educational data mining, dominant factors, feature selection method, prediction model, student performance

Procedia PDF Downloads 93
25872 Evidence of Scientific-Ness of Scriptures

Authors: Shyam Sunder Gupta

Abstract:

GOD is the infinite source of knowledge and from time to time, as per the need of mankind, keeps revealing a portion of HIS knowledge as” Words” through chosen messengers. In the course of time, ” Words” get converted into scripture. This process of conversion happens after a long gap of time and with the involvement of a large number of persons; and unintentionally scientific and other types of errors get into scriptures; otherwise scriptures are in reality, truly scientific. Description of Chronology of life in the womb (Fetal Development), Five types of rotations of celestial bodies, Speed of the Sun, Rotation of Universe, Multiple verses, Spherical shape of the earth, Measurement of time, Classification of species by nature of birth, Evolution process of non-living matter and living species, etc., most convincing prove that scriptures are truly scientific. In fact, there are many facts for which, till date, science has not found answers, but are available in scriptures; like source of Singularity from which the Big Bang took place and the Universe was created, the Infinite number of Universes, the most fundamental particle, Param-anu( God particle), fundamental of measurement of time and many more.

Keywords: Big Bang, God Particle, Scientific, Scriptures, Singularity, Universe

Procedia PDF Downloads 19
25871 Improving the Statistics Nature in Research Information System

Authors: Rajbir Cheema

Abstract:

In order to introduce an integrated research information system, this will provide scientific institutions with the necessary information on research activities and research results in assured quality. Since data collection, duplication, missing values, incorrect formatting, inconsistencies, etc. can arise in the collection of research data in different research information systems, which can have a wide range of negative effects on data quality, the subject of data quality should be treated with better results. This paper examines the data quality problems in research information systems and presents the new techniques that enable organizations to improve their quality of research information.

Keywords: Research information systems (RIS), research information, heterogeneous sources, data quality, data cleansing, science system, standardization

Procedia PDF Downloads 139
25870 Mapping of Adrenal Gland Diseases Research in Middle East Countries: A Scientometric Analysis, 2007-2013

Authors: Zahra Emami, Mohammad Ebrahim Khamseh, Nahid Hashemi Madani, Iman Kermani

Abstract:

The aim of the study was to map scientific research on adrenal gland diseases in the Middle East countries through the Web of Science database using scientometric analysis. Data were analyzed with Excel software; and HistCite was used for mapping of the scientific texts. In this study, from a total of 268 retrieved records, 1125 authors from 328 institutions published their texts in 138 journals. Among 17 Middle East countries, Turkey ranked first with 164 documents (61.19%), Israel ranked second with 47 documents (15.53%) and Iran came in the third place with 26 documents. Most of the publications (185 documents, 69.2%) were articles. Among the universities of the Middle East, Istanbul University had the highest science production rate (9.7%). The Journal of Clinical Endocrinology & Metabolism had the highest TGCS (243 citations). In the scientific mapping, 7 clusters were formed based on TLCS (Total Local Citation Score) & TGCS (Total Global Citation Score). considering the study results, establishment of scientific connections and collaboration with other countries and use of publications on adrenal gland diseases from high ranking universities can help in the development of this field and promote the medical practice in this regard. Moreover, investigation of the formed clusters in relation to Congenital Hyperplasia and puberty related disorders can be research priorities for investigators.

Keywords: mapping, scientific research, adrenal gland diseases, scientometric

Procedia PDF Downloads 247
25869 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: information gain (IG), intrusion detection system (IDS), k-means clustering, Weka

Procedia PDF Downloads 279
25868 Understanding the Complexity of Corruption and Anti-Corruption in Indonesia's Mining Industry: Challenges and Opportunities

Authors: Ahmad Khoirul Umam, Iin Mayasari

Abstract:

Indonesia is blessed with rich natural resources and frequently dubbed as the 6th richest country in the world in terms of mining resources, including minerals and coal. Mining can contribute to the socio-economic development by generating state revenue for development, elevating poverty through employment, opening and developing remote areas, putting in basic infrastructure and creating new centres of developments. However, favouritism and rent-seeking behaviour committed by government officials, politicians, and business players in licensing and permit giving in mining and forestry sectors have resisted reforms. Even though Indonesia’s Corruption Eradication Commission (KPK) successfully targeted untouchable actors, public criticism continues to focus on questions of why corruption apparently remains systemic in mining industry in the country? This paper revealed that structural anomalies, as well as legacies of the Soeharto era’s power inequities, have severely inhibited Indonesia’s bureaucratic arrangements that continue to influence adversely the elements of transparency and accountability in mining industry governance. In the more liberalized and decentralized political system, the deficiencies have gradually assisted vested interest groups to band together, thus creating a coalition that can challenge, resist, and contain anti-graft actions. Therefore, Indonesia needs much more serious anti-corruption actions that would require eliminating the monopoly over power, enhancing competition, limiting discretion, and clarifying the rules of business and political competition in the mining sector in the country.

Keywords: anti-corruption, public integrity, private integrity, mining industry, democratization

Procedia PDF Downloads 96
25867 Strategies for Achieving Application of Science in National Development

Authors: Orisakwe Chimuanya Favour Israel

Abstract:

In a world filled with the products of scientific inquiry, scientific literacy has become a necessity for everyone because it is indispensable to achieving technological development of any nation. Everyone needs to use scientific information to make choices that arise every day. Everyone needs to be able to engage intelligently in public discourse and debate about important issues that involves science and technology. And everyone deserves to share in the excitement and personal fulfillment that can come from -understanding and learning about the natural world. No doubt that industrialized countries have, through their control of science and technology education, developed the potential to increase production, and to improve the standard of living of their people. The main thrust of this paper therefore, is to present an overview of science education, strategies for achieving application of science in national development, such as teaching science with the right spirit of inquiry. Also, the paper discussed three research models that can help in national development and suggests the best out of the three which is more realistic for a developing country like ours (Nigeria) to follow for a sustainable national development and finally suggests some key ways of solving problems of development.

Keywords: scientific inquiry, scientific literacy, strategies, sustainable national development

Procedia PDF Downloads 357
25866 Feasibility of Washing/Extraction Treatment for the Remediation of Deep-Sea Mining Trailings

Authors: Kyoungrean Kim

Abstract:

Importance of deep-sea mineral resources is dramatically increasing due to the depletion of land mineral resources corresponding to increasing human’s economic activities. Korea has acquired exclusive exploration licenses at four areas which are the Clarion-Clipperton Fracture Zone in the Pacific Ocean (2002), Tonga (2008), Fiji (2011) and Indian Ocean (2014). The preparation for commercial mining of Nautilus minerals (Canada) and Lockheed martin minerals (USA) is expected by 2020. The London Protocol 1996 (LP) under International Maritime Organization (IMO) and International Seabed Authority (ISA) will set environmental guidelines for deep-sea mining until 2020, to protect marine environment. In this research, the applicability of washing/extraction treatment for the remediation of deep-sea mining tailings was mainly evaluated in order to present preliminary data to develop practical remediation technology in near future. Polymetallic nodule samples were collected at the Clarion-Clipperton Fracture Zone in the Pacific Ocean, then stored at room temperature. Samples were pulverized by using jaw crusher and ball mill then, classified into 3 particle sizes (> 63 µm, 63-20 µm, < 20 µm) by using vibratory sieve shakers (Analysette 3 Pro, Fritsch, Germany) with 63 µm and 20 µm sieve. Only the particle size 63-20 µm was used as the samples for investigation considering the lower limit of ore dressing process which is tens to 100 µm. Rhamnolipid and sodium alginate as biosurfactant and aluminum sulfate which are mainly used as flocculant were used as environmentally friendly additives. Samples were adjusted to 2% liquid with deionized water then mixed with various concentrations of additives. The mixture was stirred with a magnetic bar during specific reaction times and then the liquid phase was separated by a centrifugal separator (Thermo Fisher Scientific, USA) under 4,000 rpm for 1 h. The separated liquid was filtered with a syringe and acrylic-based filter (0.45 µm). The extracted heavy metals in the filtered liquid were then determined using a UV-Vis spectrometer (DR-5000, Hach, USA) and a heat block (DBR 200, Hach, USA) followed by US EPA methods (8506, 8009, 10217 and 10220). Polymetallic nodule was mainly composed of manganese (27%), iron (8%), nickel (1.4%), cupper (1.3 %), cobalt (1.3%) and molybdenum (0.04%). Based on remediation standards of various countries, Nickel (Ni), Copper (Cu), Cadmium (Cd) and Zinc (Zn) were selected as primary target materials. Throughout this research, the use of rhamnolipid was shown to be an effective approach for removing heavy metals in samples originated from manganese nodules. Sodium alginate might also be one of the effective additives for the remediation of deep-sea mining tailings such as polymetallic nodules. Compare to the use of rhamnolipid and sodium alginate, aluminum sulfate was more effective additive at short reaction time within 4 h. Based on these results, sequencing particle separation, selective extraction/washing, advanced filtration of liquid phase, water treatment without dewatering and solidification/stabilization may be considered as candidate technologies for the remediation of deep-sea mining tailings.

Keywords: deep-sea mining tailings, heavy metals, remediation, extraction, additives

Procedia PDF Downloads 141
25865 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 197
25864 Information Management Approach in the Prediction of Acute Appendicitis

Authors: Ahmad Shahin, Walid Moudani, Ali Bekraki

Abstract:

This research aims at presenting a predictive data mining model to handle an accurate diagnosis of acute appendicitis with patients for the purpose of maximizing the health service quality, minimizing morbidity/mortality, and reducing cost. However, acute appendicitis is the most common disease which requires timely accurate diagnosis and needs surgical intervention. Although the treatment of acute appendicitis is simple and straightforward, its diagnosis is still difficult because no single sign, symptom, laboratory or image examination accurately confirms the diagnosis of acute appendicitis in all cases. This contributes in increasing morbidity and negative appendectomy. In this study, the authors propose to generate an accurate model in prediction of patients with acute appendicitis which is based, firstly, on the segmentation technique associated to ABC algorithm to segment the patients; secondly, on applying fuzzy logic to process the massive volume of heterogeneous and noisy data (age, sex, fever, white blood cell, neutrophilia, CRP, urine, ultrasound, CT, appendectomy, etc.) in order to express knowledge and analyze the relationships among data in a comprehensive manner; and thirdly, on applying dynamic programming technique to reduce the number of data attributes. The proposed model is evaluated based on a set of benchmark techniques and even on a set of benchmark classification problems of osteoporosis, diabetes and heart obtained from the UCI data and other data sources.

Keywords: healthcare management, acute appendicitis, data mining, classification, decision tree

Procedia PDF Downloads 333