Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25722

Search results for: meteorological prediction data

22722 Clustering Performance Analysis using New Correlation-Based Cluster Validity Indices

Abstract:

There are various cluster validity measures used for evaluating clustering results. One of the main objectives of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weaknesses that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal option that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points are located in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios, including the well-known iris data set and a real-world marketing application, have been conducted to compare the proposed validity indices with several well-known ones.

Keywords: clustering algorithm, cluster validity measure, correlation, data partitions, iris data set, marketing, pattern recognition

Procedia PDF Downloads 97

22721 Analyzing Competitive Advantage of Internet of Things and Data Analytics in Smart City Context

Authors: Petra Hofmann, Dana Koniel, Jussi Luukkanen, Walter Nieminen, Lea Hannola, Ilkka Donoghue

Abstract:

The Covid-19 pandemic forced people to isolate and become physically less connected. The pandemic hasnot only reshaped people’s behaviours and needs but also accelerated digital transformation (DT). DT of cities has become an imperative with the outlook of converting them into smart cities in the future. Embedding digital infrastructure and smart city initiatives as part of the normal design, construction, and operation of cities provides a unique opportunity to improve connection between people. Internet of Things (IoT) is an emerging technology and one of the drivers in DT. It has disrupted many industries by introducing different services and business models, and IoT solutions are being applied in multiple fields, including smart cities. As IoT and data are fundamentally linked together, IoT solutions can only create value if the data generated by the IoT devices is analysed properly. Extracting relevant conclusions and actionable insights by using established techniques, data analytics contributes significantly to the growth and success of IoT applications and investments. Companies must grasp DT and be prepared to redesign their offerings and business models to remain competitive in today’s marketplace. As there are many IoT solutions available today, the amount of data is tremendous. The challenge for companies is to understand what solutions to focus on and how to prioritise and which data to differentiate from the competition. This paper explains how IoT and data analytics can impact competitive advantage and how companies should approach IoT and data analytics to translate them into concrete offerings and solutions in the smart city context. The study was carried out as a qualitative, literature-based research. A case study is provided to validate the preservation of company’s competitive advantage through smart city solutions. The results of the researchcontribution provide insights into the different factors and considerations related to creating competitive advantage through IoT and data analytics deployment in the smart city context. Furthermore, this paper proposes a framework that merges the factors and considerations with examples of offerings and solutions in smart cities. The data collected through IoT devices, and the intelligent use of it, can create a competitive advantage to companies operating in smart city business. Companies should take into consideration the five forces of competition that shape industries and pay attention to the technological, organisational, and external contexts which define factors for consideration of competitive advantages in the field of IoT and data analytics. Companies that can utilise these key assets in their businesses will most likely conquer the markets and have a strong foothold in the smart city business.

Keywords: internet of things, data analytics, smart cities, competitive advantage

Procedia PDF Downloads 86

22720 Social Data-Based Users Profiles' Enrichment

Authors: Amel Hannech, Mehdi Adda, Hamid Mcheick

Abstract:

In this paper, we propose a generic model of user profile integrating several elements that may positively impact the research process. We exploit the classical behavior of users and integrate a delimitation process of their research activities into several research sessions enriched with contextual and temporal information, which allows reflecting the current interests of these users in every period of time and infer data freshness. We argue that the annotation of resources gives more transparency on users' needs. It also strengthens social links among resources and users, and can so increase the scope of the user profile. Based on this idea, we integrate the social tagging practice in order to exploit the social users' behavior to enrich their profiles. These profiles are then integrated into a recommendation system in order to predict the interesting personalized items of users allowing to assist them in their researches and further enrich their profiles. In this recommendation, we provide users new research experiences.

Keywords: user profiles, topical ontology, contextual information, folksonomies, tags' clusters, data freshness, association rules, data recommendation

Procedia PDF Downloads 252

22719 Using Data Mining in Automotive Safety

Authors: Carine Cridelich, Pablo Juesas Cano, Emmanuel Ramasso, Noureddine Zerhouni, Bernd Weiler

Abstract:

Safety is one of the most important considerations when buying a new car. While active safety aims at avoiding accidents, passive safety systems such as airbags and seat belts protect the occupant in case of an accident. In addition to legal regulations, organizations like Euro NCAP provide consumers with an independent assessment of the safety performance of cars and drive the development of safety systems in automobile industry. Those ratings are mainly based on injury assessment reference values derived from physical parameters measured in dummies during a car crash test. The components and sub-systems of a safety system are designed to achieve the required restraint performance. Sled tests and other types of tests are then carried out by car makers and their suppliers to confirm the protection level of the safety system. A Knowledge Discovery in Databases (KDD) process is proposed in order to minimize the number of tests. The KDD process is based on the data emerging from sled tests according to Euro NCAP specifications. About 30 parameters of the passive safety systems from different data sources (crash data, dummy protocol) are first analysed together with experts opinions. A procedure is proposed to manage missing data and validated on real data sets. Finally, a procedure is developed to estimate a set of rough initial parameters of the passive system before testing aiming at reducing the number of tests.

Keywords: KDD process, passive safety systems, sled test, dummy injury assessment reference values, frontal impact

Procedia PDF Downloads 373

22718 Comparative Study of Greenhouse Locations through Satellite Images and Geographic Information System: Methodological Evaluation in Venezuela

Authors: Maria A. Castillo H., Andrés R. Leandro C.

Abstract:

During the last decades, agricultural productivity in Latin America has increased with precision agriculture and more efficient agricultural technologies. The use of automated systems, satellite images, geographic information systems, and tools for data analysis, and artificial intelligence have contributed to making more effective strategic decisions. Twenty years ago, the state of Mérida, located in the Venezuelan Andes, reported the largest area covered by greenhouses in the country, where certified seeds of potatoes, vegetables, ornamentals, and flowers were produced for export and consumption in the central region of the country. In recent years, it is estimated that production under greenhouses has changed, and the area covered has decreased due to different factors, but there are few historical statistical data in sufficient quantity and quality to support this estimate or to be used for analysis and decision making. The objective of this study is to compare data collected about geoposition, use, and covered areas of the greenhouses in 2007 to data available in 2021, as support for the analysis of the current situation of horticultural production in the main municipalities of the state of Mérida. The document presents the development of the work in the diagnosis and integration of geographic coordinates in GIS and data analysis phases. As a result, an evaluation of the process is made, a dashboard is presented with the most relevant data along with the geographical coordinates integrated into GIS, and an analysis of the obtained information is made. Finally, some recommendations for actions are added, and works that expand the information obtained and its geographical traceability over time are proposed. This study contributes to granting greater certainty in the supporting data for the evaluation of social, environmental, and economic sustainability indicators and to make better decisions according to the sustainable development goals in the area under review. At the same time, the methodology provides improvements to the agricultural data collection process that can be extended to other study areas and crops.

Keywords: greenhouses, geographic information system, protected agriculture, data analysis, Venezuela

Procedia PDF Downloads 88

22717 Modeling and Optimization of Algae Oil Extraction Using Response Surface Methodology

Authors: I. F. Ejim, F. L. Kamen

Abstract:

Aims: In this experiment, algae oil extraction with a combination of n-hexane and ethanol was investigated. The effects of extraction solvent concentration, extraction time and temperature on the yield and quality of oil were studied using Response Surface Methodology (RSM). Experimental Design: Optimization of algae oil extraction using Box-Behnken design was used to generate 17 experimental runs in a three-factor-three-level design where oil yield, specific gravity, acid value and saponification value were evaluated as the response. Result: In this result, a minimum oil yield of 17% and maximum of 44% was realized. The optimum values for yield, specific gravity, acid value and saponification value from the overlay plot were 40.79%, 0.8788, 0.5056 mg KOH/g and 180.78 mg KOH/g respectively with desirability of 0.801. The maximum point prediction was yield 40.79% at solvent concentration 66.68 n-hexane, temperature of 40.0°C and extraction time of 4 hrs. Analysis of Variance (ANOVA) results showed that the linear and quadratic coefficient were all significant at p<0.05. The experiment was validated and results obtained were with the predicted values. Conclusion: Algae oil extraction was successfully optimized using RSM and its quality indicated it is suitable for many industrial uses.

Keywords: algae oil, response surface methodology, optimization, Box-Bohnken, extraction

Procedia PDF Downloads 324

22716 Modelling Consistency and Change of Social Attitudes in 7 Years of Longitudinal Data

Authors: Paul Campbell, Nicholas Biddle

Abstract:

There is a complex, endogenous relationship between individual circumstances, attitudes, and behaviour. This study uses longitudinal panel data to assess changes in social and political attitudes over a 7-year period. Attitudes are captured with the question 'what is the most important issue facing Australia today', collected at multiple time points in a longitudinal survey of 2200 Australians. Consistency of attitudes, and factors predicting change over time, are assessed. The consistency of responses has methodological implications for data collection, specifically how often such questions ought to be asked of a population. When change in attitude is observed, this study assesses the extent to which individual demographic characteristics, personality traits, and broader societal events predict change.

Keywords: attitudes, longitudinal survey analysis, personality, social values

Procedia PDF Downloads 122

22715 Data Protection and Regulation Compliance on Handling Physical Child Abuse Scenarios- A Scoping Review

Authors: Ana Mafalda Silva, Rebeca Fontes, Ana Paula Vaz, Carla Carreira, Ana Corte-Real

Abstract:

Decades of research on the topic of interpersonal violence against minors highlight five main conclusions: 1) it causes harmful effects on children's development and health; 2) it is prevalent; 3) it violates children's rights; 4) it can be prevented and 5) parents are the main aggressors. The child abuse scenario is identified through clinical observation, administrative data and self-reports. The most used instruments are self-reports; however, there are no valid and reliable self-report instruments for minors, which consist of a retrospective interpretation of the situation by the victim already in her adult phase and/or by her parents. Clinical observation and collection of information, namely from the orofacial region, are essential in the early identification of these situations. The management of medical data, such as personal data, must comply with the General Data Protection Regulation (GDPR), in Europe, and with the General Law of Data Protection (LGPD), in Brazil. This review aims to answer the question: In a situation of medical assistance to minors, in the suspicion of interpersonal violence, due to mistreatment, is it necessary for the guardians to provide consent in the registration and sharing of personal data, namely medical ones. A scoping review was carried out based on a search by the Web of Science and Pubmed search engines. Four papers and two documents from the grey literature were selected. As found, the process of identifying and signaling child abuse by the health professional, and the necessary early intervention in defense of the minor as a victim of abuse, comply with the guidelines expressed in the GDPR and LGPD. This way, the notification in maltreatment scenarios by health professionals should be a priority and there shouldn’t be the fear or anxiety of legal repercussions that stands in the way of collecting and treating the data necessary for the signaling procedure that safeguards and promotes the welfare of children living with abuse.

Keywords: child abuse, disease notifications, ethics, healthcare assistance

Procedia PDF Downloads 89

22714 A Generic Middleware to Instantly Sync Intensive Writes of Heterogeneous Massive Data via Internet

Authors: Haitao Yang, Zhenjiang Ruan, Fei Xu, Lanting Xia

Abstract:

Industry data centers often need to sync data changes reliably and instantly from a large-scale of heterogeneous autonomous relational databases accessed via the not-so-reliable Internet, for which a practical universal sync middle of low maintenance and operation costs is most wanted, but developing such a product and adapting it for various scenarios are a very sophisticated and continuous practice. The authors have been devising, applying, and optimizing a generic sync middleware system, named GSMS since 2006, holding the principles or advantages that the middleware must be SyncML-compliant and transparent to data application layer logic, need not refer to implementation details of databases synced, does not rely on host computer operating systems deployed, and its construction is light weighted and hence, of low cost. A series of ultimate experiments with GSMS sync performance were conducted for a persuasive example of a source relational database that underwent a broad range of write loads, say, from one thousand to one million intensive writes within a few minutes. The tests proved that GSMS has achieved an instant sync level of well below a fraction of millisecond per record sync, and GSMS’ smooth performances under ultimate write loads also showed it is feasible and competent.

Keywords: heterogeneous massive data, instantly sync intensive writes, Internet generic middleware design, optimization

Procedia PDF Downloads 110

22713 Familial Exome Sequencing to Decipher the Complex Genetic Basis of Holoprosencephaly

Authors: Artem Kim, Clara Savary, Christele Dubourg, Wilfrid Carre, Houda Hamdi-Roze, Valerie Dupé, Sylvie Odent, Marie De Tayrac, Veronique David

Abstract:

Holoprosencephaly (HPE) is a rare congenital brain malformation resulting from the incomplete separation of the two cerebral hemispheres. It is characterized by a wide phenotypic spectrum and a high degree of locus heterogeneity. Genetic defects in 16 genes have already been implicated in HPE, but account for only 30% of cases, suggesting that a large part of genetic factors remains to be discovered. HPE has been recently redefined as a complex multigenic disorder, requiring the joint effect of multiple mutational events in genes belonging to one or several developmental pathways. The onset of HPE may result from accumulation of the effects of multiple rare variants in functionally-related genes, each conferring a moderate increase in the risk of HPE onset. In order to decipher the genetic basis of HPE, unconventional patterns of inheritance involving multiple genetic factors need to be considered. The primary objective of this study was to uncover possible disease causing combinations of multiple rare variants underlying HPE by performing trio-based Whole Exome Sequencing (WES) of familial cases where no molecular diagnosis could be established. 39 families were selected with no fully-penetrant causal mutation in known HPE gene, no chromosomic aberrations/copy number variants and without any implication of environmental factors. As the main challenge was to identify disease-related variants among a large number of nonpathogenic polymorphisms detected by WES classical scheme, a novel variant prioritization approach was established. It combined WES filtering with complementary gene-level approaches: transcriptome-driven (RNA-Seq data) and clinically-driven (public clinical data) strategies. Briefly, a filtering approach was performed to select variants compatible with disease segregation, population frequency and pathogenicity prediction to identify an exhaustive list of rare deleterious variants. The exome search space was then reduced by restricting the analysis to candidate genes identified by either transcriptome-driven strategy (genes sharing highly similar expression patterns with known HPE genes during cerebral development) or clinically-driven strategy (genes associated to phenotypes of interest overlapping with HPE). Deeper analyses of candidate variants were then performed on a family-by-family basis. These included the exploration of clinical information, expression studies, variant characteristics, recurrence of mutated genes and available biological knowledge. A novel bioinformatics pipeline was designed. Applied to the 39 families, this final integrated workflow identified an average of 11 candidate variants per family. Most of candidate variants were inherited from asymptomatic parents suggesting a multigenic inheritance pattern requiring the association of multiple mutational events. The manual analysis highlighted 5 new strong HPE candidate genes showing recurrences in distinct families. Functional validations of these genes are foreseen.

Keywords: complex genetic disorder, holoprosencephaly, multiple rare variants, whole exome sequencing

Procedia PDF Downloads 194

22712 Building Transparent Supply Chains through Digital Tracing

Authors: Penina Orenstein

Abstract:

In today’s world, particularly with COVID-19 a constant worldwide threat, organizations need greater visibility over their supply chains more than ever before, in order to find areas for improvement and greater efficiency, reduce the chances of disruption and stay competitive. The concept of supply chain mapping is one where every process and route is mapped in detail between each vendor and supplier. The simplest method of mapping involves sourcing publicly available data including news and financial information concerning relationships between suppliers. An additional layer of information would be disclosed by large, direct suppliers about their production and logistics sites. While this method has the advantage of not requiring any input from suppliers, it also doesn’t allow for much transparency beyond the first supplier tier and may generate irrelevant data—noise—that must be filtered out to find the actionable data. The primary goal of this research is to build data maps of supply chains by focusing on a layered approach. Using these maps, the secondary goal is to address the question as to whether the supply chain is re-engineered to make improvements, for example, to lower the carbon footprint. Using a drill-down approach, the end result is a comprehensive map detailing the linkages between tier-one, tier-two, and tier-three suppliers super-imposed on a geographical map. The driving force behind this idea is to be able to trace individual parts to the exact site where they’re manufactured. In this way, companies can ensure sustainability practices from the production of raw materials through the finished goods. The approach allows companies to identify and anticipate vulnerabilities in their supply chain. It unlocks predictive analytics capabilities and enables them to act proactively. The research is particularly compelling because it unites network science theory with empirical data and presents the results in a visual, intuitive manner.

Keywords: data mining, supply chain, empirical research, data mapping

Procedia PDF Downloads 167

22711 Prediction and Analysis of Human Transmembrane Transporter Proteins Based on SCM

Authors: Hui-Ling Huang, Tamara Vasylenko, Phasit Charoenkwan, Shih-Hsiang Chiu, Shinn-Ying Ho

Abstract:

The knowledge of the human transporters is still limited due to technically demanding procedure of crystallization for the structural characterization of transporters by spectroscopic methods. It is desirable to develop bioinformatics tools for effective analysis of available sequences in order to identify human transmembrane transporter proteins (HMTPs). This study proposes a scoring card method (SCM) based method for predicting HMTPs. We estimated a set of propensity scores of dipeptides to be HMTPs using SCM from the training dataset (HTS732) consisting of 366 HMTPs and 366 non-HMTPs. SCM using the estimated propensity scores of 20 amino acids and 400 dipeptides -as HMTPs, has a training accuracy of 87.63% and a test accuracy of 66.46%. The five top-ranked dipeptides include LD, NV, LI, KY, and MN with scores 996, 992, 989, 987, and 985, respectively. Five amino acids with the highest propensity scores are Ile, Phe, Met, Gly, and Leu, that hydrophobic residues are mostly highly-scored. Furthermore, obtained propensity scores were used to analyze physicochemical properties of human transporters.

Keywords: dipeptide composition, physicochemical property, human transmembrane transporter proteins, human transmembrane transporters binding propensity, scoring card method

Procedia PDF Downloads 363

22710 Role of Machine Learning in Internet of Things Enabled Smart Cities

Authors: Amit Prakash Singh, Shyamli Singh, Chavi Srivastav

Abstract:

This paper presents the idea of Internet of Thing (IoT) for the infrastructure of smart cities. Internet of Thing has been visualized as a communication prototype that incorporates myriad of digital services. The various component of the smart cities shall be implemented using microprocessor, microcontroller, sensors for network communication and protocols. IoT enabled systems have been devised to support the smart city vision, of which aim is to exploit the currently available precocious communication technologies to support the value-added services for function of the city. Due to volume, variety, and velocity of data, it requires analysis using Big Data concept. This paper presented the various techniques used to analyze big data using machine learning.

Keywords: IoT, smart city, embedded systems, sustainable environment

Procedia PDF Downloads 561

22709 Machine Learning Classification of Fused Sentinel-1 and Sentinel-2 Image Data Towards Mapping Fruit Plantations in Highly Heterogenous Landscapes

Authors: Yingisani Chabalala, Elhadi Adam, Khalid Adem Ali

Abstract:

Mapping smallholder fruit plantations using optical data is challenging due to morphological landscape heterogeneity and crop types having overlapped spectral signatures. Furthermore, cloud covers limit the use of optical sensing, especially in subtropical climates where they are persistent. This research assessed the effectiveness of Sentinel-1 (S1) and Sentinel-2 (S2) data for mapping fruit trees and co-existing land-use types by using support vector machine (SVM) and random forest (RF) classifiers independently. These classifiers were also applied to fused data from the two sensors. Feature ranks were extracted using the RF mean decrease accuracy (MDA) and forward variable selection (FVS) to identify optimal spectral windows to classify fruit trees. Based on RF MDA and FVS, the SVM classifier resulted in relatively high classification accuracy with overall accuracy (OA) = 0.91.6% and kappa coefficient = 0.91% when applied to the fused satellite data. Application of SVM to S1, S2, S2 selected variables and S1S2 fusion independently produced OA = 27.64, Kappa coefficient = 0.13%; OA= 87%, Kappa coefficient = 86.89%; OA = 69.33, Kappa coefficient = 69. %; OA = 87.01%, Kappa coefficient = 87%, respectively. Results also indicated that the optimal spectral bands for fruit tree mapping are green (B3) and SWIR_2 (B10) for S2, whereas for S1, the vertical-horizontal (VH) polarization band. Including the textural metrics from the VV channel improved crop discrimination and co-existing land use cover types. The fusion approach proved robust and well-suited for accurate smallholder fruit plantation mapping.

Keywords: smallholder agriculture, fruit trees, data fusion, precision agriculture

Procedia PDF Downloads 39

22708 A Tactic for a Cosmopolitan City Comparison through a Data-Driven Approach: Case of Climate City Networking

Authors: Sombol Mokhles

Abstract:

Tackling climate change requires expanding networking opportunities between a diverse range of cities to accelerate climate actions. Existing climate city networks have limitations in actively engaging “ordinary” cities in networking processes between cities, as they encourage a few powerful cities to be followed by the many “ordinary” cities. To reimagine the networking opportunities between cities beyond global cities, this paper incorporates “cosmopolitan comparison” to expand our knowledge of a diverse range of cities using a data-driven approach. Through a cosmopolitan perspective, a framework is presented on how to utilise large data to expand knowledge of cities beyond global cities to reimagine the existing hierarchical networking practices. The contribution of this framework is beyond urban climate governance but inclusive of different fields which strive for a more inclusive and cosmopolitan comparison attentive to the differences across cities.

Keywords: cosmopolitan city comparison, data-driven approach, climate city networking, urban climate governance

Procedia PDF Downloads 102

22707 River Bank Erosion Studies: A Review on Investigation Approaches and Governing Factors

Authors: Azlinda Saadon

Abstract:

This paper provides detail review on river bank erosion studies with respect to their processes, methods of measurements and factors governing river bank erosion. Bank erosion processes are commonly associated with river changes initiation and development, through width adjustment and planform evolution. It consists of two main types of erosion processes; basal erosion due to fluvial hydraulic force and bank failure under the influence of gravity. Most studies had only focused on one factor rather than integrating both factors. Evidences of previous works have shown integration between both processes of fluvial hydraulic force and bank failure. Bank failure is often treated as probabilistic phenomenon without having physical characteristics and the geotechnical aspects of the bank. This review summarizes the findings of previous investigators with respect to measurement techniques and prediction rates of river bank erosion through field investigation, physical model and numerical model approaches. Factors governing river bank erosion considering physical characteristics of fluvial erosion are defined.

Keywords: river bank erosion, bank erosion, dimensional analysis, geotechnical aspects

Procedia PDF Downloads 421

22706 A Predictive MOC Solver for Water Hammer Waves Distribution in Network

Authors: A. Bayle, F. Plouraboué

Abstract:

Water Distribution Network (WDN) still suffers from a lack of knowledge about fast pressure transient events prediction, although the latter may considerably impact their durability. Accidental or planned operating activities indeed give rise to complex pressure interactions and may drastically modified the local pressure value generating leaks and, in rare cases, pipe’s break. In this context, a numerical predictive analysis is conducted to prevent such event and optimize network management. A couple of Python/FORTRAN 90, home-made software, has been developed using Method Of Characteristic (MOC) solving for water-hammer equations. The solver is validated by direct comparison with theoretical and experimental measurement in simple configurations whilst afterward extended to network analysis. The algorithm's most costly steps are designed for parallel computation. A various set of boundary conditions and energetic losses models are considered for the network simulations. The results are analyzed in both real and frequencies domain and provide crucial information on the pressure distribution behavior within the network.

Keywords: energetic losses models, method of characteristic, numerical predictive analysis, water distribution network, water hammer

Procedia PDF Downloads 216

22705 An Analysis on Clustering Based Gene Selection and Classification for Gene Expression Data

Authors: K. Sathishkumar, V. Thiagarasu

Abstract:

Due to recent advances in DNA microarray technology, it is now feasible to obtain gene expression profiles of tissue samples at relatively low costs. Many scientists around the world use the advantage of this gene profiling to characterize complex biological circumstances and diseases. Microarray techniques that are used in genome-wide gene expression and genome mutation analysis help scientists and physicians in understanding of the pathophysiological mechanisms, in diagnoses and prognoses, and choosing treatment plans. DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. This work presents an analysis of several clustering algorithms proposed to deals with the gene expression data effectively. The existing clustering algorithms like Support Vector Machine (SVM), K-means algorithm and evolutionary algorithm etc. are analyzed thoroughly to identify the advantages and limitations. The performance evaluation of the existing algorithms is carried out to determine the best approach. In order to improve the classification performance of the best approach in terms of Accuracy, Convergence Behavior and processing time, a hybrid clustering based optimization approach has been proposed.

Keywords: microarray technology, gene expression data, clustering, gene Selection

Procedia PDF Downloads 315

22704 A Theoretical Model for Pattern Extraction in Large Datasets

Authors: Muhammad Usman

Abstract:

Pattern extraction has been done in past to extract hidden and interesting patterns from large datasets. Recently, advancements are being made in these techniques by providing the ability of multi-level mining, effective dimension reduction, advanced evaluation and visualization support. This paper focuses on reviewing the current techniques in literature on the basis of these parameters. Literature review suggests that most of the techniques which provide multi-level mining and dimension reduction, do not handle mixed-type data during the process. Patterns are not extracted using advanced algorithms for large datasets. Moreover, the evaluation of patterns is not done using advanced measures which are suited for high-dimensional data. Techniques which provide visualization support are unable to handle a large number of rules in a small space. We present a theoretical model to handle these issues. The implementation of the model is beyond the scope of this paper.

Keywords: association rule mining, data mining, data warehouses, visualization of association rules

Procedia PDF Downloads 217

22703 Design of Data Management Software System Supporting Rendezvous and Docking with Various Spaceships

Authors: Zhan Panpan, Lu Lan, Sun Yong, He Xiongwen, Yan Dong, Gu Ming

Abstract:

The function of the two spacecraft docking network, the communication and control of a docking target with various spacecrafts is realized in the space lab data management system. In order to solve the problem of the complex data communication mode between the space lab and various spaceships, and the problem of software reuse caused by non-standard protocol, a data management software system supporting rendezvous and docking with various spaceships has been designed. The software system is based on CCSDS Spcecraft Onboard Interface Service(SOIS). It consists of Software Driver Layer, Middleware Layer and Appliaction Layer. The Software Driver Layer hides the various device interfaces using the uniform device driver framework. The Middleware Layer is divided into three lays, including transfer layer, application support layer and system business layer. The communication of space lab plaform bus and the docking bus is realized in transfer layer. Application support layer provides the inter tasks communitaion and the function of unified time management for the software system. The data management software functions are realized in system business layer, which contains telemetry management service, telecontrol management service, flight status management service, rendezvous and docking management service and so on. The Appliaction Layer accomplishes the space lab data management system defined tasks using the standard interface supplied by the Middleware Layer. On the basis of layered architecture, rendezvous and docking tasks and the rendezvous and docking management service are independent in the software system. The rendezvous and docking tasks will be activated and executed according to the different spaceships. In this way, the communication management functions in the independent flight mode, the combination mode of the manned spaceship and the combination mode of the cargo spaceship are achieved separately. The software architecture designed standard appliction interface for the services in each layer. Different requirements of the space lab can be supported by the use of standard services per layer, and the scalability and flexibility of the data management software can be effectively improved. It can also dynamically expand the number and adapt to the protocol of visiting spaceships. The software system has been applied in the data management subsystem of the space lab, and has been verified in the flight of the space lab. The research results of this paper can provide the basis for the design of the data manage system in the future space station.

Keywords: space lab, rendezvous and docking, data management, software system

Procedia PDF Downloads 359

22702 The Wear Recognition on Guide Surface Based on the Feature of Radar Graph

Authors: Youhang Zhou, Weimin Zeng, Qi Xie

Abstract:

Abstract: In order to solve the wear recognition problem of the machine tool guide surface, a new machine tool guide surface recognition method based on the radar-graph barycentre feature is presented in this paper. Firstly, the gray mean value, skewness, projection variance, flat degrees and kurtosis features of the guide surface image data are defined as primary characteristics. Secondly, data Visualization technology based on radar graph is used. The visual barycentre graphical feature is demonstrated based on the radar plot of multi-dimensional data. Thirdly, a classifier based on the support vector machine technology is used, the radar-graph barycentre feature and wear original feature are put into the classifier separately for classification and comparative analysis of classification and experiment results. The calculation and experimental results show that the method based on the radar-graph barycentre feature can detect the guide surface effectively.

Keywords: guide surface, wear defects, feature extraction, data visualization

Procedia PDF Downloads 509

22701 Aggregation Scheduling Algorithms in Wireless Sensor Networks

Authors: Min Kyung An

Abstract:

In Wireless Sensor Networks which consist of tiny wireless sensor nodes with limited battery power, one of the most fundamental applications is data aggregation which collects nearby environmental conditions and aggregates the data to a designated destination, called a sink node. Important issues concerning the data aggregation are time efficiency and energy consumption due to its limited energy, and therefore, the related problem, named Minimum Latency Aggregation Scheduling (MLAS), has been the focus of many researchers. Its objective is to compute the minimum latency schedule, that is, to compute a schedule with the minimum number of timeslots, such that the sink node can receive the aggregated data from all the other nodes without any collision or interference. For the problem, the two interference models, the graph model and the more realistic physical interference model known as Signal-to-Interference-Noise-Ratio (SINR), have been adopted with different power models, uniform-power and non-uniform power (with power control or without power control), and different antenna models, omni-directional antenna and directional antenna models. In this survey article, as the problem has proven to be NP-hard, we present and compare several state-of-the-art approximation algorithms in various models on the basis of latency as its performance measure.

Keywords: data aggregation, convergecast, gathering, approximation, interference, omni-directional, directional

Procedia PDF Downloads 218

22700 Reliable and Energy-Aware Data Forwarding under Sink-Hole Attack in Wireless Sensor Networks

Authors: Ebrahim Alrashed

Abstract:

Wireless sensor networks are vulnerable to attacks from adversaries attempting to disrupt their operations. Sink-hole attacks are a type of attack where an adversary node drops data forwarded through it and hence affecting the reliability and accuracy of the network. Since sensor nodes have limited battery power, it is essential that any solution to the sinkhole attack problem be very energy-aware. In this paper, we present a reliable and energy efficient scheme to forward data from source nodes to the base station while under sink-hole attack. The scheme also detects sink-hole attack nodes and avoid paths that includes them.

Keywords: energy-aware routing, reliability, sink-hole attack, WSN

Procedia PDF Downloads 386

22699 Sensitivity of the Estimated Output Energy of the Induction Motor to both the Asymmetry Supply Voltage and the Machine Parameters

Authors: Eyhab El-Kharashi, Maher El-Dessouki

Abstract:

The paper is dedicated to precise assessment of the induction motor output energy during the unbalanced operation. Since many years ago and until now the voltage complex unbalance factor (CVUF) is used only to assess the output energy of the induction motor while this output energy for asymmetry supply voltage does not depend on the value of unbalanced voltage only but also on the machine parameters. The paper illustrates the variation of the two unbalance factors, complex voltage unbalance factor (CVUF) and impedance unbalance factor (IUF), with positive sequence voltage component, reveals that degree and manner of unbalance in supply voltage. From this point of view the paper delineates the current unbalance factor (CUF) to exactly reflect the output energy during unbalanced operation. The paper proceeds to illustrate the importance of using this factor in the multi-machine system to precise prediction of the output energy during the unbalanced operation. The use of the proposed unbalance factor (CUF) avoids the accumulation of the error due to more than one machine in the system which is expected if only the complex voltage unbalance factor (CVUF) is used.

Keywords: induction motor, electromagnetic torque, voltage unbalance, energy conversion

Procedia PDF Downloads 550

22698 A Near-Optimal Domain Independent Approach for Detecting Approximate Duplicates

Authors: Abdelaziz Fellah, Allaoua Maamir

Abstract:

We propose a domain-independent merging-cluster filter approach complemented with a set of algorithms for identifying approximate duplicate entities efficiently and accurately within a single and across multiple data sources. The near-optimal merging-cluster filter (MCF) approach is based on the Monge-Elkan well-tuned algorithm and extended with an affine variant of the Smith-Waterman similarity measure. Then we present constant, variable, and function threshold algorithms that work conceptually in a divide-merge filtering fashion for detecting near duplicates as hierarchical clusters along with their corresponding representatives. The algorithms take recursive refinement approaches in the spirit of filtering, merging, and updating, cluster representatives to detect approximate duplicates at each level of the cluster tree. Experiments show a high effectiveness and accuracy of the MCF approach in detecting approximate duplicates by outperforming the seminal Monge-Elkan’s algorithm on several real-world benchmarks and generated datasets.

Keywords: data mining, data cleaning, approximate duplicates, near-duplicates detection, data mining applications and discovery

Procedia PDF Downloads 375

22697 Delivery Service and Online and Offline Purchasing for Collaborative Recommendations on Retail Cross-Channels

Authors: S. H. Liao, J. M. Huang

Abstract:

The delivery service business model is the final link in logistics for both online and offline businesses. The online and offline business model focuses on the entire customer purchasing process online and offline, placing greater emphasis on the importance of data to optimize overall retail operations. For the retail industry, it is an important task of information and management to strengthen the collection and investigation of consumers' online and offline purchasing data to better understand customers and then recommend products. This study implements two-stage data mining analytics for clustering and association rules analysis to investigate Taiwanese consumers' (n=2,209) preferences for delivery service. This process clarifies online and offline purchasing behaviors and preferences to find knowledge profiles/patterns/rules for cross-channel collaborative recommendations. Finally, theoretical and practical implications for methodology and enterprise are presented.

Keywords: delivery service, online and offline purchasing, retail cross-channel, collaborative recommendations, data mining analytics

Procedia PDF Downloads 6

22696 A High Reliable Space-Borne File System with Applications of Device Partition and Intra-Channel Pipeline in Nand Flash

Authors: Xin Li, Ji-Yang Yu, Yue-Hua Niu, Lu-Yuan Wang

Abstract:

As an inevitable chain of the space data acquirement system, space-borne storage system based on Nand Flash has gradually been implemented in spacecraft. In face of massive, parallel and varied data on board, efficient data management become an important issue of storage research. Face to the requirements of high-performance and reliability in Nand Flash storage system, a combination of hardware and file system design can drastically increase system dependability, even for missions with a very long duration. More sophisticated flash storage concepts with advanced operating systems have been researched to improve the reliability of Nand Flash storage system on satellites. In this paper, architecture of file system with multi-channel data acquisition and storage on board is proposed, which obtains large-capacity and high-performance with the combine of intra-channel pipeline and device partition in Nand Flash. Multi-channel data in different rate are stored as independent files with parallel-storage system in device partition, which assures the high-effective and reliable throughput of file treatments. For massive and high-speed data storage, an efficiency assessment model is established to calculate the bandwidth formula of intra-channel pipeline. Information tables designed in Magnetoresistive RAM (MRAM) hold the management of bad block in Nand Flash and the arrangement of file system address for the high-reliability of data storage. During the full-load test, the throughput of 3D PLUS Module 160Gb Nand Flash can reach 120Mbps for store and reach 120Mbps for playback, which efficiently satisfies the requirement of multi-channel data acquisition in Satellite. Compared with previous literature, the results of experiments verify the advantages of the proposed system.

Keywords: device partition architecture, intra-channel pipelining, nand flash, parallel storage

Procedia PDF Downloads 285

22695 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets

Authors: Najmeh Abedzadeh, Matthew Jacobs

Abstract:

An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.

Keywords: IDS, imbalanced datasets, sampling algorithms, big data

Procedia PDF Downloads 307

22694 The Role of Vocabulary in Reading Comprehension

Authors: Engku Haliza Engku Ibrahim, Isarji Sarudin, Ainon Jariah Muhamad

Abstract:

It is generally agreed that many factors contribute to one’s reading comprehension and there is consensus that vocabulary size one of the main factors. This study explores the relationship between second language learners’ vocabulary size and their reading comprehension scores. 130 Malay pre-university students of a public university participated in this study. They were students of an intensive English language programme doing preparatory English courses to pursue bachelors degree in English. A quantitative research method was employed based on the Vocabulary Levels Test by Nation (1990) and the reading comprehension score of the in-house English Proficiency Test. A review of the literature indicates that a somewhat positive correlation is to be expected though findings of this study can only be explicated once the final analysis has been carried out. This is an ongoing study and it is anticipated that results of this research will be finalized in the near future. The findings will help provide beneficial implications for the prediction of reading comprehension performance. It also has implications for the teaching of vocabulary in the ESL context. A better understanding of the relationship between vocabulary size and reading comprehension scores will enhance teachers’ and students’ awareness of the importance of vocabulary acquisition in the L2 classroom.

Keywords: vocabulary size, vocabulary learning, reading comprehension, ESL

Procedia PDF Downloads 438

22693 Tourism Satellite Account: Approach and Information System Development

Authors: Pappas Theodoros, Mihail Diakomihalis

Abstract:

Measuring the economic impact of tourism in a benchmark economy is a global concern, with previous measurements being partial and not fully integrated. Tourism is a phenomenon that requires individual consumption of visitors and which should be observed and measured to reveal, thus, the overall contribution of tourism to an economy. The Tourism Satellite Account (TSA) is a critical tool for assessing the annual growth of tourism, providing reliable measurements. This article introduces a system of TSA information that encompasses all the works of the TSA, including input, storage, management, and analysis of data, as well as additional future functions and enhances the efficiency of tourism data management and TSA collection utility. The methodology and results presented offer insights into the development and implementation of TSA.

Keywords: tourism satellite account, information system, data-based tourist account, relation database

Procedia PDF Downloads 67