Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25101

Search results for: clustering on flowing data

24261 Mathematical Analysis of Variation in Inlet Shock Wave Angle on Specific Impulse of Scramjet Engine

Abstract:

Study of shock waves generated in the Scramjet engine is typically restricted to pressure, temperature, density, entropy and Mach number variation across the shock wave. The present work discusses the impact of inlet shock wave angles on the specific impulse of the Scramjet engine. A mathematical analysis has done for the isentropic hypersonic flow of air flowing through a Scramjet with hydrogen fuel at an altitude of 30 km. Analysis has been done in order to get optimum shock wave angle to achieve maximum impulse. Since external drag has excluded from the analysis, the losses due to friction are not considered for the present analysis. When Mach number of the airflow at the entry of the nozzle reaches unity, then that flow is choked. This condition puts limitations on increasing the inlet shock wave angle. As inlet shock wave angle increases, speed of the flow entering into the nozzle decreases, which results in an increase in the specific impulse of the engine. When the speed of the flow at the entry of the nozzle reduces below sonic speed, then there is no further increase in the specific impulse of the engine. Here the Conclusion is the thrust and specific impulse of a scramjet engine, which increases gradually with an increase in inlet shock wave angle up to the condition when airflow speed reaches sonic velocity at the exit of the combustor. In addition to that, variation in drag force at the inlet of the scramjet and variation in hypersonic flow conditions at every stage of the scramjet also studied in order to understand variation on flow characteristics with respect to flow deflection angle. Essentially, it helps in designing inlet profile for the Scramjet engine to achieve optimum specific impulse.

Keywords: hypersonic flow, scramjet, shock waves, specific impulse, mathematical analysis

Procedia PDF Downloads 165

24260 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 425

24259 Visualization of PM₂.₅ Time Series and Correlation Analysis of Cities in Bangladesh

Authors: Asif Zaman, Moinul Islam Zaber, Amin Ahsan Ali

Abstract:

In recent years of industrialization, the South Asian countries are being affected by air pollution due to a severe increase in fine particulate matter 2.5 (PM₂.₅). Among them, Bangladesh is one of the most polluting countries. In this paper, statistical analyses were conducted on the time series of PM₂.₅ from various districts in Bangladesh, mostly around Dhaka city. Research has been conducted on the dynamic interactions and relationships between PM₂.₅ concentrations in different zones. The study is conducted toward understanding the characteristics of PM₂.₅, such as spatial-temporal characterization, correlation of other contributors behind air pollution such as human activities, driving factors and environmental casualties. Clustering on the data gave an insight on the districts groups based on their AQI frequency as representative districts. Seasonality analysis on hourly and monthly frequency found higher concentration of fine particles in nighttime and winter season, respectively. Cross correlation analysis discovered a phenomenon of correlations among cities based on time-lagged series of air particle readings and visualization framework is developed for observing interaction in PM₂.₅ concentrations between cities. Significant time-lagged correlations were discovered between the PM₂.₅ time series in different city groups throughout the country by cross correlation analysis. Additionally, seasonal heatmaps depict that the pooled series correlations are less significant in warmer months, and among cities of greater geographic distance as well as time lag magnitude and direction of the best shifted correlated particulate matter time series among districts change seasonally. The geographic map visualization demonstrates spatial behaviour of air pollution among districts around Dhaka city and the significant effect of wind direction as the vital actor on correlated shifted time series. The visualization framework has multipurpose usage from gathering insight of general and seasonal air quality of Bangladesh to determining the pathway of regional transportation of air pollution.

Keywords: air quality, particles, cross correlation, seasonality

Procedia PDF Downloads 103

24258 Analysing Techniques for Fusing Multimodal Data in Predictive Scenarios Using Convolutional Neural Networks

Authors: Philipp Ruf, Massiwa Chabbi, Christoph Reich, Djaffar Ould-Abdeslam

Abstract:

In recent years, convolutional neural networks (CNN) have demonstrated high performance in image analysis, but oftentimes, there is only structured data available regarding a specific problem. By interpreting structured data as images, CNNs can effectively learn and extract valuable insights from tabular data, leading to improved predictive accuracy and uncovering hidden patterns that may not be apparent in traditional structured data analysis. In applying a single neural network for analyzing multimodal data, e.g., both structured and unstructured information, significant advantages in terms of time complexity and energy efficiency can be achieved. Converting structured data into images and merging them with existing visual material offers a promising solution for applying CNN in multimodal datasets, as they often occur in a medical context. By employing suitable preprocessing techniques, structured data is transformed into image representations, where the respective features are expressed as different formations of colors and shapes. In an additional step, these representations are fused with existing images to incorporate both types of information. This final image is finally analyzed using a CNN.

Keywords: CNN, image processing, tabular data, mixed dataset, data transformation, multimodal fusion

Procedia PDF Downloads 111

24257 Current Characteristic of Water Electrolysis to Produce Hydrogen, Alkaline, and Acid Water

Authors: Ekki Kurniawan, Yusuf Nur Jayanto, Erna Sugesti, Efri Suhartono, Agus Ganda Permana, Jaspar Hasudungan, Jangkung Raharjo, Rintis Manfaati

Abstract:

The purpose of this research is to study the current characteristic of the electrolysis of mineral water to produce hydrogen, alkaline water, and acid water. Alkaline and hydrogen water are believed to have health benefits. Alkaline water containing hydrogen can be an anti-oxidant that captures free radicals, which will increase the immune system. In Indonesia, there are two existing types of alkaline water producing equipment, but the installation is complicated, and the price is relatively expensive. The electrolysis process is slow (6-8 hours) since they are locally made using 311 VDC full bridge rectifier power supply. This paper intends to discuss how to make hydrogen and alkaline water by a simple portable mineral water ionizer. This is an electrolysis device that is easy to carry and able to separate ions of mineral water into acidic and alkaline water. With an electric field, positive ions will be attracted to the cathode, while negative ions will be attracted to the anode. The circuit equivalent can be depicted as RLC transient ciruit. The diode component ensures that the electrolytic current is direct current. Switch S divides the switching times t1, t2, and t3. In the first stage up to t1, the electrolytic current increases exponentially, as does the inductor charging current (L). The molecules in drinking water experience magnetic properties. The direction of the dipole ions, which are random in origin, will regularly flare with the direction of the electric field. In the second stage up to t2, the electrolytic current decreases exponentially, just like the charging current of a capacitor (C). In the 3rd stage, start t3 until it tends to be constant, as is the case with the current flowing through the resistor (R).

Keywords: current electrolysis, mineral water, ions, alkaline and acid waters, inductor, capacitor, resistor

Procedia PDF Downloads 102

24256 Order Fulfilment Strategy in E-Commerce Warehouse Based on Simulation: Business Customers Case

Authors: Aurelija Burinskiene

Abstract:

This paper presents the study for an e-commerce warehouse. The study is aiming to improve order fulfillment activity by identifying the strategy presenting the best performance. A simulation model was proposed to reach the target of this research. This model enables various scenario tests in an e-commerce warehouse, allowing them to find out for the best order fulfillment strategy. By using simulation, model authors investigated customers’ orders representing on-line purchases for one month. Experiments were designed to evaluate various order picking methods applicable to the fulfillment of customers’ orders. The research uses cost components analysis and helps to identify the best possible order picking method improving the overall performance of e-commerce warehouse and fulfillment service to the customers. The results presented show that the application of order batching strategy is the most applicable because it brings distance savings of around 6.7 percentage. This result could be improved by taking an assortment clustering action until 8.34 percentage. So, the recommendations were given to apply the method for future e-commerce warehouse operations.

Keywords: e-commerce, order, fulfilment, strategy, simulation

Procedia PDF Downloads 147

24255 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: data mining, textile production, decision trees, classification

Procedia PDF Downloads 344

24254 Investigation of Delivery of Triple Play Data in GE-PON Fiber to the Home Network

Authors: Ashima Anurag Sharma

Abstract:

Optical fiber based networks can deliver performance that can support the increasing demands for high speed connections. One of the new technologies that have emerged in recent years is Passive Optical Networks. This research paper is targeted to show the simultaneous delivery of triple play service (data, voice, and video). The comparison between various data rates is presented. It is demonstrated that as we increase the data rate, number of users to be decreases due to increase in bit error rate.

Keywords: BER, PON, TDMPON, GPON, CWDM, OLT, ONT

Procedia PDF Downloads 521

24253 Microarray Gene Expression Data Dimensionality Reduction Using PCA

Authors: Fuad M. Alkoot

Abstract:

Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.

Keywords: PCA, gene expression, dimensionality reduction, classification, autism

Procedia PDF Downloads 556

24252 Design Components and Reliability Aspects of Municipal Waste Water and SEIG Based Micro Hydro Power Plant

Authors: R. K. Saket

Abstract:

This paper presents design aspects and probabilistic approach for generation reliability evaluation of an alternative resource: municipal waste water based micro hydro power generation system. Annual and daily flow duration curves have been obtained for design, installation, development, scientific analysis and reliability evaluation of the MHPP. The hydro potential of the waste water flowing through sewage system of the BHU campus has been determined to produce annual flow duration and daily flow duration curves by ordering the recorded water flows from maximum to minimum values. Design pressure, the roughness of the pipe’s interior surface, method of joining, weight, ease of installation, accessibility to the sewage system, design life, maintenance, weather conditions, availability of material, related cost and likelihood of structural damage have been considered for design of a particular penstock for reliable operation of the MHPP. A MHPGS based on MWW and SEIG is designed, developed, and practically implemented to provide reliable electric energy to suitable load in the campus of the Banaras Hindu University, Varanasi, (UP), India. Generation reliability evaluation of the developed MHPP using Gaussian distribution approach, safety factor concept, peak load consideration and Simpson 1/3rd rule has presented in this paper.

Keywords: self excited induction generator, annual and daily flow duration curve, sewage system, municipal waste water, reliability evaluation, Gaussian distribution, Simpson 1/3rd rule

Procedia PDF Downloads 556

24251 Spatio-temporal Distribution of Surface Water Quality in the Kebir Rhumel Basin, Algeria

Authors: Lazhar Belkhiri, Ammar Tiri, Lotfi Mouni, Fatma Elhadj Lakouas

Abstract:

This research aims to present a surface water quality assessment of hydrochemical parameters in the Kebir Rhumel Basin, Algeria. The water quality index (WQI), Mann–Kendall (MK) test, and hierarchical cluster analysis (HCA) were used in oder to understand the spatio-temporal distribution of the surface water quality in the study area. Eleven hydrochemical parameters were measured monthly at eight stations from January 2016 to December 2020. The dominant cation in the surface water was found to be calcium, followed by sodium, and the dominant anion was sulfate, followed by chloride. In terms of WQI, a significant percentage of surface water samples at stations Ain Smara (AS), Beni Haroune (BH), Grarem (GR), and Sidi Khlifa (SK) exhibited poor water quality, with approximately 89.5%, 90.6%, 78.2%, and 62.7%, respectively, falling into this category. Mann–Kendall trend analysis revealed a significantly increasing trend in WQI values at stations Oued Boumerzoug (ON) and SK, indicating that the temporal variation of WQI in these stations is significant. Hierarchical clustering analysis classified the data into three clusters. The first cluster contained approximately 22% of the total number of months, the second cluster included about 30%, and the third cluster had the highest representation, approximately 48% of the total number of months. Within these clusters, certain stations exhibited higher WQI values. In the first cluster, stations GR and ON had the highest WQI values. In the second cluster, stations Oued Boumerzoug (OB) and SK showed the highest WQI values, while in the last cluster, stations AS, BH, El Milia (EM), and Hammam Grouz (HG) had the highest mean WQI values. Also, approximately 38%, 41%, and 38% of the total water samples in the first, second, and third clusters, respectively, were classified as having poor water quality. The findings of this study can serve as a scientific basis for decision-makers to formulate strategies for surface water quality restoration and management in the region.

Keywords: surface water, water quality index (WQI), Mann Kendall (MK) test, hierarchical cluster analysis (HCA), spatial-temporal distribution, Kebir Rhumel Basin

Procedia PDF Downloads 10

24250 Potentials and Challenges of Adventure Tourism Development: A Case Study of Kashmir Valley, India

Authors: Abdul Hamid Mir

Abstract:

Tourism which is considered as the economic bonanza of Jammu and Kashmir plays an important role in the socio-economic development of Jammu and Kashmir. It is considered as a multi-segmented industry which provides different type of jobs like hotel managers, receptionists, guides, tour operators, travel agents, photographers, etc. Kashmir Valley which is one of the three meso regions (Jammu, Kashmir and Ladakh) of Jammu and Kashmir State is famous all over the world due to its natural beauty. It attracts tourists from the every corner of the globe that is why it has earned name as ‘Paradise on Earth’ and ‘Switzerland of Asia’. It is full of natural treasure and gratifies the several types of tourists. The tourists are experiences fun, thrilling events and safe experience in Kashmir valley, but on the other hand, the adventure tourists are experiencing the physical risks, dangers and losses (injuries, death etc) too. Kashmir valley has greater potential to become one of the best adventure tourism destinations in the world. It offers immense opportunities to the adventurists to explore the wonderful exotic Himalayan ranges and landscapes, in addition, to facing the challenges of fast flowing rivers. Adventure tourism is at the initial stage of development in Kashmir valley and virgin areas needs to be explored and develop which in result will increase not only tourist arrivals but also enhance the business opportunities and economy of local people in Kashmir. Thus the exploitation of virginity of adventure tourism potentials in Kashmir valley is need of the hour. Therefore the present study highlights the potentials of adventure tourism in Kashmir valley and also focuses on the problems in the development of adventure tourism. Furthermore, the study extends to give various recommendations and suggestions in order to develop adventure tourism and broaden the base of tourist arrivals on one hand and sustained the growth on the other hand.

Keywords: adventure, Kashmir, tourism, tourists, potential, rafting, skiing

Procedia PDF Downloads 239

24249 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic

Authors: Fei Gao, Rodolfo C. Raga Jr.

Abstract:

This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.

Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle

Procedia PDF Downloads 66

24248 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0

Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini

Abstract:

Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.

Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling

Procedia PDF Downloads 88

24247 A Geospatial Consumer Marketing Campaign Optimization Strategy: Case of Fuzzy Approach in Nigeria Mobile Market

Authors: Adeolu O. Dairo

Abstract:

Getting the consumer marketing strategy right is a crucial and complex task for firms with a large customer base such as mobile operators in a competitive mobile market. While empirical studies have made efforts to identify key constructs, no geospatial model has been developed to comprehensively assess the viability and interdependency of ground realities regarding the customer, competition, channel and the network quality of mobile operators. With this research, a geo-analytic framework is proposed for strategy formulation and allocation for mobile operators. Firstly, a fuzzy analytic network using a self-organizing feature map clustering technique based on inputs from managers and literature, which depicts the interrelationships amongst ground realities is developed. The model is tested with a mobile operator in the Nigeria mobile market. As a result, a customer-centric geospatial and visualization solution is developed. This provides a consolidated and integrated insight that serves as a transparent, logical and practical guide for strategic, tactical and operational decision making.

Keywords: geospatial, geo-analytics, self-organizing map, customer-centric

Procedia PDF Downloads 174

24246 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption

Authors: Waziri Victor Onomza, John K. Alhassan, Idris Ismaila, Noel Dogonyaro Moses

Abstract:

This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute theoretical presentations in high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.

Keywords: big data analytics, security, privacy, bootstrapping, homomorphic, homomorphic encryption scheme

Procedia PDF Downloads 372

24245 Numerical Analysis of Laminar Reflux Condensation from Gas-Vapour Mixtures in Vertical Parallel Plate Channels

Authors: Foad Hassaninejadafarahani, Scott Ormiston

Abstract:

Reflux condensation occurs in a vertical channels and tubes when there is an upward core flow of vapor (or gas-vapor mixture) and a downward flow of the liquid film. The understanding of this condensation configuration is crucial in the design of reflux condensers, distillation columns, and in loss-of-coolant safety analyses in nuclear power plant steam generators. The unique feature of this flow is the upward flow of the vapor-gas mixture (or pure vapor) that retards the liquid flow via shear at the liquid-mixture interface. The present model solves the full, elliptic governing equations in both the film and the gas-vapor core flow. The computational mesh is non-orthogonal and adapts dynamically the phase interface, thus produces sharp and accurate interface. Shear forces and heat and mass transfer at the interface are accounted for fundamentally. This modeling is a big step ahead of current capabilities by removing the limitations of previous reflux condensation models which inherently cannot account for the detailed local balances of shear, mass, and heat transfer at the interface. Discretisation has been done based on a finite volume method and a co-located variable storage scheme. An in-house computer code was developed to implement the numerical solution scheme. Detailed results are presented for laminar reflux condensation from steam-air mixtures flowing in vertical parallel plate channels. The results include velocity and pressure profiles, as well as axial variations of film thickness, Nusselt number and interface gas mass fraction.

Keywords: Reflux, Condensation, CFD-Two Phase, Nusselt number

Procedia PDF Downloads 357

24244 Protecting Privacy and Data Security in Online Business

Authors: Bilquis Ferdousi

Abstract:

With the exponential growth of the online business, the threat to consumers’ privacy and data security has become a serious challenge. This literature review-based study focuses on a better understanding of those threats and what legislative measures have been taken to address those challenges. Research shows that people are increasingly involved in online business using different digital devices and platforms, although this practice varies based on age groups. The threat to consumers’ privacy and data security is a serious hindrance in developing trust among consumers in online businesses. There are some legislative measures taken at the federal and state level to protect consumers’ privacy and data security. The study was based on an extensive review of current literature on protecting consumers’ privacy and data security and legislative measures that have been taken.

Keywords: privacy, data security, legislation, online business

Procedia PDF Downloads 97

24243 An Analysis of Privacy and Security for Internet of Things Applications

Authors: Dhananjay Singh, M. Abdullah-Al-Wadud

Abstract:

The Internet of Things is a concept of a large scale ecosystem of wireless actuators. The actuators are defined as things in the IoT, those which contribute or produces some data to the ecosystem. However, ubiquitous data collection, data security, privacy preserving, large volume data processing, and intelligent analytics are some of the key challenges into the IoT technologies. In order to solve the security requirements, challenges and threats in the IoT, we have discussed a message authentication mechanism for IoT applications. Finally, we have discussed data encryption mechanism for messages authentication before propagating into IoT networks.

Keywords: Internet of Things (IoT), message authentication, privacy, security

Procedia PDF Downloads 377

24242 Leveraging Natural Language Processing for Legal Artificial Intelligence: A Longformer Approach for Taiwanese Legal Cases

Authors: Hsin Lee, Hsuan Lee

Abstract:

Legal artificial intelligence (LegalAI) has been increasing applications within legal systems, propelled by advancements in natural language processing (NLP). Compared with general documents, legal case documents are typically long text sequences with intrinsic logical structures. Most existing language models have difficulty understanding the long-distance dependencies between different structures. Another unique challenge is that while the Judiciary of Taiwan has released legal judgments from various levels of courts over the years, there remains a significant obstacle in the lack of labeled datasets. This deficiency makes it difficult to train models with strong generalization capabilities, as well as accurately evaluate model performance. To date, models in Taiwan have yet to be specifically trained on judgment data. Given these challenges, this research proposes a Longformer-based pre-trained language model explicitly devised for retrieving similar judgments in Taiwanese legal documents. This model is trained on a self-constructed dataset, which this research has independently labeled to measure judgment similarities, thereby addressing a void left by the lack of an existing labeled dataset for Taiwanese judgments. This research adopts strategies such as early stopping and gradient clipping to prevent overfitting and manage gradient explosion, respectively, thereby enhancing the model's performance. The model in this research is evaluated using both the dataset and the Average Entropy of Offense-charged Clustering (AEOC) metric, which utilizes the notion of similar case scenarios within the same type of legal cases. Our experimental results illustrate our model's significant advancements in handling similarity comparisons within extensive legal judgments. By enabling more efficient retrieval and analysis of legal case documents, our model holds the potential to facilitate legal research, aid legal decision-making, and contribute to the further development of LegalAI in Taiwan.

Keywords: legal artificial intelligence, computation and language, language model, Taiwanese legal cases

Procedia PDF Downloads 65

24241 Cognitive Science Based Scheduling in Grid Environment

Authors: N. D. Iswarya, M. A. Maluk Mohamed, N. Vijaya

Abstract:

Grid is infrastructure that allows the deployment of distributed data in large size from multiple locations to reach a common goal. Scheduling data intensive applications becomes challenging as the size of data sets are very huge in size. Only two solutions exist in order to tackle this challenging issue. First, computation which requires huge data sets to be processed can be transferred to the data site. Second, the required data sets can be transferred to the computation site. In the former scenario, the computation cannot be transferred since the servers are storage/data servers with little or no computational capability. Hence, the second scenario can be considered for further exploration. During scheduling, transferring huge data sets from one site to another site requires more network bandwidth. In order to mitigate this issue, this work focuses on incorporating cognitive science in scheduling. Cognitive Science is the study of human brain and its related activities. Current researches are mainly focused on to incorporate cognitive science in various computational modeling techniques. In this work, the problem solving approach of human brain is studied and incorporated during the data intensive scheduling in grid environments. Here, a cognitive engine is designed and deployed in various grid sites. The intelligent agents present in CE will help in analyzing the request and creating the knowledge base. Depending upon the link capacity, decision will be taken whether to transfer data sets or to partition the data sets. Prediction of next request is made by the agents to serve the requesting site with data sets in advance. This will reduce the data availability time and data transfer time. Replica catalog and Meta data catalog created by the agents assist in decision making process.

Keywords: data grid, grid workflow scheduling, cognitive artificial intelligence

Procedia PDF Downloads 390

24240 Heritage and Tourism in the Era of Big Data: Analysis of Chinese Cultural Tourism in Catalonia

Authors: Xinge Liao, Francesc Xavier Roige Ventura, Dolores Sanchez Aguilera

Abstract:

With the development of the Internet, the study of tourism behavior has rapidly expanded from the traditional physical market to the online market. Data on the Internet is characterized by dynamic changes, and new data appear all the time. In recent years the generation of a large volume of data was characterized, such as forums, blogs, and other sources, which have expanded over time and space, together they constitute large-scale Internet data, known as Big Data. This data of technological origin that derives from the use of devices and the activity of multiple users is becoming a source of great importance for the study of geography and the behavior of tourists. The study will focus on cultural heritage tourist practices in the context of Big Data. The research will focus on exploring the characteristics and behavior of Chinese tourists in relation to the cultural heritage of Catalonia. Geographical information, target image, perceptions in user-generated content will be studied through data analysis from Weibo -the largest social networks of blogs in China. Through the analysis of the behavior of heritage tourists in the Big Data environment, this study will understand the practices (activities, motivations, perceptions) of cultural tourists and then understand the needs and preferences of tourists in order to better guide the sustainable development of tourism in heritage sites.

Keywords: Barcelona, Big Data, Catalonia, cultural heritage, Chinese tourism market, tourists’ behavior

Procedia PDF Downloads 133

24239 Towards A Framework for Using Open Data for Accountability: A Case Study of A Program to Reduce Corruption

Authors: Darusalam, Jorish Hulstijn, Marijn Janssen

Abstract:

Media has revealed a variety of corruption cases in the regional and local governments all over the world. Many governments pursued many anti-corruption reforms and have created a system of checks and balances. Three types of corruption are faced by citizens; administrative corruption, collusion and extortion. Accountability is one of the benchmarks for building transparent government. The public sector is required to report the results of the programs that have been implemented so that the citizen can judge whether the institution has been working such as economical, efficient and effective. Open Data is offering solutions for the implementation of good governance in organizations who want to be more transparent. In addition, Open Data can create transparency and accountability to the community. The objective of this paper is to build a framework of open data for accountability to combating corruption. This paper will investigate the relationship between open data, and accountability as part of anti-corruption initiatives. This research will investigate the impact of open data implementation on public organization.

Keywords: open data, accountability, anti-corruption, framework

Procedia PDF Downloads 324

24238 Probabilistic Gathering of Agents with Simple Sensors: Distributed Algorithm for Aggregation of Robots Equipped with Binary On-Board Detectors

Authors: Ariel Barel, Rotem Manor, Alfred M. Bruckstein

Abstract:

We present a probabilistic gathering algorithm for agents that can only detect the presence of other agents in front of or behind them. The agents act in the plane and are identical and indistinguishable, oblivious, and lack any means of direct communication. They do not have a common frame of reference in the plane and choose their orientation (direction of possible motion) at random. The analysis of the gathering process assumes that the agents act synchronously in selecting random orientations that remain fixed during each unit time-interval. Two algorithms are discussed. The first one assumes discrete jumps based on the sensing results given the randomly selected motion direction, and in this case, extensive experimental results exhibit probabilistic clustering into a circular region with radius equal to the step-size in time proportional to the number of agents. The second algorithm assumes agents with continuous sensing and motion, and in this case, we can prove gathering into a very small circular region in finite expected time.

Keywords: control, decentralized, gathering, multi-agent, simple sensors

Procedia PDF Downloads 159

24237 Syndromic Surveillance Framework Using Tweets Data Analytics

Authors: David Ming Liu, Benjamin Hirsch, Bashir Aden

Abstract:

Syndromic surveillance is to detect or predict disease outbreaks through the analysis of medical sources of data. Using social media data like tweets to do syndromic surveillance becomes more and more popular with the aid of open platform to collect data and the advantage of microblogging text and mobile geographic location features. In this paper, a Syndromic Surveillance Framework is presented with machine learning kernel using tweets data analytics. Influenza and the three cities Abu Dhabi, Al Ain and Dubai of United Arabic Emirates are used as the test disease and trial areas. Hospital cases data provided by the Health Authority of Abu Dhabi (HAAD) are used for the correlation purpose. In our model, Latent Dirichlet allocation (LDA) engine is adapted to do supervised learning classification and N-Fold cross validation confusion matrix are given as the simulation results with overall system recall 85.595% performance achieved.

Keywords: Syndromic surveillance, Tweets, Machine Learning, data mining, Latent Dirichlet allocation (LDA), Influenza

Procedia PDF Downloads 106

24236 Analysis of Urban Population Using Twitter Distribution Data: Case Study of Makassar City, Indonesia

Authors: Yuyun Wabula, B. J. Dewancker

Abstract:

In the past decade, the social networking app has been growing very rapidly. Geolocation data is one of the important features of social media that can attach the user's location coordinate in the real world. This paper proposes the use of geolocation data from the Twitter social media application to gain knowledge about urban dynamics, especially on human mobility behavior. This paper aims to explore the relation between geolocation Twitter with the existence of people in the urban area. Firstly, the study will analyze the spread of people in the particular area, within the city using Twitter social media data. Secondly, we then match and categorize the existing place based on the same individuals visiting. Then, we combine the Twitter data from the tracking result and the questionnaire data to catch the Twitter user profile. To do that, we used the distribution frequency analysis to learn the visitors’ percentage. To validate the hypothesis, we compare it with the local population statistic data and land use mapping released by the city planning department of Makassar local government. The results show that there is the correlation between Twitter geolocation and questionnaire data. Thus, integration the Twitter data and survey data can reveal the profile of the social media users.

Keywords: geolocation, Twitter, distribution analysis, human mobility

Procedia PDF Downloads 308

24235 Analysis and Rule Extraction of Coronary Artery Disease Data Using Data Mining

Authors: Rezaei Hachesu Peyman, Oliyaee Azadeh, Salahzadeh Zahra, Alizadeh Somayyeh, Safaei Naser

Abstract:

Coronary Artery Disease (CAD) is one major cause of disability in adults and one main cause of death in developed. In this study, data mining techniques including Decision Trees, Artificial neural networks (ANNs), and Support Vector Machine (SVM) analyze CAD data. Data of 4948 patients who had suffered from heart diseases were included in the analysis. CAD is the target variable, and 24 inputs or predictor variables are used for the classification. The performance of these techniques is compared in terms of sensitivity, specificity, and accuracy. The most significant factor influencing CAD is chest pain. Elderly males (age > 53) have a high probability to be diagnosed with CAD. SVM algorithm is the most useful way for evaluation and prediction of CAD patients as compared to non-CAD ones. Application of data mining techniques in analyzing coronary artery diseases is a good method for investigating the existing relationships between variables.

Keywords: classification, coronary artery disease, data-mining, knowledge discovery, extract

Procedia PDF Downloads 652

24234 Sensor Data Analysis for a Large Mining Major

Authors: Sudipto Shanker Dasgupta

Abstract:

One of the largest mining companies wanted to look at health analytics for their driverless trucks. These trucks were the key to their supply chain logistics. The automated trucks had multi-level sub-assemblies which would send out sensor information. The use case that was worked on was to capture the sensor signal from the truck subcomponents and analyze the health of the trucks from repair and replacement purview. Open source software was used to stream the data into a clustered Hadoop setup in Amazon Web Services cloud and Apache Spark SQL was used to analyze the data. All of this was achieved through a 10 node amazon 32 core, 64 GB RAM setup real-time analytics was achieved on ‘300 million records’. To check the scalability of the system, the cluster was increased to 100 node setup. This talk will highlight how Open Source software was used to achieve the above use case and the insights on the high data throughput on a cloud set up.

Keywords: streaming analytics, data science, big data, Hadoop, high throughput, sensor data

Procedia PDF Downloads 400

24233 Data-Centric Anomaly Detection with Diffusion Models

Authors: Sheldon Liu, Gordon Wang, Lei Liu, Xuefeng Liu

Abstract:

Anomaly detection, also referred to as one-class classification, plays a crucial role in identifying product images that deviate from the expected distribution. This study introduces Data-centric Anomaly Detection with Diffusion Models (DCADDM), presenting a systematic strategy for data collection and further diversifying the data with image generation via diffusion models. The algorithm addresses data collection challenges in real-world scenarios and points toward data augmentation with the integration of generative AI capabilities. The paper explores the generation of normal images using diffusion models. The experiments demonstrate that with 30% of the original normal image size, modeling in an unsupervised setting with state-of-the-art approaches can achieve equivalent performances. With the addition of generated images via diffusion models (10% equivalence of the original dataset size), the proposed algorithm achieves better or equivalent anomaly localization performance.

Keywords: diffusion models, anomaly detection, data-centric, generative AI

Procedia PDF Downloads 77

24232 Regulation on the Protection of Personal Data Versus Quality Data Assurance in the Healthcare System Case Report

Authors: Elizabeta Krstić Vukelja

Abstract:

Digitization of personal data is a consequence of the development of information and communication technologies that create a new work environment with many advantages and challenges, but also potential threats to privacy and personal data protection. Regulation (EU) 2016/679 of the European Parliament and of the Council is becoming a law and obligation that should address the issues of personal data protection and information security. The existence of the Regulation leads to the conclusion that national legislation in the field of virtual environment, protection of the rights of EU citizens and processing of their personal data is insufficiently effective. In the health system, special emphasis is placed on the processing of special categories of personal data, such as health data. The healthcare industry is recognized as a particularly sensitive area in which a large amount of medical data is processed, the digitization of which enables quick access and quick identification of the health insured. The protection of the individual requires quality IT solutions that guarantee the technical protection of personal categories. However, the real problems are the technical and human nature and the spatial limitations of the application of the Regulation. Some conclusions will be drawn by analyzing the implementation of the basic principles of the Regulation on the example of the Croatian health care system and comparing it with similar activities in other EU member states.

Keywords: regulation, healthcare system, personal dana protection, quality data assurance

Procedia PDF Downloads 32