Search results for: BRASSINOSTEROID INSENSITIVE 1
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 62

Search results for: BRASSINOSTEROID INSENSITIVE 1

2 A Clustering-Based Approach for Weblog Data Cleaning

Authors: Amine Ganibardi, Cherif Arab Ali

Abstract:

This paper addresses the data cleaning issue as a part of web usage data preprocessing within the scope of Web Usage Mining. Weblog data recorded by web servers within log files reflect usage activity, i.e., End-users’ clicks and underlying user-agents’ hits. As Web Usage Mining is interested in End-users’ behavior, user-agents’ hits are referred to as noise to be cleaned-off before mining. Filtering hits from clicks is not trivial for two reasons, i.e., a server records requests interlaced in sequential order regardless of their source or type, website resources may be set up as requestable interchangeably by end-users and user-agents. The current methods are content-centric based on filtering heuristics of relevant/irrelevant items in terms of some cleaning attributes, i.e., website’s resources filetype extensions, website’s resources pointed by hyperlinks/URIs, http methods, user-agents, etc. These methods need exhaustive extra-weblog data and prior knowledge on the relevant and/or irrelevant items to be assumed as clicks or hits within the filtering heuristics. Such methods are not appropriate for dynamic/responsive Web for three reasons, i.e., resources may be set up to as clickable by end-users regardless of their type, website’s resources are indexed by frame names without filetype extensions, web contents are generated and cancelled differently from an end-user to another. In order to overcome these constraints, a clustering-based cleaning method centered on the logging structure is proposed. This method focuses on the statistical properties of the logging structure at the requested and referring resources attributes levels. It is insensitive to logging content and does not need extra-weblog data. The used statistical property takes on the structure of the generated logging feature by webpage requests in terms of clicks and hits. Since a webpage consists of its single URI and several components, these feature results in a single click to multiple hits ratio in terms of the requested and referring resources. Thus, the clustering-based method is meant to identify two clusters based on the application of the appropriate distance to the frequency matrix of the requested and referring resources levels. As the ratio clicks to hits is single to multiple, the clicks’ cluster is the smallest one in requests number. Hierarchical Agglomerative Clustering based on a pairwise distance (Gower) and average linkage has been applied to four logfiles of dynamic/responsive websites whose click to hits ratio range from 1/2 to 1/15. The optimal clustering set on the basis of average linkage and maximum inter-cluster inertia results always in two clusters. The evaluation of the smallest cluster referred to as clicks cluster under the terms of confusion matrix indicators results in 97% of true positive rate. The content-centric cleaning methods, i.e., conventional and advanced cleaning, resulted in a lower rate 91%. Thus, the proposed clustering-based cleaning outperforms the content-centric methods within dynamic and responsive web design without the need of any extra-weblog. Such an improvement in cleaning quality is likely to refine dependent analysis.

Keywords: clustering approach, data cleaning, data preprocessing, weblog data, web usage data

Procedia PDF Downloads 156
1 Electronic Raman Scattering Calibration for Quantitative Surface-Enhanced Raman Spectroscopy and Improved Biostatistical Analysis

Authors: Wonil Nam, Xiang Ren, Inyoung Kim, Masoud Agah, Wei Zhou

Abstract:

Despite its ultrasensitive detection capability, surface-enhanced Raman spectroscopy (SERS) faces challenges as a quantitative biochemical analysis tool due to the significant dependence of local field intensity in hotspots on nanoscale geometric variations of plasmonic nanostructures. Therefore, despite enormous progress in plasmonic nanoengineering of high-performance SERS devices, it is still challenging to quantitatively correlate the measured SERS signals with the actual molecule concentrations at hotspots. A significant effort has been devoted to developing SERS calibration methods by introducing internal standards. It has been achieved by placing Raman tags at plasmonic hotspots. Raman tags undergo similar SERS enhancement at the same hotspots, and ratiometric SERS signals for analytes of interest can be generated with reduced dependence on geometrical variations. However, using Raman tags still faces challenges for real-world applications, including spatial competition between the analyte and tags in hotspots, spectral interference, laser-induced degradation/desorption due to plasmon-enhanced photochemical/photothermal effects. We show that electronic Raman scattering (ERS) signals from metallic nanostructures at hotspots can serve as the internal calibration standard to enable quantitative SERS analysis and improve biostatistical analysis. We perform SERS with Au-SiO₂ multilayered metal-insulator-metal nano laminated plasmonic nanostructures. Since the ERS signal is proportional to the volume density of electron-hole occupation in hotspots, the ERS signals exponentially increase when the wavenumber is approaching the zero value. By a long-pass filter, generally used in backscattered SERS configurations, to chop the ERS background continuum, we can observe an ERS pseudo-peak, IERS. Both ERS and SERS processes experience the |E|⁴ local enhancements during the excitation and inelastic scattering transitions. We calibrated IMRS of 10 μM Rhodamine 6G in solution by IERS. The results show that ERS calibration generates a new analytical value, ISERS/IERS, insensitive to variations from different hotspots and thus can quantitatively reflect the molecular concentration information. Given the calibration capability of ERS signals, we performed label-free SERS analysis of living biological systems using four different breast normal and cancer cell lines cultured on nano-laminated SERS devices. 2D Raman mapping over 100 μm × 100 μm, containing several cells, was conducted. The SERS spectra were subsequently analyzed by multivariate analysis using partial least square discriminant analysis. Remarkably, after ERS calibration, MCF-10A and MCF-7 cells are further separated while the two triple-negative breast cancer cells (MDA-MB-231 and HCC-1806) are more overlapped, in good agreement with the well-known cancer categorization regarding the degree of malignancy. To assess the strength of ERS calibration, we further carried out a drug efficacy study using MDA-MB-231 and different concentrations of anti-cancer drug paclitaxel (PTX). After ERS calibration, we can more clearly segregate the control/low-dosage groups (0 and 1.5 nM), the middle-dosage group (5 nM), and the group treated with half-maximal inhibitory concentration (IC50, 15 nM). Therefore, we envision that ERS calibrated SERS can find crucial opportunities in label-free molecular profiling of complicated biological systems.

Keywords: cancer cell drug efficacy, plasmonics, surface-enhanced Raman spectroscopy (SERS), SERS calibration

Procedia PDF Downloads 116