Deepak Mishra


3 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili


Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: Business, Data Mining, Information Retrieval, Feature Extraction, Business Data Processing, layout, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 1
2 Risks for Cyanobacteria Harmful Algal Blooms in Georgia Piedmont Waterbodies Due to Land Management and Climate Interactions

Authors: Sam Weber, Deepak Mishra, Susan Wilde, Elizabeth Kramer


The frequency and severity of cyanobacteria harmful blooms (CyanoHABs) have been increasing over time, with point and non-point source eutrophication and shifting climate paradigms being blamed as the primary culprits. Excessive nutrients, warm temperatures, quiescent water, and heavy and less regular rainfall create more conducive environments for CyanoHABs. CyanoHABs have the potential to produce a spectrum of toxins that cause gastrointestinal stress, organ failure, and even death in humans and animals. To promote enhanced, proactive CyanoHAB management, risk modeling using geospatial tools can act as predictive mechanisms to supplement current CyanoHAB monitoring, management and mitigation efforts. The risk maps would empower water managers to focus their efforts on high risk water bodies in an attempt to prevent CyanoHABs before they occur, and/or more diligently observe those waterbodies. For this research, exploratory spatial data analysis techniques were used to identify the strongest predicators for CyanoHAB blooms based on remote sensing-derived cyanobacteria cell density values for 771 waterbodies in the Georgia Piedmont and landscape characteristics of their watersheds. In-situ datasets for cyanobacteria cell density, nutrients, temperature, and rainfall patterns are not widely available, so free gridded geospatial datasets were used as proxy variables for assessing CyanoHAB risk. For example, the percent of a watershed that is agriculture was used as a proxy for nutrient loading, and the summer precipitation within a watershed was used as a proxy for water quiescence. Cyanobacteria cell density values were calculated using atmospherically corrected images from the European Space Agency’s Sentinel-2A satellite and multispectral instrument sensor at a 10-meter ground resolution. Seventeen explanatory variables were calculated for each watershed utilizing the multi-petabyte geospatial catalogs available within the Google Earth Engine cloud computing interface. The seventeen variables were then used in a multiple linear regression model, and the strongest predictors of cyanobacteria cell density were selected for the final regression model. The seventeen explanatory variables included land cover composition, winter and summer temperature and precipitation data, topographic derivatives, vegetation index anomalies, and soil characteristics. Watershed maximum summer temperature, percent agriculture, percent forest, percent impervious, and waterbody area emerged as the strongest predictors of cyanobacteria cell density with an adjusted R-squared value of 0.31 and a p-value ~ 0. The final regression equation was used to make a normalized cyanobacteria cell density index, and a Jenks Natural Break classification was used to assign waterbodies designations of low, medium, or high risk. Of the 771 waterbodies, 24.38% were low risk, 37.35% were medium risk, and 38.26% were high risk. This study showed that there are significant relationships between free geospatial datasets representing summer maximum temperatures, nutrient loading associated with land use and land cover, and the area of a waterbody with cyanobacteria cell density. This data analytics approach to CyanoHAB risk assessment corroborated the literature-established environmental triggers for CyanoHABs, and presents a novel approach for CyanoHAB risk mapping in waterbodies across the greater southeastern United States.

Keywords: Remote Sensing, Cyanobacteria, Risk Mapping, land use/land cover

Procedia PDF Downloads 85
1 Development and Characterization of Double Liposomes Based Dual Drug Delivery System for H. Pylori Targeting

Authors: Ashish Kumar Jain, Deepak Mishra


The objective of the present investigation was to prepare and evaluate a vesicular dual drug delivery system for effective management of mucosal ulcer. Inner encapsulating and Double liposomes were prepared by glass bead and reverse phase evaporation method respectively. The formulation consisted of inner liposomes bearing Ranitidine Bismuth Citrate (RBC) and outer liposomes encapsulating Amoxicillin trihydrate (AMOX). The optimized inner liposomes and double liposomes were extensively characterized for vesicle size, morphology, zeta potential, vesicles count, entrapment efficiency and in vitro drug release. In vitro, the double liposomes demonstrated a sustained release of AMOX and RBC viz 91.4±1.8% and 77.2±2.1% respectively at the end of 72 hr. Furthermore binding specificity and targeting propensity toward H. pylori (SKP-56) was confirmed by agglutination and in situ adherence assay. Reduction of the absolute alcohol induced ulcerogenic index from 3.01 ± 0.25 to 0.31 ± 0.09 and 100% H. pylori clearance rate was observed. These results suggested that double liposomes are potential vector for the development of dual drug delivery for effective treatment of H. pylori-associated peptic ulcer.

Keywords: double liposomes, H. pylori targeting, PE liposomes, glass-beads method, peptic ulcers

Procedia PDF Downloads 305