Repository navigation

extract-text

Website
Wikipedia

dbashford / textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

extract-text extraction Node.js

HTML

1684

194

3 年前

pd3f / pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

pdf text-extraction pdf-to-text pipeline 机器学习 OCR language-model extract-text parsr Python

HTML

326

2 年前

ropensci-archive / fulltext

⚠ ARCHIVED ⚠ Search across and get full text for OA & closed journals

pdf metadata Open Access XML extract-text rstats R r-package

270

3 年前

opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

etl Python OCR enrichment solr elasticsearch extract extract-text extractor extract-information RDF (Resource Description Framework)documents pdf named-entity-recognition annotation ingestion-pipeline 自然语言处理

Python

270

3 年前

KevM / tikaondotnet

Use the Java Tika text extraction library on the .NET platform

tika extract-text

Rich Text Format

206

1 年前

ahmedkhemiri95 / PDFs-TextExtract

Multiple and Large PDF Documents Text Extraction.

pdf Parser 数据科学 Python pdf-processing extract-text pdf-document pypdf2 pdfs

Python

130

6 个月前

lu4p / cat

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

text-extraction cross-platform Go cat extract-text

102

2 年前

zetahernandez / pdf-to-text

Read pdf files on javascript

pdf extract-text JavaScript

JavaScript

5 年前

BitMiracle / Docotic.Pdf.Samples

C# and VB.NET samples for Docotic.Pdf library

pdf-library pdf-to-text pdf-signature pdf-generation extract-text net-core pdf-manipulation pdf-parser html-to-pdf

Visual Basic .NET

15 天前

ropensci / antiword

R wrapper for antiword utility

extract-text R rstats r-package

4 个月前

ropensci / rtika

R Interface to Apache Tika

R rstats r-package peer-reviewed tika extract-text pdf-files Parsing Java tesseract

2 年前

ApryseSDK / pdftron-document-search

Build search across multiple documents client-side in your file storage

algolia-instantsearch extract-text

JavaScript

2 年前

OpenJarbas / simple_NER

simple rule based named entity recognition

ner named-entity-recognition annotation-tool extract-information extract-text 自然语言处理 nlp-library keywords information-extraction

Python

4 年前

AllanCameron / PDFR

An R package to extract text from pdf.

pdf extract-text data-scientists

C++

2 年前

maxim2266 / OCR

A collection of tools for OCR (optical character recognition).

OCR ocr-recognition Bash Linux tesseract extract-text C

10 个月前

datalogics / pdf-rest-api-samples

pdfRest API Toolkit is a REST API service for processing PDF documents, made by developers, for developers. Rapidly integrate PDF workflows with your existing projects and applications, simply and seamlessly. Get started for free in seconds.

pdf pdf-converter pdf-document pdf-document-processor pdf-files REST API web-api convert-to-pdf extract-text OCR pdf-library pdfa

Java

13 天前

bhattbhavesh91 / google-vision-api-for-ocr-demo

Repo which contains a small demo to Extract Text from image OCR using Google Vision API in Python

google-vision Python extract-text Demo

Jupyter Notebook

4 年前

Zoltanar / Happy-Reader

VNDB explorer and VNR-like text hooker.

extract-text game-launcher WPF

3 个月前

rlayers / pawpaw

Text Processing & Segmentation Framework

自然语言处理 text-processing information-extraction extract-text knowledge-graph Python Parser query-engine query-language tree xml-parser Parsing

Python

5 个月前

TwistAtom / ZWSP-Tool

ZWSP-Tool is a powerful toolkit that allows to manipulate zero width spaces quickly and easily. ZWSP-Tool allows in particular to detect, clean, hide, extract and bruteforce a text containing zero width spaces.

Python 工具 toolkit Steganography steganography-algorithms hide-messages extract-text bruteforce bruteforcing encryption

Python

5 年前