Repository navigation

pdf-parsing

Website
Wikipedia

py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

pypdf2 pdf Python pdf-parser pdf-parsing pdf-manipulation pdf-documents help-wanted

Python

9325

1488

5 天前

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

pdf pdf-parsing table-extraction

Python

8160

766

1 个月前

galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams

pdf-generation pdf-parsing Node.js pdf-manipulation

1168

168

20 天前

adithya-s-k / marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

FastAPI pdf-converter pdf-files pdf-parser pdf-parsing API REST API

Python

884

105

10 个月前

drmingler / docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.

API FastAPI markdown-parser pdf-conversion pdf-converter pdf-parser pdf-parsing pdf-to-markdown

Python

672

6 个月前

jstockwin / py-pdf-parser

A Python tool to help extracting information from structured PDFs.

pdf Parsing pdf-parsing

Python

410

8 天前

chunyenHuang / hummusRecipe

A powerful PDF tool for NodeJS based on HummusJS.

pdf pdf-files pdf-generation pdf-parsing pdf-manipulation Node.js

JavaScript

348

2 年前

thoqbk / traprange

(Java)A Method to Extract Tabular Content from PDF Files

Java pdf pdfbox Parser pdf-parsing pdf-manipulation pdf-files

HTML

335

132

2 年前

ck-unifr / pdf_parsing

PDF解析（文字，章节，表格，图片，参考），基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答，摘要，信息抽取

langchain 大语言模型 pdf pdf-parsing rwkv Python chatglm2-6b information-extraction chatpdf Streamlit

Python

206

2 年前

ScientaNL / pdf-extractor

Node.js module for rendering pdf pages to images, svgs, html files, text files and json metadata

pdf-parsing Node.js image-generation

JavaScript

100

2 年前

iamarunbrahma / pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

document-conversion document-processing information-retrieval pdf-parsing pdf-to-markdown Python rag retrieval-augmented-generation text-extraction pdf-converter

Python

9 个月前

rostrovsky / pdf-table

Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV

OpenCV opencv3 pdfbox tables table Java java-library pdf-parsing

Java

2 年前

hellpanderrr / linkedin-pdf-parsing

Parsing resumes in a PDF format from linkedIn

linkedin Python pdf-parsing resume-parser

Python

9 年前

tuffstuff9 / nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

content-extraction filepond Next pdf-parser pdf-parsing

TypeScript

2 年前

dipietrantonio / pdf4py

A PDF parser written in Python 3 with no external dependencies.

pdf Parser pdf-parsing Python information-extraction

Python

5 年前

abdullahshafiq-20 / ResumeTex

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaTeX syntax.

自动化 developer-tools document-processing Express LaTeX Node.js Open Source pdf-parsing React resume Tailwind CSS TeX

JavaScript

2 天前

DQ-Zhang / refchaser

Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward&forward by extracting references and creating search queries, ranks articles by relevance to improve screening efficiency, download full-text pdf of research articles in batch.

research-paper text-mining pdf-parsing

Python

5 年前

adrienjoly / npm-pdfreader-example

Example of use of pdfreader: parse a PDF résumé

pdf-parsing Example

JavaScript

3 年前

malice-plugins / pdf

Malice PDF Plugin

malice Malware pdf 插件 pdf-parsing Docker malware-analysis

Python

7 年前

aimaster-dev / chatbot-using-rag-and-langchain

Chat with your PDFs using AI! This Streamlit app uses RAG, LangChain, FAISS, and OpenAI to let you ask questions and get answers with page and file references.

聊天机器人 langchain 大语言模型 rag Streamlit 人工智能 chat-ui document-search embeddings faiss 自然语言处理 openai pdf pdf-parsing Python semantic-search vector-store

Python

3 个月前