Repository navigation

pdf-processing

Website
Wikipedia

Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.

openai TypeScript gpt-3 gpt-4 langchain Mongoose Next openai-api chat 聊天机器人 document-embedding pdf-processing pinecone React Tailwind CSS vectorization

TypeScript

863

147

2 年前

allenai / papermage

library supporting NLP and CV research on scientific papers

机器视觉机器学习 multimodal 自然语言处理 pdf-processing scientific-papers Python

Python

785

1 年前

ahmedkhemiri95 / PDFs-TextExtract

Multiple and Large PDF Documents Text Extraction.

pdf Parser 数据科学 Python pdf-processing extract-text pdf-document pypdf2 pdfs

Python

131

8 个月前

Tele-AI / doc-ops-mcp

MCP server for seamless document format conversion and processing

document-conversion document-processing docx-to-pdf file-converter markdown-converter pdf-conversion watermark pdf-processing

TypeScript

129

6 天前

postralai / masquerade

The Privacy Firewall for LLMs

claude mcp pdf-processing 隐私 mcp-server model-context-protocol anonymization

Python

2 个月前

aws-samples / document-processing-pipeline-for-regulated-industries

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

机器学习 Amazon Web Services cdk aws-lambda amazon-web-services amazon-textract amazon-dynamodb amazon-s3 amazon-sqs aws-cdk pdf-processing 图像处理 data-analytics data-lineage data-governance

Python

4 年前

PSPDFKit / nutrient-dws-client-python

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

ocr-python pdf-converter pdf-document-processor pdf-generation pdf-processing Python

Python

1 个月前

PSPDFKit-labs / nutrient-dws-client-typescript

This library provides a type-safe and ergonomic interface for document processing operations including conversion, merging, compression, watermarking, and text extraction using Nutrient DWS Processor API.

pdf-converter pdf-document-processor pdf-generation pdf-processing TypeScript

TypeScript

1 个月前

autollama / autollama

Anthropic's Contextual Retrieval implementation with visual chunk comparison. Preview context enrichment before/after embedding.

人工智能自动化聊天机器人 Docker document-processing embeddings knowledge-base 大语言模型 Node.js openai pdf-processing rag React semantic-search vector-database

HTML

10 天前

Govind-S-B / pdf-to-text-chroma-search

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

chromadb pdf-processing similarity-search text-extraction

Python

2 年前

tetratensor / ML-powered_resume_analyser

Local, privacy-friendly resume analysis: convert, classify, and get advice using TF‑IDF, Logistic Regression, and sentence-transformer embeddings.

数据科学机器学习自然语言处理 pdf-processing Python resume-analysis sentence-transformers scikit-learn text-classification

Python

11 天前

ranguy9304 / LangGraphRAG

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

聊天机器人 information-retrieval langgraph 自然语言处理 openai-api pdf-processing Python rag vector-database web-scraping

Python

1 年前

ManasMadan / pdf-actions

A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...

pdf pdf-merge pdf-merger React react-component pdf-split pdf-processing pdf-lib pdf-rotate JavaScript npm

JavaScript

2 年前

Remy2404 / Polymind

A powerful, multi-modal Telegram bot leveraging cutting-edge AI technologies including Gemini, DeepSeek, OpenRouter, and 50+ AI models for comprehensive conversational assistance, media processing, and collaborative features with MCP (Model Context Protocol) integration.

gemini Telegram deepseek-r1 图像处理 voice voice-recognition ai-assistant multi-model openrouter pdf-processing

Python

7 天前

DioCrafts / ai-book-summarizer

📚 AI-Powered Book EPUB Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

人工智能自动化 document-analysis knowledge-extraction 机器学习 Markdown 自然语言处理 openai pdf pdf-processing Python study-materials text-analysis

Python

6 天前

ManasMadan / PDFActions

Built with pdf-actions NPM package.

React pdf react-components react-component pdf-merge pdf-merger pdf-split pdf-rotate pdf-lib pdf-processing

JavaScript

1 年前

Inc44 / MaTools

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

application audio-processing file-management GUI 图像处理 OCR pdf-processing productivity Python Qt Rust speech-recognition video-processing youtube-downloader

Python

7 个月前

enesmanan / paper-bold

AI-powered RAG-based tool for summarizing, extracting insights, and answering questions about research papers with high accuracy

gemini-api langchain pdf-processing rag academic-paper

HTML

6 个月前

allanninal / document-summarizer

The Document Summarizer leverages Hugging Face’s facebook/bart-large-cnn model to transform lengthy documents into concise summaries. Built with ReactJS (Vite) for the frontend and Flask for the backend, it supports PDF and text files, offering real-time summarization for researchers, students, and professionals.

ai-tools Flask huggingface 自然语言处理 pdf-processing React Vite

JavaScript

10 个月前

noorjotk / local-rag-engine

Local RAG app with zero-config Docker setup. FastAPI + Streamlit + Qdrant + Ollama. Just run `docker-compose up --build`! 🚀

大语言模型 rag Docker FastAPI local-ai local-llm ollama pdf-processing Python qdrant qdrant-vector-database semantic-search Streamlit vector-database

Python

2 个月前