Repository navigation

#

pdf-processing

TypeScript
848
2 年前
Python
761
5 个月前
Python
128
2 个月前

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

Python
23
1 年前

A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...

JavaScript
13
1 年前

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

Python
7
9 个月前

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

Python
6
1 个月前

AI-powered RAG-based tool for summarizing, extracting insights, and answering questions about research papers with high accuracy

HTML
6
1 个月前

📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

Python
4
4 个月前

A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing

Python
3
3 年前

The Document Summarizer leverages Hugging Face’s facebook/bart-large-cnn model to transform lengthy documents into concise summaries. Built with ReactJS (Vite) for the frontend and Flask for the backend, it supports PDF and text files, offering real-time summarization for researchers, students, and professionals.

JavaScript
3
4 个月前

This is some useful mini projects that I had worked for self-learning Python programming.

Python
3
1 年前

PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.

Python
3
3 个月前

Azure Document Intelligence Result Processor: A toolset for annotating PDFs based on Azure Document Intelligence analysis results, featuring a React web application and a standalone Python script for processing and visualizing extracted data with confidence indicators.

JavaScript
2
1 个月前

A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.

Python
2
3 个月前

This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on the PDF document. It utilizes Weaviate as a vector database for efficient retrieval of relevant information and Gemini to generate natural language responses.

Jupyter Notebook
2
3 个月前

A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.

Python
2
6 个月前