Repository navigation

#

extract-data

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python
31378
2 天前
pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python
6955
2 天前

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

TypeScript
6754
4 个月前

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Python
2031
1 天前
JavaScript
1294
2 个月前

Extracts data points from images of graphs

C++
1273
3 年前

Crawly, a high-level web crawling & scraping framework for Elixir.

Elixir
1020
7 个月前

Extract structured data from web sites. Web sites scraping.

Go
682
2 年前

A simple resume parser used for extracting information from resumes

Python
300
1 年前

Receipt scanner extracts information from your PDF or image receipts - built in NodeJS

JavaScript
300
6 年前

Extract data from .trace documents generated by Instruments

Objective-C
225
5 年前

Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.

Python
161
3 天前

An R package for acquisition and processing of NASA SMAP data

R
85
1 年前

Library and cli for extracting data from HTML via CSS selectors

Go
69
7 个月前

Extract colors from an image. Colors are grouped based on visual similarities using the CIE76 formula.

Python
68
5 年前

FBLYZE is a Facebook scraping system and analysis system.

Jupyter Notebook
64
4 年前

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

Python
57
1 年前

Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.

Java
54
1 年前

This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.

Python
53
3 年前