Max 3 urls / pages / images per day.
In this free trial you can only extract one page at a time.
Some Documents may require custom solution. Contact Us if you find the output promising.
One Tool for LLMs, RAG pipelines, data transformation, financial reports, invoices, and more.
Similar to Firecrawl🔥.
Works on bank statements, invoices, scientific papers, and even scanned or handwritten layouts. Table extractions are not only accurate but much affordable compared to others.
No need to juggle separate APIs for Crawling Webpages, Parsing HTML & PDFs and for Extracting Tables as Excels Sheets. ParseExtract.com handles all💪, at a lower c🤑💲t.
One Subscription to Parse/Extract from both Webpages and Documents. Use the same API to extract Tables and Data. Saves Costs.
No need for multiple monthly subscriptions.
"Taxes may apply. For output token based pricing a minimum charge of $0.000625 (which is 1600 pages per $) is charged if token cost goes below $0.000625. Any payment has a validity of 30 days. Minimum $9.99 payment required."
This converts any webpage to LLM ready text so that you can pass the parsed clean text to an LLM and do any operations.
Each URL will be charged $0.005 per url. For crawling job, if you crawl 10 websites then it will cost 10*$0.005.
The LLM do not need perfect markdown format so we format the output in such a way that the token count is reduced and important data like image urls, tables are well understood by the LLM. The low token count saves downstream costs.
For document/image parsing (Option B) and structured data extraction from document/images we count only the output tokens after parsing or extractions and charge you only on the output tokens.
On average, 1 token generally corresponds to ~4 characters of text for common English, i.e. 1 token = ¾ of a word or ~75 words is equivalent to ~100 tokens.
You pay less for documents having fewer text. A normal page is about 500-750 tokens which will cost $0.00050 - $0.00075. A dense document can vary from 1200-1500 tokens which will cost $0.0012 - $0.0015. Most of the author's books have around 500 tokens per page as they are not very dense. A pdf/docx page with multiple images will cost less as the amount of text is less. You can see some examples and their output tokens here.
Option A is a faster option than Option B. Option B can be sometimes better and cheaper for documents with fewer texts. For Both you have the option to replace images with image id in the same position as the image. Refer some examples here.
Yes, we can provide a custom API endpoint or an Web Interface. If the work requires some major time investment then we may charge a resonable fee.
No. Your data is never stored. Files are deleted immediately after processing.
Take Note: The parsing logic most of the times sends your files to an external LLM API.
You can extract all the tables present in any single image or Document page.
It will be charged as $0.01 per document-page/image.
At present multiple page documents and table extraction from webpage are not available. Contact us if you need the feature.
Mention want you want to extract with any required schema and extract that data from an single image or Document page.
For documents/images, it will be charged based on the output tokens at the rate of $1/million output tokens. You pay less for fewer extracted data.
For urls, you will be charged $0.01 per url.
At present multiple page documents and extracting data from crawling a whole website are not available. Contact us if you need the feature.
You can reach us at contact@parseextract.com