PDF to Markdown API Tool - Convert PDFs to Structured Markdown for LLMs & Web

PDF to Markdown is a REST API tool that converts PDF documents into clean, structured Markdown format, facilitating content repurposing and easy text manipulation. It empowers developers to accurately extract human-readable content while preserving document hierarchy, making PDFs easily consumable for content repurposing, data analysis, and LLM training.

Key Benefits of PDF to Markdown API

Convert complex PDFs into lightweight, plain-text Markdown, ideal for web content, documentation systems, or blog posts.
Extract structured content from PDFs, accurately preserving headings, lists, tables, and other formatting elements in an easily parseable format.
Simplify PDF content for easier management and version control in text-based systems like Git.
Enable advanced data processing, LLM training, and analysis by providing clean, semantic text extracted from PDFs for AI, NLP, or search indexing.
Support accessibility initiatives by transforming inaccessible PDF content into universally readable Markdown.
Automate large-scale PDF to Markdown conversions, streamlining workflows for content migration or dynamic publishing.

Try Now with API Lab

Start right from your browser - upload files, choose parameters, generate code, and send API Calls directly from API Lab!

Request

POST

Headers

Api-Key

Don't have a key? Create an account to get one.

Response-Type

Choose between a full response after processing completes or an immediate response containing only the requestId to poll for the processing status later.

Full Response

Request ID

Required Parameters

file

File to be uploaded and processed

Alphanumeric ID (UUID) of existing file on server to be processed

Optional Parameters

output_type

Specify whether to save output as a file with .md extension or return output directly in the JSON response

pages

Range of pages to process. Comma delimited string of pages. e.g. - 1,3-5,9-last. Default: 1-last (all pages). Same way we handle page ranges with other endpoints.

page_break_comments

Keeps page breaks in the Markdown as HTML comments (e.g., ). This prevents them from being rendered in the output, but can be helpful context for LLMs. Has potential to break up the flow of the document, however. Default: off.

Code

curl -X POST "https://api.pdfrest.com/markdown" \
  -H "Accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -H "Api-Key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \

Response

The response for your API Call will display here.

Once you've sent your POST request and received a valid response, you can download your output file using the output URL.

Build Your Solution

You have document processing problems, we have Solutions. Explore the many ways pdfRest can align your documents with your business objectives.

Browse all solutions

Integrate pdfRest with Microsoft Power Automate

Ensure GDPR Compliance for PDF Processing with EU-Based Cloud API

Integrate PDF API Tools with Salesforce Apex Code

Why is pdfRest the best API to convert PDF to Markdown?

pdfRest offers the best solution for PDF to Markdown conversion, because it delivers accurate content extraction, preserves structural integrity, and enables flexible content repurposing for modern applications.

Accurate PDF Content Extraction for Structured Markdown Output

Experience precise and reliable content extraction with pdfRest's PDF to Markdown API. Our tool intelligently parses PDF content to deliver clean, readable Markdown that accurately captures text, headings, and other key elements, maintaining the fidelity of your original document's information.

Unlike generic text extractors, our advanced algorithms are built to handle a wide array of PDF complexities, converting them into a semantic Markdown structure. This ensures your output is truly structured data, ready for immediate use:

Diverse Layouts: Accurately processes varied page designs and content arrangements.
Tabular Content: Converts data from tables into a clean, parseable Markdown table format.
Semantic Elements: Identifies and retains the logical flow and hierarchy of content.

This results in high-quality input for content management systems, publishing platforms, or advanced applications like large language models and AI.

Preserve PDF Document Structure and Formatting in Markdown

The true power of PDF to Markdown lies in its ability to preserve the inherent structure and formatting of your documents, transforming them into equivalent Markdown syntax. This ensures the integrity of your content's hierarchy, making it easier to work with.

Our API meticulously translates PDF elements, so you retain critical structural components:

Headings: Automatically converted to appropriate Markdown headers (e.g., #, ##).
Lists: Transformed into clean, readable Markdown bullet or numbered lists.
Tables: Rendered into a parseable Markdown table structure.
Links & Emphasis: Preserved as functional Markdown links and formatted text (bold, italics).

Many conversion tools strip away this vital structural information, leading to significant manual reformatting. pdfRest overcomes this by accurately reflecting the document hierarchy and relationships, saving you time and ensuring consistent content presentation.

Unlock Content Repurposing and LLM Training from PDFs

pdfRest's PDF to Markdown API is a game-changer for unlocking the potential of your static PDF information, allowing you to quickly transform it into dynamic, editable Markdown. This opens up numerous possibilities for content utilization:

Modern Web Formats: Easily migrate legacy PDF documents for web publishing and responsive designs.
Content Generation: Quickly create articles for blogs, knowledge bases, or marketing materials.
Technical Documentation: Convert manuals and reports into structured, version-controlled documentation.

Furthermore, the clean, structured Markdown output is an ideal format for training Large Language Models (LLMs) and other AI/NLP applications. By providing semantic and well-organized text extracted from PDFs, you can significantly improve the quality and efficiency of your machine learning data pipelines, enabling more robust and intelligent AI-driven solutions tailored to your specific needs.

Start from Code Examples

See more code examples in our GitHub repository

Need more help?

Start with a Tutorial for step-by-step guidance

0 items

Customize Your Solution

Learn about the parameters for this tool to create your custom solution.

File

The file parameter allows you to select a local file to be uploaded to pdfRest’s processing server.

See Documentation

The id parameter lets you chain API requests by using the unique resource ID assigned to an output file from a previous pdfRest tool. This means you can process documents through multiple steps—like converting a Word Doc to PDF, then that PDF to Markdown—all without downloading intermediate files between operations.

See Documentation

Output

The output parameter lets you set a filename (without extension) for your markdown file.

See Documentation

Output Type

output_type specifies how the converted Markdown content is returned in the API response.

file: Returns a resource ID and download URL for the .md file, allowing you to retrieve the Markdown content as a standalone file.
json: Returns the raw Markdown content directly embedded within the JSON response of the API call.

See Documentation

Pages

pages allows you to specify which pages from the input PDF document should be processed for Markdown conversion. It accepts a comma-delimited string of individual page numbers and/or page ranges. This functionality works identically to how page ranges are handled across other pdfRest endpoints.

Example: 1,3-5,9-last
Default: 1-last (all pages)

See Documentation

Page Break Comments

page_break_comments determines whether page breaks from the original PDF are inserted into the Markdown output as HTML comments.

on: Includes page breaks as HTML comments in the format . This can provide useful contextual information for Large Language Models (LLMs) and other machine processing, as these comments are not rendered in the final visible output. However, be aware that these comments might conceptually break up the flow of the document in the raw Markdown file.
off: (Default) Page breaks are not explicitly marked in the Markdown output.

See Documentation