How to Export PDF Form Data with Python, Tutorial

Share this page

Why Export PDF Form Data with Python?

The pdfRest Export Form Data API Tool is designed to extract data from fillable forms within a PDF document. This functionality is particularly useful for automating the collection of form data for analysis or integration into other systems. For instance, an organization might use this API to extract survey responses or application form data without manual data entry.

In this tutorial, we will demonstrate how to send an API call to the Export Form Data endpoint using Python. We will walk through the code to understand how it works and how to set up the necessary parameters for the API request.

Python Code Sample for Form Export

from requests_toolbelt import MultipartEncoder
import requests
import json

exported_form_data_endpoint_url = 'https://api.pdfrest.com/exported-form-data'

mp_encoder_exportedFormData = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output' : 'example_exportedFormData_out',
        'data_format': 'xml',
    }
)

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_exportedFormData.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
}

print("Sending POST request to exported-form-data endpoint...")
response = requests.post(exported_form_data_endpoint_url, data=mp_encoder_exportedFormData, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

The source of the provided code is available at GitHub.

Breaking Down the Python

The code snippet above demonstrates how to call the Export Form Data API endpoint:

from requests_toolbelt import MultipartEncoder
import requests
import json

This imports the necessary libraries: MultipartEncoder for creating multipart/form-data payloads, requests for making HTTP requests, and json for handling JSON data.

exported_form_data_endpoint_url = 'https://api.pdfrest.com/exported-form-data'

This sets the URL of the Export Form Data API endpoint.

mp_encoder_exportedFormData = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output' : 'example_exportedFormData_out',
        'data_format': 'xml',
    }
)

This creates a MultipartEncoder object with the fields required by the API:

file: The PDF file to be processed. It should be opened in binary read mode.
output: The desired name for the output file.
data_format: The format in which to export the form data, which can be 'xml', 'json', or 'csv'.

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_exportedFormData.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
}

These are the headers for the HTTP request, including the API key for authentication.

response = requests.post(exported_form_data_endpoint_url, data=mp_encoder_exportedFormData, headers=headers)

This sends a POST request to the API endpoint with the multipart form data and headers.

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

If the request is successful, it prints the JSON response. Otherwise, it prints the error text.

Next Steps with pdfRest

We've gone through the process of setting up and making an API call to the pdfRest Export Form Data endpoint using Python. By following this tutorial, you should be able to extract form data from PDFs programmatically.

Feel free to demo all of the pdfRest API Tools in the API Lab at https://pdfrest.com/apilab/ and refer to the API Reference documentation at https://pdfrest.com/documentation/.

Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub.