How to Convert PDF to PDF/A with Python
Why Use Convert to PDF/A with Python?
The pdfRest Convert to PDF/A API Tool is a powerful resource that allows users to convert PDF files into the PDF/A format, which is an ISO-standardized version of the PDF specialized for digital preservation of electronic documents. This tutorial will guide you through the process of sending an API call to Convert to PDF/A using Python.
This can be particularly useful for archiving documents in a way that preserves their visual appearance over time, ensuring that they can be reliably accessed and rendered in the future.
Python Code Sample for PDF/A
from requests_toolbelt import MultipartEncoder
import requests
import json
pdfa_endpoint_url = 'https://api.pdfrest.com/pdfa'
# The /pdfa endpoint can take a single PDF file or id as input.
mp_encoder_pdfa = MultipartEncoder(
fields={
'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
'output_type': 'PDF/A-1b',
'rasterize_if_errors_encountered': 'on',
'output' : 'example_pdfa_out',
}
)
# Let's set the headers that the pdfa endpoint expects.
# Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below.
headers = {
'Accept': 'application/json',
'Content-Type': mp_encoder_pdfa.content_type,
'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}
print("Sending POST request to pdfa endpoint...")
response = requests.post(pdfa_endpoint_url, data=mp_encoder_pdfa, headers=headers)
print("Response status code: " + str(response.status_code))
if response.ok:
response_json = response.json()
print(json.dumps(response_json, indent = 2))
else:
print(response.text)
# If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.
Source: pdf-rest-api-samples on GitHub
Breaking Down the Python
The provided code block demonstrates how to make a multipart/form-data POST request to the pdfRest API to convert a PDF to PDF/A format.
from requests_toolbelt import MultipartEncoder import requests import json
This imports the necessary modules. MultipartEncoder from requests_toolbelt is used to encode the multipart form data.
pdfa_endpoint_url = 'https://api.pdfrest.com/pdfa'
This sets the API endpoint URL for the PDF/A conversion.
mp_encoder_pdfa = MultipartEncoder(
fields={
'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
'output_type': 'PDF/A-1b',
'rasterize_if_errors_encountered': 'on',
'output' : 'example_pdfa_out',
}
)
Here we define the payload for the POST request. The fields include:
'file': The PDF file to convert. Replace'/path/to/file'with the actual file path.'output_type': The type of PDF/A to convert to (e.g., PDF/A-1b).'rasterize_if_errors_encountered': If set to 'on', the service will rasterize the PDF if it encounters errors during conversion.'output': The desired name for the output file.
headers = {
'Accept': 'application/json',
'Content-Type': mp_encoder_pdfa.content_type,
'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
}
These are the headers for the request. Replace 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with your actual API key.
response = requests.post(pdfa_endpoint_url, data=mp_encoder_pdfa, headers=headers)
This line sends the POST request with the encoded data and headers.
if response.ok:
response_json = response.json()
print(json.dumps(response_json, indent = 2))
else:
print(response.text)
If the request is successful, the response is printed in a formatted JSON structure. If not, the error text is printed.
Beyond this Tutorial
We have now successfully made an API call to the pdfRest Convert to PDF/A endpoint using Python. This allows us to convert PDF documents to the archival PDF/A format programmatically. You can demo all of the pdfRest API Tools in the API Lab and refer to the API Reference documentation for more details.
Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at our GitHub Repository.