How to Apply TDM Reservations to PDF with Python, Tutorial

Share this page

Why Use TDM Reserve PDF with Python?

The pdfRest TDM Reserve PDF API Tool provides Python developers with a straightforward method to enforce intellectual property rights against automated data mining. This tutorial will walk you through how to construct and execute an API request to the TDM Reserve endpoint using Python. By building this functionality into your document processing scripts, you can programmatically inject W3C-compliant metadata into your PDFs, effectively blocking unauthorized AI scrapers and web crawlers from harvesting your content.

In a practical setting, a data engineering team at an academic publisher might use Python to manage a massive library of scientific research and journals. Before distributing these PDFs online or to third-party databases, they need to ensure the documents cannot be freely ingested to train commercial LLMs. By writing a Python script that calls the TDM Reserve PDF API, the team can rapidly process the entire repository, embedding a machine-readable "opt-out" policy into every file to legally protect the institution's proprietary research at scale.

TDM Reserve PDF with Python Code Example

from requests_toolbelt import MultipartEncoder
import requests
import json

# This sample applies metadata on a PDF declaring Text and Data Mining (TDM) rights.

# By default, we use the US-based API service. This is the primary endpoint for global use.
api_url = "https://api.pdfrest.com"

# For GDPR compliance and enhanced performance for European users, you can switch to the EU-based service by uncommenting the URL below.
# For more information visit https://pdfrest.com/pricing#how-do-eu-gdpr-api-calls-work
#api_url = "https://eu-api.pdfrest.com"

tdm_reserved_pdf_endpoint_url = api_url+'/tdm-reserved-pdf'

# The /tdm-reserved-pdf endpoint can take a single PDF file or id as input.
mp_encoder_tdm_reserved_pdf = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'policy': 'https://example.com/tdm-policy',
        'output' : 'example_tdm_reserved_pdf_out'
    }
)

# Let's set the headers that the tdm-reserved-pdf endpoint expects.
# Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below.
headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_tdm_reserved_pdf.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}

print("Sending POST request to tdm-reserved-pdf endpoint...")
response = requests.post(tdm_reserved_pdf_endpoint_url, data=mp_encoder_tdm_reserved_pdf, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

# If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.

Source: GitHub Repository

Breaking Down the Code

The code begins by importing necessary libraries: requests_toolbelt for handling multipart form data, requests for making HTTP requests, and json for handling JSON data.

api_url = "https://api.pdfrest.com"

This line sets the base URL for the API. The default is the US-based service, but you can switch to the EU-based service for GDPR compliance by uncommenting the alternative URL.

mp_encoder_tdm_reserved_pdf = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'policy': 'https://example.com/tdm-policy',
        'output' : 'example_tdm_reserved_pdf_out'
    }
)

The MultipartEncoder is used to prepare the data for the POST request. The fields dictionary includes:

file: The PDF file to which TDM rights will be applied. It is opened in binary mode.
policy: A URL pointing to the TDM policy document.
output: The desired name for the output file.

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_tdm_reserved_pdf.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}

The headers dictionary specifies the request headers. The Content-Type is automatically set by the MultipartEncoder. The Api-Key is a placeholder for your actual API key.

response = requests.post(tdm_reserved_pdf_endpoint_url, data=mp_encoder_tdm_reserved_pdf, headers=headers)

This line sends a POST request to the TDM Reserve PDF endpoint with the prepared data and headers.

Beyond the Tutorial

In this tutorial, you learned how to apply TDM rights to a PDF using the pdfRest API with Python. This can be particularly useful for automating the preparation of documents for text and data mining. To explore more, you can demo all of the pdfRest API Tools in the API Lab. For further details, refer to the API Reference Guide.

Note: This example demonstrates a multipart API call. Code samples using JSON payloads can be found at GitHub Repository.