How to Merge Different File Formats Together as a PDF with Python
Why Use Merge PDFs with Python?
The pdfRest Merge PDFs API Tool is a powerful resource for developers looking to combine multiple PDF files into a single document programmatically. With this tool, you can streamline workflows that require merging PDFs, making it a valuable asset for automation and efficiency. This tutorial will guide you through the process of sending an API call to merge PDFs using Python, demonstrating how to leverage the capabilities of pdfRest.
Consider a scenario where a user needs to compile various types of documents, such as images, PowerPoint presentations, and PDFs, into a single cohesive PDF file for a business report. By using the Merge PDFs API Tool, this task can be automated, saving time and reducing the potential for human error. This is just one example of how the Merge PDFs feature can be utilized in real-world applications, enhancing productivity and document management.
Merge PDFs with Python Code Example
from requests_toolbelt import MultipartEncoder import requests import json # In this sample, we will show how to merge different file types together as # discussed in https://pdfrest.com/solutions/merge-multiple-types-of-files-together/. # First, we will upload an image file to the /pdf route and capture the output ID. # Next, we will upload a PowerPoint file to the /pdf route and capture its output # ID. Finally, we will pass both IDs to the /merged-pdf route to combine both inputs # into a single PDF. # # Note that there is nothing special about an image and a PowerPoint file, and # this sample could be easily used to convert and combine any two file types # that the /pdf route takes as inputs. api_key = 'xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here pdf_endpoint_url = 'https://api.pdfrest.com/pdf' mp_encoder_image_pdf = MultipartEncoder( fields={ 'file': ('file_name.png', open('/path/to/file.png', 'rb'), 'image/png'), } ) image_headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_image_pdf.content_type, 'Api-Key': api_key } print("Sending POST request to pdf endpoint...") response = requests.post(pdf_endpoint_url, data=mp_encoder_image_pdf, headers=image_headers) print("Response status code: " + str(response.status_code)) if response.ok: response_json = response.json() image_id = response_json["outputId"] print("Got the first output ID: " + image_id) mp_encoder_ppt_pdf = MultipartEncoder( fields={ 'file': ('file_name.ppt', open('/path/to/file.ppt', 'rb'), 'application/vnd.ms-powerpoint'), } ) ppt_headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_ppt_pdf.content_type, 'Api-Key': api_key } print("Sending POST request to pdf endpoint...") response = requests.post(pdf_endpoint_url, data=mp_encoder_ppt_pdf, headers=ppt_headers) print("Response status code: " + str(response.status_code)) if response.ok: response_json = response.json() ppt_id = response_json["outputId"] print("Got the second output ID: " + image_id) merged_pdf_endpoint_url = 'https://api.pdfrest.com/merged-pdf' merge_request_data = [('id', image_id), ('pages', '1-last'), ('type', 'id'), ('id', ppt_id), ('pages', '1-last'), ('type', 'id'), ('output', 'multiple_file_types')] mp_encoder_merge = MultipartEncoder( fields=merge_request_data ) merge_headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_merge.content_type, 'Api-Key': api_key } print("Sending POST request to merged-pdf endpoint...") response = requests.post(merged_pdf_endpoint_url, data=mp_encoder_merge, headers=merge_headers) print("Merge response status code: " + str(response.status_code)) if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text) else: print(response.text) else: print(response.text) # If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.
Source: GitHub Repository
Breaking Down the Code
The code begins by importing necessary modules and setting up the API key:
from requests_toolbelt import MultipartEncoder import requests import json api_key = 'xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
Here, `requests_toolbelt` is used to handle multipart form data, and `requests` is for making HTTP requests. The `api_key` is a placeholder for your actual API key.
Next, the code prepares to upload an image file to the `/pdf` endpoint:
pdf_endpoint_url = 'https://api.pdfrest.com/pdf' mp_encoder_image_pdf = MultipartEncoder( fields={ 'file': ('file_name.png', open('/path/to/file.png', 'rb'), 'image/png'), } ) image_headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_image_pdf.content_type, 'Api-Key': api_key }
The `MultipartEncoder` is used to encode the image file. The `image_headers` dictionary specifies the headers for the request, including the API key and content type.
The same process is repeated for a PowerPoint file:
mp_encoder_ppt_pdf = MultipartEncoder( fields={ 'file': ('file_name.ppt', open('/path/to/file.ppt', 'rb'), 'application/vnd.ms-powerpoint'), } ) ppt_headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_ppt_pdf.content_type, 'Api-Key': api_key }
After obtaining the output IDs for both files, they are sent to the `/merged-pdf` endpoint to be combined:
merge_request_data = [('id', image_id), ('pages', '1-last'), ('type', 'id'), ('id', ppt_id), ('pages', '1-last'), ('type', 'id'), ('output', 'multiple_file_types')] mp_encoder_merge = MultipartEncoder( fields=merge_request_data ) merge_headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_merge.content_type, 'Api-Key': api_key }
The `merge_request_data` specifies the files to be merged, the pages to include, and the output format. The `merge_headers` are similar to the previous headers but adapted for the merge request.
Beyond the Tutorial
In this tutorial, you learned how to use Python to merge different file types into a single PDF using the pdfRest API. This process can be adapted for various file types supported by the API, providing flexibility and efficiency in document management.
To explore more capabilities of pdfRest, try out the API Lab at pdfRest API Lab and refer to the API Reference Guide for detailed documentation on all available API tools.