How to Check PDF Conditions and Metadata with PHP

Learn how to use pdfRest Query PDF API Tool to check PDF files for conditional information and metadata with PHP.
Share this page

Why Use Query PDF with PHP?

The pdfRest Query PDF API Tool is a powerful resource for extracting information from PDF files. It allows users to query various properties of a PDF, such as metadata, page count, whether it contains annotations, and much more. This tutorial will guide you through the process of making an API call to the Query PDF endpoint using PHP.

One example for using Query PDF is to automate the process of cataloging a large number of PDFs based on their metadata or to verify compliance with PDF standards in a batch of documents.

Query PDF with PHP Code Example

require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Psr7\Utils;

$client = new Client();

$headers = [
  'Api-Key' => 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
];

$options = [
  'multipart' => [
    [
      'name' => 'file',
      'contents' => Utils::tryFopen('/path/to/file', 'r'),
      'filename' => '/path/to/file',
      'headers' => [
        'Content-Type' => ''
      ]
    ],
    [
      'name' => 'queries',
      'contents' => 'tagged,image_only,title,subject,author,producer,creator,creation_date,modified_date,keywords,doc_language,page_count,contains_annotations,contains_signature,pdf_version,file_size,filename,restrict_permissions_set,contains_xfa,contains_acroforms,contains_javascript,contains_transparency,contains_embedded_file,uses_embedded_fonts,uses_nonembedded_fonts,pdfa,requires_password_to_open,pdfua_claim,pdfe_claim,pdfx_claim'
    ]
  ]
];

$request = new Request('POST', 'https://api.pdfrest.com/pdf-info', $headers);

$res = $client->sendAsync($request, $options)->wait();

echo $res->getBody();

Source of the provided code: GitHub

Breaking Down the Code

The code above is a PHP script that uses the Guzzle HTTP client to send a multipart/form-data POST request to the pdfRest API. Here's how each part of the code works:

  • Autoload and Namespaces: The script begins by including the Composer autoload file and importing necessary classes from the Guzzle HTTP client library.
  • Client Initialization: A new Guzzle HTTP client instance is created.
  • Headers: The 'Api-Key' is set in the headers for authentication with the pdfRest API.
  • Options: The 'multipart' array contains the details of the file to be uploaded and the queries to be performed. The 'contents' key under 'file' uses Utils::tryFopen to open the file for reading.
  • Request: A new HTTP POST request is created with the API endpoint URL and the headers.
  • Response: The request is sent asynchronously, and the script waits for the response. The response body is then echoed, which contains the results of the queries.

The 'queries' field contains a comma-separated list of properties to query about the PDF. These properties are explained in the pdfRest Cloud API Reference Guide.

Beyond the Tutorial

In this tutorial, we've learned how to send a multipart/form-data POST request to the pdfRest API to query information about a PDF file using PHP. By executing the code, you can retrieve detailed information about a PDF document, which can be used for various purposes such as document management, compliance checks, or content analysis.

I encourage you to demo all of the pdfRest API Tools in the API Lab and refer to the API Reference documentation for more details and capabilities.

Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub.

Generate a self-service API Key now!

Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.