Running QPDF

This chapter describes how to run the qpdf program from the command line.

Basic Invocation

When running qpdf, the basic invocation is as follows:

qpdf [ options ] { infilename | --empty } outfilename

This converts PDF file infilename to PDF file outfilename. The output file is functionally identical to the input file but may have been structurally reorganized. Also, orphaned objects will be removed from the file. Many transformations are available as controlled by the options below. In place of infilename, the parameter --empty may be specified. This causes qpdf to use a dummy input file that contains zero pages. The only normal use case for using --empty would be if you were going to add pages from another source, as discussed in Page Selection Options.

If @filename appears as a word anywhere in the command-line, it will be read line by line, and each line will be treated as a command-line argument. Leading and trailing whitespace is intentionally not removed from lines, which makes it possible to handle arguments that start or end with spaces. The @- option allows arguments to be read from standard input. This allows qpdf to be invoked with an arbitrary number of arbitrarily long arguments. It is also very useful for avoiding having to pass passwords on the command line. Note that the @filename can’t appear in the middle of an argument, so constructs such as --arg=@option will not work. You would have to include the argument and its options together in the arguments file.

outfilename does not have to be seekable, even when generating linearized files. Specifying “-” as outfilename means to write to standard output. If you want to overwrite the input file with the output, use the option --replace-input and omit the output file name. You can’t specify the same file as both the input and the output. If you do this, qpdf will tell you about the --replace-input option.

Most options require an output file, but some testing or inspection commands do not. These are specifically noted.

Exit Status

The exit status of qpdf may be interpreted as follows:

  • 0: no errors or warnings were found. The file may still have problems qpdf can’t detect. If --warning-exit-0 was specified, exit status 0 is used even if there are warnings.

  • 2: errors were found. qpdf was not able to fully process the file.

  • 3: qpdf encountered problems that it was able to recover from. In some cases, the resulting file may still be damaged. Note that qpdf still exits with status 3 if it finds warnings even when --no-warn is specified. With --warning-exit-0, warnings without errors exit with status 0 instead of 3.

Note that qpdf never exists with status 1. If you get an exit status of 1, it was something else, like the shell not being able to find or execute qpdf.

Shell Completion

Starting in qpdf version 8.3.0, qpdf provides its own completion support for zsh and bash. You can enable bash completion with eval $(qpdf --completion-bash) and zsh completion with eval $(qpdf --completion-zsh). If qpdf is not in your path, you should invoke it above with an absolute path. If you invoke it with a relative path, it will warn you, and the completion won’t work if you’re in a different directory.

qpdf will use argv[0] to figure out where its executable is. This may produce unwanted results in some cases, especially if you are trying to use completion with copy of qpdf that is built from source. You can specify a full path to the qpdf you want to use for completion in the QPDF_EXECUTABLE environment variable.

Basic Options

The following options are the most common ones and perform commonly needed transformations.

--help

Display command-line invocation help.

--version

Display the current version of qpdf.

--copyright

Show detailed copyright information.

--show-crypto

Show a list of available crypto providers, each on a line by itself. The default provider is always listed first. See Crypto Providers for more information about crypto providers.

--completion-bash

Output a completion command you can eval to enable shell completion from bash.

--completion-zsh

Output a completion command you can eval to enable shell completion from zsh.

--password=password

Specifies a password for accessing encrypted files. To read the password from a file or standard input, you can use --password-file, added in qpdf 10.2. Note that you can also use @filename or @- as described above to put the password in a file or pass it via standard input, but you would do so by specifying the entire --password=password option in the file. Syntax such as --password=@filename won’t work since @filename is not recognized in the middle of an argument.

--password-file=filename

Reads the first line from the specified file and uses it as the password for accessing encrypted files. filename may be - to read the password from standard input. Note that, in this case, the password is echoed and there is no prompt, so use with caution.

--is-encrypted

Silently exit with status 0 if the file is encrypted or status 2 if the file is not encrypted. This is useful for shell scripts. Other options are ignored if this is given. This option is mutually exclusive with --requires-password. Both this option and --requires-password exit with status 2 for non-encrypted files.

--requires-password

Silently exit with status 0 if a password (other than as supplied) is required. Exit with status 2 if the file is not encrypted. Exit with status 3 if the file is encrypted but requires no password or the correct password has been supplied. This is useful for shell scripts. Note that any supplied password is used when opening the file. When used with a --password option, this option can be used to check the correctness of the password. In that case, an exit status of 3 means the file works with the supplied password. This option is mutually exclusive with --is-encrypted. Both this option and --is-encrypted exit with status 2 for non-encrypted files.

--verbose

Increase verbosity of output. For now, this just prints some indication of any file that it creates.

--progress

Indicate progress while writing files.

--no-warn

Suppress writing of warnings to stderr. If warnings were detected and suppressed, qpdf will still exit with exit code 3. See also --warning-exit-0.

--warning-exit-0

If warnings are found but no errors, exit with exit code 0 instead 3. When combined with --no-warn, the effect is for qpdf to completely ignore warnings.

--linearize

Causes generation of a linearized (web-optimized) output file.

--replace-input

If specified, the output file name should be omitted. This option tells qpdf to replace the input file with the output. It does this by writing to infilename.~qpdf-temp# and, when done, overwriting the input file with the temporary file. If there were any warnings, the original input is saved as infilename.~qpdf-orig.

--copy-encryption=file

Encrypt the file using the same encryption parameters, including user and owner password, as the specified file. Use --encryption-file-password to specify a password if one is needed to open this file. Note that copying the encryption parameters from a file also copies the first half of /ID from the file since this is part of the encryption parameters.

--encryption-file-password=password

If the file specified with --copy-encryption requires a password, specify the password using this option. Note that only one of the user or owner password is required. Both passwords will be preserved since QPDF does not distinguish between the two passwords. It is possible to preserve encryption parameters, including the owner password, from a file even if you don’t know the file’s owner password.

--allow-weak-crypto

Starting with version 10.4, qpdf issues warnings when requested to create files using RC4 encryption. This option suppresses those warnings. In future versions of qpdf, qpdf will refuse to create files with weak cryptography when this flag is not given. See Weak Cryptography for additional details.

--encrypt options --

Causes generation an encrypted output file. Please see Encryption Options for details on how to specify encryption parameters.

--decrypt

Removes any encryption on the file. A password must be supplied if the file is password protected.

--password-is-hex-key

Overrides the usual computation/retrieval of the PDF file’s encryption key from user/owner password with an explicit specification of the encryption key. When this option is specified, the argument to the --password option is interpreted as a hexadecimal-encoded key value. This only applies to the password used to open the main input file. It does not apply to other files opened by --pages or other options or to files being written.

Most users will never have a need for this option, and no standard viewers support this mode of operation, but it can be useful for forensic or investigatory purposes. For example, if a PDF file is encrypted with an unknown password, a brute-force attack using the key directly is sometimes more efficient than one using the password. Also, if a file is heavily damaged, it may be possible to derive the encryption key and recover parts of the file using it directly. To expose the encryption key used by an encrypted file that you can open normally, use the --show-encryption-key option.

--suppress-password-recovery

Ordinarily, qpdf attempts to automatically compensate for passwords specified in the wrong character encoding. This option suppresses that behavior. Under normal conditions, there are no reasons to use this option. See Unicode Passwords for a discussion

--password-mode=mode

This option can be used to fine-tune how qpdf interprets Unicode (non-ASCII) password strings passed on the command line. With the exception of the hex-bytes mode, these only apply to passwords provided when encrypting files. The hex-bytes mode also applies to passwords specified for reading files. For additional discussion of the supported password modes and when you might want to use them, see Unicode Passwords. The following modes are supported:

  • auto: Automatically determine whether the specified password is a properly encoded Unicode (UTF-8) string, and transcode it as required by the PDF spec based on the type encryption being applied. On Windows starting with version 8.4.0, and on almost all other modern platforms, incoming passwords will be properly encoded in UTF-8, so this is almost always what you want.

  • unicode: Tells qpdf that the incoming password is UTF-8, overriding whatever its automatic detection determines. The only difference between this mode and auto is that qpdf will fail with an error message if the password is not valid UTF-8 instead of falling back to bytes mode with a warning.

  • bytes: Interpret the password as a literal byte string. For non-Windows platforms, this is what versions of qpdf prior to 8.4.0 did. For Windows platforms, there is no way to specify strings of binary data on the command line directly, but you can use the @filename option to do it, in which case this option forces qpdf to respect the string of bytes as provided. This option will allow you to encrypt PDF files with passwords that will not be usable by other readers.

  • hex-bytes: Interpret the password as a hex-encoded string. This provides a way to pass binary data as a password on all platforms including Windows. As with bytes, this option may allow creation of files that can’t be opened by other readers. This mode affects qpdf’s interpretation of passwords specified for decrypting files as well as for encrypting them. It makes it possible to specify strings that are encoded in some manner other than the system’s default encoding.

--rotate=[+|-]angle[:page-range]

Apply rotation to specified pages. The page-range portion of the option value has the same format as page ranges in Page Selection Options. If the page range is omitted, the rotation is applied to all pages. The angle portion of the parameter may be either 0, 90, 180, or 270. If preceded by + or -, the angle is added to or subtracted from the specified pages’ original rotations. This is almost always what you want. Otherwise the pages’ rotations are set to the exact value, which may cause the appearances of the pages to be inconsistent, especially for scans. For example, the command qpdf in.pdf out.pdf --rotate=+90:2,4,6 --rotate=180:7-8 would rotate pages 2, 4, and 6 90 degrees clockwise from their original rotation and force the rotation of pages 7 through 8 to 180 degrees regardless of their original rotation, and the command qpdf in.pdf out.pdf --rotate=+180 would rotate all pages by 180 degrees.

--keep-files-open=[yn]

This option controls whether qpdf keeps individual files open while merging. Prior to version 8.1.0, qpdf always kept all files open, but this meant that the number of files that could be merged was limited by the operating system’s open file limit. Version 8.1.0 opened files as they were referenced and closed them after each read, but this caused a major performance impact. Version 8.2.0 optimized the performance but did so in a way that, for local file systems, there was a small but unavoidable performance hit, but for networked file systems, the performance impact could be very high. Starting with version 8.2.1, the default behavior is that files are kept open if no more than 200 files are specified, but this default behavior can be explicitly overridden with the --keep-files-open flag. If you are merging more than 200 files but less than the operating system’s max open files limit, you may want to use --keep-files-open=y, especially if working over a networked file system. If you are using a local file system where the overhead is low and you might sometimes merge more than the OS limit’s number of files from a script and are not worried about a few seconds additional processing time, you may want to specify --keep-files-open=n. The threshold for switching may be changed from the default 200 with the --keep-files-open-threshold option.

--keep-files-open-threshold=count

If specified, overrides the default value of 200 used as the threshold for qpdf deciding whether or not to keep files open. See --keep-files-open for details.

--pages options --

Select specific pages from one or more input files. See Page Selection Options for details on how to do page selection (splitting and merging).

--collate=n

When specified, collate rather than concatenate pages from files specified with --pages. With a numeric argument, collate in groups of n. The default is 1. See Page Selection Options for additional details.

--flatten-rotation

For each page that is rotated using the /Rotate key in the page’s dictionary, remove the /Rotate key and implement the identical rotation semantics by modifying the page’s contents. This option can be useful to prepare files for buggy PDF applications that don’t properly handle rotated pages.

--split-pages=[n]

Write each group of n pages to a separate output file. If n is not specified, create single pages. Output file names are generated as follows:

  • If the string %d appears in the output file name, it is replaced with a range of zero-padded page numbers starting from 1.

  • Otherwise, if the output file name ends in .pdf (case insensitive), a zero-padded page range, preceded by a dash, is inserted before the file extension.

  • Otherwise, the file name is appended with a zero-padded page range preceded by a dash.

Page ranges are a single number in the case of single-page groups or two numbers separated by a dash otherwise. For example, if infile.pdf has 12 pages

  • qpdf --split-pages infile.pdf %d-out would generate files 01-out through 12-out

  • qpdf --split-pages=2 infile.pdf outfile.pdf would generate files outfile-01-02.pdf through outfile-11-12.pdf

  • qpdf --split-pages infile.pdf something.else would generate files something.else-01 through something.else-12

Note that outlines, threads, and other global features of the original PDF file are not preserved. For each page of output, this option creates an empty PDF and copies a single page from the output into it. If you require the global data, you will have to run qpdf with the --pages option once for each file. Using --split-pages is much faster if you don’t require the global data.

--overlay options --

Overlay pages from another file onto the output pages. See Overlay and Underlay Options for details on overlay/underlay.

--underlay options --

Overlay pages from another file onto the output pages. See Overlay and Underlay Options for details on overlay/underlay.

Password-protected files may be opened by specifying a password. By default, qpdf will preserve any encryption data associated with a file. If --decrypt is specified, qpdf will attempt to remove any encryption information. If --encrypt is specified, qpdf will replace the document’s encryption parameters with whatever is specified.

Note that qpdf does not obey encryption restrictions already imposed on the file. Doing so would be meaningless since qpdf can be used to remove encryption from the file entirely. This functionality is not intended to be used for bypassing copyright restrictions or other restrictions placed on files by their producers.

Prior to 8.4.0, in the case of passwords that contain characters that fall outside of 7-bit US-ASCII, qpdf left the burden of supplying properly encoded encryption and decryption passwords to the user. Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For an in-depth discussion, please see Unicode Passwords. Previous versions of this manual described workarounds using the iconv command. Such workarounds are no longer required or recommended with qpdf 8.4.0. However, for backward compatibility, qpdf attempts to detect those workarounds and do the right thing in most cases.

Encryption Options

To change the encryption parameters of a file, use the –encrypt flag. The syntax is

--encrypt user-password owner-password key-length [ restrictions ] --

Note that “--” terminates parsing of encryption flags and must be present even if no restrictions are present.

Either or both of the user password and the owner password may be empty strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation of PDF files with a non-empty user password, an empty owner password, and a 256-bit key since such files can be opened with no password. If you want to create such files, specify the encryption option --allow-insecure, as described below.

The value for key-length may be 40, 128, or 256. The restriction flags are dependent upon key length. When no additional restrictions are given, the default is to be fully permissive.

If key-length is 40, the following restriction options are available:

--print=[yn]

Determines whether or not to allow printing.

--modify=[yn]

Determines whether or not to allow document modification.

--extract=[yn]

Determines whether or not to allow text/image extraction.

--annotate=[yn]

Determines whether or not to allow comments and form fill-in and signing.

If key-length is 128, the following restriction options are available:

--accessibility=[yn]

Determines whether or not to allow accessibility to visually impaired. The qpdf library disregards this field when AES is used or when 256-bit encryption is used. You should really never disable accessibility, but qpdf lets you do it in case you need to configure a file this way for testing purposes. The PDF spec says that conforming readers should disregard this permission and always allow accessibility.

--extract=[yn]

Determines whether or not to allow text/graphic extraction.

--assemble=[yn]

Determines whether document assembly (rotation and reordering of pages) is allowed.

--annotate=[yn]

Determines whether modifying annotations is allowed. This includes adding comments and filling in form fields. Also allows editing of form fields if --modify-other=y is given.

--form=[yn]

Determines whether filling form fields is allowed.

--modify-other=[yn]

Allow all document editing except those controlled separately by the --assemble, --annotate, and --form options.

--print=print-opt

Controls printing access. print-opt may be one of the following:

  • full: allow full printing

  • low: allow low-resolution printing only

  • none: disallow printing

--modify=modify-opt

Controls modify access. This way of controlling modify access has less granularity than new options added in qpdf 8.4. modify-opt may be one of the following:

  • all: allow full document modification

  • annotate: allow comment authoring, form operations, and document assembly

  • form: allow form field fill-in and signing and document assembly

  • assembly: allow document assembly only

  • none: allow no modifications

Using the --modify option does not allow you to create certain combinations of permissions such as allowing form filling but not allowing document assembly. Starting with qpdf 8.4, you can either just use the other options to control fields individually, or you can use something like --modify=form --assembly=n to fine tune.

--cleartext-metadata

If specified, any metadata stream in the document will be left unencrypted even if the rest of the document is encrypted. This also forces the PDF version to be at least 1.5.

--use-aes=[yn]

If --use-aes=y is specified, AES encryption will be used instead of RC4 encryption. This forces the PDF version to be at least 1.6.

--allow-insecure

From qpdf 10.2, qpdf defaults to not allowing creation of PDF files where the user password is non-empty, the owner password is empty, and a 256-bit key is in use. Files created in this way are insecure since they can be opened without a password. Users would ordinarily never want to create such files. If you are using qpdf to intentionally created strange files for testing (a definite valid use of qpdf!), this option allows you to create such insecure files.

--force-V4

Use of this option forces the /V and /R parameters in the document’s encryption dictionary to be set to the value 4. As qpdf will automatically do this when required, there is no reason to ever use this option. It exists primarily for use in testing qpdf itself. This option also forces the PDF version to be at least 1.5.

If key-length is 256, the minimum PDF version is 1.7 with extension level 8, and the AES-based encryption format used is the PDF 2.0 encryption method supported by Acrobat X. the same options are available as with 128 bits with the following exceptions:

--use-aes

This option is not available with 256-bit keys. AES is always used with 256-bit encryption keys.

--force-V4

This option is not available with 256 keys.

--force-R5

If specified, qpdf sets the minimum version to 1.7 at extension level 3 and writes the deprecated encryption format used by Acrobat version IX. This option should not be used in practice to generate PDF files that will be in general use, but it can be useful to generate files if you are trying to test proper support in another application for PDF files encrypted in this way.

The default for each permission option is to be fully permissive.

Page Selection Options

Starting with qpdf 3.0, it is possible to split and merge PDF files by selecting pages from one or more input files. Whatever file is given as the primary input file is used as the starting point, but its pages are replaced with pages as specified.

--pages input-file [ --password=password ] [ page-range ] [ ... ] --

Multiple input files may be specified. Each one is given as the name of the input file, an optional password (if required to open the file), and the range of pages. Note that “--” terminates parsing of page selection flags.

Starting with qpf 8.4, the special input file name “.” can be used as a shortcut for the primary input filename.

For each file that pages should be taken from, specify the file, a password needed to open the file (if any), and a page range. The password needs to be given only once per file. If any of the input files are the same as the primary input file or the file used to copy encryption parameters (if specified), you do not need to repeat the password here. The same file can be repeated multiple times. If a file that is repeated has a password, the password only has to be given the first time. All non-page data (info, outlines, page numbers, etc.) are taken from the primary input file. To discard these, use --empty as the primary input.

Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf sees a value in the place where it expects a page range and that value is not a valid range but is a valid file name, qpdf will implicitly use the range 1-z, meaning that it will include all pages in the file. This makes it possible to easily combine all pages in a set of files with a command like qpdf --empty out.pdf --pages *.pdf --.

The page range is a set of numbers separated by commas, ranges of numbers separated dashes, or combinations of those. The character “z” represents the last page. A number preceded by an “r” indicates to count from the end, so r3-r1 would be the last three pages of the document. Pages can appear in any order. Ranges can appear with a high number followed by a low number, which causes the pages to appear in reverse. Numbers may be repeated in a page range. A page range may be optionally appended with :even or :odd to indicate only the even or odd pages in the given range. Note that even and odd refer to the positions within the specified, range, not whether the original number is even or odd.

Example page ranges:

  • 1,3,5-9,15-12: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in that order.

  • z-1: all pages in the document in reverse

  • r3-r1: the last three pages of the document

  • r1-r3: the last three pages of the document in reverse order

  • 1-20:even: even pages from 2 to 20

  • 5,7-9,12:odd: pages 5, 8, and, 12, which are the pages in odd positions from among the original range, which represents pages 5, 7, 8, 9, and 12.

Starting in qpdf version 8.3, you can specify the --collate option. Note that this option is specified outside of --pages ... --. When --collate is specified, it changes the meaning of --pages so that the specified files, as modified by page ranges, are collated rather than concatenated. For example, if you add the files odd.pdf and even.pdf containing odd and even pages of a document respectively, you could run qpdf --collate odd.pdf --pages odd.pdf even.pdf -- all.pdf to collate the pages. This would pick page 1 from odd, page 1 from even, page 2 from odd, page 2 from even, etc. until all pages have been included. Any number of files and page ranges can be specified. If any file has fewer pages, that file is just skipped when its pages have all been included. For example, if you ran qpdf --collate --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf r1 -- out.pdf, you would get the following pages in this order:

  • a.pdf page 1

  • b.pdf page 6

  • c.pdf last page

  • a.pdf page 2

  • b.pdf page 5

  • a.pdf page 3

  • b.pdf page 4

  • a.pdf page 4

  • a.pdf page 5

Starting in qpdf version 10.2, you may specify a numeric argument to --collate. With --collate=n, pull groups of n pages from each file, again, stopping when there are no more pages. For example, if you ran qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf r1 -- out.pdf, you would get the following pages in this order:

  • a.pdf page 1

  • a.pdf page 2

  • b.pdf page 6

  • b.pdf page 5

  • c.pdf last page

  • a.pdf page 3

  • a.pdf page 4

  • b.pdf page 4

  • a.pdf page 5

Starting in qpdf version 8.3, when you split and merge files, any page labels (page numbers) are preserved in the final file. It is expected that more document features will be preserved by splitting and merging. In the mean time, semantics of splitting and merging vary across features. For example, the document’s outlines (bookmarks) point to actual page objects, so if you select some pages and not others, bookmarks that point to pages that are in the output file will work, and remaining bookmarks will not work. A future version of qpdf may do a better job at handling these issues. (Note that the qpdf library already contains all of the APIs required in order to implement this in your own application if you need it.) In the mean time, you can always use --empty as the primary input file to avoid copying all of that from the first file. For example, to take pages 1 through 5 from a infile.pdf while preserving all metadata associated with that file, you could use

qpdf infile.pdf --pages . 1-5 -- outfile.pdf

If you wanted pages 1 through 5 from infile.pdf but you wanted the rest of the metadata to be dropped, you could instead run

qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf

If you wanted to take pages 1 through 5 from file1.pdf and pages 11 through 15 from file2.pdf in reverse, taking document-level metadata from file2.pdf, you would run

qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf

If, for some reason, you wanted to take the first page of an encrypted file called encrypted.pdf with password pass and repeat it twice in an output file, and if you wanted to drop document-level metadata but preserve encryption, you would use

qpdf --empty --copy-encryption=encrypted.pdf \
     --encryption-file-password=pass \
     --pages encrypted.pdf --password=pass 1 \
           ./encrypted.pdf --password=pass 1 -- \
     outfile.pdf

Note that we had to specify the password all three times because giving a password as --encryption-file-password doesn’t count for page selection, and as far as qpdf is concerned, encrypted.pdf and ./encrypted.pdf are separated files. These are all corner cases that most users should hopefully never have to be bothered with.

Prior to version 8.4, it was not possible to specify the same page from the same file directly more than once, and the workaround of specifying the same file in more than one way was required. Version 8.4 removes this limitation, but there is still a valid use case. When you specify the same page from the same file more than once, qpdf will share objects between the pages. If you are going to do further manipulation on the file and need the two instances of the same original page to be deep copies, then you can specify the file in two different ways. For example qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf would create a file with two copies of the first page of the input, and the two copies would share any objects in common. This includes fonts, images, and anything else the page references.

Overlay and Underlay Options

Starting with qpdf 8.4, it is possible to overlay or underlay pages from other files onto the output generated by qpdf. Specify overlay or underlay as follows:

{ --overlay | --underlay } file [ options ] --

Overlay and underlay options are processed late, so they can be combined with other like merging and will apply to the final output. The --overlay and --underlay options work the same way, except underlay pages are drawn underneath the page to which they are applied, possibly obscured by the original page, and overlay files are drawn on top of the page to which they are applied, possibly obscuring the page. You can combine overlay and underlay.

The default behavior of overlay and underlay is that pages are taken from the overlay/underlay file in sequence and applied to corresponding pages in the output until there are no more output pages. If the overlay or underlay file runs out of pages, remaining output pages are left alone. This behavior can be modified by options, which are provided between the --overlay or --underlay flag and the -- option. The following options are supported:

  • --password=password: supply a password if the overlay/underlay file is encrypted.

  • --to=page-range: a range of pages in the same form at described in Page Selection Options indicates which pages in the output should have the overlay/underlay applied. If not specified, overlay/underlay are applied to all pages.

  • --from=[page-range]: a range of pages that specifies which pages in the overlay/underlay file will be used for overlay or underlay. If not specified, all pages will be used. This can be explicitly specified to be empty if --repeat is used.

  • --repeat=page-range: an optional range of pages that specifies which pages in the overlay/underlay file will be repeated after the “from” pages are used up. If you want to repeat a range of pages starting at the beginning, you can explicitly use --from=.

Here are some examples.

  • --overlay o.pdf --to=1-5 --from=1-3 --repeat=4 --: overlay the first three pages from file o.pdf onto the first three pages of the output, then overlay page 4 from o.pdf onto pages 4 and 5 of the output. Leave remaining output pages untouched.

  • --underlay footer.pdf --from= --repeat=1,2 --: Underlay page 1 of footer.pdf on all odd output pages, and underlay page 2 of footer.pdf on all even output pages.

Embedded Files/Attachments Options

Starting with qpdf 10.2, you can work with file attachments in PDF files from the command line. The following options are available:

--list-attachments

Show the “key” and stream number for embedded files. With --verbose, additional information, including preferred file name, description, dates, and more are also displayed. The key is usually but not always equal to the file name, and is needed by some of the other options.

--show-attachment=key

Write the contents of the specified attachment to standard output as binary data. The key should match one of the keys shown by --list-attachments. If specified multiple times, only the last attachment will be shown.

--add-attachment file options --

Add or replace an attachment with the contents of file. This may be specified more than once. The following additional options may appear before the -- that ends this option:

--key=key

The key to use to register the attachment in the embedded files table. Defaults to the last path element of file.

--filename=name

The file name to be used for the attachment. This is what is usually displayed to the user and is the name most graphical PDF viewers will use when saving a file. It defaults to the last path element of file.

--creationdate=date

The attachment’s creation date in PDF format; defaults to the current time. The date format is explained below.

--moddate=date

The attachment’s modification date in PDF format; defaults to the current time. The date format is explained below.

--mimetype=type/subtype

The mime type for the attachment, e.g. text/plain or application/pdf. Note that the mimetype appears in a field called /Subtype in the PDF but actually includes the full type and subtype of the mime type.

--description="text"

Descriptive text for the attachment, displayed by some PDF viewers.

--replace

Indicates that any existing attachment with the same key should be replaced by the new attachment. Otherwise, qpdf gives an error if an attachment with that key is already present.

--remove-attachment=key

Remove the specified attachment. This doesn’t only remove the attachment from the embedded files table but also clears out the file specification. That means that any potential internal links to the attachment will be broken. This option may be specified multiple times. Run with --verbose to see status of the removal.

--copy-attachments-from file options --

Copy attachments from another file. This may be specified more than once. The following additional options may appear before the -- that ends this option:

--password=password

If required, the password needed to open file

--prefix=prefix

Only required if the file from which attachments are being copied has attachments with keys that conflict with attachments already in the file. In this case, the specified prefix will be prepended to each key. This affects only the key in the embedded files table, not the file name. The PDF specification doesn’t preclude multiple attachments having the same file name.

When a date is required, the date should conform to the PDF date format specification, which is D:yyyymmddhhmmss<z>, where <z> is either Z for UTC or a timezone offset in the form -hh'mm' or +hh'mm'. Examples: D:20210207161528-05'00', D:20210207211528Z.

Advanced Parsing Options

These options control aspects of how qpdf reads PDF files. Mostly these are of use to people who are working with damaged files. There is little reason to use these options unless you are trying to solve specific problems. The following options are available:

--suppress-recovery

Prevents qpdf from attempting to recover damaged files.

--ignore-xref-streams

Tells qpdf to ignore any cross-reference streams.

Ordinarily, qpdf will attempt to recover from certain types of errors in PDF files. These include errors in the cross-reference table, certain types of object numbering errors, and certain types of stream length errors. Sometimes, qpdf may think it has recovered but may not have actually recovered, so care should be taken when using this option as some data loss is possible. The --suppress-recovery option will prevent qpdf from attempting recovery. In this case, it will fail on the first error that it encounters.

Ordinarily, qpdf reads cross-reference streams when they are present in a PDF file. If --ignore-xref-streams is specified, qpdf will ignore any cross-reference streams for hybrid PDF files. The purpose of hybrid files is to make some content available to viewers that are not aware of cross-reference streams. It is almost never desirable to ignore them. The only time when you might want to use this feature is if you are testing creation of hybrid PDF files and wish to see how a PDF consumer that doesn’t understand object and cross-reference streams would interpret such a file.

Advanced Transformation Options

These transformation options control fine points of how qpdf creates the output file. Mostly these are of use only to people who are very familiar with the PDF file format or who are PDF developers. The following options are available:

--compress-streams=[yn]

By default, or with --compress-streams=y, qpdf will compress any stream with no other filters applied to it with the /FlateDecode filter when it writes it. To suppress this behavior and preserve uncompressed streams as uncompressed, use --compress-streams=n.

--decode-level=option

Controls which streams qpdf tries to decode. The default is generalized. The following options are available:

  • none: do not attempt to decode any streams

  • generalized: decode streams filtered with supported generalized filters: /LZWDecode, /FlateDecode, /ASCII85Decode, and /ASCIIHexDecode. We define generalized filters as those to be used for general-purpose compression or encoding, as opposed to filters specifically designed for image data. Note that, by default, streams already compressed with /FlateDecode are not uncompressed and recompressed unless you also specify --recompress-flate.

  • specialized: in addition to generalized, decode streams with supported non-lossy specialized filters; currently this is just /RunLengthDecode

  • all: in addition to generalized and specialized, decode streams with supported lossy filters; currently this is just /DCTDecode (JPEG)

--stream-data=option

Controls transformation of stream data. This option predates the --compress-streams and --decode-level options. Those options can be used to achieve the same affect with more control. The value of option may be one of the following:

  • compress: recompress stream data when possible (default); equivalent to --compress-streams=y --decode-level=generalized. Does not recompress streams already compressed with /FlateDecode unless --recompress-flate is also specified.

  • preserve: leave all stream data as is; equivalent to --compress-streams=n --decode-level=none

  • uncompress: uncompress stream data compressed with generalized filters when possible; equivalent to --compress-streams=n --decode-level=generalized

--recompress-flate

By default, streams already compressed with /FlateDecode are left alone rather than being uncompressed and recompressed. This option causes qpdf to uncompress and recompress the streams. There is a significant performance cost to using this option, but you probably want to use it if you specify --compression-level.

--compression-level=level

When writing new streams that are compressed with /FlateDecode, use the specified compression level. The value of level should be a number from 1 to 9 and is passed directly to zlib, which implements deflate compression. Note that qpdf doesn’t uncompress and recompress streams by default. To have this option apply to already compressed streams, you should also specify --recompress-flate. If your goal is to shrink the size of PDF files, you should also use --object-streams=generate.

--normalize-content=[yn]

Enables or disables normalization of content streams. Content normalization is enabled by default in QDF mode. Please see QDF Mode for additional discussion of QDF mode.

--object-streams=mode

Controls handling of object streams. The value of mode may be one of the following:

  • preserve: preserve original object streams (default)

  • disable: don’t write any object streams

  • generate: use object streams wherever possible

--preserve-unreferenced

Tells qpdf to preserve objects that are not referenced when writing the file. Ordinarily any object that is not referenced in a traversal of the document from the trailer dictionary will be discarded. This may be useful in working with some damaged files or inspecting files with known unreferenced objects.

This flag is ignored for linearized files and has the effect of causing objects in the new file to be written in order by object ID from the original file. This does not mean that object numbers will be the same since qpdf may create stream lengths as direct or indirect differently from the original file, and the original file may have gaps in its numbering.

See also --preserve-unreferenced-resources, which does something completely different.

--remove-unreferenced-resources=option

The option may be auto, yes, or no. The default is auto.

Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt to remove images and fonts that are not used by a page even if they are referenced in the page’s resources dictionary. When shared resources are in use, this behavior can greatly reduce the file sizes of split pages, but the analysis is very slow. In versions from 8.1 through 9.1.1, qpdf did this analysis by default. Starting in qpdf 10.0.0, if auto is used, qpdf does a quick analysis of the file to determine whether the file is likely to have unreferenced objects on pages, a pattern that frequently occurs when resource dictionaries are shared across multiple pages and rarely occurs otherwise. If it discovers this pattern, then it will attempt to remove unreferenced resources. Usually this means you get the slower splitting speed only when it’s actually going to create smaller files. You can suppress removal of unreferenced resources altogether by specifying no or force it to do the full algorithm by specifying yes.

Other than cases in which you don’t care about file size and care a lot about runtime, there are few reasons to use this option, especially now that auto mode is supported. One reason to use this is if you suspect that qpdf is removing resources it shouldn’t be removing. If you encounter that case, please report it as bug at https://github.com/qpdf/qpdf/issues/.

--preserve-unreferenced-resources

This is a synonym for --remove-unreferenced-resources=no.

See also --preserve-unreferenced, which does something completely different.

--newline-before-endstream

Tells qpdf to insert a newline before the endstream keyword, not counted in the length, after any stream content even if the last character of the stream was a newline. This may result in two newlines in some cases. This is a requirement of PDF/A. While qpdf doesn’t specifically know how to generate PDF/A-compliant PDFs, this at least prevents it from removing compliance on already compliant files.

--linearize-pass1=file

Write the first pass of linearization to the named file. The resulting file is not a valid PDF file. This option is useful only for debugging QPDFWriter’s linearization code. When qpdf linearizes files, it writes the file in two passes, using the first pass to calculate sizes and offsets that are required for hint tables and the linearization dictionary. Ordinarily, the first pass is discarded. This option enables it to be captured.

--coalesce-contents

When a page’s contents are split across multiple streams, this option causes qpdf to combine them into a single stream. Use of this option is never necessary for ordinary usage, but it can help when working with some files in some cases. For example, this can also be combined with QDF mode or content normalization to make it easier to look at all of a page’s contents at once.

--flatten-annotations=option

This option collapses annotations into the pages’ contents with special handling for form fields. Ordinarily, an annotation is rendered separately and on top of the page. Combining annotations into the page’s contents effectively freezes the placement of the annotations, making them look right after various page transformations. The library functionality backing this option was added for the benefit of programs that want to create n-up page layouts and other similar things that don’t work well with annotations. The option parameter may be any of the following:

  • all: include all annotations that are not marked invisible or hidden

  • print: only include annotations that indicate that they should appear when the page is printed

  • screen: omit annotations that indicate they should not appear on the screen

Note that form fields are special because the annotations that are used to render filled-in form fields may become out of date from the fields’ values if the form is filled in by a program that doesn’t know how to update the appearances. If qpdf detects this case, its default behavior is not to flatten those annotations because doing so would cause the value of the form field to be lost. This gives you a chance to go back and resave the form with a program that knows how to generate appearances. QPDF itself can generate appearances with some limitations. See the --generate-appearances option below.

--generate-appearances

If a file contains interactive form fields and indicates that the appearances are out of date with the values of the form, this flag will regenerate appearances, subject to a few limitations. Note that there is not usually a reason to do this, but it can be necessary before using the --flatten-annotations option. Most of these are not a problem with well-behaved PDF files. The limitations are as follows:

  • Radio button and checkbox appearances use the pre-set values in the PDF file. QPDF just makes sure that the correct appearance is displayed based on the value of the field. This is fine for PDF files that create their forms properly. Some PDF writers save appearances for fields when they change, which could cause some controls to have inconsistent appearances.

  • For text fields and list boxes, any characters that fall outside of US-ASCII or, if detected, “Windows ANSI” or “Mac Roman” encoding, will be replaced by the ? character.

  • Quadding is ignored. Quadding is used to specify whether the contents of a field should be left, center, or right aligned with the field.

  • Rich text, multi-line, and other more elaborate formatting directives are ignored.

  • There is no support for multi-select fields or signature fields.

If qpdf doesn’t do a good enough job with your form, use an external application to save your filled-in form before processing it with qpdf.

--optimize-images

This flag causes qpdf to recompress all images that are not compressed with DCT (JPEG) using DCT compression as long as doing so decreases the size in bytes of the image data and the image does not fall below minimum specified dimensions. Useful information is provided when used in combination with --verbose. See also the --oi-min-width, --oi-min-height, and --oi-min-area options. By default, starting in qpdf 8.4, inline images are converted to regular images and optimized as well. Use --keep-inline-images to prevent inline images from being included.

--oi-min-width=width

Avoid optimizing images whose width is below the specified amount. If omitted, the default is 128 pixels. Use 0 for no minimum.

--oi-min-height=height

Avoid optimizing images whose height is below the specified amount. If omitted, the default is 128 pixels. Use 0 for no minimum.

--oi-min-area=area-in-pixels

Avoid optimizing images whose pixel count (width × height) is below the specified amount. If omitted, the default is 16,384 pixels. Use 0 for no minimum.

--externalize-inline-images

Convert inline images to regular images. By default, images whose data is at least 1,024 bytes are converted when this option is selected. Use --ii-min-bytes to change the size threshold. This option is implicitly selected when --optimize-images is selected. Use --keep-inline-images to exclude inline images from image optimization.

--ii-min-bytes=bytes

Avoid converting inline images whose size is below the specified minimum size to regular images. If omitted, the default is 1,024 bytes. Use 0 for no minimum.

--keep-inline-images

Prevent inline images from being included in image optimization. This option has no affect when --optimize-images is not specified.

--remove-page-labels

Remove page labels from the output file.

--qdf

Turns on QDF mode. For additional information on QDF, please see QDF Mode. Note that --linearize disables QDF mode.

--min-version=version

Forces the PDF version of the output file to be at least version. In other words, if the input file has a lower version than the specified version, the specified version will be used. If the input file has a higher version, the input file’s original version will be used. It is seldom necessary to use this option since qpdf will automatically increase the version as needed when adding features that require newer PDF readers.

The version number may be expressed in the form major.minor.extension-level, in which case the version is interpreted as major.minor at extension level extension-level. For example, version 1.7.8 represents version 1.7 at extension level 8. Note that minimal syntax checking is done on the command line.

--force-version=version

This option forces the PDF version to be the exact version specified even when the file may have content that is not supported in that version. The version number is interpreted in the same way as with --min-version so that extension levels can be set. In some cases, forcing the output file’s PDF version to be lower than that of the input file will cause qpdf to disable certain features of the document. Specifically, 256-bit keys are disabled if the version is less than 1.7 with extension level 8 (except R5 is disabled if less than 1.7 with extension level 3), AES encryption is disabled if the version is less than 1.6, cleartext metadata and object streams are disabled if less than 1.5, 128-bit encryption keys are disabled if less than 1.4, and all encryption is disabled if less than 1.3. Even with these precautions, qpdf won’t be able to do things like eliminate use of newer image compression schemes, transparency groups, or other features that may have been added in more recent versions of PDF.

As a general rule, with the exception of big structural things like the use of object streams or AES encryption, PDF viewers are supposed to ignore features in files that they don’t support from newer versions. This means that forcing the version to a lower version may make it possible to open your PDF file with an older version, though bear in mind that some of the original document’s functionality may be lost.

By default, when a stream is encoded using non-lossy filters that qpdf understands and is not already compressed using a good compression scheme, qpdf will uncompress and recompress streams. Assuming proper filter implements, this is safe and generally results in smaller files. This behavior may also be explicitly requested with --stream-data=compress.

When --normalize-content=y is specified, qpdf will attempt to normalize whitespace and newlines in page content streams. This is generally safe but could, in some cases, cause damage to the content streams. This option is intended for people who wish to study PDF content streams or to debug PDF content. You should not use this for “production” PDF files.

When normalizing content, if qpdf runs into any lexical errors, it will print a warning indicating that content may be damaged. The only situation in which qpdf is known to cause damage during content normalization is when a page’s contents are split across multiple streams and streams are split in the middle of a lexical token such as a string, name, or inline image. Note that files that do this are invalid since the PDF specification states that content streams are not to be split in the middle of a token. If you want to inspect the original content streams in an uncompressed format, you can always run with --qdf --normalize-content=n for a QDF file without content normalization, or alternatively --stream-data=uncompress for a regular non-QDF mode file with uncompressed streams. These will both uncompress all the streams but will not attempt to normalize content. Please note that if you are using content normalization or QDF mode for the purpose of manually inspecting files, you don’t have to care about this.

Object streams, also known as compressed objects, were introduced into the PDF specification at version 1.5, corresponding to Acrobat 6. Some older PDF viewers may not support files with object streams. qpdf can be used to transform files with object streams to files without object streams or vice versa. As mentioned above, there are three object stream modes: preserve, disable, and generate.

In preserve mode, the relationship to objects and the streams that contain them is preserved from the original file. In disable mode, all objects are written as regular, uncompressed objects. The resulting file should be readable by older PDF viewers. (Of course, the content of the files may include features not supported by older viewers, but at least the structure will be supported.) In generate mode, qpdf will create its own object streams. This will usually result in more compact PDF files, though they may not be readable by older viewers. In this mode, qpdf will also make sure the PDF version number in the header is at least 1.5.

The --qdf flag turns on QDF mode, which changes some of the defaults described above. Specifically, in QDF mode, by default, stream data is uncompressed, content streams are normalized, and encryption is removed. These defaults can still be overridden by specifying the appropriate options as described above. Additionally, in QDF mode, stream lengths are stored as indirect objects, objects are laid out in a less efficient but more readable fashion, and the documents are interspersed with comments that make it easier for the user to find things and also make it possible for fix-qdf to work properly. QDF mode is intended for people, mostly developers, who wish to inspect or modify PDF files in a text editor. For details, please see QDF Mode.

Testing, Inspection, and Debugging Options

These options can be useful for digging into PDF files or for use in automated test suites for software that uses the qpdf library. When any of the options in this section are specified, no output file should be given. The following options are available:

--deterministic-id

Causes generation of a deterministic value for /ID. This prevents use of timestamp and output file name information in the /ID generation. Instead, at some slight additional runtime cost, the /ID field is generated to include a digest of the significant parts of the content of the output PDF file. This means that a given qpdf operation should generate the same /ID each time it is run, which can be useful when caching results or for generation of some test data. Use of this flag is not compatible with creation of encrypted files.

--static-id

Causes generation of a fixed value for /ID. This is intended for testing only. Never use it for production files. If you are trying to get the same /ID each time for a given file and you are not generating encrypted files, consider using the --deterministic-id option.

--static-aes-iv

Causes use of a static initialization vector for AES-CBC. This is intended for testing only so that output files can be reproducible. Never use it for production files. This option in particular is not secure since it significantly weakens the encryption.

--no-original-object-ids

Suppresses inclusion of original object ID comments in QDF files. This can be useful when generating QDF files for test purposes, particularly when comparing them to determine whether two PDF files have identical content.

--show-encryption

Shows document encryption parameters. Also shows the document’s user password if the owner password is given.

--show-encryption-key

When encryption information is being displayed, as when --check or --show-encryption is given, display the computed or retrieved encryption key as a hexadecimal string. This value is not ordinarily useful to users, but it can be used as the argument to --password if the --password-is-hex-key is specified. Note that, when PDF files are encrypted, passwords and other metadata are used only to compute an encryption key, and the encryption key is what is actually used for encryption. This enables retrieval of that key.

--check-linearization

Checks file integrity and linearization status.

--show-linearization

Checks and displays all data in the linearization hint tables.

--show-xref

Shows the contents of the cross-reference table in a human-readable form. This is especially useful for files with cross-reference streams which are stored in a binary format.

--show-object=trailer|obj[,gen]

Show the contents of the given object. This is especially useful for inspecting objects that are inside of object streams (also known as “compressed objects”).

--raw-stream-data

When used along with the --show-object option, if the object is a stream, shows the raw stream data instead of object’s contents.

--filtered-stream-data

When used along with the --show-object option, if the object is a stream, shows the filtered stream data instead of object’s contents. If the stream is filtered using filters that qpdf does not support, an error will be issued.

--show-npages

Prints the number of pages in the input file on a line by itself. Since the number of pages appears by itself on a line, this option can be useful for scripting if you need to know the number of pages in a file.

--show-pages

Shows the object and generation number for each page dictionary object and for each content stream associated with the page. Having this information makes it more convenient to inspect objects from a particular page.

--with-images

When used along with --show-pages, also shows the object and generation numbers for the image objects on each page. (At present, information about images in shared resource dictionaries are not output by this command. This is discussed in a comment in the source code.)

--json

Generate a JSON representation of the file. This is described in depth in QPDF JSON

--json-help

Describe the format of the JSON output.

--json-key=key

This option is repeatable. If specified, only top-level keys specified will be included in the JSON output. If not specified, all keys will be shown.

--json-object=trailer|obj[,gen]

This option is repeatable. If specified, only specified objects will be shown in the “objects” key of the JSON output. If absent, all objects will be shown.

--check

Checks file structure and well as encryption, linearization, and encoding of stream data. A file for which --check reports no errors may still have errors in stream data content but should otherwise be structurally sound. If --check any errors, qpdf will exit with a status of 2. There are some recoverable conditions that --check detects. These are issued as warnings instead of errors. If qpdf finds no errors but finds warnings, it will exit with a status of 3 (as of version 2.0.4). When --check is combined with other options, checks are always performed before any other options are processed. For erroneous files, --check will cause qpdf to attempt to recover, after which other options are effectively operating on the recovered file. Combining --check with other options in this way can be useful for manually recovering severely damaged files. Note that --check produces no output to standard output when everything is valid, so if you are using this to programmatically validate files in bulk, it is safe to run without output redirected to /dev/null and just check for a 0 exit code.

The --raw-stream-data and --filtered-stream-data options are ignored unless --show-object is given. Either of these options will cause the stream data to be written to standard output. In order to avoid commingling of stream data with other output, it is recommend that these objects not be combined with other test/inspection options.

If --filtered-stream-data is given and --normalize-content=y is also given, qpdf will attempt to normalize the stream data as if it is a page content stream. This attempt will be made even if it is not a page content stream, in which case it will produce unusable results.

Unicode Passwords

At the library API level, all methods that perform encryption and decryption interpret passwords as strings of bytes. It is up to the caller to ensure that they are appropriately encoded. Starting with qpdf version 8.4.0, qpdf will attempt to make this easier for you when interact with qpdf via its command line interface. The PDF specification requires passwords used to encrypt files with 40-bit or 128-bit encryption to be encoded with PDF Doc encoding. This encoding is a single-byte encoding that supports ISO-Latin-1 and a handful of other commonly used characters. It has a large overlap with Windows ANSI but is not exactly the same. There is generally not a way to provide PDF Doc encoded strings on the command line. As such, qpdf versions prior to 8.4.0 would often create PDF files that couldn’t be opened with other software when given a password with non-ASCII characters to encrypt a file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf recognizes the encoding of the parameter and transcodes it as needed. The rest of this section provides the details about exactly how qpdf behaves. Most users will not need to know this information, but it might be useful if you have been working around qpdf’s old behavior or if you are using qpdf to generate encrypted files for testing other PDF software.

A note about Windows: when qpdf builds, it attempts to determine what it has to do to use wmain instead of main on Windows. The wmain function is an alternative entry point that receives all arguments as UTF-16-encoded strings. When qpdf starts up this way, it converts all the strings to UTF-8 encoding and then invokes the regular main. This means that, as far as qpdf is concerned, it receives its command-line arguments with UTF-8 encoding, just as it would in any modern Linux or UNIX environment.

If a file is being encrypted with 40-bit or 128-bit encryption and the supplied password is not a valid UTF-8 string, qpdf will fall back to the behavior of interpreting the password as a string of bytes. If you have old scripts that encrypt files by passing the output of iconv to qpdf, you no longer need to do that, but if you do, qpdf should still work. The only exception would be for the extremely unlikely case of a password that is encoded with a single-byte encoding but also happens to be valid UTF-8. Such a password would contain strings of even numbers of characters that alternate between accented letters and symbols. In the extremely unlikely event that you are intentionally using such passwords and qpdf is thwarting you by interpreting them as UTF-8, you can use --password-mode=bytes to suppress qpdf’s automatic behavior.

The --password-mode option, as described earlier in this chapter, can be used to change qpdf’s interpretation of supplied passwords. There are very few reasons to use this option. One would be the unlikely case described in the previous paragraph in which the supplied password happens to be valid UTF-8 but isn’t supposed to be UTF-8. Your best bet would be just to provide the password as a valid UTF-8 string, but you could also use --password-mode=bytes. Another reason to use --password-mode=bytes would be to intentionally generate PDF files encrypted with passwords that are not properly encoded. The qpdf test suite does this to generate invalid files for the purpose of testing its password recovery capability. If you were trying to create intentionally incorrect files for a similar purposes, the bytes password mode can enable you to do this.

When qpdf attempts to decrypt a file with a password that contains non-ASCII characters, it will generate a list of alternative passwords by attempting to interpret the password as each of a handful of different coding systems and then transcode them to the required format. This helps to compensate for the supplied password being given in the wrong coding system, such as would happen if you used the iconv workaround that was previously needed. It also generates passwords by doing the reverse operation: translating from correct in incorrect encoding of the password. This would enable qpdf to decrypt files using passwords that were improperly encoded by whatever software encrypted the files, including older versions of qpdf invoked without properly encoded passwords. The combination of these two recovery methods should make qpdf transparently open most encrypted files with the password supplied correctly but in the wrong coding system. There are no real downsides to this behavior, but if you don’t want qpdf to do this, you can use the --suppress-password-recovery option. One reason to do that is to ensure that you know the exact password that was used to encrypt the file.

With these changes, qpdf now generates compliant passwords in most cases. There are still some exceptions. In particular, the PDF specification directs compliant writers to normalize Unicode passwords and to perform certain transformations on passwords with bidirectional text. Implementing this functionality requires using a real Unicode library like ICU. If a client application that uses qpdf wants to do this, the qpdf library will accept the resulting passwords, but qpdf will not perform these transformations itself. It is possible that this will be addressed in a future version of qpdf. The QPDFWriter methods that enable encryption on the output file accept passwords as strings of bytes.

Please note that the --password-is-hex-key option is unrelated to all this. This flag bypasses the normal process of going from password to encryption string entirely, allowing the raw encryption key to be specified directly. This is useful for forensic purposes or for brute-force recovery of files with unknown passwords.