Welcome folks today in this blog post we will be compressing pdf document
using ghostscript
in command line using python
.All the full source code of the application is shown below.
Get Started
In order to get started you need to download the ghostscript
library from the below url and then install it on the machine by adding it inside the environment
variables.
https://ghostscript.com/releases/gsdnld.html
And now you need to make an app.py
file and copy paste the following code
app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
import argparse import subprocess import os.path import sys import shutil def main(): parser = argparse.ArgumentParser( description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter ) parser.add_argument('input', help='Relative or absolute path of the input PDF file') parser.add_argument('-o', '--out', help='Relative or absolute path of the output PDF file') parser.add_argument('-c', '--compress', type=int, help='Compression level from 0 to 4') parser.add_argument('-b', '--backup', action='store_true', help="Backup the old PDF file") parser.add_argument('--open', action='store_true', default=False, help='Open PDF after compression') args = parser.parse_args() # In case no compression level is specified, default is 2 '/ printer' if not args.compress: args.compress = 2 # In case no output file is specified, store in temp file if not args.out: args.out = 'temp.pdf' # Run compress(args.input, args.out, power=args.compress) # In case no output file is specified, erase original file if args.out == 'temp.pdf': if args.backup: shutil.copyfile(args.input, args.input.replace(".pdf", "_BACKUP.pdf")) shutil.copyfile(args.out, args.input) os.remove(args.out) # In case we want to open the file after compression if args.open: if args.out == 'temp.pdf' and args.backup: subprocess.call(['open', args.input]) else: subprocess.call(['open', args.out]) if __name__ == '__main__': main() |
As you can see we are importing all the libraries
and then we are defining the main()
function inside which we are adding the command line
arguments to compress
the pdf document using different options. We are adding arguments to the command using the add_argument()
method. And then we are calling the compress()
function where we define the logic
to actually compress the text and images
present inside the pdf document. And now we need to define the compress()
function as shown below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
def compress(input_file_path, output_file_path, power=0): """Function to compress PDF via Ghostscript command line interface""" quality = { 0: '/default', 1: '/prepress', 2: '/printer', 3: '/ebook', 4: '/screen' } # Basic controls # Check if valid path if not os.path.isfile(input_file_path): print("Error: invalid path for input PDF file") sys.exit(1) # Check if file is a PDF by extension if input_file_path.split('.')[-1].lower() != 'pdf': print("Error: input file is not a PDF") sys.exit(1) gs = get_ghostscript_path() print("Compress PDF...") initial_size = os.path.getsize(input_file_path) subprocess.call([gs, '-sDEVICE=pdfwrite', '-dCompatibilityLevel=1.4', '-dPDFSETTINGS={}'.format(quality[power]), '-dNOPAUSE', '-dQUIET', '-dBATCH', '-sOutputFile={}'.format(output_file_path), input_file_path] ) final_size = os.path.getsize(output_file_path) ratio = 1 - (final_size / initial_size) print("Compression by {0:.0%}.".format(ratio)) print("Final file size is {0:.1f}MB".format(final_size / 1000000)) print("Done.") def get_ghostscript_path(): gs_names = ['gs', 'gswin32', 'gswin64'] for name in gs_names: if shutil.which(name): return shutil.which(name) raise FileNotFoundError(f'No GhostScript executable was found on path ({"/".join(gs_names)})') |
As you can see we are first of all getting the ghostscript
library executable path using the get_ghostscript_path()
method and then we are calling the subprocess
module to execute the ghostscript
command to reduce the pdf size
and then it is saving it as the output pdf file.
Usage
Now to run this pdf compression
script you need to execute the below command as shown below
python app.py input.pdf -o output.pdf
Here input.pdf
is the path of the input pdf file
Here output.pdf
is the path of the output pdf file
Now if you see the size of the output.pdf
file it is compressed to 44 KB
from 3.6 MB