Welcome folks today in this blog post we will be taking screenshot of website and export it to compressed pdf document using ghostscript & selenium. All the full source code of the application is shown below
Get Started
In order to get started you need to install the below libraries for this application
pip install pyhtml2pdf
For this application we need this package and this package is using selenium library as this library is useful in taking the screenshot of the website in headless chrome environment and then exporting it to pdf document. And also it is using ghostscript to compress the resultant exported pdf document. This package is required
Dependencies
Selenium & Ghostscript
Usage
Now to use this library or package you need to make an app.py
file and copy paste the below code
app.py
1 2 3 |
from pyhtml2pdf import converter converter.convert('https://youtube.com', 'sample.pdf') |
As you can see at the very top we are importing the pyhtml2pdf library and from this we are importing the converter class. In this converter class we have the method of convert. In this method we are providing two parameters first is the url of the website and the second argument is the filename of the exported pdf document. If you execute this application you will see the pdf document exported and will look like this. In this case we are taking screenshot of youtube.com and export it to pdf document
Converting Local HTML5 File Template to PDF Document
Now we will take the example of local html5 template document to export it to pdf document. And for this we need to make a index.html
file and copy paste the html template as shown below
index.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Document</title> </head> <style> /** * 01/28/2016 * This pen is years old, and watching at the code after all * those years made me fall from my chair, so I: * - changed all IDs to classes * - converted all units to pixels and em units * - changed all global elements to classes or children of * .login * - cleaned the syntax to be more consistent * - added a lot of spaces that I so hard tried to avoid * a few years ago * (because it's cool to not use them) * - and probably something else that I can't remember anymore * * I sticked to the same philosophy, meaning: * - the design is almost the same * - only pure HTML and CSS * - no frameworks, preprocessors or resets */ /* 'Open Sans' font from Google Fonts */ @import url(https://fonts.googleapis.com/css?family=Open+Sans:400,700); body { background: #456; font-family: 'Open Sans', sans-serif; } .login { width: 400px; margin: 16px auto; font-size: 16px; } /* Reset top and bottom margins from certain elements */ .login-header, .login p { margin-top: 0; margin-bottom: 0; } /* The triangle form is achieved by a CSS hack */ .login-triangle { width: 0; margin-right: auto; margin-left: auto; border: 12px solid transparent; border-bottom-color: #28d; } .login-header { background: #28d; padding: 20px; font-size: 1.4em; font-weight: normal; text-align: center; text-transform: uppercase; color: #fff; } .login-container { background: #ebebeb; padding: 12px; } /* Every row inside .login-container is defined with p tags */ .login p { padding: 12px; } .login input { box-sizing: border-box; display: block; width: 100%; border-width: 1px; border-style: solid; padding: 16px; outline: 0; font-family: inherit; font-size: 0.95em; } .login input[type="email"], .login input[type="password"] { background: #fff; border-color: #bbb; color: #555; } /* Text fields' focus effect */ .login input[type="email"]:focus, .login input[type="password"]:focus { border-color: #888; } .login input[type="submit"] { background: #28d; border-color: transparent; color: #fff; cursor: pointer; } .login input[type="submit"]:hover { background: #17c; } /* Buttons' focus effect */ .login input[type="submit"]:focus { border-color: #05a; } </style> <body> <div class="login"> <div class="login-triangle"></div> <h2 class="login-header">Log in</h2> <form class="login-container"> <p><input type="email" placeholder="Email"></p> <p><input type="password" placeholder="Password"></p> <p><input type="submit" value="Log in"></p> </form> </div> </body> </html> |
As you can see we have a simple login form with css styles also. We will now export this login form template to pdf document.
app.py
1 2 3 4 5 |
import os from pyhtml2pdf import converter path = os.path.abspath('index.html') converter.convert(f'file:///{path}', 'file.pdf') |
Here as you can see we are using the os module and also we are using again the convert method to export the html template to pdf document. Now if you open the pdf document you will see the login form is fully exported along side with all the css styles.
Compress the Resultant PDF
Now if you see the resultant pdf generated is very large in size. So we need to reduce the size of the pdf documents. We need to use ghostscript library. This library is built inside this package. The python code is shown below
1 |
converter.convert(source, target, compress=True, power=0) |
So here again we are using the convert method this time we are also providing the compress parameter to true. And one more parameter we are providing is power which is equal to 0.
- 0: default
- 1: prepress
- 2: printer
- 3: ebook
- 4: screen
So these are the different sizes for the pdf document. Depending upon how much compression you want you can reduce the size of the pdf document
1 2 3 4 |
import os from pyhtml2pdf import compressor compressor.compress('sample.pdf', 'compressed_sample.pdf') |
And here in this above python code we are importing the compressor class from pyhtml2pdf. And then we are using the compress() method to actually compress the size of the pdf document. Here in this file we are providing the input file name as an argument and also the second argument will be output file name