Skip to content

WebNinjaDeveloper.com

Programming Tutorials




Menu
  • Home
  • Youtube Channel
  • Official Blog
  • Nearby Places Finder
  • Direction Route Finder
  • Distance & Time Calculator
Menu

Python 3 Selenium to Take Screenshot of Website and Export it to Compressed PDF Document & Using Ghostscript in Command Line

Posted on October 8, 2022

 

 

Welcome folks today in this blog post we will be taking screenshot of website and export it to compressed pdf document using ghostscript & selenium. All the full source code of the application is shown below

 

 

Get Started

 

 

In order to get started you need to install the below libraries for this application

 

 

pip install pyhtml2pdf

 

 

For this application we need this package and this package is using selenium library as this library is useful in taking the screenshot of the website in headless chrome environment and then exporting it to pdf document. And also it is using ghostscript to compress the resultant exported pdf document. This package is required

 

 

Dependencies

 

Selenium & Ghostscript

 

 

Usage

 

 

Now to use this library or package you need to make an app.py file and copy paste the below code

 

 

app.py

 

 

Python
1
2
3
from pyhtml2pdf import converter
 
converter.convert('https://youtube.com', 'sample.pdf')

 

 

As you can see at the very top we are importing the pyhtml2pdf library and from this we are importing the converter class. In this converter class we have the method of convert. In this method we are providing two parameters first is the url of the website and the second argument is the filename of the exported pdf document. If you execute this application you will see the pdf document exported and will look like this. In this case we are taking screenshot of youtube.com and export it to pdf document

 

 

 

 

 

Converting Local HTML5 File Template to PDF Document

 

 

Now we will take the example of local html5 template document to export it to pdf document. And for this we need to make a index.html file and copy paste the html template as shown below

 

 

index.html

 

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<style>
    /**
* 01/28/2016
* This pen is years old, and watching at the code after all
* those years made me fall from my chair, so I:
* - changed all IDs to classes
* - converted all units to pixels and em units
* - changed all global elements to classes or children of
*   .login
* - cleaned the syntax to be more consistent
* - added a lot of spaces that I so hard tried to avoid
*   a few years ago
*   (because it's cool to not use them)
* - and probably something else that I can't remember anymore
*
* I sticked to the same philosophy, meaning:
* - the design is almost the same
* - only pure HTML and CSS
* - no frameworks, preprocessors or resets
*/
 
/* 'Open Sans' font from Google Fonts */
@import url(https://fonts.googleapis.com/css?family=Open+Sans:400,700);
 
body {
  background: #456;
  font-family: 'Open Sans', sans-serif;
}
 
.login {
  width: 400px;
  margin: 16px auto;
  font-size: 16px;
}
 
/* Reset top and bottom margins from certain elements */
.login-header,
.login p {
  margin-top: 0;
  margin-bottom: 0;
}
 
/* The triangle form is achieved by a CSS hack */
.login-triangle {
  width: 0;
  margin-right: auto;
  margin-left: auto;
  border: 12px solid transparent;
  border-bottom-color: #28d;
}
 
.login-header {
  background: #28d;
  padding: 20px;
  font-size: 1.4em;
  font-weight: normal;
  text-align: center;
  text-transform: uppercase;
  color: #fff;
}
 
.login-container {
  background: #ebebeb;
  padding: 12px;
}
 
/* Every row inside .login-container is defined with p tags */
.login p {
  padding: 12px;
}
 
.login input {
  box-sizing: border-box;
  display: block;
  width: 100%;
  border-width: 1px;
  border-style: solid;
  padding: 16px;
  outline: 0;
  font-family: inherit;
  font-size: 0.95em;
}
 
.login input[type="email"],
.login input[type="password"] {
  background: #fff;
  border-color: #bbb;
  color: #555;
}
 
/* Text fields' focus effect */
.login input[type="email"]:focus,
.login input[type="password"]:focus {
  border-color: #888;
}
 
.login input[type="submit"] {
  background: #28d;
  border-color: transparent;
  color: #fff;
  cursor: pointer;
}
 
.login input[type="submit"]:hover {
  background: #17c;
}
 
/* Buttons' focus effect */
.login input[type="submit"]:focus {
  border-color: #05a;
}
</style>
<body>
    <div class="login">
        <div class="login-triangle"></div>
        
        <h2 class="login-header">Log in</h2>
      
        <form class="login-container">
          <p><input type="email" placeholder="Email"></p>
          <p><input type="password" placeholder="Password"></p>
          <p><input type="submit" value="Log in"></p>
        </form>
      </div>
</body>
</html>

 

 

As you can see we have a simple login form with css styles also. We will now export this login form template to pdf document.

 

 

app.py

 

 

Python
1
2
3
4
5
import os
from pyhtml2pdf import converter
 
path = os.path.abspath('index.html')
converter.convert(f'file:///{path}', 'file.pdf')

 

 

Here as you can see we are using the os module and also we are using again the convert method to export the html template to pdf document. Now if you open the pdf document you will see the login form is fully exported along side with all the css styles.

 

 

 

 

 

Compress the Resultant PDF

 

 

Now if you see the resultant pdf generated is very large in size. So we need to reduce the size of the pdf documents. We need to use ghostscript library. This library is built inside this package. The python code is shown below

 

 

Python
1
converter.convert(source, target, compress=True, power=0)

 

 

So here again we are using the convert method this time we are also providing the compress parameter to true. And one more parameter we are providing is power which is equal to 0.

 

 

  • 0: default
  • 1: prepress
  • 2: printer
  • 3: ebook
  • 4: screen

 

 

So these are the different sizes for the pdf document. Depending upon how much compression you want you can reduce the size of the pdf document

 

 

Python
1
2
3
4
import os
from pyhtml2pdf import compressor
 
compressor.compress('sample.pdf', 'compressed_sample.pdf')

 

 

And here in this above python code we are importing the compressor class from pyhtml2pdf. And then we are using the compress() method to actually compress the size of the pdf document. Here in this file we are providing the input file name as an argument and also the second argument will be output file name

 

Recent Posts

  • Android Kotlin Project to Load Image From URL into ImageView Widget
  • Android Java Project to Make HTTP Call to JSONPlaceholder API and Display Data in RecyclerView Using GSON & Volley Library
  • Android Java Project to Download Youtube Video Thumbnail From URL & Save it inside SD Card
  • Android Java Project to Embed Google Maps & Add Markers Using Maps SDK
  • Android Java Project to Download Random Image From Unsplash Using OkHttp & Picasso Library & Display it
  • Angular
  • Bunjs
  • C#
  • Deno
  • django
  • Electronjs
  • java
  • javascript
  • Koajs
  • kotlin
  • Laravel
  • meteorjs
  • Nestjs
  • Nextjs
  • Nodejs
  • PHP
  • Python
  • React
  • ReactNative
  • Svelte
  • Tutorials
  • Vuejs




©2023 WebNinjaDeveloper.com | Design: Newspaperly WordPress Theme