Welcome folks today in this blog post we will be using the pdf2docx
library inside the flask application to bulk process the pdf
documents to docx
files in browser using python. All the full source code of the application is shown below.
Get Started
In order to get started you need to install the below libraries
using the pip command as shown below
pip install flask
pip install pdf2docx
And after that you need to make the app.py
file and copy paste the following code
app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
import os from flask import Flask, render_template, request, send_file from werkzeug.utils import secure_filename from pdf2docx import Converter import zipfile app = Flask(__name__) app.config['UPLOAD_FOLDER'] = 'uploads' @app.route('/') def index(): return render_template('index.html') @app.route('/convert', methods=['POST']) def convert(): # Get the uploaded PDF files pdf_files = request.files.getlist('pdf_files') # Create a temporary folder to store the converted DOCX files temp_folder = os.path.join(app.config['UPLOAD_FOLDER'], 'temp') os.makedirs(temp_folder, exist_ok=True) # Convert each PDF file to DOCX docx_files = [] for pdf_file in pdf_files: if pdf_file.filename != '': # Save the PDF file filename = secure_filename(pdf_file.filename) pdf_path = os.path.join(temp_folder, filename) pdf_file.save(pdf_path) # Convert PDF to DOCX docx_path = os.path.join(temp_folder, os.path.splitext(filename)[0] + '.docx') converter = Converter(pdf_path) converter.convert(docx_path) converter.close() docx_files.append(docx_path) # Create a ZIP archive of the converted DOCX files zip_path = os.path.join(app.config['UPLOAD_FOLDER'], 'converted_files.zip') with zipfile.ZipFile(zip_path, 'w') as zip_file: for docx_file in docx_files: zip_file.write(docx_file, os.path.basename(docx_file)) # Send the ZIP file for download return send_file(zip_path, as_attachment=True) if __name__ == '__main__': app.run(debug=True) |
As you can see we have defined all the get
and post
routes for exporting the pdf
files to docx
files using the pdf2docx
library which we have imported at the top and then we are loading the index.html
template now we need to define this as shown below
templates/index.html
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<!DOCTYPE html> <html> <head> <title>PDF to DOCX Conversion</title> </head> <body> <h1>PDF to DOCX Conversion</h1> <form action="/convert" method="post" enctype="multipart/form-data"> <input type="file" name="pdf_files" multiple accept=".pdf"> <input type="submit" value="Convert"> </form> </body> </html> |