Welcome folks today in this blog post we will be using the xpdf
library to extract the text
from the pdf file and save it as audio
file using the say.js
module in node.js. All the full source code of the application is shown below.
Get Started
In order to get started you need to install the below libraries
using the npm command
npm i say
Installing XPDF in Windows
First of all guys go to https://www.xpdfreader.com/download.html and download the xpdf
binaries and after that you need to add the path inside the environment
variables as shown below
And now we need to write the below code inside the index.js
file of your node.js project as shown below
index.js
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
const fs = require('fs'); const { exec } = require('child_process'); const say = require('say'); // read the PDF file // convert the PDF to text using pdf-to-text command-line tool exec(`pdftotext sample.pdf output.txt`, (error, stdout, stderr) => { if (error) { console.error(`exec error: ${error}`); return; } // convert the extracted text to speech say.export(fs.readFileSync("output.txt"), 'Microsoft Zira Desktop', 1, 'output.wav', function(err) { if (err) throw err; console.log('Audio file saved!'); }); }); |
As you can see we are importing the exec
module and the say
module and then we are exporting the pdf
file to text file which is output.txt
and then we are reading the content
of the text file and then we are converting to the output.wav
audio file as shown below