Welcome folks today in this blog post we will be extracting and parse
text and tables
from pdf document using pdfreader
library in javascript. All the full source code of the application is shown below.
Get Started
In order to get started you need to make a new node.js
project using the npm command
npm init -y
And now you need to install the below libraries
using the below command as shown below
npm i pdfreader
And after that you need to create the index.js
file and copy paste the following code. And now we need to go to package.json
file and add the type
to module as shown below
package.json
index.js
1 2 3 4 5 6 7 |
import { PdfReader } from "pdfreader"; new PdfReader().parseFileItems("file.pdf", (err, item) => { if (err) console.error("error:", err); else if (!item) console.warn("end of file"); else if (item.text) console.log(item.text); }); |
As you can see we are importing the pdfreader
library at the top and then we are passing the path of the pdf
file and then we are returning the contents of the pdf
file in the terminal.
Parsing a password-protected PDF file
We can even parse the contents of the password
protected pdf file as shown below
1 2 3 4 5 6 7 8 9 10 |
import { PdfReader } from "pdfreader"; new PdfReader({ password: "YOUR_PASSWORD" }).parseFileItems( "test/sample-with-password.pdf", function (err, item) { if (err) console.error(err); else if (!item) console.warn("end of file"); else if (item.text) console.log(item.text); } ); |
As you can see in the above code we are providing the password
property and then we are providing the parseFileItems()
method to get the text contents of the pdf
file and then we are printing the text
content on the command line.
Raw PDF reading from a PDF buffer
We can even read the content of the pdf
file from the buffer
and then print the text content inside the terminal as shown below
1 2 3 4 5 6 7 8 9 10 11 |
import fs from "fs"; import { PdfReader } from "pdfreader"; fs.readFile("test/sample.pdf", (err, pdfBuffer) => { // pdfBuffer contains the file content new PdfReader().parseBuffer(pdfBuffer, (err, item) => { if (err) console.error("error:", err); else if (!item) console.warn("end of buffer"); else if (item.text) console.log(item.text); }); }); |