Skip to content

WebNinjaDeveloper.com

Programming Tutorials




Menu
  • Home
  • Youtube Channel
  • Official Blog
  • Nearby Places Finder
  • Direction Route Finder
  • Distance & Time Calculator
Menu

Node.js Express Project to Extract Text From Image Using Tesseract OCR Library in Browser Using JS

Posted on January 20, 2023

 

 

Welcome folks today in this blog post we will be using the tesseract ocr library to extract text from images and display it inside the browser in node.js and express using Javascript. All the full source code of the application is shown below.

 

 

Get Started

 

 

In order to get started you need to initialize a new node.js project using the below command as shown below

 

 

npm init -y

 

 

npm i express

 

 

npm i multer

 

 

npm i ejs

 

 

npm i node-tesseract-ocr

 

 

After that we will see the below directory structure of the node.js and express app as shown below

 

 

 

 

 

 

And now you need to make the uploads directory where we will be storing the uploaded images using the multer library. And also make a views directory where we will be storing all the ejs views for this web app.

 

 

Installing Tesseract Library in Windows

 

 

First of all we need to go to the below url to download the exe installer of the tesseract library in windows

 

https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w32-setup-v4.0.0.20181030.exe

 

 

And now you need to add the library inside the path using the environment variables as shown below

 

 

Step 1 – Add TESSDATA_PREFIX in the System Environment Variables :

 

 

Variable Name –TESSDATA_PREFIX

 

 

Variable Value – C:\Program Files (x86)\Tesseract-OCR\tessdata

 

 

Step 2– Add another environment variable “tesseract”

 

 

Variable Name –tesseract

 

 

Variable Value –C:\Program Files (x86)\Tesseract-OCR\tesseract.exe

 

 

Step 3 – In the PATH environment variable add following path of installation of tesseract

 

 

Variable Value –C:\Program Files (x86)\Tesseract-OCR

 

 

Now we need to make the index.js file and copy paste the following code

 

 

index.js

 

 

JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
const express = require('express');
 
const multer = require('multer')
 
const tesseract = require("node-tesseract-ocr");
 
 
const path = require('path')
 
const app = express()
 
app.use(express.static(path.join(__dirname + '/uploads')))
 
app.set('view engine', "ejs")
 
var storage = multer.diskStorage({
  destination: function (req, file, cb) {
    cb(null, "uploads");
  },
  filename: function (req, file, cb) {
    cb(
      null,
      file.fieldname + "-" + Date.now() + path.extname(file.originalname)
    );
  },
});
 
const upload = multer({storage:storage})
 
app.get('/', (req, res) => {
    res.render('index',{data:''})
})
 
app.post('/extracttextfromimage', upload.single('file'), (req, res) => {
    console.log(req.file.path)
  
    const config = {
      lang: "eng",
      oem: 1,
      psm: 3,
    };
 
    tesseract
      .recognize(req.file.path, config)
      .then((text) => {
          console.log("Result:", text);
          
          res.render('index',{data:text})
      })
      .catch((error) => {
        console.log(error.message);
      });
})
 
 
app.listen(5000, () => {
    console.log("App os listening on port 5000")
})

 

 

As you can see we are importing all the libraries at the top and then we are loading the index.html template file when we go to the / home route and also we are writing the post route at the /extracttextfromimage and inside it we are using the tesseract command to extract the text from the given image which is uploaded using the multer library and then we are passing the text to the ejs template where we are displaying the data inside the textarea.

 

 

Now make the index.ejs file inside the views directory and copy paste the below html code as shown below

 

 

views/index.ejs

 

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Extract Text From Image</title>
</head>
<body>
    <form action="/extracttextfromimage" method="post" enctype="multipart/form-data">
        <label for="file">Upload Image File:</label>
        <input type="file" name="file" accept="image/*" required id="">
        <button>Extract Text From Image</button>
    </form>
 
    <textarea cols="40" rows="40">
        <%=data%>
    </textarea>
</body>
</html>

 

 

Now inside this html we have the input field where we allow the user to select any image and then we have the button to extract text from image and then we are displaying the text inside the textarea.

 

 

 

 

Recent Posts

  • Android Kotlin Project to Load Image From URL into ImageView Widget
  • Android Java Project to Make HTTP Call to JSONPlaceholder API and Display Data in RecyclerView Using GSON & Volley Library
  • Android Java Project to Download Youtube Video Thumbnail From URL & Save it inside SD Card
  • Android Java Project to Embed Google Maps & Add Markers Using Maps SDK
  • Android Java Project to Download Random Image From Unsplash Using OkHttp & Picasso Library & Display it
  • Angular
  • Bunjs
  • C#
  • Deno
  • django
  • Electronjs
  • java
  • javascript
  • Koajs
  • kotlin
  • Laravel
  • meteorjs
  • Nestjs
  • Nextjs
  • Nodejs
  • PHP
  • Python
  • React
  • ReactNative
  • Svelte
  • Tutorials
  • Vuejs




©2023 WebNinjaDeveloper.com | Design: Newspaperly WordPress Theme