Welcome folks today in this blog post we will be building speech to text
app using web speech recognition api
in javascript. All the full source code of the application is shown below.
Get Started
In order to get started you need to make an index.html
file and copy paste the following code
index.html
For this we need to include the bootstrap 4
cdn link as shown below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta1/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-giJF6kkoqNQ00vy+HMDP7azOuL0xtbfIcaT9wjKHr8RbDVddVHyTfAAsrekwKmP1" crossorigin="anonymous" /> <title>Speech To Text</title> </head> <body> </body> <script src="./speechRecognition.js"></script> </html> |
After including the bootstrap
cdn links we now need to include the html
code as shown below
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<body class="container pt-5 bg-dark"> <h2 class="mt-4 text-light">Transcript</h2> <div class="p-3" style="border: 1px solid gray; height: 300px; border-radius: 8px;"> <span id="final" class="text-light"></span> <span id="interim" class="text-secondary"></span> </div> <div class="mt-4"> <button class="btn btn-success" id="start">Start</button> <button class="btn btn-danger" id="stop">Stop</button> <p id="status" class="lead mt-3 text-light" style="display: none">Listenting ...</p> </div> </body> |
As you can see we are attaching the bootstrap 4 classes for styling the application. And then we will have the heading and after that we have the section to display the speech to text
widget and then we have the button to start and stop the user’s microphone. And also at the bottom we are showing the status of the microphone.
So now we need to create the speechRecognition.js
file which will contain the javascript code to make the speech to text
app as shown below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
if ("webkitSpeechRecognition" in window) { // Initialize webkitSpeechRecognition let speechRecognition = new webkitSpeechRecognition(); // String for the Final Transcript let final_transcript = ""; // Set the properties for the Speech Recognition object speechRecognition.continuous = true; speechRecognition.interimResults = true; } else { console.log("Speech Recognition Not Available"); } |
First of all we are checking if the support of webkitSpeechRecognition api is available or not. And then we are making the new object of the webkitSpeechRecognition class and then we are declaring the transcript
variable where we will be storing all the text which is spoken by the user.
And then we are setting the properties for the speechRecognition which is continous and interimResults. These options are set because we need to continously check the user microphone and convert the speech to text and display it in textarea.
Adding the SpeechRecognition Events
Now we will be adding the different events which are available inside the webspeech recognition api. It can include what should happen when it starts ,stops and gets any error. And also when it gets some kind of data
1 2 3 4 5 6 7 8 9 10 11 12 |
speechRecognition.onstart = () => { // Show the Status Element document.querySelector("#status").style.display = "block"; }; speechRecognition.onerror = () => { // Hide the Status Element document.querySelector("#status").style.display = "none"; }; speechRecognition.onend = () => { // Hide the Status Element document.querySelector("#status").style.display = "none"; }; |
As you can see we have defined all the events here inside the start event we are displaying the status i.e listening for user speech. And also if any error comes we will be displaying the error and when it ends or stops then we again hiding the status.
Now guys we will be binding the addEventListener
to both the start and stop buttons present inside the DOM and here we will be starting the webspeech api.
1 2 3 4 5 6 7 8 9 10 |
// Set the onClick property of the start button document.querySelector("#start").onclick = () => { // Start the Speech Recognition speechRecognition.start(); }; // Set the onClick property of the stop button document.querySelector("#stop").onclick = () => { // Stop the Speech Recognition speechRecognition.stop(); }; |
As you can see we are using the start()
and stop()
methods to start and stop the web speech
recognition api. It’s really easy to do this.
Now we need to define what happens when we get some kind of speech or data from the user.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
speechRecognition.onresult = (event) => { // Create the interim transcript string locally because we don't want it to persist like final transcript let interim_transcript = ""; // Loop through the results from the speech recognition object. for (let i = event.resultIndex; i < event.results.length; ++i) { // If the result item is Final, add it to Final Transcript, Else add it to Interim transcript if (event.results[i].isFinal) { final_transcript += event.results[i][0].transcript; } else { interim_transcript += event.results[i][0].transcript; } } |
As you can see we have the onresult
event inside the webspeech api and here we are just adding the user spoken words to the textarea. For this we are using the for loop to get all the results or words spoken by the user and then appending it to the textarea. The user spoken speech or words is available inside the transcript
property.
Full Source Code
Wrapping the blog post this is the full source code of the speechRecognition.js
file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
if ("webkitSpeechRecognition" in window) { // Initialize webkitSpeechRecognition let speechRecognition = new webkitSpeechRecognition(); // String for the Final Transcript let final_transcript = ""; // Set the properties for the Speech Recognition object speechRecognition.continuous = true; speechRecognition.interimResults = true; // Callback Function for the onStart Event speechRecognition.onstart = () => { // Show the Status Element document.querySelector("#status").style.display = "block"; }; speechRecognition.onerror = () => { // Hide the Status Element document.querySelector("#status").style.display = "none"; }; speechRecognition.onend = () => { // Hide the Status Element document.querySelector("#status").style.display = "none"; }; speechRecognition.onresult = (event) => { // Create the interim transcript string locally because we don't want it to persist like final transcript let interim_transcript = ""; // Loop through the results from the speech recognition object. for (let i = event.resultIndex; i < event.results.length; ++i) { // If the result item is Final, add it to Final Transcript, Else add it to Interim transcript if (event.results[i].isFinal) { final_transcript += event.results[i][0].transcript; } else { interim_transcript += event.results[i][0].transcript; } } // Set the Final transcript and Interim transcript. document.querySelector("#final").innerHTML = final_transcript; document.querySelector("#interim").innerHTML = interim_transcript; }; // Set the onClick property of the start button document.querySelector("#start").onclick = () => { // Start the Speech Recognition speechRecognition.start(); }; // Set the onClick property of the stop button document.querySelector("#stop").onclick = () => { // Stop the Speech Recognition speechRecognition.stop(); }; } else { console.log("Speech Recognition Not Available"); } |