Recently one of the readers asked about converting live speech to text. The topic sounds interesting to me. Though I wrote about converting speech to text using Amazon Transcribe and Google Cloud Speech, these services require passing audio files. These audio files are then converted into text.
But here the question is for live speech to text. So I decided to explore the solution and came across the Web Speech API. It provides 2 functionality – speech recognition, and speech synthesis. The speech recognition is used to get the text from the speech.
Speech recognition receives speech from your device’s microphone. The word or phrase is checked by a speech recognition service and then returned as a text string.
In this tutorial, we’ll convert live speech to text using Web Speech API and additionally create a PDF of this speech.
Note that Web Speech API is currently supported on a limited browser. You can use this service on the latest version of Chrome or Safari.
Getting Started
To see the flow in action, I’ll create the HTML with a few elements. We’ll have 2 buttons – Start
and Stop
to initiate and end speech recognition. When you click on the Start button, it first asks for permission to use the microphone. Once you give the permission, you can start speaking to your microphone. The words will start printing in HTML as you speak.
For ending the speech recognition, simply click the Stop button. As soon as you click it, a new button Save to PDF
will appear. This button will convert your speech to PDF and send it to the browser.
Create the index.html
file and add the following code to it.
<p>
<input type="button" name="start" value="Start" class="start" />
</p>
<p>
<input type="button" name="stop" value="Stop" class="stop" />
</p>
<div class="transcript"></div>
<div class="btn-pdf" style="display:none;">
<button onclick="save_pdf()">Save to PDF</button>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script>
<script src="custom.js"></script>
Here, I am not adding any styling to the elements. The main purpose is to build the actual functionality. The design part will differ for each user.
I have included html2canvas
and jspdf
libraries via CDN into the HTML. These libraries generate the PDF out of HTML provided to it. It also has custom.js
where we write the actual code for speech recognition and PDF generation.
In the HTML, I’ve added a div
container with the class transcript
. The text string of a speech will append inside this div container in the runtime.
Convert Live Speech to Text
At first, we must check browser compatibility for speech recognition and alert the user if it’s not supported.
if ("webkitSpeechRecognition" in window) {
// actual code here
} else {
alert("speech recognition API not supported");
}
Next, we have to create an object of the class SpeechRecognition
. This class has few properties to interact with.
continuous
: If you want to continuously convert speech while speaking, set this property totrue
. It keeps speech recognition on until you explicitly end it.start
: This property initiates the speech recognition service.stop
: As the name suggests, it terminates the speech recognition process.
const SpeechRecognition = window.SpeechRecognition || webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.continuous = true;
document.querySelector(".start").onclick = () => {
document.querySelector(".btn-pdf").style.display = "none";
recognition.start();
};
document.querySelector(".stop").onclick = () => {
document.querySelector(".btn-pdf").style.display = "block";
recognition.stop();
};
When you are talking to the microphone, Web Speech API starts recognizing words or phrases which need to catch and print on the page. For this, we have to use the onresult
property of the SpeechRecognition class.
let transcript = "";
recognition.onresult = (event) => {
for (let i = event.resultIndex; i < event.results.length; i++) {
if (event.results[i].isFinal) {
transcript += event.results[i][0].transcript;
}
document.querySelector(".transcript").innerHTML = transcript;
}
};
This code receives the text string runtime and keeps appending text to the specified div container. The process continues until you hit the Stop button.
Convert Speech to PDF
Once you are done with the process you might want to convert speech to PDF for offline use. To generate the PDF out of your text string, write the below code into the save_pdf()
method.
function save_pdf() {
window.jsPDF = window.jspdf.jsPDF;
var doc = new jsPDF();
// Source HTMLElement or a string containing HTML.
var elementHTML = document.querySelector(".transcript");
doc.html(elementHTML, {
callback: function(doc) {
// Save the PDF
doc.save('speech.pdf');
},
x: 15,
y: 15,
width: 170, //target width in the PDF document
windowWidth: 650 //window width in CSS pixels
});
}
It takes all content from the div having a class transcript and passes it to the jspdf
library which then generates the PDF.
The final code of the custom.js
file will be as follows.
if ("webkitSpeechRecognition" in window) {
const SpeechRecognition = window.SpeechRecognition || webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.continuous = true;
let transcript = "";
recognition.onresult = (event) => {
for (let i = event.resultIndex; i < event.results.length; i++) {
if (event.results[i].isFinal) {
transcript += event.results[i][0].transcript;
}
document.querySelector(".transcript").innerHTML = transcript;
}
};
document.querySelector(".start").onclick = () => {
document.querySelector(".btn-pdf").style.display = "none";
recognition.start();
};
document.querySelector(".stop").onclick = () => {
document.querySelector(".btn-pdf").style.display = "block";
recognition.stop();
};
} else {
alert("speech recognition API not supported");
}
function save_pdf() {
window.jsPDF = window.jspdf.jsPDF;
var doc = new jsPDF();
// Source HTMLElement or a string containing HTML.
var elementHTML = document.querySelector(".transcript");
doc.html(elementHTML, {
callback: function(doc) {
// Save the PDF
doc.save('speech.pdf');
},
x: 15,
y: 15,
width: 170, //target width in the PDF document
windowWidth: 650 //window width in CSS pixels
});
}
You’re done with converting live speech to text using JavaScript. Give it a try and let me know your thoughts in the comment section below.
Related Articles
- Get YouTube Video List By Keywords Using YouTube Search API and JavaScript
- Validate Google reCAPTCHA using JavaScript
- How to Detect Browser in PHP and JavaScript
If you liked this article, then please subscribe to our YouTube Channel for video tutorials.