Recently I worked on a project where I was introduced to the Amazon Transcribe service. The client wanted the feature of converting speech to text programmatically. After some research, we found Amazon Transcribe is the best fit for such tasks. Amazon Transcribe uses a machine learning model called automatic speech recognition (ASR) to convert audio to text. Some of the benefits of using Amazon Transcribe are:
- Get insights from customer conversations
- Search and analyze media content
- Create subtitles and meeting notes
- Improve clinical documentation
Depending on your business model, you can utilize the outcome of this service to grow your business.
In this article, we study how to convert speech to text using Amazon Transcribe in PHP. The AWS provides the SDK for PHP which we are going to use for this tutorial.
Getting Started
To get started, you should have an AWS account. Login with your AWS account and grab your security credentials. We will require these credentials in the latter part of the tutorial. Our PHP script will communicate with the AWS services via these credentials.
After this, install the AWS SDK PHP library using the Composer command:
composer require aws/aws-sdk-php
For converting the speech to text through Amazon Transcribe, you must have a supported media file. Allowed media formats are mp3 | mp4 | wav | flac | ogg | amr | webm
. In addition to this, your speech should be in the supported languages. You can see the list of language codes in their documentation.
The integration of Amazon Transcribe involves the following steps.
- Upload the media file on S3 Bucket.
- Instantiate an Amazon Transcribe Client.
- Start a Transcription job with Amazon Transcribe. This transcription job requires the media URL of the S3 object and the unique job id.
- Amazon Transcribe service may take a few minutes to finish the translation process. You have to wait for it.
- Download the text file after AWS completes the transcription job.
Let’s see how to handle this flow by writing the actual PHP code.
Speech-To-Text using Amazon Transcribe in PHP
First, create the HTML form to browse the media file. Upon form submission, we take the media file for further processing and send a translated text back to the browser in the .txt
file format.
<form method="post" enctype="multipart/form-data">
<p><input type="file" name="media" accept="audio/*,video/*" /></p>
<input type="submit" name="submit" value="Submit" />
</form>
On the PHP end, you have to send the media file to the AWS service. We first upload this media file on the S3 bucket and then initiate the translation task. Include the AWS environment in your application.
<?php
require 'vendor/autoload.php';
use Aws\S3\S3Client;
use Aws\TranscribeService\TranscribeServiceClient;
// process the media file
Next, upload the media file on the S3 bucket and grab the S3 URL of the uploaded media.
if ( isset($_POST['submit']) ) {
// Check if media file is supported
$arr_mime_types = ['mp3', 'mp4', 'wav', 'flac', 'ogg', 'amr', 'webm'];
$media_format = pathinfo($_FILES['media']['name'])['extension'];
if ( !in_array($media_format, $arr_mime_types) ) {
die('File type is not allowed');
}
// pass AWS API credentials
$region = 'PASS_REGION';
$access_key = 'ACCESS_KEY';
$secret_access_key = 'SECRET_ACCESS_KEY';
// Specify S3 bucket name
$bucketName = 'PASS_BUCKET_NAME';
$key = basename($_FILES['media']['name']);
// upload file on S3 Bucket
try {
// Instantiate an Amazon S3 client.
$s3 = new S3Client([
'version' => 'latest',
'region' => $region,
'credentials' => [
'key' => $access_key,
'secret' => $secret_access_key
]
]);
$result = $s3->putObject([
'Bucket' => $bucketName,
'Key' => $key,
'Body' => fopen($_FILES['media']['tmp_name'], 'r'),
'ACL' => 'public-read',
]);
$audio_url = $result->get('ObjectURL');
// Code for Amazon Transcribe Service - Start here
} catch (Exception $e) {
echo $e->getMessage();
}
}
Make sure to replace the placeholders with the actual values. The S3 URL of the uploaded media will be sent to the Amazon Transcribe service. To initiate a transcription job, we require a unique job id to differentiate multiple jobs. This unique id can be created using the uniqid()
method.
// Create Amazon Transcribe Client
$awsTranscribeClient = new TranscribeServiceClient([
'region' => $region,
'version' => 'latest',
'credentials' => [
'key' => $access_key,
'secret' => $secret_access_key
]
]);
// Start a Transcription Job
$job_id = uniqid();
$transcriptionResult = $awsTranscribeClient->startTranscriptionJob([
'LanguageCode' => 'en-US',
'Media' => [
'MediaFileUri' => $audio_url,
],
'TranscriptionJobName' => $job_id,
]);
$status = array();
while(true) {
$status = $awsTranscribeClient->getTranscriptionJob([
'TranscriptionJobName' => $job_id
]);
if ($status->get('TranscriptionJob')['TranscriptionJobStatus'] == 'COMPLETED') {
break;
}
sleep(5);
}
// delete s3 object
$s3->deleteObject([
'Bucket' => $bucketName,
'Key' => $key,
]);
// download the converted txt file
In the above code, we instantiate Amazon Transcribe Client and start the Transcription job. It may take a few mins to complete the translation. I have handled it inside the while loop using the sleep()
method. I am checking if the process is completed every 5 seconds. If it is completed, I am breaking the loop.
Once the translation job is over, we don’t need the media file anymore so we’re deleting it.
You can see this Transcription process on the AWS dashboard under the Amazon Transcribe => Transcription jobs as shown in the screenshot below.
Once the translation is ready, download it into the text format using the below code.
$url = $status->get('TranscriptionJob')['Transcript']['TranscriptFileUri'];
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
if (curl_errno($curl)) {
$error_msg = curl_error($curl);
echo $error_msg;
}
curl_close($curl);
$arr_data = json_decode($data);
// Send converted txt file to a browser
$file = $job_id.".txt";
$txt = fopen($file, "w") or die("Unable to open file!");
fwrite($txt, $arr_data->results->transcripts[0]->transcript);
fclose($txt);
header('Content-Description: File Transfer');
header('Content-Disposition: attachment; filename='.basename($file));
header('Expires: 0');
header('Cache-Control: must-revalidate');
header('Pragma: public');
header('Content-Length: ' . filesize($file));
header("Content-Type: text/plain");
readfile($file);
exit();
This code sends the translated text file to the browser which will be downloaded automatically.
Final Sample Code
The code written above is in chunks. The final sample code is as follows. You can just copy it and use it in your application.
<?php
set_time_limit(0);
require 'vendor/autoload.php';
use Aws\S3\S3Client;
use Aws\TranscribeService\TranscribeServiceClient;
if ( isset($_POST['submit']) ) {
// Check if media file is supported
$arr_mime_types = ['mp3', 'mp4', 'wav', 'flac', 'ogg', 'amr', 'webm'];
$media_format = pathinfo($_FILES['media']['name'])['extension'];
if ( !in_array($media_format, $arr_mime_types) ) {
die('File type is not allowed');
}
// pass AWS API credentials
$region = 'PASS_REGION';
$access_key = 'ACCESS_KEY';
$secret_access_key = 'SECRET_ACCESS_KEY';
// Specify S3 bucket name
$bucketName = 'PASS_BUCKET_NAME';
// Instantiate an Amazon S3 client.
$s3 = new S3Client([
'version' => 'latest',
'region' => $region,
'credentials' => [
'key' => $access_key,
'secret' => $secret_access_key
]
]);
$key = basename($_FILES['media']['name']);
// upload file on S3 Bucket
try {
$result = $s3->putObject([
'Bucket' => $bucketName,
'Key' => $key,
'Body' => fopen($_FILES['media']['tmp_name'], 'r'),
'ACL' => 'public-read',
]);
$audio_url = $result->get('ObjectURL');
// Code for Amazon Transcribe Service - Start here
// Create Amazon Transcribe Client
$awsTranscribeClient = new TranscribeServiceClient([
'region' => $region,
'version' => 'latest',
'credentials' => [
'key' => $access_key,
'secret' => $secret_access_key
]
]);
// Start a Transcription Job
$job_id = uniqid();
$transcriptionResult = $awsTranscribeClient->startTranscriptionJob([
'LanguageCode' => 'en-US',
'Media' => [
'MediaFileUri' => $audio_url,
],
'TranscriptionJobName' => $job_id,
]);
$status = array();
while(true) {
$status = $awsTranscribeClient->getTranscriptionJob([
'TranscriptionJobName' => $job_id
]);
if ($status->get('TranscriptionJob')['TranscriptionJobStatus'] == 'COMPLETED') {
break;
}
sleep(5);
}
// delete s3 object
$s3->deleteObject([
'Bucket' => $bucketName,
'Key' => $key,
]);
// download the converted txt file
$url = $status->get('TranscriptionJob')['Transcript']['TranscriptFileUri'];
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
if (curl_errno($curl)) {
$error_msg = curl_error($curl);
echo $error_msg;
}
curl_close($curl);
$arr_data = json_decode($data);
// Send converted txt file to a browser
$file = $job_id.".txt";
$txt = fopen($file, "w") or die("Unable to open file!");
fwrite($txt, $arr_data->results->transcripts[0]->transcript);
fclose($txt);
header('Content-Description: File Transfer');
header('Content-Disposition: attachment; filename='.basename($file));
header('Expires: 0');
header('Cache-Control: must-revalidate');
header('Pragma: public');
header('Content-Length: ' . filesize($file));
header("Content-Type: text/plain");
readfile($file);
exit();
} catch (Exception $e) {
echo $e->getMessage();
}
}
?>
<form method="post" enctype="multipart/form-data">
<p><input type="file" name="media" accept="audio/*,video/*" /></p>
<input type="submit" name="submit" value="Submit" />
</form>
Conclusion
We have seen the Amazon Transcribe service that can be used to convert speech to text. To get the job done you should pass your speech in the supported media format. Then PHP script written above will give you translated text.
Related Articles
- Text-To-Speech using Amazon Polly in PHP
- Upload Files to Amazon S3 using AWS PHP SDK
- How to Upload file to S3 using Laravel Filesystem
If you liked this article, then please subscribe to our YouTube Channel for video tutorials.
Thank you so much.
I really appreciate this tutorial
Can any one please tell me, how to implement code for live audio to text in PHP
Would be great to see an example with a Queue like SQS. Let the user wait in front of a PHP script running sleep(5) in an endless loop is not very friendly.
Great tutorial, thanks!