Speech-to-Text in Python with Django

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps running within web browsers without sending user data to 3rd party servers.

Speech-to-text, also known as automatic speech recognition (ASR), is a technology that converts spoken language into written text. Speech-to-text technology automates transcription, saving time and enhancing accessibility for a wide range of applications. Incorporating speech-to-text functionality into your application can greatly enhance user engagement and accessibility.

In this article, we will walk you through the process of integrating speech-to-text into a Django application using Picovoice's Leopard Speech-to-Text engine.

Note that it is also possible to perform speech-to-text directly in the front-end with Leopard. Check out the Leopard Web Quick Start guide to get started.

1. Prerequisites

Sign up for a free Picovoice Console account. Once you've created an account, copy your AccessKey on the main dashboard.

Also make sure that you have Python and Django installed on your device, and that your version of Django supports your version of Python. You can check if they are installed with the following commands:

python --version
python -m django --version

2. Create a Django Project

If you don't already have a Django project, start by creating one with the following command:

django-admin startproject myproject

3. Create a Django App

Within your project, create a new Django app.

cd myproject
python manage.py startapp myapp

In your /myproject/settings.py file, add 'myapp' to the INSTALLED_APPS list.

4. Install the Leopard Python SDK

Install pvleopard:

pip install pvleopard

5. Create a View

Replace the contents of /myapp/views.py with the following code. Make sure to replace ${ACCESS_KEY} with your actual AccessKey.

from django.shortcuts import render
from django.core.files.storage import FileSystemStorage
from django.http import JsonResponse
from pvleopard import create, LeopardActivationLimitError, LeopardError


def transcribe_audio(request):
    if request.method == 'POST' and request.FILES['audioFile']:
        try:
            # save file to server
            file = request.FILES['audioFile']
            fs = FileSystemStorage()
            filename = fs.save(file.name, file)
            
            # transcribe with Leopard Speech-to-Text
            leopard = create(access_key="${ACCESS_KEY}")
            transcript, words = leopard.process_file(filename)
            
            # clean up
            leopard.delete()
            fs.delete(filename)
        except LeopardActivationLimitError:
            return JsonResponse({'error': "AccessKey has reached its processing limit."})
        except LeopardError:
            return JsonResponse({'error': "Unable to transcribe audio file."})
        else:
            for word in words:
                print(
                    "{word=\"%s\" start_sec=%.2f end_sec=%.2f confidence=%.2f}"
                    % (word.word, word.start_sec, word.end_sec, word.confidence))
            return JsonResponse({'transcript': transcript})
    return render(request, "transcribe_audio.html")

This view function (transcribe_audio) will receive an audio file sent by a template (we will set this up in the next step), transcribe it using pvleopard, and send the transcript back to the template to be displayed.

Note that Leopard also returns a timestamp and confidence level for every word in the transcript. This will be printed in your terminal.

Inside the /myapp directory, create a /urls.py file and add the following code:

from django.urls import path
from . import views

urlpatterns = [
    path("", views.transcribe_audio, name="transcribe_audio"),
]

In /myproject/urls.py, add the following line to the urlpatterns list:

path("", include("myapp.urls")),

Make sure to also import include from django.urls.

6. Create a Template

Inside the /myapp directory, create a /templates directory. Inside this /templates directory, create a transcribe_audio.html HTML file with the following content in the <body> tag:

<h3>File Upload</h3>
<input id="file-upload-input" type="file" accept="audio/*" name="audioFile" />
<button id="file-upload-btn" type="button">Upload</button>
<h3>Transcript</h3>
<p id="transcript"></p>
<script>
  let mediaStream, audioContext, audioSource
  const fileUploadInput = document.getElementById("file-upload-input")
  const fileUploadBtn = document.getElementById("file-upload-btn")
  const transcriptEl = document.getElementById("transcript")
  fileUploadBtn.addEventListener("click", handleFileUpload)

  async function handleFileUpload() {
    const file = fileUploadInput.files[0]
    if (!file) return

    // add audio file to FormData object
    let data = new FormData()
    data.append('audioFile', file)
    data.append('fileName', file.name)

    // send audio file to the `view` for transcription
    transcriptEl.style.color = 'black'
    transcriptEl.innerText = "Transcribing..."
    const rawResponse = await fetch("", {
      method: 'POST',
      body: data,
      headers: { "X-CSRFToken": '{{csrf_token}}' },
    })

    // handle transcript in response
    const { transcript, error } = await rawResponse.json()
    if (error) {
      transcriptEl.style.color = 'red'
      transcriptEl.innerText = error
    } else if (transcript?.trim() === "") {
      transcriptEl.innerText = "No audio detected."
    } else if (transcript) {
      transcriptEl.innerText = transcript
    }
  }
</script>

This template simply allows you to send an audio file to your view function (transcribe_audio), which will transcribe the audio and return the transcript to be displayed.

7. Run Project

Start the development server:

python manage.py runserver

Click the link in your terminal to access the development server in your browser. Finally, upload an audio file to view the transcript!

Speech to Text with Django