OpenAI - Voice

1 minute read

OpenAI’s voice APIs allow us to transcribe voice as well as translate various languages to English.

[audio file] -> OpenAI Voice API -> [transcription]

Http API

Using CURL we can ask the voice API to transcribe audio:

export OPENAI_KEY="<your-key-here>"
curl https://api.openai.com/v1/audio/transcriptions \
  -X POST \
  -H 'Authorization: Bearer '$OPENAI_KEY'' \
  -H 'Content-Type: multipart/form-data' \
  -F file=@recording.mp3 \
  -F model=whisper-1

The response is:

{"text":"Good morning."}

It also works in various languages. For example, when running on a clip from the popular Demon Slayer anime:

curl https://api.openai.com/v1/audio/transcriptions \
  -X POST \
  -H 'Authorization: Bearer '$OPENAI_KEY'' \                                      
  -H 'Content-Type: multipart/form-data' \
  -F file=@inosuke.mp3 \
  -F model=whisper-1

We get the response:

{"text":"すごいだろう俺はすごいだろう俺は 2回言ってる 4月3日"}

We can also translate many languages from voice to English text:

curl https://api.openai.com/v1/audio/translations \  
  -X POST \                    
  -H 'Authorization: Bearer '$OPENAI_KEY'' \
  -H 'Content-Type: multipart/form-data' \
  -F file=@inosuke.mp3 \  
  -F model=whisper-1

With the translated transcription:

{"text":"I'm amazing, right? I'm amazing, right? He said it twice... He said it three times..."}

API Key

To get an API Key, create an OpenAI account, you can generate tokens under your profile.

Examples

Export your API key as an environment variable:

export OPENAI_KEY="<your-key-here>"

Go

package main

import (
  "context"
  "fmt"
  "os"

  openai "github.com/sashabaranov/go-openai"
)

func main() {
  c := openai.NewClient(os.Getenv("OPENAI_KEY"))
  ctx := context.Background()

  req := openai.AudioRequest{
    Model:    openai.Whisper1,
    FilePath: "./recording.mp3",
  }
  resp, err := c.CreateTranscription(ctx, req)
  if err != nil {
    fmt.Printf("Transcription error: %v\n", err)
    return
  }
  fmt.Println(resp.Text)
}

For translations use c.CreateTranslation instead.

Python

pip3 install openai
import os
import openai

openai.api_key = os.getenv("OPENAI_KEY")

audio_file= open("./recording.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)

To translate use openai.Audio.translate instead.