sarvamai-go SDK Documentation
Speech-to-Text

TranscribeStream (WebSocket)

`SpeechToText.TranscribeStream` for real-time transcription.

Package

import "github.com/Shreehari-Acharya/sarvamai-go/stt"

Signature

func (c *STTClient) TranscribeStream(ctx context.Context, language LanguageCode, opts ...StreamOption) (*speech.Stream, error)

Options

OptionTypeNotes
WithStreamLanguagelanguages.CodeOverrides the language argument when set.
WithStreamModelModelSaarika | ModelSaarasIf omitted, model is not explicitly sent by SDK.
WithStreamModeModeTranscribe | ModeTranslate | ModeVerbatim | ModeTranslit | ModeCodemixMode support depends on model validation rules.
WithStreamSampleRateSampleRate8000 | SampleRate16000SDK validation allows only these two values.
WithStreamInputAudioCodecCodecWAV | CodecPCMS16LE | CodecPCML16 | CodecPCMRAWSDK validation rejects other codecs for streaming.
WithStreamHighVADSensitivityboolEnables higher VAD sensitivity.
WithStreamVADSignalsboolEnables VAD signal events in stream responses.
WithStreamFlushSignalboolEnables server flush signal behavior.

Model + option compatibility

ComboResult
WithStreamModel(ModelSaaras) + WithStreamMode(...)Valid
WithStreamModel(ModelSaarika) + WithStreamMode(...)Validation error
WithStreamMode(...) with model omittedValid in SDK validation (mode check uses saaras:v3 spec)
WithStreamSampleRate(8000 or 16000)Valid
WithStreamSampleRate(22050/24000/etc)Validation error
WithStreamInputAudioCodec(wav/pcm_s16le/pcm_l16/pcm_raw)Valid
WithStreamInputAudioCodec(mp3/flac/etc)Validation error

Default behavior

  1. If language resolves to empty, SDK sets it to unknown before validation.
  2. If sample rate is not set, stream object is created with 16000.

WebSocket query keys sent by SDK

  • language-code
  • model
  • mode
  • sample_rate
  • input_audio_codec
  • high_vad_sensitivity
  • vad_signals
  • flush_signal

Stream usage

Returned type is *speech.Stream.

Common flow:

  1. create stream
  2. send audio via SendAudio
  3. call Flush
  4. iterate with Next()/Current()
  5. check Err() and call Close()

Example

stream, err := client.SpeechToText.TranscribeStream(
    ctx,
    stt.LanguageUnknown,
    stt.WithStreamModel(stt.ModelSaaras),
    stt.WithStreamMode(stt.ModeTranscribe),
    stt.WithStreamSampleRate(stt.SampleRate16000),
    stt.WithStreamInputAudioCodec(stt.CodecWAV),
)
if err != nil {
    panic(err)
}
defer stream.Close()

// Send PCM chunks and flush when done.
if err := stream.SendAudio(chunk1); err != nil {
    panic(err)
}
if err := stream.Flush(); err != nil {
    panic(err)
}

for stream.Next() {
    resp := stream.Current()
    _ = resp
}
if err := stream.Err(); err != nil {
    panic(err)
}

On this page