sarvamai-go SDK Documentation
Speech-to-Text

Transcribe (REST)

`SpeechToText.Transcribe` for file-based speech recognition.

Package

import "github.com/Shreehari-Acharya/sarvamai-go/stt"

Signature

func (c *STTClient) Transcribe(ctx context.Context, file io.Reader, opts ...Option) (*TranscribeResponse, error)

Required input

  • file (io.Reader) is required.

Options

OptionTypeNotes
WithModelModelSaarika | ModelSaarasIf omitted, model is not explicitly sent by SDK.
WithModeModeTranscribe | ModeTranslate | ModeVerbatim | ModeTranslit | ModeCodemixMode support depends on model validation rules.
WithLanguagelanguages.CodeAllowed values depend on selected model (or validation default when omitted).
WithAudioCodecspeech.InputAudioCodecSDK does not locally validate codec values for REST transcribe.

Model + option compatibility

ComboResult
WithModel(ModelSaaras) + WithMode(...)Valid
WithModel(ModelSaarika) + WithMode(...)Validation error (mode only supported with saaras:v3)
WithMode(...) with model omittedValid in SDK validation (mode check uses saaras:v3 spec)
WithModel(ModelSaarika) + WithLanguage in SaarikaLanguagesValid
WithModel(ModelSaarika) + WithLanguage outside SaarikaLanguagesValidation error
WithModel(ModelSaaras) + WithLanguage in SaarasLanguagesValid
WithModel(ModelSaaras) + WithLanguage outside SaarasLanguagesValidation error
WithLanguage with model omittedValid only when language is in SaarasLanguages

Validation behavior from SDK

  1. file must be non-nil.
  2. If model is set, it must be saarika:v2.5 or saaras:v3.
  3. Mode is validated via speech.ValidateMode.
  4. Language is validated via speech.ValidateLanguageWithSpec.

Response

FieldTypeNotes
TranscriptstringMain transcription output.
RequestID*stringOptional request identifier.
LanguageCode*stringOptional detected/selected language code.
LanguageProbability*float64Optional confidence for language detection.
Timestamps*speech.TimestampsOptional word timing metadata.
DiarizedTranscript*speech.DiarizedTranscriptOptional diarized output when available.

Example

resp, err := client.SpeechToText.Transcribe(
    ctx,
    audioFile,
    stt.WithModel(stt.ModelSaaras),
    stt.WithMode(stt.ModeTranslate),
    stt.WithLanguage(stt.LanguageHiIN),
    stt.WithAudioCodec(stt.CodecWAV),
)
if err != nil {
    panic(err)
}
fmt.Println(resp.Transcript)

On this page