Back to Models
AI

ai-sage/GigaAM-v3

ai-sageaudio

GigaAM-v3

GigaAM-v3 is a Conformer-based foundation model with 220–240M parameters, pretrained on diverse Russian speech data using the HuBERT-CTC objective. It is the third generation of the GigaAM family and provides state-of-the-art performance on Russian ASR across a wide range of domains.

GigaAM-v3 includes the following model variants:

  • ssl — self-supervised HuBERT–CTC encoder pre-trained on 700,000 hours of Russian speech
  • ctc — ASR model fine-tuned with a CTC decoder
  • rnnt — ASR model fine-tuned with an RNN-T decoder
  • e2e_ctc — end-to-end CTC model with punctuation and text normalization
  • e2e_rnnt — end-to-end RNN-T model with punctuation and text normalization

GigaAM-v3 training incorporates new internal datasets: callcenter conversations, speech with background music, natural speech, and speech with atypical characteristics. the models perform on average 30% better on these new domains, while maintaining the same quality as previous GigaAM generations on public benchmarks.

The table below reports the Word Error Rate (%) for GigaAM-v3 and other existing models over diverse domains.

Set NameV3_CTCV3_RNNTT-One + LMWhisper
Open Datasets3.02.65.712.0
Golos Farfield4.53.912.216.7
Natural Speech7.86.914.513.6
Disordered Speech20.619.251.059.3
Callcenter10.39.513.523.9
Average9.28.419.425.1

The end-to-end ASR models (e2e_ctc and e2e_rnnt) produce punctuated, normalized text directly. In end-to-end ASR comparisons of e2e_ctc and e2e_rnnt against Whisper-large-v3, using Gemini 2.5 Pro as an LLM-as-a-judge, GigaAM-v3 models win by an average margin of 70:30.

For detailed results, see metrics.

Usage

from transformers import AutoModel

revision = "e2e_rnnt"  # can be any v3 model: ssl, ctc, rnnt, e2e_ctc, e2e_rnnt
model = AutoModel.from_pretrained(
    "ai-sage/GigaAM-v3",
    revision=revision,
    trust_remote_code=True,
)

transcription = model.transcribe("example.wav")
print(transcription)

Recommended versions:

  • torch==2.8.0, torchaudio==2.8.0
  • transformers==4.57.1
  • pyannote-audio==4.0.0, torchcodec==0.7.0
  • (any) hydra-core, omegaconf, sentencepiece

Full usage guide can be found in the example.

License: MIT

Paper: GigaAM: Efficient Self-Supervised Learner for Speech Recognition (InterSpeech 2025)

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes98
Downloads
📝

No reviews yet

Be the first to review ai-sage/GigaAM-v3!

Model Info

Providerai-sage
Categoryaudio
Reviews0
Avg. Rating / 5.0

Community

Likes98
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor