Back to Models
pyannote logo

pyannote/speaker-diarization-community-1

pyannoteaudio

community-1 speaker diarization

This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarization.

  • stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
  • audio files sampled at a different rate are resampled to 16kHz automatically upon loading.

The main improvements brought by Community-1 are:

  • improved speaker assignment and counting
  • simpler reconciliation with transcription timestamps with exclusive speaker diarization
  • easy offline use (i.e. without internet connection)
  • (optionally) hosted on pyannoteAI cloud

Setup

  1. pip install pyannote.audio
  2. Accept user conditions
  3. Create access token at hf.co/settings/tokens.

Quick start

# download the pipeline from Huggingface
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1", 
    token="{huggingface-token}")

# run the pipeline locally on your computer
output = pipeline("audio.wav")

# print the predicted speaker diarization 
for turn, speaker in output.speaker_diarization:
    print(f"{speaker} speaks between t={turn.start:.3f}s and t={turn.end:.3f}s")

Benchmark

Out of the box, Community-1 is much better than speaker-diarization-3.1.

We report diarization error rates (in %) on large collection of academic benchmarks (fully automatic processing, no forgiveness collar, nor skipping overlapping speech).

Benchmark (last updated in 2025-09)legacy (3.1)community-1precision-2
AISHELL-412.211.711.4
AliMeeting (channel 1)24.520.315.2
AMI (IHM)18.817.012.9
AMI (SDM)22.719.915.6
AVA-AVD49.744.637.1
CALLHOME (part 2)28.526.716.6
DIHARD 3 (full)21.420.214.7
Ego4D (dev.)51.246.839.0
MSDWild25.422.817.3
RAMC22.220.810.5
REPERE (phase2)7.98.97.4
VoxConverse (v0.3)11.211.28.5

Precision-2 model is even better and can be tested like this:

  1. Create an API key on pyannoteAI dashboard (free credits included)
  2. Change one line of code
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
-     'pyannote/speaker-diarization-community-1', token="{huggingface-token}")
+     'pyannote/speaker-diarization-precision-2', token="{pyannoteAI-api-key}")
diarization = pipeline("audio.wav")  # runs on pyannoteAI servers

Processing on GPU

pyannote.audio pipelines run on CPU by default. You can send them to GPU with the following lines:

import torch
pipeline.to(torch.device("cuda"))

Processing from memory

Pre-loading audio files in memory may result in faster processing:

waveform, sample_rate = torchaudio.load("audio.wav")
output = pipeline({"waveform": waveform, "sample_rate": sample_rate})

Monitoring progress

Hooks are available to monitor the progress of the pipeline:

from pyannote.audio.pipelines.utils.hook import ProgressHook
with ProgressHook() as hook:
    output = pipeline("audio.wav", hook=hook)

Controlling the number of speakers

In case the number of speakers is known in advance, one can use the num_speakers option:

output = pipeline("audio.wav", num_speakers=2)

One can also provide lower and/or upper bounds on the number of speakers using min_speakers and max_speakers options:

output = pipeline("audio.wav", min_speakers=2, max_speakers=5)

Exclusive speaker diarization

Community-1 pretrained pipeline returns a new exclusive speaker diarization, on top of the regular speaker diarization, available as output.exclusive_speaker_diarization.

This is a feature which is backported from our latest commercial model that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.

Offline use

  1. In the terminal, copy the pipeline on disk:
# make sure git-lfs is installed (https://git-lfs.com)
git lfs install

# create a directory on disk
mkdir /path/to/directory

# when prompted for a password, use an access token with write permissions.
# generate one from your settings: https://huggingface.co/settings/tokens
git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1
  1. In Python, use the pipeline without internet connection:
# load pipeline from disk (works without internet connection)
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1')

# run the pipeline locally on your computer
output = pipeline("audio.wav")

Citations

  1. Speaker segmentation model
@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
  1. Speaker embedding model
@inproceedings{Wang2023,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}
  1. Speaker clustering
@article{Landini2022,
  author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
  title={{Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks}},
  year={2022},
  journal={Computer Speech \& Language},
}

Acknowledgment

Training and tuning made possible thanks to GENCI on the Jean Zay supercomputer.

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes334
Downloads
📝

No reviews yet

Be the first to review pyannote/speaker-diarization-community-1!

Model Info

Providerpyannote
Categoryaudio
Reviews0
Avg. Rating / 5.0

Community

Likes334
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor