Back to Models
AM

AMAImedia/Qwen3-1.7B-TTS-Cross-Darwin-NOESIS-AWQ-INT4

AMAImediaaudio

Qwen3-1.7B-TTS-Cross-Darwin-NOESIS-AWQ-INT4

AWQ INT4 quantization of FINAL-Bench/Darwin-TTS-1.7B-Cross optimized for low-VRAM consumer hardware (RTX 3060 6 GB).

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).


⚠️ License notice

This model is derived from FINAL-Bench/Darwin-TTS-1.7B-Cross which is licensed under Apache 2.0. This AWQ quantization retains the same Apache 2.0 license — see the LICENSE file in this repository for the full text.


Model summary

PropertyValue
Base modelFINAL-Bench/Darwin-TTS-1.7B-Cross
ArchitectureQwen3TTSForConditionalGeneration
Model typeqwen3_tts
TTS model size~2.1B total (talker 28L + code_predictor + codec + encoder/decoder)
TTS model typebase with 3% FFN blend from Qwen3-1.7B LLM
Tokenizer12 Hz RVQ codec tokenization
Speaker encoderx-vector based (x_vector_only_mode=True), sample_rate=24 000 Hz
Original precisionBF16 safetensors (~3.4 GB)
Quantized precisionAWQ INT4 (group_size=128, GEMM, zero_point=True) — talker weights only
Languages10 (ko, en, ja, zh + 6 more)
Disk footprint~0.9 GB
Inference VRAM~1.2 GB (fits alongside other specialists)
Quantization libraryAutoAWQ 0.2.9
Calibration set128 TTS prompts (short natural speech samples), max_seq_len=512
RNG seed1729 (NOESIS reproducibility lock)

Architecture note: Darwin-TTS-1.7B-Cross is a merge model — not a fine-tuned model. It blends 3% of FFN weights from Qwen3-1.7B (LLM) into the talker backbone of Qwen3-TTS-12Hz-1.7B-Base (84 / 976 tensors: gate/up/down_proj × 28 talker layers). This weight-space arithmetic (pure lerp, no training) adds emotional expressiveness while preserving TTS stop-signal stability.

4-module structure:

  • talker — 28-layer Qwen3 LLM backbone (AWQ-quantized)
  • code_predictor — 5-layer token head, hidden_size=1024 (BF16, untouched)
  • speech_tokenizer — 12 Hz RVQ codec (BF16, untouched)
  • encoder/decoder — audio waveform pipeline (BF16, untouched)

The "Cross" refers to cross-lingual voice cloning using x-vector speaker embedding (x_vector_only_mode=True). At α≥10% the LLM's generation pattern overrides the TTS stop signal — hence the conservative 3% blend ratio.

⚠️ AWQ compatibility: AutoAWQ quantizes the talker (language model head) component. Non-standard modules (code_predictor, speaker_encoder) are kept in BF16 within the checkpoint. Verify output with validate_awq_quality.py before using as a KD teacher.


Why this quantization

The original Darwin-TTS-1.7B-Cross weights are in BF16 (~3.4 GB). While this fits VRAM in principle, AWQ INT4 reduces VRAM to ~1.2 GB, enabling parallel loading alongside other NOESIS specialists in the sequential swapping pipeline.

This AWQ build:

  1. Reduces VRAM from ~3.4 GB to ~1.2 GB — freeing headroom for KV cache
  2. Uses GEMM kernel — compatible with device_map={"":0} (no CPU offload)
  3. Provenance-tracked (noesis_provenance.json ships alongside the model)
  4. Calibrated on natural TTS prompts matching cross-lingual synthesis distribution

Quantization methodology

This checkpoint was produced using a proprietary quantization pipeline developed by AMAImedia as part of the NOESIS DHCF-FNO framework (v14.7). The Qwen3TTSForConditionalGeneration architecture is not present in the transformers main branch and has no upstream AutoAWQ support; quantization required original engineering work developed internally at AMAImedia.


How to use

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch

model_id = "amaimedia/Qwen3-1.7B-TTS-Cross-Darwin-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoAWQForCausalLM.from_quantized(
    model_id,
    device_map={"": 0},
    torch_dtype=torch.float16,
    fuse_layers=False,
    trust_remote_code=True,
)

# TTS inference requires additional processing — see base model card for full pipeline
prompt = "Hello, how are you today?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)

Note: Full TTS synthesis (audio waveform output) requires the complete NOESIS TTS pipeline with NanoCodec vocoder. The AWQ checkpoint above is suitable for logit extraction (KD) and text-side token generation.


NOESIS context

In NOESIS this model serves as the TTS stream-A teacher for Specialist M3-TTS during knowledge distillation. It provides cross-lingual text-side logits (vocab space) used in build_ensemble_labels.py with proposed weight w=0.20 (secondary, combined with codec-side teachers via NanoCodec).

NOESIS TTS knowledge distillation uses two parallel streams:

StreamVocabTeachers
Stream A (text-side)151 936 (Qwen3)Darwin-TTS-1.7B-Cross + others
Stream B (codec-side)65 536 (NanoCodec)nanocodec_distill.py

NOESIS specialists overview:

IDRoleSize
M1ASR (150+ langs)10B/3B
M2Dubbing LM (30 langs full)10B/3B
M3TTS + voice cloning10B/3B
M4Chat + creative writing10B/3B
M5Code + math10B/3B
M6Deep research (1M ctx)10B/3B
M7Prompt engineering4B/0.8B
M8Quality control (PRM)4B/0.8B
M9Orchestrator + routing4B/0.8B

Acknowledgements & citation

Base model: Darwin-TTS-1.7B-Cross by FINAL-Bench (Qwen3 TTS architecture, cross-lingual synthesis).

@misc{darwin_tts_cross,
  title     = {Darwin-TTS-1.7B-Cross},
  author    = {FINAL-Bench},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/FINAL-Bench/Darwin-TTS-1.7B-Cross}
}

Quantization & NOESIS integration:

@misc{noesis_v14,
  title     = {NOESIS v14.7: DHCF-FNO Multilingual Dubbing Platform},
  author    = {Bolotnikov, Ilia},
  year      = {2026},
  publisher = {AMAImedia},
  url       = {https://amaimedia.com}
}
Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes14
Downloads
📝

No reviews yet

Be the first to review AMAImedia/Qwen3-1.7B-TTS-Cross-Darwin-NOESIS-AWQ-INT4!

Model Info

ProviderAMAImedia
Categoryaudio
Reviews0
Avg. Rating / 5.0

Community

Likes14
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor