AMAImedia/Qwen3-1.7B-TTS-Cross-Darwin-NOESIS-AWQ-INT4
AMAImedia • audioQwen3-1.7B-TTS-Cross-Darwin-NOESIS-AWQ-INT4
AWQ INT4 quantization of FINAL-Bench/Darwin-TTS-1.7B-Cross optimized for low-VRAM consumer hardware (RTX 3060 6 GB).
Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).
- Founder: Ilia Bolotnikov
- Organization: AMAImedia.com
- X (Twitter): @AMAImediacom
- LinkedIn: Ilia Bolotnikov
- Telegram: @djbionicl
- NOESIS version: v14.7
- Release date: 2026-04
⚠️ License notice
This model is derived from FINAL-Bench/Darwin-TTS-1.7B-Cross which is licensed
under Apache 2.0. This AWQ quantization retains the same Apache 2.0 license —
see the LICENSE file in this repository for the full text.
Model summary
| Property | Value |
|---|---|
| Base model | FINAL-Bench/Darwin-TTS-1.7B-Cross |
| Architecture | Qwen3TTSForConditionalGeneration |
| Model type | qwen3_tts |
| TTS model size | ~2.1B total (talker 28L + code_predictor + codec + encoder/decoder) |
| TTS model type | base with 3% FFN blend from Qwen3-1.7B LLM |
| Tokenizer | 12 Hz RVQ codec tokenization |
| Speaker encoder | x-vector based (x_vector_only_mode=True), sample_rate=24 000 Hz |
| Original precision | BF16 safetensors (~3.4 GB) |
| Quantized precision | AWQ INT4 (group_size=128, GEMM, zero_point=True) — talker weights only |
| Languages | 10 (ko, en, ja, zh + 6 more) |
| Disk footprint | ~0.9 GB |
| Inference VRAM | ~1.2 GB (fits alongside other specialists) |
| Quantization library | AutoAWQ 0.2.9 |
| Calibration set | 128 TTS prompts (short natural speech samples), max_seq_len=512 |
| RNG seed | 1729 (NOESIS reproducibility lock) |
Architecture note: Darwin-TTS-1.7B-Cross is a merge model — not a fine-tuned model. It blends 3% of FFN weights from Qwen3-1.7B (LLM) into the talker backbone of Qwen3-TTS-12Hz-1.7B-Base (84 / 976 tensors: gate/up/down_proj × 28 talker layers). This weight-space arithmetic (pure lerp, no training) adds emotional expressiveness while preserving TTS stop-signal stability.
4-module structure:
- talker — 28-layer Qwen3 LLM backbone (AWQ-quantized)
- code_predictor — 5-layer token head, hidden_size=1024 (BF16, untouched)
- speech_tokenizer — 12 Hz RVQ codec (BF16, untouched)
- encoder/decoder — audio waveform pipeline (BF16, untouched)
The "Cross" refers to cross-lingual voice cloning using x-vector speaker embedding (
x_vector_only_mode=True). At α≥10% the LLM's generation pattern overrides the TTS stop signal — hence the conservative 3% blend ratio.⚠️ AWQ compatibility: AutoAWQ quantizes the talker (language model head) component. Non-standard modules (
code_predictor,speaker_encoder) are kept in BF16 within the checkpoint. Verify output withvalidate_awq_quality.pybefore using as a KD teacher.
Why this quantization
The original Darwin-TTS-1.7B-Cross weights are in BF16 (~3.4 GB). While this fits VRAM in principle, AWQ INT4 reduces VRAM to ~1.2 GB, enabling parallel loading alongside other NOESIS specialists in the sequential swapping pipeline.
This AWQ build:
- Reduces VRAM from ~3.4 GB to ~1.2 GB — freeing headroom for KV cache
- Uses GEMM kernel — compatible with
device_map={"":0}(no CPU offload) - Provenance-tracked (
noesis_provenance.jsonships alongside the model) - Calibrated on natural TTS prompts matching cross-lingual synthesis distribution
Quantization methodology
This checkpoint was produced using a proprietary quantization pipeline developed
by AMAImedia as part of the NOESIS DHCF-FNO framework (v14.7). The
Qwen3TTSForConditionalGeneration architecture is not present in the transformers
main branch and has no upstream AutoAWQ support; quantization required original
engineering work developed internally at AMAImedia.
How to use
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch
model_id = "amaimedia/Qwen3-1.7B-TTS-Cross-Darwin-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoAWQForCausalLM.from_quantized(
model_id,
device_map={"": 0},
torch_dtype=torch.float16,
fuse_layers=False,
trust_remote_code=True,
)
# TTS inference requires additional processing — see base model card for full pipeline
prompt = "Hello, how are you today?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
Note: Full TTS synthesis (audio waveform output) requires the complete NOESIS TTS pipeline with NanoCodec vocoder. The AWQ checkpoint above is suitable for logit extraction (KD) and text-side token generation.
NOESIS context
In NOESIS this model serves as the TTS stream-A teacher for Specialist
M3-TTS during knowledge distillation. It provides cross-lingual text-side
logits (vocab space) used in build_ensemble_labels.py with proposed weight
w=0.20 (secondary, combined with codec-side teachers via NanoCodec).
NOESIS TTS knowledge distillation uses two parallel streams:
| Stream | Vocab | Teachers |
|---|---|---|
| Stream A (text-side) | 151 936 (Qwen3) | Darwin-TTS-1.7B-Cross + others |
| Stream B (codec-side) | 65 536 (NanoCodec) | nanocodec_distill.py |
NOESIS specialists overview:
| ID | Role | Size |
|---|---|---|
| M1 | ASR (150+ langs) | 10B/3B |
| M2 | Dubbing LM (30 langs full) | 10B/3B |
| M3 | TTS + voice cloning | 10B/3B |
| M4 | Chat + creative writing | 10B/3B |
| M5 | Code + math | 10B/3B |
| M6 | Deep research (1M ctx) | 10B/3B |
| M7 | Prompt engineering | 4B/0.8B |
| M8 | Quality control (PRM) | 4B/0.8B |
| M9 | Orchestrator + routing | 4B/0.8B |
Acknowledgements & citation
Base model: Darwin-TTS-1.7B-Cross by FINAL-Bench (Qwen3 TTS architecture, cross-lingual synthesis).
@misc{darwin_tts_cross,
title = {Darwin-TTS-1.7B-Cross},
author = {FINAL-Bench},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/FINAL-Bench/Darwin-TTS-1.7B-Cross}
}
Quantization & NOESIS integration:
@misc{noesis_v14,
title = {NOESIS v14.7: DHCF-FNO Multilingual Dubbing Platform},
author = {Bolotnikov, Ilia},
year = {2026},
publisher = {AMAImedia},
url = {https://amaimedia.com}
}