Back to Models
AM

AMAImedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4

AMAImediaimage

Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4

AWQ INT4 quantization of FINAL-Bench/Darwin-9B-Opus optimized for low-VRAM consumer hardware (RTX 3060 6 GB).

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).


⚠️ License notice

This model is derived from FINAL-Bench/Darwin-9B-Opus which is licensed under Apache 2.0. This AWQ quantization retains the same Apache 2.0 license — see the LICENSE file in this repository for the full text.


Model summary

PropertyValue
Base modelFINAL-Bench/Darwin-9B-Opus
Underlying architectureQwen3_5ForConditionalGeneration (VLM — text + vision encoder)
Model typeqwen3_5
Original precisionBF16 safetensors (~18 GB)
Quantized precisionAWQ INT4 (group_size=128, GEMM, zero_point=True)
Text vocab size248 320
Context length131 072 tokens
Hidden size (text)4 096
Layers (text)32 (hybrid: 24 GDN/linear_attention + 8 full_attention, interval=4)
Languages201 (native Qwen3.5-9B multilingual)
Disk footprint~4.7 GB
Inference VRAM~5.2 GB (text-only path, no vision input)
Quantization libraryAutoAWQ 0.2.9
Calibration set128 diverse prompts (code/reasoning/chat/research), max_seq_len=512
RNG seed1729 (NOESIS reproducibility lock)

Architecture note: Darwin-9B-Opus is a merge model built with the Darwin V5 methodology (MRI-guided per-tensor diagnostics 70% + evolutionary genome optimization 30%, implemented via direct PyTorch DARE-TIES).

  • Father: Qwen/Qwen3.5-9B (original pre-training + RLHF)
  • Mother: Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled (LoRA SFT on Claude 4.6 Opus reasoning chains)

The underlying Qwen3.5 backbone is the VLM branch (Qwen3_5ForConditionalGeneration) with a vision encoder and a hybrid text decoder (qwen3_5_text sub-config): 24 GatedDeltaNet/linear_attention layers + 8 full self-attention layers (full_attn at every 4th layer). AWQ quantization targets the text decoder weights only: GDN layers have their MLP quantized; full_attention layers get self_attn + MLP quantized.


Why this quantization

The original Darwin-9B-Opus weights are in BF16 (~18 GB) which does not fit a 6 GB consumer GPU. This AWQ build:

  1. Fits inside ~5.2 GB VRAM for text-only inference on an RTX 3060 6 GB
  2. Uses GEMM kernel — compatible with device_map={"":0} (no CPU offload)
  3. Provenance-tracked (noesis_provenance.json ships alongside the model)
  4. Calibrated on diverse multilingual prompts matching Darwin's broad training domain

Quantization methodology

This checkpoint was produced using a proprietary quantization pipeline developed by AMAImedia as part of the NOESIS DHCF-FNO framework (v14.7). The Qwen3.5 hybrid architecture used in Darwin-9B-Opus is not supported by upstream AutoAWQ; quantization required original engineering work developed internally at AMAImedia.


How to use

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch

model_id = "amaimedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
    model_id,
    device_map={"": 0},
    torch_dtype=torch.float16,
    fuse_layers=False,
)

prompt = "Explain the difference between REST and GraphQL with code examples."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Note: Vision inputs are not supported through AutoAWQ's text-only path. For multimodal use, load the BF16 base model with trust_remote_code=True.


NOESIS context

In NOESIS this model serves as a multilingual broad-domain teacher for Specialists M4-CHAT, M5-CODE, and M6-RESEARCH during knowledge distillation. It is loaded sequentially (per the NOESIS swapping protocol) onto the RTX 3060, producing top-K=512 logits at temperature=4.0.

⚠️ KD pipeline note: Darwin-9B-Opus has vocab_size=248 320 (Qwen3.5 extended vocab), while NOESIS student models use Qwen3-8B native vocab 151 936. Logit extraction requires vocab head truncation to index 151 936 via purify_logits() before ensemble aggregation in build_ensemble_labels.py. Proposed KD weight: w=0.30.

NOESIS specialists overview:

IDRoleSize
M1ASR (150+ langs)10B/3B
M2Dubbing LM (30 langs full)10B/3B
M3TTS + voice cloning10B/3B
M4Chat + creative writing10B/3B
M5Code + math10B/3B
M6Deep research (1M ctx)10B/3B
M7Prompt engineering4B/0.8B
M8Quality control (PRM)4B/0.8B
M9Orchestrator + routing4B/0.8B

Acknowledgements & citation

Base model: Darwin-9B-Opus by FINAL-Bench (derived from Qwen3.5-9B + Claude Opus distillation).

@misc{darwin9b_opus,
  title     = {Darwin-9B-Opus},
  author    = {FINAL-Bench},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}
}

Quantization & NOESIS integration:

@misc{noesis_v14,
  title     = {NOESIS v14.7: DHCF-FNO Multilingual Dubbing Platform},
  author    = {Bolotnikov, Ilia},
  year      = {2026},
  publisher = {AMAImedia},
  url       = {https://amaimedia.com}
}
Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes11
Downloads
📝

No reviews yet

Be the first to review AMAImedia/Qwen3.5-9B-Darwin-Opus-NOESIS-AWQ-INT4!

Model Info

ProviderAMAImedia
Categoryimage
Reviews0
Avg. Rating / 5.0

Community

Likes11
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor