Back to Models
Visit Website
AB
Abiray/Huihui-Qwen3.6-35B-A3B-abliterated-GGUF
Abiray • generalHuihui Qwen3.6-35B A3B Abliterated (GGUF)
This repository provides GGUF format quantizations for the huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated model.
Because this model has been fully "abliterated" to bypass alignment and safety refusals, it acts as a highly capable engine for unrestricted creative writing, dynamic storytelling, and immersive roleplay scenarios.
Available Quantizations
| File | Bit Size | Description |
|---|---|---|
huihui-35B-Q8_0.gguf | 8-bit | Highest quality quant, virtually indistinguishable from F16. |
huihui-35B-Q6_K.gguf | 6-bit | Excellent quality with a noticeably reduced memory footprint. |
huihui-35B-Q5_K_M.gguf | 5-bit | Great balance between reasoning performance and RAM usage. |
huihui-35B-Q4_K_M.gguf | 4-bit | Recommended. The optimal sweet spot for speed and quality. |
huihui-35B-Q4_K_S.gguf | 4-bit | Slightly smaller than K_M, allowing for faster inference on constrained setups. |
huihui-35B-Q3_K_M.gguf | 3-bit | Lowest resource requirement, though perplexity loss becomes more noticeable. |
Quick Start (llama.cpp)
These models are designed to be run directly via llama.cpp. The following commands are standard for local Linux environments (such as Linux Mint or Ubuntu).
1. Clone and compile via CMake:
git clone [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
cd llama.cpp
cmake -B build
cmake --build build --config Release