Back to ModelsVisit Website
LE
leonsarmiento/Qwen3.6-27B-3bit-mlx
leonsarmiento • generalleonsarmiento/Qwen3.6-27B-3bit-mlx
This model leonsarmiento/Qwen3.6-27B-3bit-mlx was converted to MLX format from Qwen/Qwen3.6-27B using mlx-lm version 0.31.2.
Quantization Details
The model uses mixed quantization:
- Embedding layers: 5-bit with group_size=64
- Prediction layers: 5-bit with group_size=64
- All other layers: 3-bit with group_size=64
This mixed precision approach provides a balance between compression and quality.
Use with mlx
pip install mlx-lm
Recommended Inference Parameters - use Jinja template on LM studio
Thinking Preserve ({%- set preserve_thinking = true %}):
- General tasks:
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0 - Coding tasks:
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Instruct Mode ({%- set enable_thinking = false -%}):
- General tasks:
temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0 - Reasoning tasks:
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Example Usage
from mlx_lm import load, generate
model, tokenizer = load("leonsarmiento/Qwen3.6-27B-3bit-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
Example with Custom Parameters
from mlx_lm import load, generate
model, tokenizer = load("leonsarmiento/Qwen3.6-27B-3bit-mlx")
prompt = "hello"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_dict=False)
response = generate(
model,
tokenizer,
prompt=prompt,
temperature=0.6,
top_p=0.95,
top_k=20,
min_p=0.0,
repetition_penalty=1.0
)