Fixed Chat Templates for Qwen 3.5 & 3.6

Drop-in Jinja templates that fix rendering errors, token waste, and missing features in the official Qwen chat templates. Works in LM Studio, llama.cpp, vLLM, MLX, oMLX, and any engine that supports HuggingFace Jinja templates.

Why you need this

The official Qwen templates have bugs that break real usage:

Problem	Impact
Tool calls fail on C++ engines	`
`developer` role rejected	Modern APIs send it; the official template raises an error
Empty thinking blocks spam context	Every past turn gets wrapped in tags, even with nothing inside
No way to toggle thinking	You're stuck with whatever the model defaults to
Qwen 3.6: `</thinking>` hallucination	Model sometimes generates the wrong closing tag; parser fails
No-user-query exception breaks tool calling	`raise_exception` crashes agentic loops and resets in OpenClaw and similar runtimes

All six are fixed here, plus a clean <|think_on|> / <|think_off|> toggle you can drop into any message.

Quick install

LM Studio

Open your Qwen model in the right-side panel
Scroll to Prompt Template
Replace the template with the contents of qwen3.5/chat_template.jinja or qwen3.6/chat_template.jinja
Save

llama.cpp / koboldcpp

--jinja --chat-templateFile qwen3.6/chat_template.jinja

vLLM / TextGen

Replace the chat_template string in your tokenizer_config.json with the file contents.

oMLX

Overwrite chat_template.jinja in your local model directory. Load with --jinja. Remove any chat_template_kwargs overrides — the template handles everything internally.

Which file do I use?

File	For models
`qwen3.5/chat_template.jinja`	Qwen3.5-35B-A3B, Qwen3.5-32B, Qwen3.5-14B, and all Qwen 3.5 variants
`qwen3.6/chat_template.jinja`	Qwen3.6-27B, Qwen3.6-35B-A3B, and all Qwen 3.6 variants

The 3.6 template is a superset — it additionally handles preserve_thinking, </thinking> hallucination recovery, and interrupted thought streams. If you're on 3.6, use the 3.6 file.

Thinking toggle

Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.

Fast answer, no reasoning:

System: You are a coding assistant. <|think_off|>
User: What's 2+2?

Deep reasoning:

System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.

The tag syntax (<|think_on|>, <|think_off|>) uses Qwen's control-token delimiters, so it will never collide with real text. Earlier community templates used /think, which broke legitimate paths like cd /mnt/project/think.

Pre-installed models

These templates are already bundled with:

If you're using one of those, you already have the template. This repo is for everyone else.

Technical details — what exactly was fixed

Tool calls on C++ engines

The official template iterates tool call arguments with |items:

{%- for key, value in tool_call.arguments|items %}

Python's Jinja supports |items. C++ runtimes (LM Studio, llama.cpp, MLX) do not — the template produces a rendering error instead of output. This template uses direct dictionary key lookups instead:

{%- for args_name in tool_call.arguments %}
    {%- set args_value = tool_call.arguments[args_name] %}

It also replaces is sequence with is iterable (stricter C++ runtimes require it), removes |safe wrappers (also Python-only), and handles arguments returned as raw strings instead of objects.

`developer` role

The OpenAI-compatible API spec sends message.role == "developer" for system-level instructions. The official Qwen template only checks for "system" and throws on anything else. Both templates here accept "developer" and map it to the system role.

Empty thinking blocks

The official template wraps every past assistant turn in thinking tags:

<|im_start|>assistant
<think/>
</think >

Here is the answer...

When there's no reasoning content, those tags are dead weight — they waste context tokens and break prefix caching. The Qwen 3.5 template checks reasoning_content before emitting. The Qwen 3.6 template goes further: it respects the preserve_thinking kwarg, checks reasoning_content|trim|length > 0, and ties history visibility to the <|think_off|> override.

`</thinking>` hallucination (Qwen 3.6 only)

The Qwen 3.6 model sometimes generates </thinking> instead of the expected closing tag. The official parser splits on </think > only and fails. The 3.6 template detects which closing tag was actually used and splits on that:

{%- if '</think >' in content %}
    {%- set think_end_token = '</think >' %}
{%- elif '</thinking>' in content %}
    {%- set think_end_token = '</thinking>' %}

It also handles interrupted generation (max tokens hit mid-thought) by rescuing incomplete streams instead of injecting broken tag pairs.

Arguments serialization

The official template serializes argument values with |tojson unconditionally, which turns Python True into JSON true correctly but fails when the value is already a string. The fixed templates check the type first — strings pass through as-is, everything else goes through |tojson.

No-user-query exception

The official template scans the message list in reverse to find the last "real" user query (skipping tool-result wrappers). If all user messages are tool results — or there are no user messages at all — it fires raise_exception('No user query found in messages.') and the template hard-crashes.

This breaks real usage:

Agentic tool-calling chains where the conversation ends with tool results and no fresh user query
After /reset or /new in runtimes like OpenClaw, where tool results from a prior session persist without a new user message
System-only contexts with no user messages

The fix replaces the exception with a graceful fallback: {%- set ns.last_query_index = messages|length - 1 %}. The thinking display logic then degrades naturally — assistant turns with reasoning content still show thinking tags when preserve_thinking is enabled.

Comparison — Qwen 3.5 templates

Feature	Official	LuffyTheFox	mod-ellary	Pneuny	This
Tool arguments	Fails	Fixed	Missing	Fixed	Fixed
`\|safe` removed	Fails	Fixed	Missing	Fixed	Fixed
`developer` role	Missing	Missing	Missing	Missing	Added
Thinking toggle	None	None	`/think` (system only)	None	`<\|think_off\|>` anywhere
Empty think in history	Broken	Broken	Tags omitted	Broken	Fixed
Text safety	N/A	N/A	Breaks on `/think` in paths	N/A	Safe
Clean instructions	Yes	Yes	Yes	Injects "I cannot call a tool"	Yes
No-user-query crash	Crashes	Crashes	Crashes	Crashes	Graceful fallback

Comparison — Qwen 3.6 template

Feature	Official	This
Tool arguments	Fails (`\|items`)	Fixed
`\|safe` removed	Fails	Fixed
`developer` role	Missing	Added
Thinking toggle	None	`<\|think_off\|>` anywhere
`preserve_thinking`	Spams empty blocks	Dynamic length checks
`</thinking>` hallucination	Fails	Detected and handled
Interrupted streams	Broken tags	Rescued
No-user-query crash	Crashes	Graceful fallback

Authorship

Role	Author
Original models	Alibaba Cloud (Qwen team)
Template fixes	froggeric

License

Apache-2.0, inherited from Qwen.

froggeric/Qwen-Fixed-Chat-Templates

Fixed Chat Templates for Qwen 3.5 & 3.6

Why you need this

Quick install

LM Studio

llama.cpp / koboldcpp

vLLM / TextGen

oMLX

Which file do I use?

Thinking toggle

Pre-installed models

Tool calls on C++ engines

`developer` role

Empty thinking blocks

`</thinking>` hallucination (Qwen 3.6 only)

Arguments serialization

No-user-query exception

Authorship

License

No reviews yet

Model Info

Community

Rating Guidelines

froggeric/Qwen-Fixed-Chat-Templates

Fixed Chat Templates for Qwen 3.5 & 3.6

Why you need this

Quick install

LM Studio

llama.cpp / koboldcpp

vLLM / TextGen

oMLX

Which file do I use?

Thinking toggle

Pre-installed models

Tool calls on C++ engines

developer role

Empty thinking blocks

</thinking> hallucination (Qwen 3.6 only)

Arguments serialization

No-user-query exception

Authorship

License

No reviews yet

Model Info

Community

Rating Guidelines

`developer` role

`</thinking>` hallucination (Qwen 3.6 only)