Back to Models
LI

LiquidAI/LFM2-24B-A2B

LiquidAIgeneral
Liquid AI
Try LFMDocsLEAPDiscord

LFM2-24B-A2B

LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

  • Best-in-class efficiency: A 24B MoE model with only 2B active parameters per token, fitting in 32 GB of RAM for deployment on consumer laptops and desktops.
  • Fast edge inference: 112 tok/s decode on AMD CPU, 293 tok/s on H100. Fits in 32B GB of RAM with day-one support llama.cpp, vLLM, and SGLang.
  • Predictable scaling: Quality improves log-linearly from 350M to 24B total parameters, confirming the LFM2 hybrid architecture scales reliably across nearly two orders of magnitude.

image

Find more information about LFM2-24B-A2B in our blog post.

🗒️ Model Details

LFM2-24B-A2B is a general-purpose instruct model (without reasoning traces) with the following features:

PropertyLFM2-8B-A1BLFM2-24B-A2B
Total parameters8.3B24B
Active parameters1.5B2.3B
Layers24 (18 conv + 6 attn)40 (30 conv + 10 attn)
Context length32,768 tokens32,768 tokens
Vocabulary size65,53665,536
Training precisionMixed BF16/FP8Mixed BF16/FP8
Training budget12 trillion tokens17 trillion tokens
LicenseLFM Open License v1.0LFM Open License v1.0

Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese

Generation parameters:

  • temperature: 0.1
  • top_k: 50
  • repetition_penalty: 1.05

We recommend the following use cases:

  • Agentic tool use: Native function calling, web search, structured outputs. Ideal as the fast inner-loop model in multi-step agent pipelines.
  • Offline document summarization and Q&A: Run entirely on consumer hardware for privacy-sensitive workflows (legal, medical, corporate).
  • Privacy-preserving customer support agent: Deployed on-premise at a company, handles multi-turn support conversations with tool access (database lookups, ticket creation) without data leaving the network.
  • Local RAG pipelines: Serve as the generation backbone in retrieval-augmented setups on a single machine without GPU servers.

We don't recommend using it for coding, as it wasn't optimized for this purpose.

Chat Template

LFM2-24B-A2B uses a ChatML-like format. See the Chat Template documentation for details. Example:

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant

You can use tokenizer.apply_chat_template() to format your messages automatically.

Tool Use

LFM2-24B-A2B supports function calling as follows:

  1. Function definition: We recommend providing the list of tools as a JSON object in the system prompt. You can also use the tokenizer.apply_chat_template() function with tools.
  2. Function call: By default, LFM2-24B-A2B writes Pythonic function calls (a Python list between <|tool_call_start|> and <|tool_call_end|> special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt.
  3. Function execution: The function call is executed, and the result is returned as a "tool" role.
  4. Final answer: LFM2-24B-A2B interprets the outcome of the function call to address the original user prompt in plain text.

See the Tool Use documentation for the full guide. Example:

<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

🏃 Inference

LFM2-24B-A2B is supported by many inference frameworks. See the Inference documentation for the full list.

NameDescriptionDocsNotebook
TransformersSimple inference with direct access to model internals.LinkColab link
vLLMHigh-throughput production deployments with GPU.LinkColab link
llama.cppCross-platform inference with CPU offloading.LinkColab link
MLXApple's machine learning framework optimized for Apple Silicon.Link
LM StudioDesktop application for running LLMs locally.Link

Here's a quick start example with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2-24B-A2B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
#   attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.1,
    top_k=50,
    repetition_penalty=1.05,
    max_new_tokens=512,
    streamer=streamer,
)

🔧 Fine-Tuning

NameDescriptionDocsNotebook
CPT (Unsloth)Continued Pre-Training using Unsloth for text completion.LinkColab link
CPT (Unsloth)Continued Pre-Training using Unsloth for translation.LinkColab link
SFT (Unsloth)Supervised Fine-Tuning with LoRA using Unsloth.LinkColab link
SFT (TRL)Supervised Fine-Tuning with LoRA using TRL.LinkColab link
DPO (TRL)Direct Preference Optimization with LoRA using TRL.LinkColab link
GRPO (Unsloth)GRPO with LoRA using Unsloth.LinkColab link
GRPO (TRL)GRPO with LoRA using TRL.LinkColab link

📊 Performance

CPU Inference

We compared LFM2-24B-A2B against two popular MoE models of similar size: Qwen3-30B-A3B-Instruct-2507 (30.5B total, 3.3B active parameters) and gpt-oss-20b (21B total, 3.6B active parameters). We measured both prefill and decode throughputs with Q4_K_M versions of these models using llama.cpp on AMD Ryzen AI Max+ 395.

image

image

GPU Inference

We also report throughput (total tokens / wall time) achieved with vLLM on a single H100 SXM5 GPU.

image

📬 Contact

Citation

@article{liquidAI202624B,
  author = {Liquid AI},
  title = {LFM2.5-24B-A2B: Scaling Up the LFM2 Architecture},
  journal = {Liquid AI Blog},
  year = {2026},
  note = {www.liquid.ai/blog/},
}
@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}
Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes316
Downloads
📝

No reviews yet

Be the first to review LiquidAI/LFM2-24B-A2B!

Model Info

ProviderLiquidAI
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes316
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor