Back to Models
FL

flwrlabs/Lizzy-7B-GGUF

flwrlabsgeneral

Lizzy 7B GGUF Quantized Models

Lizzy 7B header figure (light theme)

Quantized GGUF models for efficient CPU/GPU inference

📊 Model Variants🚀 Quick Start📚 Documentation

Overview

This repository contains GGUF-quantized versions of the Lizzy 7B, a reasoning-enhanced language model from Flower Labs with British knowledge and behavior enhancements.

Model Variants

QuantizationFile SizeQuality RetentionRecommended Use Case
Q5_K_M ⭐4.8 GB95%Best balance of quality and size
Q4_K_M4.2 GB92%Resource-constrained environments
Q8_07.2 GB99%Near-lossless compression
Q6_K5.6 GB97%Between Q5 and Q8
f1613.6 GB100%Maximum quality, benchmarking

Quick Start

Using llama.cpp (Recommended)

# Clone llama.cpp with Lizzy support
git clone https://github.com/relogu/llama.cpp.git
cd llama.cpp
git checkout lorenzo-dev

# Build with CUDA support
make LLAMA_CUDA=1

# Run inference with recommended Q5_K_M quantization
./main -m lizzy-7b-Q5_K_M.gguf \
       -p "What is the capital of England?" \
       -n 128 \
       --temp 0.6 \
       --top-p 0.95 \
       -ngl 32  # Offload all layers to GPU

Using Python (llama-cpp-python)

from llama_cpp import Llama

# Load model with GPU offload
llm = Llama(
    model_path="lizzy-7b-Q5_K_M.gguf",
    n_ctx=65536,  # Full context
    n_gpu_layers=32,  # Offload to GPU
    n_threads=8,
)

# Generate with reasoning
response = llm(
    "Explain why British people queue so much.",
    max_tokens=512,
    temperature=0.6,
    top_p=0.95,
)

print(response["choices"][0]["text"])

Using Ollama

# Create Modelfile
cat > Modelfile << EOF
FROM ./lizzy-7b-Q5_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER num_ctx 65536
EOF

# Create and run
ollama create lizzy -f Modelfile
ollama run lizzy "What's the best way to make tea?"

Usage with llama.cpp

Basic Inference

./main -m lizzy-7b-Q5_K_M.gguf \
       -p "User: Hello, assistant!\nAssistant:" \
       -n 256 \
       --temp 0.6 \
       --top-p 0.95 \
       -ngl 32

Chat Mode

./chat -m lizzy-7b-Q5_K_M.gguf \
       -ngl 32 \
       --temp 0.6 \
       --top-p 0.95

Server Mode (API)

./server -m lizzy-7b-Q5_K_M.gguf \
         -ngl 32 \
         --port 8080 \
         --host 0.0.0.0

Then access at http://localhost:8080 or use the API:

curl http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Why do British people say sorry so often?",
    "n_predict": 256,
    "temperature": 0.6,
    "top_p": 0.95
  }'

Reasoning Behavior

Lizzy 7B is a reasoning model that uses thinking tokens. You'll see output like:

> Let me think about this question about British culture...
> The user is asking about queuing behavior...
> I should explain the cultural significance...

British people queue because it reflects core cultural values of fairness and order...

This is expected behavior - the > prefix indicates the model's reasoning process before providing the final answer.

Documentation

The following sections provide comprehensive documentation for using Lizzy 7B GGUF models.

Architecture Details

  • Base: Lizzy 7B
  • Layers: 32 (with post-norm architecture)
  • Hidden size: 4096
  • Attention: Sliding window (4096) + full attention
  • RoPE: YaRN scaling (factor=8.0, original=8192)
  • Vocab: 100,278 tokens
  • Context: 65,536 tokens
  • Tensors: 355 (including attn_post_norm and ffn_post_norm)

Model Comparison

When to Use GGUF vs. Original Format

Use GGUF when:

  • ✅ You need CPU inference
  • ✅ You want flexible GPU offloading
  • ✅ You need smaller model size
  • ✅ You're using llama.cpp ecosystem
  • ✅ You want fast loading times

Use original Safetensors when:

  • ✅ You need full precision (BF16)
  • ✅ You're using transformers/vLLM
  • ✅ You need tensor parallelism
  • ✅ You're fine-tuning the model

License

These GGUF models are derived from the Lizzy 7B. Please refer to the base model license for redistribution terms.

Base Model: flwrlabs/Lizzy-7B

Citation

If you use Lizzy 7B in your research, please cite:

@model{lizzy-7b-gguf,
  title = {Lizzy 7B},
  author = {Flower Labs},
  year = {2026},
  url = {https://huggingface.co/flwrlabs/Lizzy-7B-GGUF}
}

Support

  • 📚 Documentation: See HuggingFace repository files
  • 🐛 Issues: Report on HuggingFace
  • 💬 Discussions: HuggingFace community forum

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes6
Downloads
📝

No reviews yet

Be the first to review flwrlabs/Lizzy-7B-GGUF!

Model Info

Providerflwrlabs
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes6
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor