Lizzy 7B GGUF Quantized Models

Name: flwrlabs/Lizzy-7B-GGUF
Brand: flwrlabs
Rating: 0.0 (6 reviews)

Quantized GGUF models for efficient CPU/GPU inference

📊 Model Variants • 🚀 Quick Start • 📚 Documentation

Overview

This repository contains GGUF-quantized versions of the Lizzy 7B, a reasoning-enhanced language model from Flower Labs with British knowledge and behavior enhancements.

Model Variants

Quantization	File Size	Quality Retention	Recommended Use Case
Q5_K_M ⭐	4.8 GB	95%	Best balance of quality and size
Q4_K_M	4.2 GB	92%	Resource-constrained environments
Q8_0	7.2 GB	99%	Near-lossless compression
Q6_K	5.6 GB	97%	Between Q5 and Q8
f16	13.6 GB	100%	Maximum quality, benchmarking

Quick Start

Using llama.cpp (Recommended)

# Clone llama.cpp with Lizzy support
git clone https://github.com/relogu/llama.cpp.git
cd llama.cpp
git checkout lorenzo-dev

# Build with CUDA support
make LLAMA_CUDA=1

# Run inference with recommended Q5_K_M quantization
./main -m lizzy-7b-Q5_K_M.gguf \
       -p "What is the capital of England?" \
       -n 128 \
       --temp 0.6 \
       --top-p 0.95 \
       -ngl 32  # Offload all layers to GPU

Using Python (llama-cpp-python)

from llama_cpp import Llama

# Load model with GPU offload
llm = Llama(
    model_path="lizzy-7b-Q5_K_M.gguf",
    n_ctx=65536,  # Full context
    n_gpu_layers=32,  # Offload to GPU
    n_threads=8,
)

# Generate with reasoning
response = llm(
    "Explain why British people queue so much.",
    max_tokens=512,
    temperature=0.6,
    top_p=0.95,
)

print(response["choices"][0]["text"])

Using Ollama

# Create Modelfile
cat > Modelfile << EOF
FROM ./lizzy-7b-Q5_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER num_ctx 65536
EOF

# Create and run
ollama create lizzy -f Modelfile
ollama run lizzy "What's the best way to make tea?"

Usage with llama.cpp

Basic Inference

./main -m lizzy-7b-Q5_K_M.gguf \
       -p "User: Hello, assistant!\nAssistant:" \
       -n 256 \
       --temp 0.6 \
       --top-p 0.95 \
       -ngl 32

Chat Mode

./chat -m lizzy-7b-Q5_K_M.gguf \
       -ngl 32 \
       --temp 0.6 \
       --top-p 0.95

Server Mode (API)

./server -m lizzy-7b-Q5_K_M.gguf \
         -ngl 32 \
         --port 8080 \
         --host 0.0.0.0

Then access at http://localhost:8080 or use the API:

curl http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Why do British people say sorry so often?",
    "n_predict": 256,
    "temperature": 0.6,
    "top_p": 0.95
  }'

Reasoning Behavior

Lizzy 7B is a reasoning model that uses thinking tokens. You'll see output like:

> Let me think about this question about British culture...
> The user is asking about queuing behavior...
> I should explain the cultural significance...

British people queue because it reflects core cultural values of fairness and order...

This is expected behavior - the > prefix indicates the model's reasoning process before providing the final answer.

Documentation

The following sections provide comprehensive documentation for using Lizzy 7B GGUF models.

Architecture Details

Base: Lizzy 7B
Layers: 32 (with post-norm architecture)
Hidden size: 4096
Attention: Sliding window (4096) + full attention
RoPE: YaRN scaling (factor=8.0, original=8192)
Vocab: 100,278 tokens
Context: 65,536 tokens
Tensors: 355 (including attn_post_norm and ffn_post_norm)

Model Comparison

When to Use GGUF vs. Original Format

Use GGUF when:

✅ You need CPU inference
✅ You want flexible GPU offloading
✅ You need smaller model size
✅ You're using llama.cpp ecosystem
✅ You want fast loading times

Use original Safetensors when:

✅ You need full precision (BF16)
✅ You're using transformers/vLLM
✅ You need tensor parallelism
✅ You're fine-tuning the model

License

These GGUF models are derived from the Lizzy 7B. Please refer to the base model license for redistribution terms.

Base Model: flwrlabs/Lizzy-7B

Citation

If you use Lizzy 7B in your research, please cite:

@model{lizzy-7b-gguf,
  title = {Lizzy 7B},
  author = {Flower Labs},
  year = {2026},
  url = {https://huggingface.co/flwrlabs/Lizzy-7B-GGUF}
}

Support

📚 Documentation: See HuggingFace repository files
🐛 Issues: Report on HuggingFace
💬 Discussions: HuggingFace community forum

Developed by Flower Labs

🌸 Flower Labs | 📖 Documentation | 💬 Discuss

flwrlabs/Lizzy-7B-GGUF