Back to Models
JO

JonaRuthardt/SteerViT

JonaRuthardtimage

SteerViT: Steerable Visual Representations

Paper | Project Page | GitHub | Demo (HF Spaces)

SteerViT equips pretrained Vision Transformers with steerable visual representations. Given an image and a natural-language prompt, it conditions the visual encoder through lightweight gated cross-attention to produce:

  • prompt-aware global embeddings
  • prompt-aware dense patch features
  • prompt-conditioned heatmaps

This Hugging Face repository hosts the model checkpoints.

Installation

To use SteerViT, install the library directly from GitHub:

python -m pip install "git+https://github.com/JonaRuthardt/SteerViT.git"

Quick start

import torch
from PIL import Image
from steervit import SteerViT

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model (e.g., SteerDINOv2-Base)
model = SteerViT.from_pretrained("steervit_dinov2_base.pth", device=device)
transform = model.get_transforms()

image = Image.open("path/to/image.jpg").convert("RGB")
image_tensor = transform(image).unsqueeze(0)

prompt = ["the red car"]

global_features = model.get_global_features(image_tensor, texts=prompt) # pooled image embeddings
dense_features = model.get_dense_features(image_tensor, texts=prompt) # patch-level visual features
heatmaps = model.get_heatmaps(image_tensor, texts=prompt) # prompt-conditioned localization heatmaps
attention_heatmaps = model.get_attention_heatmaps(image_tensor, texts=prompt) # attention-based heatmaps

If texts=None, SteerViT behaves like the underlying frozen ViT backbone and returns query-agnostic features.

Available checkpoints

Checkpointfrom_pretrained(...) identifierNotes
SteerDINOv2-Basesteervit_dinov2_base.pthPrimary model used for most experiments
SteerMAE-Basesteervit_mae_base.pthAlternative model based on MAE-backbone

Citation

@misc{ruthardt2026steervit,
      title={Steerable Visual Representations}, 
      author={Jona Ruthardt and Manu Gaur and Deva Ramanan and Makarand Tapaswi and Yuki M. Asano},
      journal={arXiv:2604.02327},
      year={2026}
}
Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes5
Downloads
📝

No reviews yet

Be the first to review JonaRuthardt/SteerViT!

Model Info

ProviderJonaRuthardt
Categoryimage
Reviews0
Avg. Rating / 5.0

Community

Likes5
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor