Skip to main content

Simple Retrieval Augmented Classification

training-free retrieval-augmented classification framework that can enhance any VLM with minimal resources.

Motivation
#

Vision-Language Models (VLMs) have achieved remarkable results in image understanding, but in radiology their static training data often limits accuracy and adaptability. We explored whether adding retrieval mechanisms — not text-based, but image-based — could improve diagnostic performance without retraining models.

Our goal was to build a training-free retrieval-augmented classification framework that can enhance any VLM with minimal resources.


Methodology Overview
#

We developed RAD-SRAC (Simple Retrieval-Augmented Classification) — a lightweight, few-shot framework designed to augment classification tasks across radiological modalities (CT, MRI, X-ray).

image

The pipeline works as follows:

  1. Vector Database Construction

    • We encoded all dataset images using the MedImageInsight encoder — a domain-specific medical imaging model.
    • Embeddings were stored in Qdrant, an open-source vector database.
    • Each dataset split formed a searchable repository of image features and associated labels.
  2. Retrieval Process

    • For every query image, we retrieved the top-k (1–10) most similar samples using cosine similarity.
    • Retrieved images were appended to the model’s prompt as few-shot examples, either with labels or unlabeled.
  3. Prompting Setup

    • All models used the same structured protocol:

      • System Context: Defines modality and available classes.
      • Few-Shot Section: Injects retrieved reference images.
      • Classification Request: Asks for a JSON output with predicted_class and brief reasoning.
    • Example (simplified):

      You are a medical expert. Analyze the CT image and classify it into {labels}.
      Examples:
        class: [clear_cell_RCC]
        [image]
      Now analyze the new image and output JSON:
        {'y_pred': ..., 'explanation': ...}
      
  4. Evaluation Protocol

    • We compared baseline (“raw”) VLM classification with SRAC-augmented versions.
    • Each test image was re-run up to 5 times if the model failed to follow output structure.

Datasets
#

Dataset Modality Classes Samples
KITS23 CT 5 tumor subtypes 424
Coronahack X-ray Normal / Bacterial / Viral pneumonia 5908
Brain Tumor Classification MRI No tumor / Glioma / Meningioma / Pituitary 3264

Each dataset was split into:

  • Database split for vector storage.
  • Test split (100 stratified samples).

Models Evaluated
#

We tested the SRAC approach on both large state-of-the-art VLMs and smaller deployable models:

  • Claude 3.5 Sonnet
  • GPT-4o
  • Gemini 1.5 Pro
  • Qwen2-VL 72B
  • Gemini 1.5 Flash-8B
  • Pixtral-12B

All models used temperature = 1.


Results
#

1. Large-Scale Models
#

SRAC substantially improved F1 scores across all datasets:

Dataset Model F1 (Raw) F1 (SRAC) Δ
KITS23 GPT-4o 57% 63% +6%
Claude 3.5 53% 61% +8%
Coronahack GPT-4o 41% 76% +35%
Claude 3.5 46% 76% +30%
Brain Tumor GPT-4o 59% 94% +35%
Claude 3.5 56% 91% +35%

The largest relative gain was 142% on Coronahack (Gemini 1.5 Pro), while the toughest dataset (KITS23) showed smaller but consistent improvements.

2. Small Deployable Models
#

For smaller, on-premise models:

Dataset Model F1 (Raw) F1 (SRAC) Δ
Coronahack Pixtral-12B 17% 58% +41%
Brain Tumor Pixtral-12B 23% 66% +43%
KITS23 Gemini Flash-8B 45% 63% +18%

These results demonstrate that SRAC can bridge the performance gap between small and large models — critical for clinical environments where data must remain local.

3. Optimal Number of Examples
#

Performance peaked at 3–5 retrieved images, beyond which gains saturated or declined — indicating diminishing returns and supporting practical efficiency for deployment.


Visual Analysis
#

t-SNE projections of MedImageInsight embeddings showed:

  • Clear separability for Coronahack and Brain Tumor classes.
  • Overlapping clusters for KITS23, explaining its lower SRAC gains due to subtle inter-class differences.

Discussion
#

Key insights:

  • Encoder quality dominates: MedImageInsight effectively captured high-level modality features, but fine-grained CT distinctions remain difficult.
  • Retrieval scale is bounded: Excessive examples degrade accuracy, likely due to context dilution.
  • Privacy-friendly deployability: SRAC enables effective use of small, local VLMs under healthcare data governance rules.
  • Training-free generalization: No retraining or finetuning required — only database construction and prompt adaptation.

Conclusion
#

RAD-SRAC demonstrates that retrieval-augmented classification can deliver up to 250% F1 improvement across radiology datasets with no model training. Optimal performance with 3–5 few-shot retrieved cases makes it a lightweight and practical enhancement for both large and small VLMs, especially in clinical settings demanding on-premise, privacy-preserving AI.

Code: github.com/TheLion-ai/RAD-SRAC