Creating AI tools for healthcare workshop

Our recent workshop, “Building AI Tools for Healthcare,” offered a technical deep-dive into the methodologies and models shaping the future of medical AI. This post summarizes the key technical concepts, challenges, and solutions discussed across the three main sessions.

Part 1: Medical Imaging Models
#

The initial session addressed the primary bottleneck in healthcare AI: data. The presentation highlighted systemic issues including the scarcity of public datasets, the lack of standards for data storage (e.g., NIfTI, DICOM, XML, CSV), and inconsistent data labeling, which often relies on mental shortcuts or regionalisms.

To combat this, we explored the Universal Medical Image Encoding (UMIE) initiative. UMIE is a large-scale, unified dataset containing over 1 million images (CT, MRI, X-ray) from more than 20 different sources. It standardizes diverse data types through a series of modular, reusable pipelines inspired by sklearn.pipeline. These pipelines handle everything from DICOM conversion to mask extraction and label unification, creating a cohesive dataset suitable for robust model training. A crucial component of this standardization is the adoption of a unified ontology, such as RadLex, to map disparate labels to a common vocabulary.

Medical Imaging Models

The hands-on portion focused on fine-tuning BioMedCLIP, a powerful multimodal foundation model pre-trained on millions of image-text pairs from biomedical literature. The session demonstrated how to adapt this model for a specific downstream task: tumor classification. We also briefly covered alternative medical vision encoders like MedImageInsights, BioMedParse, and RadDino, each with unique architectures for tasks ranging from report generation to semantic segmentation.

Notebook: Fine-tuning BioMedCLIP for Medical Imaging

Part 2: Language Models
#

This session transitioned to Large Language Models (LLMs), breaking down their training into three stages:

Pre-training: Teaching the model to predict the next word on a massive corpus.
Supervised Fine-Tuning (SFT): Teaching the model to follow instructions.
Preference Optimization: Teaching the model to respond in a specific style (e.g., RLHF).

A key challenge discussed was catastrophic forgetting, where a model loses general knowledge during fine-tuning on a specialized task. The presentation showed that while fine-tuning can introduce new knowledge, it also increases the risk of hallucinations if the new information is not well-grounded in the model’s pre-trained knowledge.

Continual pre-training, as exemplified by the Me-LLaMA case study (a LLaMA 2 model continually trained on 129 billion medical tokens), is one solution. For more targeted adaptations, Parameter-Efficient Fine-Tuning (PEFT) methods are preferred. We focused on Low-Rank Adaptation (LoRA), a technique that freezes the pre-trained model weights and injects small, trainable low-rank matrices. This significantly reduces the number of trainable parameters, making fine-tuning more efficient while preserving the model’s original knowledge base.

Language Models

The practical lab applied these concepts by fine-tuning Gemma-3 for generating radiology reports, showcasing how to adapt a powerful base model for a highly specific clinical task.

Notebook: Fine-tuning Gemma-3 for Radiology Report Generation

Part 3: Sentence Encoders & Retrieval Systems
#

The final session focused on converting text into meaningful numerical representations using sentence encoders. These models, typically built on transformer encoders like BERT, map sentences to high-dimensional vectors where semantic similarity is measured by cosine distance. Training is often done via contrastive learning, using a Multiple Negative Ranking Loss to pull similar sentence pairs (positives) together and push dissimilar ones (negatives) apart in the embedding space.

We discussed advanced techniques like negative mining (selecting “hard” negatives that are superficially similar but semantically different) to improve model robustness. The applications of these encoders in medicine are vast, including:

Knowledge & Evidence Retrieval (RAG): Grounding LLM answers in clinical guidelines or PubMed articles.
EHR Mining: Identifying patient cohorts from unstructured clinical notes.
Coding & Ontologies: Mapping clinical text to standardized codes like ICD or SNOMED CT.

The session detailed the architecture for a Retrieval-Augmented Generation (RAG) system, where documents are indexed in a vector database, and relevant context is retrieved at inference time to augment the LLM’s knowledge.

Sentence Encoders

The workshop concluded with a practical notebook on predicting ICD codes from clinical notes—a direct application of sentence encoders for automated medical coding.

Notebook: ICD Codes Prediction using Sentence Encoders

Part 1: Medical Imaging Models #

Part 2: Language Models #

Part 3: Sentence Encoders & Retrieval Systems #

Part 1: Medical Imaging Models
#

Part 2: Language Models
#

Part 3: Sentence Encoders & Retrieval Systems
#