Skip to content

Segment Anything in a CT Scan with NVIDIA VISTA-3D

The Future of Analyzing Medical Imagery Is Here

No time to read the blog? No worries! Here’s a video of me covering what’s in this blog!

Visual AI applications have rapidly expanded to every industry with powerful new foundation models. Healthcare is no different, providing a challenging domain that demands perfection. When patients’ lives are in the balance, doctors and medical professionals depend on state-of-the-art medical imagery and diagnostic tools to deliver the best treatment possible. 

CT Scan data was one of the most requested data types I heard at CVPR 2024. Medical researchers and ML engineers are trying to carry medical imaging into the next age by allowing models to automatically detect tumors and segment organs and helping doctors better understand their patients’ scans. 

In this short tutorial, I will show you how to leverage one of Nvidia’s Medical Foundation Models, VISTA-3D, to segment CT scans automatically! First, let’s collect our data and bring it into FiftyOne!

Configuring TotalSegmentator for Inference

For this demo, we will be using the TotalSegmentator dataset, a collection of hundreds of CT scans annotated with 107 different body parts on each scan. The collection is highly robust, with each scan and even the annotations coming in the original CT scan format (nii.gz file format). Keeping our input data in this format will give our model the most accurate information about what is being shown in the scan. However, these scans can be complex to interpret outside of medical imaging software, so let’s make it a little more practical for us! Download the dataset at the link here.

Note! If you want to bypass the preprocessing of CT Scans, check out the dataset on HuggingFace with

import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub("dgural/Total-Segmentator-50")

session= fo.launch_app(dataset)

Supported Classes in CT Scans

First, let’s collect our scans and show how we can load just one into FiftyOne. By using the open source library, NiBabel, we can slice the CT scan and save each slice as an image. Afterwards, we use `ffmpeg` to stitch all these images together into a short video. We can then add the video as a FiftyOne sample, allowing us to unlock the ability to store metadata, classifications, as well as frame level segmentations!

def load_ct(scan_name):

    ct_filepath = "TotalSegmentator/" + scan_name + "/ct.nii.gz"
    dir_name = scan_name + "_video"
    # Construct the new directory path
    new_dir_path = os.path.join(os.path.dirname(ct_filepath), dir_name)
    # Create the new directory
    os.makedirs(new_dir_path, exist_ok=True)

    # Read file
    scan = nib.load(ct_filepath)
    # Get raw data
    scan = scan.get_fdata()

    for plane in range(scan.shape[2]):
        p = scan[:,:,plane].astype(np.uint8)
        img = Image.fromarray(p) f'{new_dir_path}/plane{plane}.png' )

    mov_in =  os.path.join(f'{new_dir_path}/plane%d.png')
    mov_out = os.path.join(f'{new_dir_path}/{scan_name}.mp4')
    !ffmpeg -i {new_dir_path}/plane%d.png -vcodec libx264 -vf "pad=ceil(iw/2)*2:ceil(ih/2)*2" -r 24 -y -an {mov_out} 

    sample = fo.Sample(filepath=mov_out)
    sample["ct_filepath"] = os.path.abspath(ct_filepath)
    return sample

Using the method for loading a scan, let’s load our first 50 scans. Start by grabbing the list of scans and creating a Pandas `dataframe` to access metadata for each scan.

scans = os.listdir("TotalSegmentator")
df = pd.read_csv("TotalSegmentator/meta.csv", sep=';')

Next, we can loop through our first 50 CT scans and create FiftyOne samples!

dataset = fo.Dataset(name="TotalSegmentator", overwrite=True)

samples = []
for i, scan in enumerate(scans):
    if i == 51:
    if scan.find(".csv") == -1:
        sample = load_ct(scan)
        row = df[df['image_id'] == scan]
        sample["image_id"] = row["image_id"].item()
        sample["age"] = row["age"].item()
        sample["gender"] = row["gender"].item()
        sample["institute"] = row["institute"].item()
        sample["study_type"] = row["study_type"].item()
        sample["split"] = row["split"].item()
        sample["manufacturer"] = row["manufacturer"].item()
        sample["scanner_model"] = row["scanner_model"].item()
        sample["kvp"] = row["kvp"].item()
        sample["pathology"] = row["pathology"].item()
        sample["pathology_location"] = row["pathology_location"].item()


session= fo.launch_app(dataset)

By launching our app, we can take a look to check them out! As you can see, CT scans can come in many shapes and sizes. Filter on different pathologies or machine types by using the sidebar. Lucky for us, our foundation model is able to handle each of these cases!

To use Vista-3D, you will need to have Nvidia credits. I was able to get 1000 just to test, which was plenty for me to explore the foundation model on our dataset for free! We will need to pass our CT scans on to the API to use the model. I made a quick GitHub repo holding the first 50 scans so the model could quickly grab them. When the API returns the response, it will return an nrrd file specifying the segmentation map on each slice. We can easily add this to each frame with only a few lines! You can also prompt the model for only specific classes or regions, but we are exploring, so why not try them all? Here is our entire loop for adding the model results to our samples!

invoke_url = ""

headers = {
    "Authorization": "Bearer {Your NVIDIA API KEY}",

for sample in dataset:
        payload = {
            "prompts": {
                "classes": None,
                "points": None
            "image": f"{sample.image_id}.ct.nii.gz"
        # re-use connections
        session = requests.Session()
        response =, headers=headers, json=payload)
            with tempfile.TemporaryDirectory() as temp_dir:
                z = zipfile.ZipFile(io.BytesIO(response.content))
                shutil.move(os.path.join(temp_dir, os.listdir(temp_dir)[0]), f"/path/to/TotalSegmentator/{sample.image_id}/{sample.image_id}.nrrd")
        except Exception as e:
            with open(f"/path/to/TotalSegmentator/{sample.image_id}/{sample.image_id}.nrrd", 'w') as file:

        data, header ='/path/to/TotalSegmentator/{sample.image_id}/{sample.image_id}.nrrd')
        for frame_no, frame in sample.frames.items():
            mask = data[:,:,frame_no-1]
            frame["seg_mask"] = fo.Segmentation(mask=mask)

With all of our segmentations loaded in, let’s check out our app again to see our results!

I’m honestly astonished at how powerful the model’s performance is. Some of the scans have large amounts of noise, come in odd shapes, or contain anomalies like tumors. In each case the model handles it beautifully. Due to the 3D nature of a CT scan, the model is able to see the entire structures of the human anatomy to make sharp and precise segmentations.

Hopefully, this is just the start of powerful new healthcare models that can revolutionize the field! If better models lead to better treatment, the world could change! I will be on the lookout for any other new landmark models that come out in the near future, so stay tuned!