Skip to content

How Computer Vision Is Changing Healthcare in 2023 

Welcome to the third installment of Voxel51’s computer vision industry spotlight blog series. In this series, we highlight how different industries — from construction to climate tech, from retail to robotics, and more — are using computer vision, machine learning, and artificial intelligence to drive innovation. We’ll dive deep into the main computer vision tasks being put to use, current and future challenges, and companies at the forefront.

In this edition, we’ll focus on healthcare! Read on to learn about computer vision in healthcare and medicine.

Healthcare industry overview

Key facts and figures:

Before we dive into several popular applications of computer vision-based AI technologies in healthcare, here are some of the industry’s key challenges.

Key industry challenges in healthcare:

  • Rising costs of healthcare: in real terms, in the United States healthcare costs have risen by 290% since 1980. Costs are so high that in a recent Kaiser Family Foundation poll, 43% of respondents said that a family member had put off or postponed necessary health care as a result.
  • Shortage of physicians: despite the millions of workers employed in the healthcare industry, 132 countries are experiencing shortages of physicians, with an estimated 12.8 million more doctors needed worldwide to alleviate this problem. The United States is expected to face a shortage of up to 124,000 physicians by 2034.
  • Time-intensive EHR: according to a study involving 155,000 US physicians, on average, physicians spend more than 16 minutes per patient ecounter using electronic health records (EHR). This time was split between chart review, documentation, and ordering.

Continue reading for some ways computer vision in healthcare is enabling people to live longer, healthier lives.

Applications of computer vision in healthcare

Computer-aided detection and diagnosis

Computer-aided detection applied to mammograms. Image courtesy of Siemens Healthineers.

In healthcare, computer aided detection (CADe) and computer-aided diagnosis (CADx) refer to any applications in which a computer assists a doctor or in understanding and evaluating medical data. Within the context of computer vision, this typically means images coming from MRI, CT, X-ray, or another form of diagnostic imaging technique. However, CAD can even be (and has been) applied to camera images of faces.

Computers can assist doctors in detection by identifying suspicious regions in images and alerting the physician to these concerning areas. In some cases, CADe amounts to predicting high confidence regions of interest and projecting bounding boxes for these regions onto the original image. In other cases, region of interest identification is followed by instance segmentation. In still other cases, anomaly detection may be applied to catch abnormalities.

Computer-aided diagnosis takes images and other patient record information as input, and outputs evaluates the likelihood that the patient has a given disease or condition. In cancer diagnosis, for instance, this often means predicting whether a tumor is benign or malignant. 

Over the past few years, computer vision systems integrating detection and diagnosis have achieved remarkable performance. Nevertheless, there are still challenges to be overcome, from high false positive detection rates to adoption of these technologies into existing systems and workflows.

Here are some papers on using computer vision for detection and diagnosis:

And here are a few papers showing the promise transformer models hold for aiding medical professionals in diagnostic processes:

Monitoring disease progression

Computer vision helps doctors monitor the progression of diseases and detect onset earlier. Image courtesy of Accuray on Unsplash.

In addition to detecting abnormalities and diagnosing diseases, computer vision can be used to precisely monitor the progression of a disease over time. Just as a doctor measures patient weight, height, and blood pressure during an exam, and compares these to the patient’s past measurements to assess patient health, computer vision models allow doctors to track various diseases by comparing markers in biomedical images taken at different points in time.

In some applications, progression can be monitored by precisely tracking the size of objects like cavities or lesions. More generally, computer vision applications in disease progression monitoring are characterized by a deep learning model assigning a numerical score to an image or video. These scores allow physicians to quantify the severity and time of onset for a specific patient.

By precisely monitoring progression, computer vision models can help physicians to detect the onset of a disease more rapidly. For glaucoma, a 2018 study concluded that deep learning accelerated time to detection by more than a year.

While computer vision applications in disease monitoring primarily involve applying machine learning models to images, there are also applications involving video data. In a 2022 paper, researchers found that by evaluating the movement patterns of Parkinson’s disease patients in the act of rising from a chair, the researchers could robustly estimate disease severity. To interpret patient movement, the researchers estimated patient pose in each frame, and then quantified the velocity and smoothness of motion across video frames.

Some papers to get you started:

Preoperative surgical planning

Preoperative surgical planning (often abbreviated to surgical planning) encompasses all visualization, simulation, and construction of blueprints to be used during a subsequent surgical procedure. Surgical planning is most often used for neurosurgery, or oral or cosmetic surgery, but it can be applied in advance of any surgery. Studies have found that digital templating and other preoperative planning techniques can reduce costs in the operating room and reduce operation time, and for elderly populations, it is even associated with lower 90-day mortality rates.

Computer vision is deeply entrenched in the history of preoperative planning: as early as the 1970’s, CT scans and other forms of diagnostic imaging started to make possible the construction of anatomical models. This imaging data is used to generate 2D projections called templates, 3D digital reconstructions, or even 3D printed models. The models then allow surgeons to explore various approaches and entry pathways prior to surgery, minimizing time of surgery and invasiveness.

In recent years, the application of emerging technologies like deep learning and virtual reality in preoperative planning has begun to streamline surgery even further. According to a recent study involving 193 knee surgery cases, AI-based preoperative planning has the potential to reduce the average number of intraoperative corrections a surgeon needs to make by up to 50%. Among the advances contributing to this reduction, precision location of anatomical landmarks (keypoint detection) leads to improved accuracy in selecting the appropriate prosthetic size, and deep learning models can be employed to identify and evaluate potential trajectories for surgical instruments.

Here are a few papers on AI in preoperative planning:

Intraoperative surgical guidance

Computer vision can help surgeons perform procedures more precisely. Image courtesy of the National Cancer Institute.

Artificial intelligence and computer vision are also being used inside the operating room to make surgeries smoother and less invasive. One unifying theme for intraoperative intelligence is computer-assisted navigation. These applications take inspiration from traditional robotics, employing depth estimation and simultaneous localization and mapping to endoscopy images.

In image guided surgery, a surgeon uses this real time visual and spatial information to navigate with enhanced precision. This technology is at the heart of minimally invasive surgical (MIS) procedures. Intraoperative images can also be fused with preoperative images in augmented reality environments.

While still far from ubiquitous, we are also beginning to see computer vision-enabled navigation, AI-driven decision making, and robotic maneuverability come together in partially and even fully autonomous robotic surgical systems. Last year, researchers at Johns Hopkins University created the Smart Tissue Autonomous Robot (STAR), which automated 83% of the task of suturing the small bowel by combining tissue motion tracking and anatomical landmark tracking with a motorized suturing tool.

Here are some papers on applications of computer vision during surgery:

For an overview of AI applications in surgery, including preoperative and intraoperative applications, check out these papers:

Assisting people with vision loss

Computer vision can help the visually impaired to navigate, providing an alternative to (or working in conjunction with) guide dogs. Image courtesy of Unsplash.

In many applications of computer vision, AI models are employed to take on tasks in order to free up humans from tedious or hazardous work. Perhaps nowhere is computer vision’s liberating effect on people more pronounced than in assisting people with vision loss. By helping people with blindness or limited vision map their surroundings and navigate indoor and outdoor environments, computer vision makes it easier for them to live on their own and attend work and school. 

Computer vision techniques for assisting the visually impaired span the gamut from object recognition and face recognition, to monetary denomination verification and navigation

A few papers to pique your interest:

Other AI in healthcare

These highlighted applications only scratch the surface of how artificial intelligence is transforming healthcare and the delivery of medical care in 2023. Computer vision is also being used to automate cell counting, make operating rooms smarter, facilitate medial skill training, and aid in post-traumatic rehabilitation

Beyond computer vision, artificial intelligence is revolutionizing the way we process, understand, and draw insights from biomedical data. In 2019, following the release of Google’s Bidirectional Encoder Representations (BERT) large language model, a team of researchers from Korea University and Clova AI published BioBERT: a pre-trained biomedical language representation model for biomedical text mining. In the intervening years, large language models like BioBERT, including Med-BERT and BEHRT (for Electronic Health Records) and Clinical BERT have been used for hypothesis generation and knowledge discovery, diagnosis, and hospital readmission prediction

Most recently, Medical ChatGPT, Microsoft’s BioGPT, and BioMedLM have shown that generative pretrained transformer (GPT) models – the same architecture powering ChatGPT – can show impressive results on biomedical tasks. Expect this trend to accelerate in the coming months.

A double dose of caution

While biases inherent in artificially intelligent models can have unforeseen and often unwanted consequences across all industries, in healthcare, if AI models are not deployed with careful consideration, they can exacerbate existing inequities and do more harm than good. For a thorough discussion of these issues, check out the following resources:

 Companies at the cutting edge of computer vision in healthcare


The .lumen glasses were built to empower the blind.

Founded in 2020, .lumen is on a mission to help the world’s visually impaired — a population that is expected to reach 100 million by 2050. Started by Cornel Amariei, and staffed with more than 50 engineers, scientists, and technologists, Romania’s first deep tech startup is building AI powered “glasses” that give the visually impaired enhanced mobility. This additional freedom could potentially enable millions of people to study, pursue careers, and live unassisted.

Working with 600+ blind individuals, .lumen designed the glasses to replicate the main features of having a guide dog like helping you avoid obstacles, reacting to commands, and even gently pulling you in the right direction. The difference: guide dogs can cost up to $50,000 to train, making them inaccessible to most blind individuals.

The company’s glasses consist of six separate cameras, which together record almost 10GB of data per minute. Leveraging information from these onboard cameras, the .lumen glasses interpret the surrounding environment via computer vision tasks including object detection, image classification, semantic segmentation, and optical character recognition, all with centimeter-level precision. The glasses are also equipped with speech-to-text and text-to-speech capabilities. While some tasks like reading are enhanced by internet connectivity, all safety-related inference happens on the edge.

Visual information is imparted to the person wearing the glasses via sound and touch. To make this happen, the company uses advanced AI and robotics technology to integrate visual, auditory, and haptic systems. All told, the glasses have a sub-100ms latency, 

In 2021, .lumen received €9.3M in funding from the European Union Innovation Council, and won the Red Dot: Luminary award for the design of their headset. In 2022, the Romanian startup was honored as a Deep Tech Pioneer.

Future Fertility

Image of oocytes. Image courtesy of Atlas of Human Embryology.

In 2021, 2.4% of all births in the United States were conceived via assistive reproductive technologies (ARTs) like in vitro fertilization (IVF). However, IVF can be quite expensive, with the cost of a single cycle reaching up to $30,000. What’s more, 67% of the time, a mother’s first IVF cycle does not result in pregnancy. 

Canadian biotech startup Future Fertility is using AI and computer vision to usher in the next era in personalized fertility medicine to help doctors, embryologists, and patients make more informed decisions along the fertility journey. 

It all starts with the oocyte. Oocytes are female egg cells that are retrieved from the ovaries as part of egg freezing and IVF treatments. Through the IVF process, the oocyte is later fertilized to produce an embryo. As such, oocyte quality is an essential factor in determining the outcome of an IVF cycle. Nevertheless, even the trained eye of an embryologist has difficulty assessing the quality of an oocyte from visual features. This contrasts with embryos, where manual assessment has been possible and reliable using a robust grading system as the standard of care for years.

To bridge this gap, Future Fertility has developed a patented, non-invasive software solution that provides personalized oocyte quality assessments by analyzing single 2D oocyte images captured within fertility labs. The software can generate reports for both egg freezing patients and IVF patients that assess each oocyte’s likelihood of forming a blastocyst (a day 5 or 6 embryo). Not only does this bring much-needed standardization and objectivity to the oocyte assessment process, but the software also predicts blastocyst outcomes over 20% more accurately than embryologists. The resulting reports enable fertility specialists to better manage their patients’ expectations for pregnancy success and help guide their treatment decisions for future cycles by understanding their patient’s personal oocyte quality versus age-based population health statistics.

One of the main challenges Future Fertility overcame when developing their model was constructing a high quality dataset. Their solution based on Deep Neural Networks was trained and tested using 100,000+ oocyte images spanning different countries, outcomes, lab equipment, and varied image quality. To learn more about how Future Fertility framed the problem and built a solution using machine learning, check out this insightful blog post by their team.

In early 2022, the company raised a $6M series A round of funding.


Hologic’s 3DQuorum Technology combines multiple slices into SmartSlices with clinically relevant information.

A Nasdaq-traded corporation focused on women’s health, Hologic is paving the way in AI-enabled breast cancer detection. With almost 7,000 employees, thousands of patents, and a presence in 36 countries, Hologic has a hand in countless screening, diagnostic, and laboratory technologies, including a suite of 2D and 3D medical imaging solutions.

Hologic’s 3DQuorum and Genius AI Detection technologies employ deep learning to make breast cancer detection faster, safer, and more accurate. Traditionally, two-dimensional X-rays known as mammograms were an integral part of breast cancer screens. Recently, a 3D imaging technique called tomosynthesis has gained traction, with the added dimensionality of images allowing for improved sensitivity and specificity. 

Conventional tomosynthesis uses slices that are 1 mm in thickness, so in order to review all image slices for a screening a radiologist must inspect 240 images. Hologic’s 3DQuorum 

Imaging technology analyzes groups of 1-mm slices using computer-aided detection algorithms, and synthesizes new images for each group by combining clinically relevant regions. The resulting images are synthetic 6-mm slices called SmartSlices, which allow radiologists to save an hour per day without reduction in performance.

Hologic’s computer-aided detection system, Genius AI Detection, was trained on large quantities of clinical data to locate lesions in breast tomosynthesis images. It uses a Faster-RCNN model to identify potentially relevant regions, U-Net models for segmentation, and Inception models for classification. Genius AI Detection has separate modules for regions of interest containing soft tissue lesions and those containing calcification clusters. For each identified lesion, a score is assigned based on the model’s confidence that the lesion is malignant. In clinical studies, radiologists assisted by Genius AI Detection exhibited improved sensitivity, and achieved higher area under curve (AUC) than those without.

Iterative Health

Iterative Health’s SKOUT polyp detection in action. Image Courtesy of Iterative Health.

Founded in 2017 after CEO Jonathan Ng’s visit to Cambodia, Iterative Health was built to bring precision medicine to all, regardless of location or socio-economic position. The series B startup, which has raised more than $193M from investors including Insight Ventures and Obvious Ventures, is focused on applying AI and computer vision to gastroenterology (GI).

One of their flagship products, SKOUT, applies object detection techniques to images streaming from colonoscopic video feeds in real time to detect polyps, adenomas. With SKOUT, physicians are able to identify 27% more adenomas per colonoscopy

In some computer vision applications, small objects can be the most challenging to identify. For Computer aided detection of polyps, however, when devices detect increased numbers of polyps, these polyps are for the most part diminutive (<5mm), and as a result, less likely to indicate health concerns. SKOUT demonstrates enhanced detection capabilities for both small and large polyps.

Additionally, SKOUT is designed to be integrated into physicians’ existing workflows. According to a study published in the journal Gastroenterology, colonoscopies employing SKOUT do not take significantly longer.

In late 2022, Iterative Health received FDA clearance for their artificial intelligence based polyp detection device.

Pixee Medical

Pixee Medical’s Knee+ solution augments the surgeon’s view using computer vision and augmented reality. Image courtesy of Pixee Medical.

Founded in 2017, French medical device manufacturer Pixee Medical is using computer vision, artificial intelligence, and augmented and virtual reality to transform orthopedic surgery. Pixee Medical’s augmented reality (AR) surgical glasses help surgeons perform operations with high precision while minimizing invasiveness. 

At the heart of Pixee’s technology is a suite of state-of-the-art tracking algorithms, which leverage data from a monocular camera embedded in the glasses to locate objects down to less than 1 mm in three dimensional space, even when surgical equipment partially occludes the tracked objects. These location and tracking algorithms make it possible for surgical teams to identify the positions of anatomical landmarks (keypoint markers on the human body) without invasive probes or costly imaging.

In 2021, the company launched Knee+, a knee arthroplasty solution that helps surgeons navigate during an operation using the AR glasses. Surgical instruments are tagged with QR codes, which allow for precise localization in 3D. The glasses integrate this information to calculate alignment angles for prosthesis, and then project this information in the form of a hologram onto the scene.

In March, 2023, the company was crowned a member of the inaugural French Tech Health20 program.


San Francisco-based SafelyYou is developing artificial intelligence and computer vision-enabled solutions to create safer environments for the 55+ million individuals worldwide who live with dementia. The company was started in 2016 by CEO George Netscher, spun out of his doctoral research at the Berkeley AI Research (BAIR) lab, and motivated by experience with Alzheimers within his family. Since then, the company has raised more than $61 million to use AI to detect and prevent falls.

SafelyYou operates within assisted living communities, where they apply models to every frame from ongoing camera feeds to detect whether or not someone is on the ground. Their models are tuned for high recall, and their world-leading technology is supported with insights from clinical experts who partner with on-site staff, determining the best interventions to prevent future falls.

SafelyYou’s fall detection model detects over 99% of on-the-ground events, and when a fall happens, their systems alert caregivers in real-time. To achieve this level of accuracy, they address challenges including detection in cluttered environments, edge cases where people are not on the ground but are in a vulnerable position, and the overarching reality that falls are “long tail” events: there is way more footage of people not on the ground than there is of people on the ground. To date, they have detected more than 100,000 falls, and by getting residents help immediately and improving outcomes, SafelyYou doubles the average length of stay.

Building on the resounding success of their fall detection technology, in January of 2023, SafelyYou announced the launch of SafelyYou Aware, which provides hourly video assessments from SafelyYou team members every night to assess risks to a resident’s safety.

Other companies making strides:

  • Aidence: recently acquired by RadNet, Aidence delivers AI radiology solutions for detection of lung cancer nodules and other disorders. 
  • Depuy Synthes: subsidiary of Johnson & Johnson. VELYS Hip Navigation platform to improve surgical outcomes.
  • Intuitive Surgical: famous for their da Vinci Surgical System, Intuitive Surgical uses advanced robotics, computer vision, and AI to assist in surgical procedures, and even enable remote operations.
  • Qynapse: French neuroimaging startup applying segmentation to white matter lesions in MRI scans.

Healthcare industry datasets and challenges

The National Institutes of Health plays a central role in creating and maintaining publicly available health-related datasets, including machine learning and computer vision datasets such as DeepLesion, OASIS, and ChestX-ray8. Nevertheless, they are far from the only players in town. Here are some of the most popular public datasets and challenges at the intersection of computer vision, machine learning, and healthcare: 


Chest X-Ray (CXR)

  • CheXpert: dataset from the Stanford ML Group containing 224,316 chest radiographs (frontal and lateral views), as well as an associated competition.
  • ChestX-ray14: dataset from the NIH Clinical Center consisting of 112,120 frontal-view X-ray images from 30,805 unique patients. The dataset is also on Kaggle.
  • PadChest: dataset of 160,000+ high resolution chest X-rays, along with associated patient and case metadata. Reports contain 174 distinct radiographic label types.
  • MIMIC-CXR-JPG: chest radiographs with structured labels, from MIT’s Medical Informatics Mart for Intensive Care.


  • IVDM3Seg: a collection of 24 multi-modal MRI datasets for intervertebral disc  (IVD) localization and segmentation.
  • MRNet: Knee MRI dataset from the Stanford ML Group comprising 1,370 knee MRI exams.

Other great sources

  • MedPix: From the National Library of Medicine, MedPix contains integrated textual and image data from 12,000 patient cases. This data is searchable by modality, topic, keyword and more. 
  • Grand Challenge: a platform for algorithms, challenges, and end-to-end ML solutions in biomedical imaging applications.

If you would like to see any of these, or other medical or health related computer vision datasets added to the FiftyOne Dataset Zoo, get in touch and we can work together to make this happen!

Healthcare industry models and frameworks

If you’ve made it this far, then you may be interested in:

  • Med-PaLM: A multimodal large language model from Google that can synthesize information from medical images like X-rays and mammograms.
  • MedSAM: a Segment Anything Model fine-tuned for medical image data, released on Apr 24, 2023
  • MONAI: The Medical Open Network for Artificial Intelligence – a set of open-source frameworks related to medical imaging

Wait, What’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. It supports everything from point clouds to DICOM!

Join the FiftyOne community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!