The past 12 months have seen rapid advances in computer vision, from the enabling infrastructure, to new applications across industries, to algorithmic breakthroughs in research, to the explosion of AI-generated art. It would be impossible to cover all of these developments in full detail in a single blog post. Nevertheless, it is worth taking a look back to highlight some of the biggest and most exciting developments in the field
This post is broken into five parts:
Computer Vision trends
Transformers take hold of computer vision
Transformer models exploded onto the deep learning scene in 2017 with
Attention is All You Need, setting the standard for a variety of NLP tasks and ushering in the era of large language models (LLMs). The
Vision Transformer (ViT), introduced in late 2020, marked the first application of these self-attention based models in a computer vision context.
Data-centric computer vision gains traction
As computer vision matures, an increasingly large portion of machine learning development pipelines is focused on wrangling, cleaning, and augmenting data. Data quality is becoming a bottleneck for performance, and the industry is moving towards data-model co-design. The
data-centric ML movement is growing in popularity.
At the helm of this effort are a new wave of startups — synthetic data generation companies (
gretel,
Datagen,
Tonic) and evaluation, observability, and experiment tracking tools (
Voxel51,
Weights & Biases,
CleanLab) — joining existing labeling and annotation services (
Labelbox,
Label Studio,
CVAT,
Scale,
V7) in the effort.
AI-generated artwork gets (too?) good
Between improvements in Generative Adversarial Networks (GANs) and the rapid development and iteration in diffusion models, AI-generated art is having what can only be described as a renaissance. With tools like
Stable Diffusion,
Nightcafe,
Midjourney, and OpenAI’s
DALL-E2 it is now possible to generate incredibly nuanced images from user-input text prompts.
Artbreeder allows users to “breed” multiple images into new creations, Meta’s
Make-A-Video generates videos from text, and
RunwayML has changed the game when it comes to creating animations and editing videos. Many of these tools also support
inpainting and
outpainting, which can be used to edit and extend the scope of images.
With all of these tools revolutionizing AI art capabilities, controversy was all but inevitable, and there has been plenty of it. In September, an
AI-generated image won a fine art competition, igniting heated debate about what counts as art, as well as how ownership, attribution, and copyrights will work for this new class of content. Expect this debate to intensify!
Multi-modal AI matures
In addition to AI-generated artwork, 2022 has seen a ton of research and applications at the intersection of multiple modalities. Models and pipelines that deal with multiple types of data, including language, audio, and vision, are becoming increasingly popular. The lines between these disciplines have never been more blurred, and cross-pollination has never been more fruitful.
At the heart of this collision of contexts is
contrastive learning, which revamps the embedding of multiple types of data into the same space, the seminal example being Open AI’s Contrastive Language-Image Pretraining (
CLIP) model.
One consequence of this is the ability to semantically search through sets of images based on input that can either text or another image. This has spurred a boom in vector search engines, with
Qdrant,
Pinecone,
Weaviate,
Milvus, and others leading the way. In a similar vein, the systematic connection between modalities is strengthening visual question answering and zero-shot and few-shot image classification.
Computer Vision buzz from big tech
As dataset sizes continue to grow, the computational and financial resources required to train large, high quality models from scratch has risen dramatically. As a result, many of the most broadly applicable advances this year were either led or supported by scientists from big tech research groups. Here’s some of the highlights.
Alphabet
Amazon
Microsoft
Meta
Adobe
In 2022, Adobe took the sophisticated machinery of modern computer vision and turned it to artistic tasks of manipulation like editing, re-styling, and rearranging.
Third Time’s the Charm? puts Nvidia’s StyleGAN3 to work editing images and videos, introducing a video inversion scheme that reduces
texture sticking.
BlobGAN models scenes as collections of mid-level (between pixel-level and image-level) “blobs”, which become associated with objects in the scene without supervision, allowing for editing of scenes on the object-level.
ARF: Artistic Radiance Fields accelerates the generation of artistic 3D content by combining style transfer with
neural radiance fields (NeRFs).
Nvidia
Nvidia made contributions across the board, including multiple works on performing three dimensional computer vision tasks with single-view (monocular) images and videos.
CenterPose sets the standard for category-level 6 degree of freedom (DoF) pose estimation using only a single-stage network;
GLAMR globally situates humans in 3D space from videos recorded with dynamic (moving) cameras; and by separating the tasks of feature generation and neural rendering,
EG3D can produce high-quality 3D geometry from single images.
Electrifying new applications of Computer Vision
Computer vision now plays a role in everything from sports and entertainment to construction, to security, to agriculture, and within each of these industries there are far too many companies employing computer vision to count. This section highlights some of the key developments in some of the industries where computer vision is becoming deeply embedded.
Sports
Climate and Conservation
Circular economy startup
Greyparrot raised an $11M Series A round for its computer vision-driven waste monitoring system. Carbon marketplace NCX, which uses cutting edge computer vision models with satellite imagery to deliver precision assessment of timber and carbon potential, raised a $50M Series B. And Microsoft
announced the Microsoft Climate Research Initiative (MCRI), which will house their computer vision for climate efforts in renewable energy mapping, land cover mapping, and glacier mapping.
Autonomous vehicles
2022 was a bit of a mixed bag for the autonomous vehicles industry as a whole, with self-driving car company
Argo AI shutting down operations in October, and Ford and
Rivian shifting their focus from L4 (highly automated) to L2 (partial) and L3 (conditional) automation. Apple also recently announced that it was
scaling back its self-driving efforts, “Project Titan”, and pushing launch back until 2026.
Health and Medicine
In Australia, engineers devised a promising no-contact
computer vision-based approach for blood pressure detection, which may offer an alternative to the traditional inflatable cuffs. Additionally Google began licensing its computer vision based breast cancer detection tool to cancer detection and therapy provider
iCAD.
Prominent Computer Vision papers you can’t pass up
CV tooling startups grow in size and impact
Conclusion
2022 was extremely lively for machine learning, and especially so for computer vision. The crazy thing is the rapid pace of development in research, growth in number practitioners, and adoption in industry appear to be accelerating. Let’s see what 2023 has in store!
FiftyOne Computer Vision toolset
FiftyOne is an open source machine learning toolset developed by Voxel51 that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
[@portabletext/react] Unknown block type "externalImage", specify a component for it in the `components.types` prop