Skip to content

Recapping the Computer Vision Meetup — April 13, 2023

Yesterday Voxel51 hosted the April 13, 2023 Computer Vision Meetup. In this blog post you’ll find the playback recordings, highlights from the presentations and Q&A, as well as the upcoming Meetup schedule so that you can join us at a future event.

First, Thanks for Voting for Your Favorite Charity!

In lieu of swag, we gave Meetup attendees the opportunity to help guide our monthly donation to charitable causes. The charity that received the highest number of votes this month was Wildlife AI. We were first introduced to Wildlife AI through the FiftyOne community! They are using FiftyOne to enable their users to easily analyze the camera data and create their own models. We are sending this month’s charitable donation of $200 to Wildlife AI on behalf of the computer vision community.

Missed the Meetup? No problem. Here are the playbacks and talk abstracts from the event.

Using Computer Vision to Understand Biological Vision

In the past decade, deep neural networks (DNNs) have become a leading choice among neuroscientists to model the visual brain. While DNNs are often celebrated for their biological inspiration, they are also criticized for their lack of interpretability. In this talk we will discuss how DNNs help us understand biological intelligence, their promises and pitfalls as models of the brain, and what may be in store for the future.

Benjamin Lahner is a PhD student at MIT studying computational neuroscience.

Emergence of Maps in the Memories of Blind Navigation Agents

Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines — specifically, artificial intelligence (AI) navigation agents — also build implicit (or ‘mental’) maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Learn more on Arxiv.

Dhruv Batra is an Associate Professor in the School of Interactive Computing at Georgia Tech (leading the machine Learning & perception lab) and a Research Director in the Fundamental AI Research (FAIR) team at Meta (leading the embodied AI and robotics efforts at FAIR.)

Q&A from the talk included:

  • How does research perform in dynamic environments?
  • What were the common distance metrics that proved to be most useful?
  • Have you considered where language may interact with the memory and mapping as part of these experiments?
  • What is next for blind agents?
  • Where may blind agents be guaranteed to fail?
  • Any advice for people who work with non-traditional robots like those that require micro-nano sized sensors?

You can jump straight to the Q&A here.

Generating Diverse and Natural 3D Human Motions from Texts

Automated generation of 3D human motions from text is an interesting yet challenging problem, which also owns a broad range of applications such as VR/AR, 3D content creation. Specifically, the generated motions are expected to be sufficiently diverse to explore the text-grounded motion space, and more importantly, accurately depicting the content in prescribed text descriptions. Here we tackle this problem with a two-stage approach: text2length sampling and text2motion generation. Text2length involves sampling from the learned distribution function of motion lengths conditioned on the input text. This is followed by our text2motion module using a temporal variational autoencoder to synthesize a diverse set of human motions of the sampled lengths. Moreover, a large-scale dataset of scripted 3D Human motions, HumanML3D, is constructed, consisting of 14,616 motion clips and 44,970 text descriptions. You can get the data, code, paper and watch the demo video on the Text-to-Motion website.

Chuan Guo is a fourth-year ECE PhD student at the University of Alberta.

Q&A from the talk included:

  • What was the loss error that was used?
  • Does it make any difference when using a transformer based language model? Such as BERT?

You can jump straight to the Q&A here.

Computer Vision Meetup Locations

Computer Vision Meetups Worlwide, sponsored by Voxel51

Computer Vision Meetup membership has grown to more than 3,700 members in just under 9 months! The goal of the Meetups is to bring together communities of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of computer vision and complementary technologies. 

Join one of the 13 Meetup locations closest to your timezone.

What’s Next?

We have exciting speakers already signed up over the next few months! Become a member of the Computer Vision Meetup closest to you, then register for the Zoom. 

Up next on April 27 at 10 AM IST (04:30 UTC) we have the APAC friendly Computer Vision Meetup happening with talks including:

  • Leveraging Attention for Improved Accuracy and Robustness – Hila Chefer (PhD student and lecturer at Tel-Aviv University)
  • Breaking the Bottleneck of AI Deployment at the Edge with OpenVINO – Zhuo Wu (AI software evangelist at Intel focusing on the OpenVINO toolkit)

You can find a complete schedule of upcoming Meetups on the Voxel51 Events page.

Get Involved!

There are a lot of ways to get involved in the Computer Vision Meetups. Reach out if you identify with any of these:

  • You’d like to speak at an upcoming Meetup
  • You have a physical meeting space in one of the Meetup locations and would like to make it available for a Meetup
  • You’d like to co-organize a Meetup
  • You’d like to co-sponsor a Meetup

Reach out to Meetup co-organizer Jimmy Guerrero on or ping him over LinkedIn to discuss how to get you plugged in.

The Computer Vision Meetup network is sponsored by Voxel51, the company behind the open source FiftyOne computer vision toolset. FiftyOne enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. It’s easy to get started, in just a few minutes.