We just wrapped up the October ‘24 AI, Machine Learning and Computer Vision Meetup, and if you missed it or want to revisit it, here’s a recap! In this blog post you’ll find the playback recordings, highlights from the presentations and Q&A, as well as the upcoming Meetup schedule so that you can join us at a future event.
Accelerating Machine Learning Research and Development for Autonomy
At Oxa (Autonomous Vehicle Software), we designed an automated workflow for building machine vision models at scale from data collection to in-vehicle deployment, involving a number of steps, such as, intelligent route planning to maximise visual diversity; sampling of the sensor data w.r.t. visual and semantic uniqueness; language-driven automated annotation tools and multi-modal search engine; and sensor data expansion using generative methods.
Speaker: Guillaume Rochette is a Staff Engineer at Oxa MetaDriver, a suite of tools that combines generative AI, digital twins and simulation to accelerate machine learning and testing of self-driving technology before and during real-world use. Prior to that, he did a PhD. in Machine Vision at the University of Surrey on “Pose Estimation and Novel View Synthesis of Humans”. He is currently working on Machine Vision and 3D Geometric Understanding for autonomous driving.
Q&A
- What sort of hardware is running in the car?
- Can you explain how the embeddings are done for these images?
- What tools are you using for labeling?
- Do you have any stats of the overall dataset you have collected that you can share? Percentage in cities, with traffic lights etc?
- Do you consider human perception metrics for identifying objects?
- How does the model behave when changing the trained model from urban data into more hybrid terrain, like farm fields?
- What kind of homography techniques you used for locating and detecting objects especially in occlusion cases?
- While collecting the data and training are the models prone to data drift?
- What challenges arise when expanding sensors or adding new data streams for OOD scenarios?
Resource Links
Pixels Are All You Need: Utilizing 2D Image Representations in Applied Robotics
Many vision-based robot control applications (like those in manufacturing) require 3D estimates of task-relevant objects, which can be realized by training a direct 3D object detection model. However, obtaining 3D annotation for a specific application is expensive relative to 2D object representations like segmentation masks or bounding boxes.
In this talk, Brent will describe how we achieve mobile robot manipulation using inexpensive pixel-based object representations combined with known 3D environmental constraints and robot kinematics. He will also discuss how recent Visual AI developments show promise to further reduce the cost of 2D training data, thereby increasing the practicality of pixel-based objects representations in robotics.
Speaker: Brent Griffin, PhD is a Principal Machine Learning Scientist at Voxel51. Previously, he was the Perception Lead at Agility Robotics and an assistant research scientist at the University of Michigan conducting research at the intersection of computer vision, control, and robot learning. He is lead author on publications in all of the top IEEE conferences for computer vision, robotics, and control, and his work has been featured in Popular Science, in IEEE Spectrum, and on the Big Ten Network.
Q&A
- Does the model have any weight information (through servo or otherwise)?
- How does the robot manage to keep that bag inside the cup without falling?
- With the camera in the gripper, how well would such a configuration work to grasp objects that are moving?
- In some cases, depth estimation can be achieved using relative motion between objects or the camera. However, when dealing with static objects, is there any mathematical technique that efficiently calculates depth without relying on motion? Could you provide insight into methods that can estimate depth for stationary objects and their relative efficiency?
- The embedding images you’ve presented—are they generated using UMAP? If so, are there any other efficient and consistent techniques for dimensionality reduction or visualization of embeddings that could be considered, particularly in terms of scalability and preserving local/global structure?
- When using lstm do you experience any issues or reduced accuracy/performance as the timesteps progress or gets longer?
Resource Links
PostgreSQL for Innovative Vector Search
There are a plethora of datastores that can work with vector embeddings. You are probably already running one that allows for innovative uses of data alongside your embeddings – PostgreSQL! This talk will focus on showing examples of how features already present in the PostgreSQL ecosystem allow you to leverage it for cutting edge use cases. Live demos and lively discussion will be the focus of the talk. You will go home with the foundation to do more impressive vector similarity searches.
Speaker: Steve Pousty is a dad, partner, son, a founder, and a principal developer advocate at Voxel51. He can teach you about Computer Vision, Data Analysis, Java, Python, PostgreSQL, Microservices, and Kubernetes. He has deep expertise in GIS/Spatial, Remote Sensing, Statistics, and Ecology. Steve has a Ph.D. in Ecology and can be bribed with offers of bird watching or fly fishing.
Q&A
- Why are you multiplying by -1 to search for the farthest?
- Are test images loaded into the DB?
- Can you throw more light on wrappers, especially web wrappers in this context of data?
- Would the triggers create a large impact on the database performance?
- Is there any indexing algorithm involved or any cache to reduce the latency when we have millions of images (vectors) to match?
Resource Links
Join the AI, Machine Learning and Computer Vision Meetup!
The goal of the Meetups is to bring together communities of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies.
Join one of the 12 Meetup locations closest to your timezone.
- Athens
- Austin
- Bangalore
- Boston
- Chicago
- London
- New York
- Peninsula
- San Francisco
- Seattle
- Silicon Valley
- Toronto
What’s Next?
Up next on Nov 14, 2024 at 10:00 AM PT / 1:00 PM ET, we have three great speakers lined up!
- Human-in-the-loop: Practical Lessons for Building Comprehensive AI Systems- Adrian Loy, Merantix Momentum
- Curating Excellence: Strategies for Optimizing Visual AI Datasets- Harpreet Sahota, Voxel51
- Deploying ML models on Edge Devices using Qualcomm AI Hub– Bhushan Sonawane, Qualcomm
Register for the Zoom here. You can find a complete schedule of upcoming Meetups on the Voxel51 Events page.
Get Involved!
There are a lot of ways to get involved in the Computer Vision Meetups. Reach out if you identify with any of these:
- You’d like to speak at an upcoming Meetup
- You have a physical meeting space in one of the Meetup locations and would like to make it available for a Meetup
- You’d like to co-organize a Meetup
- You’d like to co-sponsor a Meetup
Reach out to Meetup co-organizer Jimmy Guerrero on Meetup.com or ping me over LinkedIn to discuss how to get you plugged in.
—
These Meetups are sponsored by Voxel51, the company behind the open source FiftyOne computer vision toolset. FiftyOne enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. It’s easy to get started, in just a few minutes.