Berlin AI, ML & Computer Vision Meetup – Nov 22, 2024

Berlin AI, ML & Computer Vision Meetup – Nov 22, 2024

This event is now over.

Register for the next one.

Go to upcoming events
Skip to content

Berlin AI, Machine Learning and Computer Vision Meetup

Nov 22, 2024 | 5:30 to 8:30 PM

Register for the event at MotionLab.Berlin

By submitting you (1) agree to Voxel51’s Terms of Service and Privacy Statement and (2) agree to receive occasional emails.

Date, Time and Location

Date and Time

Nov 22, 2024 from 5:30 PM to 8:30 PM

Location

The Meetup will take place at MotionLab.Berlin, Bouchéstraße 12/Halle 20 in Berlin

Vector Streaming: The Memory Efficient Indexing for Vector Databases

Sonam Pankaj
GenerativeAI Evangelist at Articul8-ai

Vector databases are everywhere, powering LLMs. Indexing vectors, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient. Supports, Weaviate, Elastic and Pinecone.

About the Speaker

Sonam Pankaj is a Generative AI Evangelist at Articul8-ai and the co-creator and maintainer of the open-source library called Embed-Anything, which helps to create local dense, splade, and multimodal embeddings and index them to vector databases; it’s built-in Rust for speed and efficiency . She worked previously at Qdrant Engine and Rasa. Previously, she also worked as an AI researcher at Saama and has worked extensively on clinical trial analytics. She is passionate about topics like metric learning and biases in language models. She has also published a paper in the most reputed journal of computational linguistics, COLING, in ACL Anthology

How to Unlock More Value from Self-Driving Datasets

Dan Gural
Machine Learning Engineer at Voxel51

AV/ADAS is one of the most advanced fields in Visual AI. However, getting your hands on a high quality dataset can be tough, let alone working with them to get a model to production. In this talk, I will show you the leading methods and tools to help visualize as well take these datasets to the next level. I will demonstrate how to clean and curate AV datasets as well as perform state of the art augmentations using diffusion models to create synthetic data that can empower the self driving car models of the future,

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

When the Medium is the Message: Addressing Input Biases in Multimodal/Multilingual Models

Scott Martens
Jina AI

An embedding model is trained to produce outputs that ensure that semantic similarity is preserved as distance in embedding spaces — like is near like and far from unlike. But models trained with diverse kinds of inputs, i.e. different media and different languages, learn to treat those properties as semantic properties. Two pictures are more “semantically alike” than a picture and a descriptive text that matches it. Similar problems arise with multilingual models: Two English sentences are more alike than an English sentence and a Chinese translation. This undermines the general utility of embedding models. This presentation shows evidence of where this comes from and offers approaches to mitigate the problem.

About the Speaker

Scott Martens is a long-term veteran of AI and NLP research, having started working at AI start-ups in 1994, and a KU Leuven graduate with a doctorate in linguistics. His background includes machine translation development and the intersection between linguistics, philology, and modern AI. Dr. Martens is a Senior Content Manager and Evangelist at Jina AI in Berlin.

How Reliable Is an Autonomous Car’s Understanding of Its Surroundings?

Hanieh Shojaei
Leibniz Universität Hannover

Thanks to deep learning, autonomous cars equipped with cameras and LiDAR can accurately recognize common objects such as cars, streets, and pedestrians, significantly enhancing their understanding of the environment. However, these models often display overconfidence, which can result in misidentifications. For example, consider an exaggerated scenario where an elephant on the street might be mistakenly identified as a trunk because the model has not been trained to recognize elephants. This issue stems from the models’ design to make decisions rather than acknowledge uncertainty by saying ‘I don’t know.’ In this talk, we will discuss how models can recognize their limitations and avoid making uncertain decisions, particularly through the lens of an autonomous car.

About the Speaker

Hanieh Shojaei is a PhD researcher at the Institute of Cartography and Geoinformatics (IKG) at Leibniz University Hannover, specializing in uncertainty estimation and reliability of AI models. Her research focuses on using deep learning for LiDAR scene segmentation to enhance environmental perception and assess prediction reliability for autonomous vehicles.