We just wrapped up the Dec 7, 2023 AI, Machine Learning and Data Science Meetup, and if you missed it or want to revisit it, here’s a recap! In this blog post you’ll find the playback recordings, highlights from the presentations and Q&A, as well as the upcoming Meetup schedule so that you can join us at a future event.
First, Thanks for Voting for Your Favorite Charity!
In lieu of swag, we gave Meetup attendees the opportunity to help guide a $200 donation to charitable causes. The charity that received the highest number of votes this month was The Coalition for Rainforest Nation (CfRN), an organization working on the responsible stewardship of the world’s last great rainforests to achieve environmental and social sustainability.. We are sending this event’s charitable donation of $200 to CfRN on behalf of the computer vision community!
Missed the Meetup? No problem. Here are playbacks and talk abstracts from the event.
Building Yourself a Private Search System with Retrieval Augmentation with Haystack
In this talk we will have a look at Haystack, an open source LLM framework, and how we can use it to create custom, private search systems on our own data. We will look at how we can build retrieval augmented generative pipelines for our Notion pages, and how Haystack can help you create custom tooling for larger NLP applications.
Speaker: Tuana Çelik is a Lead Developer Advocate at deepset, where she focuses on the open-source NLP community. With a background in engineering, her main focus is to help the Haystack developer community.
Q&A
- When is giving the context different from giving the answer or correct response?
- How does Haystack handle the deployment of models from Hugging Face or other providers into the NLP pipeline?
- What are the useful python libraries or scripts we can use for building chatbots that can give meaningful responses to questions asked?
- Is the extra information embedded the same way as the training set?
- For beginners, what / where are the best resources to start learning and potential tutorials / small project learning using Haystack?
- Can this be run on say on a Macbook or do you need server / cloud resources?
- Is there an integration between Haystack and Ollama?
- How does Haystack assist in fine-tuning a pre-trained language model?
- How is Haystack better than llamaindex and langchain?
- How does Haystack’s REST API help in deploying the final system so that it can be queried with a user-facing interface?
Resource links
- Slides from the talk can be found here.
- Notebook used in the presentation can be found here.
- Check out the Advent of Haystack
Rise of the Intelligent Data Platform
At Databricks, we see organizations looking to leverage the power of AI, not only to deliver intelligent solutions, but to also have intelligent user interfaces. Join us as we delve into how lakehouse architecture forms the backbone of intelligent data platforms, integrating AI to enhance user interaction and self-management. Discover how this evolution is democratizing data and AI access for all data workers in modern organizations, paving the way for the next generation of data and AI enabled solutions.
Speaker: Franco Patano is a Strategic Data and AI Advisor at Databricks, where he collaborates with customers to understand their business needs, challenges, and goals, and help them leverage the power of Databricks to deliver data products that drive value, innovation, and growth.
Q&A
- How does Databricks facilitate running applications on CPUs or GPUs based on analysis requirements?
- Does the Databricks AI give suggestions for feature engineering?
- In Databricks AutoML, can you define the hyper parameters search space and configure the regularization (including drop out)?
- Do you hook into a 3rd party Vector Search engine or do you have your own vector database? Where does it live?
- How does Databricks support real-time and streaming analytics?
Resource links
- Learn more about the Databricks Data Intelligence Platform here.
Scaling Similarity Search with USearch
Vector and similarity search is increasingly critical in 2023, but most libraries struggle to fully utilize modern hardware due to issues rooted in their code architecture. Many rely on object-oriented programming, which reduces memory-efficiency and data-locality. Additionally, dependence on compilers for low-level optimizations fails to properly emit key AVX-512 and SVE Assembly instructions for x86 and Arm. My talk will dissect these and other pitfalls, and demonstrate how USearch innovates in areas like architecture and SIMD utilization to overcome them.
Speaker: Ash Vardanian, Founder of Unum Cloud. With background across astrophysics, high performance computing, and systems design, Ash focuses on bridging theory and real-world AI application
Q&A
- Can we use SIMD-optimized tricks to speed up the transformer-based model inference? Basically we are doing matrix-vector multiplication instead of vector-vector multiplication.
- Would a good implementation be Uform for embedding + Usearch for search -> RAG to an LLM?
- At a small scale, is Usearch inefficient?
- What do you think about perhaps using the Apple Neural Engine in the future?
- How can USearch integrate with the Milvus vector search engine?
Resource links
- Project: USearch – up to 100x faster than FAISS vector search for 100M+ points
- Project: SimSIMD – up to 200x faster than NumPy and SciPy distance functions
- Project: StringZilla – up to 10x faster String class for Python
- How going from Python to Assembly accelerates cosine distances by 2500x
- Extensive distance functions benchmarks against SciPy
- Benchmarking USearch with Intel
- Using USearch for molecule search with AWS
Join the AI, Machine Learning and Data Science Meetup!
The AI, Machine Learning and Data Science Meetup membership has grown to over 11,000 members! The goal of the Meetups is to bring together communities of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies.
Join one of the 12 Meetup locations closest to your timezone.
- Athens
- Austin
- Bangalore
- Boston
- Chicago
- London
- New York
- Peninsula
- San Francisco
- Seattle
- Silicon Valley
- Toronto
What’s Next?
Up next on Jan 25 at 10 AM Pacific we have a great line up speakers including:
- SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset – Kimberly Wilber at Google Research
- More speakers to be announced shortly!
Register for the Zoom here. You can find a complete schedule of upcoming Meetups on the Voxel51 Events page.
Get Involved!
There are a lot of ways to get involved in the Computer Vision Meetups. Reach out if you identify with any of these:
- You’d like to speak at an upcoming Meetup
- You have a physical meeting space in one of the Meetup locations and would like to make it available for a Meetup
- You’d like to co-organize a Meetup
- You’d like to co-sponsor a Meetup
Reach out to Meetup co-organizer Jimmy Guerrero on Meetup.com or ping me over LinkedIn to discuss how to get you plugged in.
—
The Computer Vision Meetup network is sponsored by Voxel51, the company behind the open source FiftyOne computer vision toolset. FiftyOne enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. It’s easy to get started, in just a few minutes.