Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows. We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming.