Computer Vision Datasets with Natural Language

Imagine telling your computer vision system exactly what you want, in plain language, and having it just… do it. No scripting. No wrestling with file formats. No chasing down mislabeled images.

Load my COCO dataset, run YOLOv8 detection, and show me the results.

With FiftyOne Skills and the FiftyOne MCP Server it’s now possible. Load your dataset, run a model like YOLOv8, and visualize results — all through a natural language interface. Your AI assistant doesn’t just suggest commands; it executes them, handling the messy details so you can focus on the parts that matter: improving data quality and building better models. Claude becomes a data curator that actually executes, not just generates code.

What is a natural language interface?

A natural language interface (NLI) lets humans interact with software systems using everyday language instead of rigid commands, scripts, or UI clicks. Instead of learning a domain-specific API or remembering exact function signatures, you describe what you want to accomplish, and the system translates that intent into concrete actions.

In practice, a natural language interface sits on top of existing tools and workflows. It doesn’t replace them, it orchestrates them. The interface parses intent, asks clarifying questions when needed, and executes real operations against real systems. The key difference from chatbots or code generators is execution: a true NLI doesn’t just suggest what to do, it actually does it.

When connected to the right backend, a natural language interface becomes a control plane for complex systems — lowering the barrier to entry without dumbing down what’s possible.

Why natural language interfaces matter for computer vision

Computer vision workflows are powerful, but they’re also fragmented. Loading datasets, converting formats, running inference, inspecting failures, fixing labels — each step lives in a different tool, file, or script. Even experienced engineers spend more time wrangling data than improving models.

Natural language interfaces compress this complexity. They let you express intent instead of implementation: “Find duplicate images,” “Run detection and show false positives,” “Visualize embeddings for this class.”

For CV teams, this matters because:

Iteration gets faster: You move from idea → execution in one step.
Expertise is shared: Hard-won workflows become reusable instead of tribal knowledge.
More people can contribute: Researchers, data scientists, and PMs can explore datasets without writing glue code.
Focus shifts to data quality: The real bottleneck in model performance.

In short, NLIs turn computer vision from a pipeline you manage into a system you collaborate with.

What are skills and MCP?

Agent skills and MCP solve different parts of the same problem. MCP is about connection: it lets an agent talk to real systems, run real operations, and get real results instead of just generating text. Skills are about guidance: they teach the agent how to use those capabilities correctly for a specific task. MCP exposes what the system can do, while skills explain how and when to do it. On their own, tools are powerful but can be ambiguous. Skills turn those tools into repeatable workflows. They encode experience, decisions, and guardrails. Together, they let agents move from “I can call functions” to “I know how to complete this task end to end.”

Two parts, one natural language interface

FiftyOne MCP Server connects Agents to FiftyOne’s 80+ operators, dataset management, model inference, brain computations, and the App. It’s the bridge between natural language and FiftyOne tools.

FiftyOne Skills are expert workflows built on top. Each skill teaches the agent how to complete a specific task: import data, find duplicates, visualize embeddings. Skills handle the nuances so you don’t have to.

Setup: Enable a natural language interface for FiftyOne

Step 1: Install the MCP server

We recommend using a Python 3.11 virtual environment. Before installing, make sure your environment is active and that `pip`, `setuptools`, and `wheel` are up to date to avoid dependency issues.

Now install the MCP Server:

Step 2: Connect to your AI tool - Claude Code

This step happens inside Claude Code, not your local terminal. To learn how to install Claude Code on your computer, check the setup guide.

- Open Claude Code

- Type `/plugins` to open the plugin manager

- Run:

Claude is now connected to FiftyOne.

Step 3: Install your first skill

Add the FiftyOne Skills marketplace and your first skill in the same Claude Code environment:

Try it: Dataset inference through natural language

Our first skill handles the most common workflow, loading data and running models.

Download the sample dataset (49 COCO images):

Using your own data? Just point to your dataset folder and format, the skill handles the rest.

Ask Claude:

Using the fiftyone-dataset-inference skills, import the image dataset at ~/fo-tutorial/skills_49 as tutorial_skills, run YOLOv8, and open the FiftyOne App.

What happens

Claude explores your directory structure
Confirms settings before creating anything
Creates a persistent FiftyOne dataset
Imports samples
Runs YOLOv8 (or any model) and stores predictions
Opens the FiftyOne App at `localhost:5151`

If you’ve read this far, the best way to understand how this works is to try it.

These workflows aren’t meant to stay on paper. They’re designed to be explored, questioned, and pushed by real users working with real datasets. Whether you’re importing mixed media, checking for duplicates, or visually exploring embeddings, the agent becomes most useful when you start interacting with it and seeing how it responds to your data.

We’re actively experimenting with these ideas and learning from how people use them in practice. If you want to test the MCP server, try out agent-driven workflows, or share feedback, join the FiftyOne community Discord and jump into the project-mcp-server channel:

Join our Discord

That’s where most of the discussion, questions, and iteration are happening. Every experiment and piece of feedback helps shape what works next.

Resources

Star us on GitHub Skills & Github Fiftyone ⭐️

Talk to a computer vision expert