Building Agent Skills for Real-World Workflows

When Anthropic introduced the idea of agent skills, they described them not as prompts or tools, but as structured knowledge that teaches AI agents how to perform real tasks safely and reliably. That idea became the foundation of how we built FiftyOne Skills and how we now think of agentic workflows in production.

This blog is a reflection on what we learned while building agent skills in practice. It covers what skills actually look like once you start using them, why Anthropic’s model helped shape our thinking, and why skills became a natural part of our agentic workflow. Most importantly, it shares the lessons that emerged as we built, used, and iterated on skills for both internal teams and external users.

It's not theory. It’s what we learned by building FiftyOne Skills in real workflows.

What is an agent skill in practice?

An agent skill is simply a way to teach an agent how to do something well. Not just what button to press, but how to approach a task from start to finish. It captures the steps, the decisions, the edge cases, and what “done correctly” actually means.

When Anthropic introduced the idea of agent skills, what stood out was that skills weren’t about adding more tools or writing longer prompts, they were about packaging experience in a way an AI agent could reuse. This post focuses on what we learned applying that model in real agentic workflows at FiftyOne, rather than documenting Claude-specific implementations.

Looking ahead, I believe skills will matter more than MCP alone for most real agentic workflows. MCP gives agents access to powerful tools, but skills are what turn those tools into reliable workflows.

Sometimes skills guide how MCP tools should be used, sometimes they combine MCP with existing CLIs, and sometimes they don’t need MCP at all, they just define style, format, or consistency. That’s why we shipped FiftyOne Skills for AI Engineering. Skills aren’t hype. They’re a practical way to take what we already know how to do and make it reusable, shareable, and easier for both humans and agents to follow.

How we applied skills in real-world agentic workflows

When we started building FiftyOne Skills, we followed the same idea Anthropic skills proposes, but we applied it very practically. We treated each skill as a small piece of experience, not as a generic instruction. Every skill is designed to solve one concrete problem, follow a proven agentic workflow, include basic safety checks, and explain why certain decisions are made. The goal was simple: instead of asking the AI agent to figure things out every time, we wanted to show it how this task is usually done correctly. That shift made a difference immediately. Errors went down, behavior became more consistent, and the AI agent stopped improvising in places where it shouldn’t.

At Voxel51, we think about agent skills in two broad groups: skills for developers and skills for active users. Both live in the same place and follow the same philosophy, but they solve different problems. You can see all of them in the public repository: https://github.com/voxel51/fiftyone-skills

In this blog, I’m focusing on developer-focused skills, because that’s where we learned the most while building and using them day to day. These skills help developers move faster while still following best practices:

Develop Plugin — guides developers through creating custom FiftyOne plugins, including operators and panels, using a clear and repeatable structure.
Code Style — helps write Python code that follows FiftyOne’s conventions, acting like a consistent reviewer.
PR Triage — helps triage GitHub issues and pull requests by validating their status, categorizing them, and generating clear responses.

All of these skills are still under learning. They improve every time someone uses them, gives feedback, or hits an edge case. That’s exactly how we want them to evolve. Below are the main lessons we learned, grounded in concrete examples from the AI agent skills we shipped at Voxel51.

Lesson 1: Start from the user, then reverse engineer the skill

The first and most important lesson was to start from the user’s perspective, not from the tool.

Instead of asking “what should this skill do?”, we asked:

What task do I want to complete using the agent?
What questions do I naturally ask?
What decisions do I expect the agent to make?
Where do I usually correct it?

Once we understood that flow, we reverse-engineered it into a skill.

The Develop Plugin skill is a good example. We didn’t design it upfront as a “plugin skill.” We first used the agent to help us build plugins, noticed where it struggled, and identified the thinking process we were repeating every time. That process, the steps, checks, and decisions, became the skill itself.

Check out Develop Plugin

Skills work best when they are extracted from real usage, not imagined/ideal agentic workflows.

Lesson 2: Agent skills are never finished

One thing we learned very quickly is that a skill is not done when you write the first version.

The first version is just a starting point.

Every time you use a skill, it becomes an opportunity to improve it. Real progress happens when you watch how the agent behaves in practice: where it hesitates, where it makes assumptions, and where you still need to step in and correct it. Skills are a dynamic process, not a static artifact.

Each interaction raises new questions:

What confused the agent?
What instruction was missing?
What edge case wasn’t covered?

Those answers go back into the skill. Over time, the skill becomes more precise, more robust, and more aligned with how the task is actually done. You don’t “finish” a skill, you maintain it and evolve it as your agentic workflow changes.

In early versions, we often had to interrupt the agent and correct its assumptions. Most of the time, that wasn’t a model problem, it was a signal that the skill was missing context. Those corrections became inputs. Every fix we made manually was something the skill should eventually know how to do on its own.

This is one of the biggest differences between skills and prompts. Prompts are rewritten. Skills are improved through use.

Lesson 3: Don’t hardcode agent skills—leave room to explore

One early mistake we made was being too specific.

If a skill is too tightly scoped, the agent can’t adapt when the input changes slightly. Instead of hardcoding exact steps, we learned to make skills generic enough to explore, while still guiding behavior.

A good skill:

defines the goal clearly,
gives structure to the process,
but allows the agent to explore details when needed.

This is especially important when working with real datasets, codebases, or repositories where every case is a bit different.

Lesson 4: Safety needs human feedback

Security and safety matter, but trying to control every step with rules often hides the bigger picture.

What worked better for us was designing agent skills that expect human feedback, even when the goal is to automate most of the pipeline. No matter how good the agent is, there should always be a moment where a human can review, confirm, or stop the process.

In practice, this means skills should:

ask for confirmation before destructive actions,
validate assumptions instead of blindly proceeding,
explain what will happen next in plain language.

At the same time, we avoid adding dozens of micro-rules. Over-control makes agentic workflows brittle and harder to reason about. The goal is not to automate everything end-to-end without visibility, but to keep humans in the loop where it matters. Good skills don’t remove responsibility from users. They support better decisions by making risks visible and easy to review.

Lesson 5: Agent skills teach judgment, not just steps

The most powerful idea we borrowed from agent skills Anthropic is this:

Skills should teach judgment.

Good skills explain:

why a step exists,
when to skip it,
what to do if it fails.

This turns skills into mental models, not scripts.

Agents stop acting like automations and start acting like assistants.

What FiftyOne Skills enables going forward

Once we started treating skills as living artifacts, something shifted.

Skills became more than instructions. They became:

shared understanding across teams,
a place to encode judgment,
and a way to make agentic workflows survive scale.

Instead of re-explaining how things should be done, we could point to a skill. Instead of fixing the same mistakes repeatedly, we improved the skill once.

This is why we believe skills will matter more over time, not because they’re new, but because they make experience reusable.

Want to explore or contribute?

All the skills mentioned in this post are open and evolving. You can explore them, use them, or contribute feedback on GitHub:

FiftyOne Skills GitHub

If you’re experimenting with agentic workflows, we’d love to hear what works for you, and what doesn’t. Drop your feedback in #project-mcp-server on our Discord—your insights help us make these skills better.

Talk to a computer vision expert