How to Implement Agentic Development in Your Engineering Workflow – Lessons from Spotify and Anthropic

Introduction

Agentic development is reshaping how software teams build, test, and deploy code. Inspired by the collaboration between Spotify and Anthropic, this guide walks you through adopting AI agents—autonomous systems that can plan, write, debug, and even refactor code—into your daily engineering practice. Instead of replacing developers, these agents act as tireless collaborators, handling repetitive tasks and freeing you to focus on complex problem-solving. By following the steps below, you’ll learn how to set up, integrate, and refine agentic workflows that boost productivity without sacrificing control.

How to Implement Agentic Development in Your Engineering Workflow – Lessons from Spotify and Anthropic — Source: engineering.atspotify.com

What You Need

An AI model provider (e.g., Anthropic’s Claude API, OpenAI, or a local LLM)
A development environment (VS Code, JetBrains, or terminal with shell access)
Version control system (Git and a platform like GitHub or GitLab)
CI/CD pipeline (GitHub Actions, Jenkins, or similar)
Agent orchestration framework (for example, LangChain, CrewAI, or custom scripts)
Access to task management (Jira, Linear, or a simple to-do list)
Basic understanding of API keys and environment variables
A small, non-critical project to test your agent workflow on

Step-by-Step Guide

Step 1: Define Agent Roles and Boundaries

Before writing any code, decide what your agent will (and will not) do. Spotify and Anthropic emphasize that agents shouldn’t have unrestricted access. Start by listing tasks your team finds tedious or time-consuming—like writing unit tests, formatting code, generating documentation, or triaging issues. Assign one role per agent: for example, a Test Agent that creates pytest files, a Refactor Agent that suggests improvements, and a Docs Agent that updates READMEs. Set clear boundaries: agents can modify files only in specific directories, and all changes must be reviewed by a human before merging.

Step 2: Configure Your AI Model Access

Sign up for an API key from your chosen provider (e.g., Anthropic). Store the key securely as an environment variable (ANTHROPIC_API_KEY). Install the official SDK in your development environment:

npm install @anthropic-ai/sdk  # for Node.js
pip install anthropic          # for Python

Test connectivity by writing a simple script that sends a prompt and logs the response. Ensure you’ve set a token limit and temperature appropriate for code generation (lower temperature, e.g., 0.2, yields more deterministic outputs).

Step 3: Build a Basic Agent Loop

Create a core loop where the agent receives a task, acts on it, and reports results. A minimal structure in Python might look like:

import anthropic
import subprocess

client = anthropic.Anthropic()

def agent_loop(task):
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{"role": "user", "content": task}]
    )
    code_output = response.content[0].text
    # write to file and run tests
    with open("generated_code.py", "w") as f:
        f.write(code_output)
    result = subprocess.run(["python", "-m", "pytest", "generated_code.py"], capture_output=True)
    return result.stdout

This is deliberately simple. In production, you’d wrap this in error handling and add a sandboxed execution environment.

Step 4: Integrate Agents with Version Control

To make agents useful collaboratively, connect them to your Git workflow. Use a webhook (e.g., GitHub App) that triggers an agent when a pull request is opened. The agent can analyze the diff, suggest improvements, or automatically add tests. For example:

On PR creation, send the diff to an agent via the API.
Ask the agent to generate a code review summary and post it as a comment.
Optionally, let the agent create a new branch with suggested changes.

Critical: never let an agent push directly to main. Always require human approval. Use branch protection rules to enforce this.

Step 5: Add Agents to Your CI/CD Pipeline

Take agentic development further by running agents as part of your continuous integration. For instance, a Security Agent can scan new code for vulnerabilities using an LLM, while a Documentation Agent can regenerate API docs. In GitHub Actions:

jobs:
  agent-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run agent review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          python agent_review.py --diff $(git diff origin/main...HEAD)

Set the agent’s output as a check that can pass or fail. Spotify’s team uses this approach to catch style issues and potential bugs before code reaches human reviewers.

Step 6: Implement Human-in-the-Loop Feedback

Agents will sometimes produce incorrect or unsafe code. Build a feedback mechanism where developers can rate agent outputs and provide corrective prompts. Store these interactions (anonymized) to fine-tune or adjust system prompts later. For example, add a simple thumbs-up/thumbs-down button in your PR comments. Use this data to iterate on the agent’s instructions—update the prompt to discourage unsafe patterns or to prefer a specific coding style.

Step 7: Monitor, Log, and Iterate

Track the agent’s actions in a dedicated log. Record the prompt, response, file changes, and the final decision (accepted/rejected by human). Review these logs weekly to identify failure modes. Common issues include:

Hallucination of nonexistent APIs
Infinite loops when fixing errors
Security risks like exposing credentials

Adjust your agent’s system prompt to mitigate these. For instance, add “Always verify API calls against official documentation” or “Never output real API keys.”

Tips for Success

Start small: Don’t deploy a full agent swarm on day one. Pick one task (e.g., auto-generating unit tests) and perfect it before expanding.
Maintain human oversight: No matter how good your agent becomes, always require a developer to approve code changes. The goal is augmentation, not full automation.
Use versioned prompts: Treat your agent’s system prompt like code—store it in Git and version it. This makes debugging easier.
Leverage the Spotify+Anthropic pattern: In their live demo, they used a multi-agent setup where one agent debugs another. Consider pairing agents that check each other’s output.
Sandbox executions: Run agent-generated code in isolated containers (Docker) to prevent accidental damage to your production environment.
Measure impact: Track metrics like time saved, number of bugs caught early, or developer satisfaction. Use these to justify further investment.
Stay updated: AI models improve rapidly. Revisit your agent’s configuration quarterly to take advantage of new capabilities.