AI-assisted coding is having its moment. For autocomplete tools and AI agents like GitHub Copilot and Cursor, the hype is real. But so is the confusion. Are we replacing developers? Can anyone build software just by prompting? Is “vibe coding” the future?
At Modus Create, we wanted to cut through the noise. So we ran a real experiment: two teams, same scope, same product, same timeline. One team used traditional workflows. The other used AI agents to scaffold, implement, and iterate — working in a new paradigm we call Agentic Coding.
Every technique we learned along the way and every insight this approach taught us is collected in our Agentic Coding Handbook. This article distills the lessons from the handbook into the core principles and practices any engineer can start applying today.
From Typing Code to Designing Systems
Agentic coding isn’t about writing code faster. It’s about working differently. Instead of manually authoring every line, engineers become high-level problem solvers. They define the goal, plan the implementation, and collaborate with an AI agent that writes code on their behalf.
Agentic Coding is a structured, AI-assisted workflow where skilled engineers prompt intentionally, validate rigorously, and guide the output within clear architectural boundaries.
This approach is fundamentally different from what many refer to as “vibe coding”, the idea that you can throw a vague prompt at an LLM and see what comes back. That mindset leads to bloated code, fragile architecture, and hallucinations.
Agentic Coding vs. Vibe Coding
To illustrate the difference, here’s how agentic coding compares to the more casual “vibe coding” approach across key dimensions:
Agentic Coding | Vibe Coding | |
---|---|---|
Planning | Structured implementation plan | None or minimal upfront thinking |
Prompting | Scoped, intentional, reusable | Loose, improvisational, trial-and-error |
Context | Deliberately curated via files/MCPs | Often missing or overloaded |
Validation | Treated as a critical engineering step | Frequently skipped or shallow |
Output Quality | High, repeatable, aligned to standards | Inconsistent, often needs full rewrite |
Team Scalability | Enables leaner squads with high output | Prone to technical debt and drift |
Agentic coding provides the structure, discipline, and scalability that large organizations need to standardize success across multiple squads. It aligns AI workflows with existing engineering quality gates, enabling automation without losing control. In contrast, vibe coding may produce short-term wins but fails to scale under the weight of enterprise demands for predictability, maintainability, and shared accountability.
A Note on Our Experiment
We ran a structured experiment with two engineering squads working on the same product. One team (DIY) built the product using traditional methods. The other team (AI) used Cursor and GitHub Copilot Agent to complete the same scope, using agentic workflows. The AI team had 30% fewer engineers and delivered in half the time. More importantly, the code quality — verified by SonarQube and human reviewers — was consistent across both teams.
Core Practices That Make the Difference
Implementation Planning is Non-Negotiable
Before any prompting happens, engineers must do the thinking. Creating an implementation plan isn’t just a formality but the most critical piece in making agentic coding work. It’s where intent becomes design.
A solid implementation plan defines what to build, but also why, how, and within what constraints. It includes:
- Functional goals: What should this piece of code do?
- Constraints: Performance expectations, architecture rules, naming conventions, etc.
- Edge cases: Known pitfalls, alternate flows, integration risks.
- Required context: Links to schemas, designs, existing modules, etc.
- Step-by-step plan: Breakdown of the task into scoped units that will become individual prompts.
This plan is usually written in markdown and lives inside the codebase. It acts like a contract between the engineer and the AI agent.
The more precise and explicit this document is, the easier it is to turn each unit into a high-quality prompt. This is where agentic coding shifts from “throw a prompt and see what happens” to deliberate system design, supported by AI.
In short, prompting is the act. Planning is the discipline. Without it, you’re not doing agentic coding — you’re just taking shots in the dark and hoping something works.
Prompt Engineering is a Real Skill
Prompt engineering is not about being clever. It’s about being precise, scoped, and iterative. We teach engineers to break down tasks into discrete steps, write action-oriented instructions, avoid vague intentions, chain prompts, and use prompting strategies like:
- Three Experts: Use this when you want multiple perspectives on a tough design problem. For example, ask the AI to respond as a senior engineer, a security expert, and a performance-focused architect.
- N-Shot Prompting: Provide the AI with N examples of the desired output format or pattern. Zero-shot uses no examples, one-shot provides a single example, and few-shot (N-shot) includes multiple examples to guide the AI toward the expected structure and style.
- 10 Iteration Self-Refinement: Best used when you want the AI to improve its own output iteratively. Give it a problem, then prompt it to improve its previous response 10 times, evaluating each step with reasoning.
Choosing the right style depends on the type of challenge you’re tackling — architectural design, implementation, refactoring, or debugging.
Context is a First-Class Citizen
Model Context Providers (MCPs) give GitHub Copilot a second brain. Instead of treating the LLM as an isolated suggester, MCPs stream relevant context — from Figma designs, documentation in Confluence, code changes from GitHub, and decision logs — directly into the Copilot chat session.
This allows engineers to ask Copilot to write code that matches an actual UI layout, or implements some logic described in a design doc, without manually pasting content into the prompt. The results are significantly more relevant and aligned. Some of the MCPs we use are:
- GitHub MCP: Pulls in pull request content and comments to give the model full context for writing review responses, proposing changes, or continuing implementation from feedback.
- Figma MCP: Streams UI layouts into the session, enabling the AI to generate frontend code that accurately reflects the design.
- Database Schema MCP: Injects table structures, column types, and relationships to help the AI write or update queries, migrations, or API models with accurate field-level context.
- Memory Bank MCP: Shares scoped memory across sessions and team members, maintaining continuity of architectural decisions, prompt history, and recent iterations.
- CloudWatch MCP: Supplies log output to the AI for debugging and incident triage — essential during the Debugging workflow.
- SonarQube MCP: Feeds static analysis results so the AI can refactor code to eliminate bugs, smells, or duplication.
- Confluence MCP: Integrates architecture and business documentation to inform decisions around domain logic, constraints, and requirements.
MCPs are just one part of the context curation puzzle. Engineers also need to deliberately craft the model’s working memory for each session. That includes:
- Implementation Plans: Markdown files that define goals, steps, constraints, and trade-offs, acting as an onboarding doc for the AI agent.
- Codebase Files: Selectively attaching relevant parts of the codebase (like entry points, shared utilities, schemas, or config files) so the AI operates with architectural awareness.
- Console Logs or Test Output: Including runtime details helps the AI understand execution behavior and suggest context-aware fixes.
- Instructions or TODO Blocks: GitHub Copilot supports markdown-based instruction files and inline TODO comments to guide its code generation. These instructions act like lightweight tickets embedded directly in the repo. For example, an
INSTRUCTIONS.md
might define architectural rules, file responsibilities, or interface contracts. Within code files, TODOs like// TODO: replace mock implementation with production-ready logic
act as scoped prompts that Copilot can act on directly. Used consistently, these become in-repo signals that align the agent’s output with team expectations and design intent, markers inside the code to direct the model towards a specific change or design pattern.
Effective context curation is an engineering discipline. Give too little, and the agent hallucinates. Give too much, and it loses focus or runs out of space in the LLM context window. The best results come from curating the smallest possible set of high-signal resources. When you treat context as a design artifact the AI becomes a more reliable collaborator.
The Role of Workflows
We embedded AI in our delivery pipeline using a set of core workflows. You can explore each one in more detail in our handbook, but here is the high-level overview:
Workflow | Purpose |
---|---|
Spec-First | Write a scoped prompt plan before coding |
Exploratory | Understand unfamiliar codebases with AI help |
Memory Bank | Maintain continuity across sessions and team members |
TDD | Test-first with AI-generated test coverage |
Debugging | Use AI to triage, investigate, and fix bugs |
Visual Feedback | Align AI output with Figma and screenshots |
Auto Validations | Run tools like SonarQube, ESLint post-output |
In our experience, these workflows are not just productivity boosters; they’re the foundation for scaling AI-assisted development across teams. They provide consistency, repeatability, and shared mental models. We believe this approach is especially critical in enterprise environments, where large engineering organizations require predictable output, quality assurance, and alignment with established standards. Agentic workflows bring just enough structure to harness AI’s strengths without sacrificing accountability or control.
Building a Validation Loop
We use validation tools like SonarQube, ESLint, Vitest, and Prettier to provide automatic feedback to the AI. For example, if SonarQube flags duplication, we prompt the AI to refactor accordingly. This creates a tight loop where validation tools become coaching signals.
Some tools, like GitHub Copilot, can even collect log output from the terminal running tests or executing scripts. This allows the AI to observe the outcome of code execution, analyze stack traces or test failures, and automatically attempt fixes. One common approach is asking the AI to run a test suite, interpret the failed test results, make corrections, and repeat this process until all tests pass.
Lizard, a tool that calculates code complexity metrics, is another useful validation tool. Engineers can instruct the AI to execute Lizard against the codebase. When the output indicates that a function exceeds the defined complexity threshold (typically 10), the AI is prompted to refactor that function into smaller, more maintainable blocks. This method forces the AI to act on specific, measurable quality signals and improves overall code readability.
In this setup, engineers can let the AI operate in a closed loop for several iterations. Once the AI produces clean validation results — whether through passing tests, static analysis, or complexity reduction — the human engineer steps back in to review the result. This combination of automation and oversight speeds up bug fixing while maintaining accountability.
But here’s the thing: the team needs to actually understand what the AI built. If you’re just rubber-stamping AI changes without really getting what they do, you’re setting yourself up for trouble. The review step isn’t just a checkbox — it’s where you make sure the code actually makes sense for your system.
Why Human Oversight Still Matters
No AI is accountable for what goes to production. Engineers are. AI doesn’t own architectural tradeoffs, domain-specific reasoning, or security assumptions. Human-in-the-loop is the safety mechanism.
Humans are the only ones who can recognize when business context changes, when a feature should be cut for scope, or when a security concern outweighs performance gains. AI can assist in code generation, validation, and even debugging — but it lacks the experience, judgment, and ownership required to make trade-offs that affect users, stakeholders, or the long-term health of the system.
Human engineers are also responsible for reviewing the AI’s decisions, ensuring they meet legal, ethical, and architectural constraints. This is especially critical in regulated industries, or when dealing with sensitive data. Without a human to enforce these standards, the risk of silent failure increases dramatically.
Agentic coding isn’t about handing off responsibility, it’s about amplifying good engineering judgment.
Where People Fail (And Blame the AI)
Common mistakes include vague prompts, lack of planning, poor context, and not validating output. While LLMs have inherent limitations — they hallucinate, make incorrect assumptions, and produce plausible-sounding but wrong outputs even with good inputs — engineering discipline significantly increases the reliability of results.
A prompt like “make this better” tells the AI nothing about what “better” means — faster? more readable? safer? Without clear constraints and context, LLMs default to producing generic solutions that may not align with your actual needs. The goal isn’t to eliminate all AI errors, but to create workflows that catch and correct them systematically.
Lack of validation is another key failure mode. Trusting the first output, skipping tests, or ignoring code quality tools defeats the point of the feedback loop. AI agents need boundaries and coaching signals or, without them, they can drift into plausible nonsense.
Using these tools effectively also means understanding their current limitations. AI models work best with well-represented programming languages like JavaScript, TypeScript, and Python (to name a few examples). However, teams working in specialized domains may see limited results even with popular languages.
A Closer Look at Our Tooling
GitHub Copilot played a key role in our experiment, especially when paired with instruction files, validation scripts, and Model Context Providers (MCPs).
What made GitHub Copilot viable for agentic workflows wasn’t just its autocomplete or inline chat. It was how we surrounded it with structure and feedback mechanisms:
Instruction Files
Instruction files served as the AI’s map. These markdown-based guides detailed the implementation plan, scoped tasks, architectural constraints, naming conventions, and even file-level goals. When placed inside the repo, they gave GitHub Copilot context it otherwise wouldn’t have. Unlike ad-hoc prompts, these files were written with intent and discipline, and became a critical part of the repo’s knowledge layer.
Validation Scripts
We paired Copilot with post-generation validation tools like ESLint, Vitest, Horusec, and SonarQube. These weren’t just guardrails but closers of the loop. When Copilot generated code that violated rules or failed tests, engineers would reframe the prompt with validation results as input. This prompted Copilot to self-correct. It’s how we turned passive AI output into an iterative feedback process.
Copilot + Workflows = Impact
Used this way, GitHub Copilot became more than a helper. It became a participant in our structured workflows:
- In Spec-First, Copilot consumed instruction files to scaffold code.
- In Debugging, it analyzed logs fed via MCP and proposed targeted fixes.
- In TDD, it generated unit tests from requirements, then refactored code until tests passed.
- In Visual Feedback, it aligned components with Figma via the design MCP.
By aligning Copilot with prompts, plans, validation, and context, we moved from “code completion” to code collaboration.
So no — GitHub Copilot isn’t enough on its own. But when embedded inside a disciplined workflow, with context and feedback flowing in both directions, it’s a capable agent. One that gets better the more structured your engineering practice becomes.
Final Advice: How to Actually Start
The path to agentic coding begins with a single, well-chosen task. Pick something atomic that you understand deeply — a function you need to refactor, a component you need to build, or a bug you need to fix. Before touching any AI tool, write an implementation plan that defines your goals, constraints, and step-by-step approach.
Once you have your plan, start experimenting with the workflows we’ve outlined. Try Spec-First to scaffold your implementation, then use Auto Validations to create feedback loops. If you’re working with UI, explore Visual Feedback with design tools. As you gain confidence, introduce Model Context Providers to give your AI agent richer context about your codebase and requirements. Always keep in mind that the quality of AI output depends on the quality of the task setup and the availability of feedback.
Treat each interaction as both an experiment and a learning opportunity. Validate every output as if it came from a junior developer. Most importantly, remember that this isn’t about replacing your engineering judgment; it’s about amplifying it. The most successful engineers in our experiments were the ones who treated the AI as a collaborator — not a magician.
What we’ve described isn’t just a productivity technique — it’s a fundamental shift in how we think about human creativity and machine capability. When engineers become high-level problem solvers, supported by AI agents within well-defined boundaries, we unlock new possibilities for what software teams can accomplish. Welcome to the next era of software development.
Behind the scenes
Wesley Fuchter is a Senior Principal Engineer at Modus Create, with over 13 years of experience building cloud-native applications for web and mobile. Working as a tech leader he's spending most of his time working closely with engineering, product, and design to solve customers' business problems. His experience sits at the intersection of hands-on coding, innovation, and people management with a mix of experiences going from AWS, Java and TypeScript to startups, agile and lean practices.
If you enjoyed this article, you might be interested in joining the Tweag team.