How I AISuccessfully coding with AI in large enterprises: Centralized rules, workflows for tech debt, & more
CHAPTERS
Enterprise AI coding vs. “vibe coding”: why guardrails matter at scale
Claire and Zach set the tone: what works for solo projects doesn’t translate to a 100+ person org shipping to a mission-critical platform. They frame the episode as operationalizing AI for real engineering teams—quality, maintainability, and consistency first.
- •Vibe coding can be fine for side projects but breaks down in enterprise codebases
- •Large teams need repeatable workflows and shared standards, not individual hacks
- •AI adoption must preserve code quality, security, and long-term maintainability
- •Goal: make AI effective inside real constraints (reviews, tests, conventions, scale)
AI tool stack at LaunchDarkly: a multi-tool reality across roles
Zach lists the rapidly expanding set of AI tools being used across design, product, and engineering. The key takeaway is that enterprises won’t standardize on one tool immediately—so workflows must tolerate heterogeneity.
- •Design tools: Lovable, v0, Figma Make (exploration mindset)
- •Product: ChatPRD
- •Engineering: Cursor (heavy), Devin (heavy), plus background agents; some use Windsurf
- •Trials: Augment, Claude Code; PR review via Copilot and Cursor
- •Implication: rules and best practices must work across multiple tools/IDEs
Driving adoption: why a named owner is essential for org-wide change
They discuss why AI enablement needs a responsible driver—someone close to the code who is actively testing what works. Without ownership, adoption becomes fragmented and skepticism hardens after early failures.
- •A dedicated adoption “driver” (not necessarily authoritarian “in charge”) increases success
- •Being close to the code is critical to diagnose failures and improve workflows
- •Early skepticism came from mismatch between small-app success and monorepo reality
- •Objective: turn experimentation into operationalized, repeatable practice
First-time success strategy: turning skeptics into believers
Zach emphasizes making engineers successful on their first attempt with AI tools. If the first experience is negative, engineers use it as proof the tools don’t work—so the org must engineer the “aha moment.”
- •Engineers’ first AI attempt often fails without guardrails/context
- •Negative first impressions create long-lived resistance
- •Scaling requires a system, not each engineer “on their own journey”
- •Enablement focus: make success the default experience
Repo as the source of truth: docs moved into the codebase for humans + LLMs
Zach’s foundational move: bring scattered documentation from Confluence/Docs into the repository. This improves human onboarding and gives LLMs direct access to the same authoritative guidance.
- •“What’s good for humans is also good for LLMs”
- •Centralize docs in a /docs directory inside the repo (not scattered across tools)
- •Examples: frontend organization, accessibility, JS style guide
- •Outcome: better default context for agents and developers
Centralized rules architecture: one canonical rule system for many AI tools
They show how to avoid duplicating rules across tool-specific formats (Claude.md, Cursor rules, etc.). Zach creates a centralized “.agentsrules/.agents” structure and points each tool’s config to it.
- •Problem: tool-specific rule files don’t scale across many AI tools
- •Solution: create a single canonical rules directory (e.g., .agentsrules / .agents)
- •Use concise “quick hits” docs that link to deeper standards
- •Each tool’s rules reference the same source of truth to minimize duplication
Domain-specific rules that improve output: feature flagging as a case study
Zach explains how specialized rules can eliminate common model confusion—like mixing up “feature flags in LaunchDarkly the product” vs. “feature flags in the code.” Clear rules improved reliability and enabled automation via MCP.
- •Models/tools can misunderstand domain terms without explicit guidance
- •Feature flagging rules clarify expected behavior, outputs, and links
- •Rules materially improved agent success on real tasks
- •Example: PM used Devin to put a change behind a flag via MCP and it worked end-to-end
How to build the first rules: let agents draft, then humans harden
Zach’s practical advice: bootstrap rules by asking agents to propose structure and docs, then review carefully. He also focuses rules on where engineers/agents routinely get stuck (e.g., test frameworks).
- •Start by asking agents to generate initial docs/rules (Devin wiki, Cursor, etc.)
- •Human review matters—verify repo-specific details and standards
- •Prioritize rules around frequent failure points (e.g., Jest vs. Vitest mismatch)
- •Iterate based on real friction observed in the team’s day-to-day work
Demo workflow: using Devin Wiki + Devin agent to create charting docs and rules
They demonstrate querying Devin Wiki for repo facts (charting libraries), then asking Devin to generate both human-readable documentation and an agent-facing rule. They also discuss practicalities like VM boot time and incremental setup in large repos.
- •Devin Wiki answers “what’s in the repo?” (e.g., Recharts, VisX, possible ECharts)
- •Devin agent generates markdown docs + rules files in the repo structure
- •Operational note: Devin boots a VM; slower than local IDE tools but more capable
- •Guideline: keep agent rules concise; link to full docs; consider chunking limits (~200 lines)
Centralizing knowledge across tools: reducing duplication between Devin knowledge and repo rules
Zach describes how Devin accumulates shared knowledge across sessions/users, but he deliberately points Devin back to the same repo-based rules/docs to avoid divergent “truths.” The goal is one consistent knowledge system for all tools.
- •Devin builds centralized knowledge over time; teams can accept suggested entries
- •Risk: duplicated or conflicting guidance across tools/memories
- •Strategy: make the repo’s .agents directory the primary source of truth
- •Outcome: consistent behavior across Devin, Cursor, Augment, and other agents
AI for tech debt: turning noisy test output into a prioritized burn-down plan
They shift to Zach’s favorite enterprise AI use case: accelerating tech-debt reduction. He shows a structured “migrations” checklist that quantifies issues (e.g., noisy test logs) and breaks them into prioritized, agent-executable tasks.
- •Problem: ~1,200 lines of noisy test output hide real failures and a11y issues
- •Workflow: capture logs → have AI cluster issues and identify worst offenders
- •Create a checklist file with tiers/priorities so any agent (or human) can execute next steps
- •Run tasks incrementally, review PRs, merge, and check off items—like an AI-ready project board
AI-assisted hiring quality: a custom GPT to improve interview scorecards + feedback
Zach explains a hiring workflow: a custom GPT evaluates the quality of interview scorecards against a rubric, highlights strengths/gaps, and drafts a tactful Slack message to coach interviewers. It scales consistent hiring practices without requiring the manager in every interview.
- •Rubrics help, but consistency requires coaching and feedback on scorecards
- •Custom GPT trained on rubric + examples rates scorecards (excellent/good/fair/poor)
- •Produces detailed improvement notes plus a ready-to-send Slack message
- •Side benefit: Zach improved his own scorecards by iterating with the tool
Wrap-up: the playbook + lightning round (favorite tool, when AI won’t cooperate)
Claire summarizes the overarching approach: experiment broadly, centralize context, and use AI for high-leverage work like docs, tech debt, and hiring operations. In the lightning round, Zach names Windsurf as most transformational and shares how he decides when to push AI vs. switch to manual work.
- •Playbook recap: experimentation + centralized docs/rules + AI-generated plans + human review
- •Windsurf is Zach’s favorite due to UX and agent workflow fit
- •When AI fails: either refine context or recognize it’s not the right task and do it yourself
- •Closing: where to find Zach (LinkedIn), LaunchDarkly hiring, and request for user feedback