Skip to content
How I AIHow I AI

Successfully coding with AI in large enterprises: Centralized rules, workflows for tech debt, & more

Zach Davis is a product-minded engineering leader and builder at heart, with over 12 years of experience building high‑performing teams and crafting developer tools at companies like Atlassian and LaunchDarkly. In this episode, he shares how he’s helping his 100-plus-person engineering team successfully adopt AI tools by creating centralized documentation, using agents to tackle technical debt, and improving hiring processes—all while maintaining high quality standards in a mature codebase. *What you’ll learn:* 1. How to create a centralized rules system that works across multiple AI tools instead of duplicating documentation 2. A systematic approach to using AI agents like Devin and Cursor to analyze and reduce test noise in large codebases 3. How to leverage AI tools to document your codebase more effectively by extracting knowledge from existing sources 4. Why “what’s good for humans is also good for LLMs” should guide your documentation strategy 5. A custom GPT workflow for improving interview feedback quality and coaching interviewers 6. How to approach tech debt reduction with AI by creating prioritized task lists that both humans and AI agents can work from *Brought to you by:* WorkOS—Make your app enterprise-ready today Lenny’s List on Maven—Hands-on AI education curated by Lenny and Claire *Where to find Zach Davis:* LaunchDarkly: https://www.launchdarkly.com LinkedIn: https://www.linkedin.com/in/zach-davis-28207195/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to Zach Davis (02:44) Overview of AI tools used at LaunchDarkly (04:00) The importance of having someone responsible for driving AI adoption (05:44) Why vibe coding isn’t acceptable for enterprise development (06:42) Making engineers successful with AI on their first attempt (07:55) Creating centralized documentation for both humans and AI agents (10:19) Using feature flagging rules to improve AI outputs (12:33) Advice for getting started with rules (14:28) Demo: Setting up Devin’s environment in a large codebase (24:33) Devin’s plan overview (27:55) Demo: Creating a prioritized tech debt reduction plan (36:40) Demo: Using AI to improve hiring processes and interview feedback (40:34) Summary of key approaches for integrating AI into engineering workflows (42:08) Lightning round and final thoughts *Tools referenced:* • Cursor: https://www.cursor.com/ • Devin: https://devin.ai/ • ChatGPT: https://chat.openai.com/ • Claude: https://claude.ai/ • Windsurf: https://windsurf.com/ • Lovable: https://lovable.dev/ • v0: https://v0.dev/ • ChatPRD: https://www.chatprd.ai/ • Figma: https://www.figma.com/ • GitHub Copilot: https://github.com/features/copilot *Other references:* • Jest: https://jestjs.io/ • Vitest: https://vitest.dev/ • MCP: https://www.anthropic.com/news/model-context-protocol • Confluence: https://www.atlassian.com/software/confluence _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire VohostZach Davisguest
Jul 21, 202544mWatch on YouTube ↗

CHAPTERS

  1. Enterprise AI coding vs. “vibe coding”: why guardrails matter at scale

    Claire and Zach set the tone: what works for solo projects doesn’t translate to a 100+ person org shipping to a mission-critical platform. They frame the episode as operationalizing AI for real engineering teams—quality, maintainability, and consistency first.

    • Vibe coding can be fine for side projects but breaks down in enterprise codebases
    • Large teams need repeatable workflows and shared standards, not individual hacks
    • AI adoption must preserve code quality, security, and long-term maintainability
    • Goal: make AI effective inside real constraints (reviews, tests, conventions, scale)
  2. AI tool stack at LaunchDarkly: a multi-tool reality across roles

    Zach lists the rapidly expanding set of AI tools being used across design, product, and engineering. The key takeaway is that enterprises won’t standardize on one tool immediately—so workflows must tolerate heterogeneity.

    • Design tools: Lovable, v0, Figma Make (exploration mindset)
    • Product: ChatPRD
    • Engineering: Cursor (heavy), Devin (heavy), plus background agents; some use Windsurf
    • Trials: Augment, Claude Code; PR review via Copilot and Cursor
    • Implication: rules and best practices must work across multiple tools/IDEs
  3. Driving adoption: why a named owner is essential for org-wide change

    They discuss why AI enablement needs a responsible driver—someone close to the code who is actively testing what works. Without ownership, adoption becomes fragmented and skepticism hardens after early failures.

    • A dedicated adoption “driver” (not necessarily authoritarian “in charge”) increases success
    • Being close to the code is critical to diagnose failures and improve workflows
    • Early skepticism came from mismatch between small-app success and monorepo reality
    • Objective: turn experimentation into operationalized, repeatable practice
  4. First-time success strategy: turning skeptics into believers

    Zach emphasizes making engineers successful on their first attempt with AI tools. If the first experience is negative, engineers use it as proof the tools don’t work—so the org must engineer the “aha moment.”

    • Engineers’ first AI attempt often fails without guardrails/context
    • Negative first impressions create long-lived resistance
    • Scaling requires a system, not each engineer “on their own journey”
    • Enablement focus: make success the default experience
  5. Repo as the source of truth: docs moved into the codebase for humans + LLMs

    Zach’s foundational move: bring scattered documentation from Confluence/Docs into the repository. This improves human onboarding and gives LLMs direct access to the same authoritative guidance.

    • “What’s good for humans is also good for LLMs”
    • Centralize docs in a /docs directory inside the repo (not scattered across tools)
    • Examples: frontend organization, accessibility, JS style guide
    • Outcome: better default context for agents and developers
  6. Centralized rules architecture: one canonical rule system for many AI tools

    They show how to avoid duplicating rules across tool-specific formats (Claude.md, Cursor rules, etc.). Zach creates a centralized “.agentsrules/.agents” structure and points each tool’s config to it.

    • Problem: tool-specific rule files don’t scale across many AI tools
    • Solution: create a single canonical rules directory (e.g., .agentsrules / .agents)
    • Use concise “quick hits” docs that link to deeper standards
    • Each tool’s rules reference the same source of truth to minimize duplication
  7. Domain-specific rules that improve output: feature flagging as a case study

    Zach explains how specialized rules can eliminate common model confusion—like mixing up “feature flags in LaunchDarkly the product” vs. “feature flags in the code.” Clear rules improved reliability and enabled automation via MCP.

    • Models/tools can misunderstand domain terms without explicit guidance
    • Feature flagging rules clarify expected behavior, outputs, and links
    • Rules materially improved agent success on real tasks
    • Example: PM used Devin to put a change behind a flag via MCP and it worked end-to-end
  8. How to build the first rules: let agents draft, then humans harden

    Zach’s practical advice: bootstrap rules by asking agents to propose structure and docs, then review carefully. He also focuses rules on where engineers/agents routinely get stuck (e.g., test frameworks).

    • Start by asking agents to generate initial docs/rules (Devin wiki, Cursor, etc.)
    • Human review matters—verify repo-specific details and standards
    • Prioritize rules around frequent failure points (e.g., Jest vs. Vitest mismatch)
    • Iterate based on real friction observed in the team’s day-to-day work
  9. Demo workflow: using Devin Wiki + Devin agent to create charting docs and rules

    They demonstrate querying Devin Wiki for repo facts (charting libraries), then asking Devin to generate both human-readable documentation and an agent-facing rule. They also discuss practicalities like VM boot time and incremental setup in large repos.

    • Devin Wiki answers “what’s in the repo?” (e.g., Recharts, VisX, possible ECharts)
    • Devin agent generates markdown docs + rules files in the repo structure
    • Operational note: Devin boots a VM; slower than local IDE tools but more capable
    • Guideline: keep agent rules concise; link to full docs; consider chunking limits (~200 lines)
  10. Centralizing knowledge across tools: reducing duplication between Devin knowledge and repo rules

    Zach describes how Devin accumulates shared knowledge across sessions/users, but he deliberately points Devin back to the same repo-based rules/docs to avoid divergent “truths.” The goal is one consistent knowledge system for all tools.

    • Devin builds centralized knowledge over time; teams can accept suggested entries
    • Risk: duplicated or conflicting guidance across tools/memories
    • Strategy: make the repo’s .agents directory the primary source of truth
    • Outcome: consistent behavior across Devin, Cursor, Augment, and other agents
  11. AI for tech debt: turning noisy test output into a prioritized burn-down plan

    They shift to Zach’s favorite enterprise AI use case: accelerating tech-debt reduction. He shows a structured “migrations” checklist that quantifies issues (e.g., noisy test logs) and breaks them into prioritized, agent-executable tasks.

    • Problem: ~1,200 lines of noisy test output hide real failures and a11y issues
    • Workflow: capture logs → have AI cluster issues and identify worst offenders
    • Create a checklist file with tiers/priorities so any agent (or human) can execute next steps
    • Run tasks incrementally, review PRs, merge, and check off items—like an AI-ready project board
  12. AI-assisted hiring quality: a custom GPT to improve interview scorecards + feedback

    Zach explains a hiring workflow: a custom GPT evaluates the quality of interview scorecards against a rubric, highlights strengths/gaps, and drafts a tactful Slack message to coach interviewers. It scales consistent hiring practices without requiring the manager in every interview.

    • Rubrics help, but consistency requires coaching and feedback on scorecards
    • Custom GPT trained on rubric + examples rates scorecards (excellent/good/fair/poor)
    • Produces detailed improvement notes plus a ready-to-send Slack message
    • Side benefit: Zach improved his own scorecards by iterating with the tool
  13. Wrap-up: the playbook + lightning round (favorite tool, when AI won’t cooperate)

    Claire summarizes the overarching approach: experiment broadly, centralize context, and use AI for high-leverage work like docs, tech debt, and hiring operations. In the lightning round, Zach names Windsurf as most transformational and shares how he decides when to push AI vs. switch to manual work.

    • Playbook recap: experimentation + centralized docs/rules + AI-generated plans + human review
    • Windsurf is Zach’s favorite due to UX and agent workflow fit
    • When AI fails: either refine context or recognize it’s not the right task and do it yourself
    • Closing: where to find Zach (LinkedIn), LaunchDarkly hiring, and request for user feedback

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.