AI Tools, Vibes, and Chaos: What’s Actually Helping Us Build Faster

Aug 20, 2025

•

0 Min Read

Jonathan Manalus

Product Design

Contents

Heading 2

Heading 3

Get updates into your inbox

If you're just jumping in, this post picks up from our anchor article, Too Fast, Too Curious, where we laid out the case for vibe-coding at Metronome. It covered our broader design-led push to ship faster and smarter by embedding AI into the product development cycle and introduced MDR (Design x AI), our playbook for turning insight into action.

This post is about the supporting AI tooling. The good, the bad, the occasionally broken.

Hey there 👋 I'm Jonathan, a product designer at Metronome focused on hybrid pricing models, integrations, billing, and data workflows. My interest in AI tooling started in my past few roles, where building automations showed me how much we could actually improve versus just accepting "That's how it's done." When I joined Metronome in March, I kept seeing design ideas that could be quick wins just sitting there. Coming from a variety of industries with some frontend experience, I wondered: what if I could just build them instead of leaving them on the shelf? That led to my first pull request, and soon after I was asked to take the first step in exploring AI tooling for the design org, to figure out what would actually work for our team. I dove in, testing everything from Claude to Cursor to Figma Make. Some tools stuck, some didn’t, but each gave us clearer direction on what role AI should play in our workflows. This post is a snapshot of those learnings and how we think about tool fit as part of the broader MDR strategy.

Who should be reading this? If you’re just starting your AI journey, maybe trying to figure out which tools to try, or how this actually fits into your workflow, this one’s for you. I’m writing from a design lens, but the learnings apply broadly to anyone outside of engineering trying to make sense of AI tooling in a real product environment.

But if you’ve ever tried to integrate AI into actual product workflows, you already know: it’s one thing to have the vision, another to pick the right tools, and something else entirely to get people to use them well.

We didn’t go into this expecting magic. We went in expecting to test, to break things, and to figure out what actually helps us move faster. Consider this our take on what’s worked best for us so far. The last of these assessments were done in June 2025, and as the tools evolve, so will our opinions. We plan to revisit and update as we learn more.

Claude Code: The Thoughtful Executor

Claude Code seemed out of step with where we were starting, geared toward technical users at a moment when we were still assessing our team’s coding fluency. Our early assessment nearly ruled it out: too terminal-heavy, no visual context, and a setup process that assumed command-line interface (CLI) fluency. For a design org just getting its hands into code, that was a lot to ask.

But we stuck with it.

What started as “maybe this is too technical for us” turned into one of our most trusted tools for multi-file changes, structured refactors, and vibe-coded delivery with actual confidence.

(Though confidence came with a learning curve.) Early on, I had Claude Code build out a feature for selecting rate cards in our product. Everything seemed to work perfectly—until I dug deeper into the code and realized it had hard-coded a specific rate card ID to make the timestamp-based selection appear functional. When I confronted it, Claude Code came clean: "Yes, you're absolutely right! The original implementation was 'working' because it had a hardcoded rate card ID... The timestamp-based selection was never actually working." It had essentially been faking it to make the feature seem complete.

But here's what earned our trust: it didn't deflect or make excuses. It walked through exactly what was broken, why it made that choice, and how to fix it properly. That methodical honesty, even when admitting mistakes, became Claude Code's signature move.

Why It Works

Claude Code isn’t flashy, but it is reliable. It doesn’t jump straight to writing code. It asks questions, reasons through structure, and proposes options. That methodical thinking helps us earn trust with the system and solutioning . Unlike other tools that start coding the moment you hit enter, Claude slows down just enough to produce cleaner output, better decisions, and fewer ESLint regrets.

It also plays well with Linear and our MCP setup: tickets can be scoped, pulled, and worked through systematically, ideal for async work and scaling safe contributions.

Where It Struggles

The terminal interface is still a steep hill for most designers (or non-technicals). It’s not where you go for visual polish or quick one-offs. You need local setup, command line familiarity, and a willingness to sit in a different mental model than Figma or Cursor.

It also struggles with visual context for non-technical users: while Claude Code can work with Figma MCP and screenshot uploads, the setup process isn't straightforward and requires more technical handholding to connect visual references like mockups or component context.

Our Take

Claude Code found its place in our toolkit not because it was the easiest but because it was the most trustworthy. It’s still not our recommended starting point for non-technical folks, but when the task calls for multi-step workflows, structured system changes, or careful planning and code review, it's hard to beat.

Cursor: The Fastest Way to Break (and Fix) Things

Cursor feels like the AI tool that most closely understands how non-technicals want to work: visual, direct, and fast. It’s hands-down the tool we recommend for getting your company started. It works like VS Code with an AI assistant built in and when you're focused on small UI improvements, it's shockingly effective.

That’s exactly where Cursor shines. Copy tweaks, spacing adjustments, empty state scaffolding. anything that’s scoped and visual is usually shippable in minutes. And because it gives you a real-time preview and lets you undo broken states, it encourages exploration without fear.

But with that speed comes chaos. The faster you move, the faster things can easily break, especially when trying to do anything larger than a scoped fix. In our early Cursor tests, even basic prompts had a tendency to run wild, making sweeping changes across the codebase. It was a bit of a shock. I once took down our entire alerts UI by just asking the question, “What does this [part of the code] do?” Instead of explaining, Cursor rewrote sweeping sections of the file, introducing new API hooks that went nowhere and breaking large chunks of the interface (Luckily, this was never pushed to production, but it was a sharp reminder that speed without guardrails can cause more damage than progress).

We quickly learned to introduce tighter guardrails and add more constraints to questions to keep things from spiraling. And even with more precise prompting, it’s easy to misstep if you don't pay attention. Prompts can still go sideways, states break unexpectedly, and debugging often becomes part of the job when tackling anything beyond the basics.

Why It Works

Great for scoped changes: Cursor is perfect for targeted polish, like aligning components or updating warnings.
High visual feedback: You see what you’re doing as you go, which builds confidence.
Encourages iteration: With quick preview and rollback, non-technicals feel safe experimenting.

Where It Struggles

Anything beyond minor changes: Complex features or multi-step flows often lead to junk code.
Prompting precision: One vague sentence can create six broken files. Prompts must be sharp.
Technical assumptions: Still requires local setup, some terminal comfort, and debugging intuition.

Our Take

Cursor is the fastest way to get started with vibe coding, but it’s also the fastest way to get in over your head. It's the tool we get everyone started with before moving over to Claude Code. We love it for small wins and high-velocity polish, but when things get complicated, it still demands prompting discipline and engineering safety rails. To help with that, we invested time upfront building a prompt library tailored to our product and patterns in a way to help contributors be productive from the jump. It works well, but it’s added overhead that teams should plan for if they want to scale this tool safely.

Figma Make: Rapid Prototyping With Guardrails

If Cursor is our go-to for polish and shipping real UI, Figma Make is where we go to explore ideas. It's fast, visual, and surprisingly powerful when you're trying to go from idea to layout without spinning up a full project. Your designers and PMs should live here when it comes to rapid strategy and discovery.

What makes Figma Make stand out is its ability to generate multiple UI concepts in minutes, using plain-language prompts and structural references. This makes it ideal for rapid ideation or prototyping early ideas, especially when speed matters more than pixel-perfection.

Ideally we want to use Claude to generate prompt structures, feeding it product context and design goals before transferring to Figma Make for execution. It doesn't always nail it on the first try, but it's fast enough that restarting is rarely a blocker.

Why It Works

Lightning-fast ideation: You can get multiple layout concepts in minutes.
Great for PMs and Designers: No code required to start exploring UI possibilities.
Supports experimentation: Easy to pivot and test new directions quickly.

Where It Struggles

Limited design system awareness: When we tested it, components didn't align with our visual language or tokens. However, Figma recently added the ability to bring style context from design libraries into Make, which we're excited to test in future iterations
Style consistency: Generated outputs often miss polish or alignment details.
Static outputs: Interactive prototypes are limited, and there's no integration with existing React components, preventing Figma Make from leveraging our component library to build higher-quality prototypes.

Our Take

Figma Make is a powerful tool for getting unblocked during ideation. We don't expect it to deliver production-ready output, but that's not the point. It can help you move from "We're stuck" to "Here's something we can react to." When paired with a structured prompting flow and a clear product context, it can reduce the time it takes to go from a doc to a tangible design idea.

Google Stitch: A Demo, Not a Workflow

(Note: we tested this tool shortly after it launched. We haven’t revisited it since, and it may have improved—something we plan to reevaluate in the future.)

Of all the tools we tested, Google Stitch generated the most initial interest from our design team because it promised to generate Figma mockups directly from prompts—something Figma had demonstrated but removed after their 2024 launch.

When we say single screen, we mean it. Stitch generates one static screen at a time, with no way to build flows or connected experiences. It struggled with layout, color, responsiveness, and had no understanding of component structure or design systems. The Figma export technically worked, but most of what we got needed to be rebuilt from scratch.

We tried giving it screenshots of existing designs, prompting themes, and layering API integrations. Sometimes it worked. Sometimes it gave us a grey box. The edit tools were also extremely limited: a small set of fonts, color pickers, and almost no layout control.

Why It Doesn't Work (Yet)

Slow to generate: 2–3 minutes for each screen
Disconnected output: No flows, no state, no component reuse
Poor design quality: Layout issues, visual inconsistency, no system alignment
Low interaction fidelity: No real prototyping or preview capability

Our Take

Stitch feels more like a showcase for Gemini than a production tool. While it might be useful for quick demo screens or product marketing one-offs, it’s not something we recommend integrating into real design workflows. Not yet.

We’ll keep an eye on it, but for now, there are better ways to generate UI, faster, and with way more control.

Wrapping It Up

Look, every team’s stack will look different, but here’s what we’ve found: the best tools should meet you where your company and team is, then help you move faster. That doesn’t always mean the most powerful tool wins. It means the one that builds trust, fits your workflows, and leaves enough room to experiment without taking the whole app down with it.

We started by asking one question: What if we could remove friction without waiting in line for resources? That’s still the goal. And as our toolset matures, we’ll keep refining our approach.