Vibe coding: the good, the almost, and the @#$%**
generative AI developmentPublic workshops: Designing microservices: responsibilities, APIs and collaborations:
- Explore DDD, April 14-15, Denver Learn more
- DDD EU, June 2-3, Antwerp, Belgium Learn more
I’ve been using Github Copilot for a while now and find it to be fairly useful, albeit in a limited way. More recently, I started using a couple of the more powerful AI coding agents: Junie by Jetbrains and Cursor.
The results so far have been mixed. I’ve defined the following rating system:
- good - completed a task perfectly
- almost - completed most of the task, but then got lost
- @#$%** - behaved like a hungover intern who was completely lost
This is a work in progress document that attempts to describe my experiences with these coding agents. I will write it in fairly random order based on the development work that I’ve been doing.
TL;DR
- AI coding agents can often make you (feel?) more productive
- Software development jobs are safe
My projects
I’ve used coding agents on a few different projects:
- Eventuate Springwolf (Async API) support
- Customers and Orders Spring Authorization Server
- Customers and Orders UI + Backend for Frontend (BFF)
The last project is the closest to ‘vibe coding’ since I know very little about NextJS, React and TypeScript.
The good
In this section, I’ll describe when the AI coding agents worked well.
Using Junie to create a Gradle plugin
One of the biggest successes was using Junie to create a Gradle plugin for the Eventuate Springwolf project.
The goal was to run npx @asyncapi/cli validate
to validate the springwolf.json
JSON files created by various tests.
First version of the plugin and task
Here are the prompts that I gave Junie. The first prompt created the basic plugin:
Craete a plugin called ValidateAsyncAPI that defines a validateAsyncAPI task that runs asyncapi validate on the build/springwolf.json
The second prompt made the task incremental:
Enhance the plugin to support incremental execution - only run it if build/springwolf.json has changed since the last time it was run. define build/springwolf.json as an input and use gradle’s builtin incrementallism
The changes that Junie made are part of this commit and co
As far as I remember, Junie created a plugin that worked without any manual changes.
Second version
Later I changed some tests to generate multiple springwolf*.json
files.
I asked Junie to enhance the plugin to validate all of them.
Here’s the prompt:
The specificationFile is now a collection of files matching this pattern build/springwolf*.json
The changes that Junie made are part of this commit. Here’s the updated plugin. Most notably:
- The task’s input changed from a
File
to aFileCollection
- The task
exec()
method was changed to iterated over the files in the collection and rannpx @asyncapi/cli validate
on each one
Thoughts
This was one of the few tasks that a coding agent was able to complete without help. I suspect that Junie was successful, is because the task of creating a Gradle plugin is well documented and fairly stable - there haven’t been radical changes from one Gradle version to the next.
The almost
In this section, I’ll describe when the AI coding agents wrote some useful code but were unable to complete the task.
Creating the Customers and Orders BFF
project
The Customers and Orders BFF
project is a NextJS/React project that uses TypeScript.
I know very little about NextJS, React and TypeScript, so I thought it would be a good project to use a coding agent for.
I gave Cursor
the following prompt:
You are tasked with creating the foundation for a NextJS project that will serve as both the frontend and BFF layer. Please generate a basic NextJS project using Create Next App that includes:
- A basic pages structure (e.g., index.js, _app.js).
- Jest and React Testing Library configuration with at least one simple test (e.g., verify that the homepage renders correctly).
Ensure that the project builds successfully and that the initial test passes.
Sadly, I’ve lost the Cursor session, but as far as I remember, Cursor completed a large part of this task. But it got very confused, however, when completing the configuration of the project.
Confused about Swc vs Babel
Initially, Cursor configured the project to use swc - apparently that’s required by NextJS. But then while attempting to fix some issue, Cursor repeatedly tried to configure the project to use babel. I eventually gave up trying to use Cursor and manually fixed the project.
Thoughts
I suspect that Cursor was confused because it was trained on content that primarily described older versions of NextJS that used Babel. My sense is that the constant evolution of the JavaScript ecosystem is a challenge for both humans and AI coding agents. For AI coding agents, it’s probably worse since they don’t actually know anything. Their “next likely token” prediction algorithm struggles.
Displaying the customer’s orders in the NextJS/React UI
Later, I wanted the React UI to display a customer’s orders. I gave Junie the following prompt:
Below is the OpenAPI specification for the Order Service.
Change
to contain a table showing the results of calling the GET /orders endpoint. Add necessary route etc {"openapi":"3.0.1","info":{"title":"OpenAPI
Here’s the commit. On the one hand, I was impressed. Junie generated a lot of code that, given my limited knowledge of NextJS/React, I would have struggled to write. But on the other hand, the code contained errors.
The errors
Some errors were basic things, such as the wrong path for some imports.
The newly added end-to-end were wrong because they went through the login flow even though the user was already logged in.
Also, the handler for the REST endpoint /orders
in the BFF wasn’t quite right for reasons I cannot explain.
Fixing the errors
Here’s the commit containing the manual fixes for that problem.
It made the GET /orders
route look similar to another API route that was previously generated by Junie.
This commit:
- moved the route from
app/api/orders/route.ts
topages/api/orders.ts
. - Changed the handler’s signature
- changed the handler to use
getToken({ req, secret: process.env.NEXTAUTH_SECRET });
to obtain the JWT to pass to theOrder Service
At the time, I didn’t investigate why these changes worked, but they did.
New API routes in Next.JS 13+
I subsequently learned from Github Copilot that pages/api
is the pre-Next.JS 13 way of defining API routes and app/api
is the new way.
Here’s the prompt I gave Copilot:
What’s the different between these two file paths: app/api/orders/route.ts and pages/api/orders.ts.
Ironically, my fix consisting of reverting the GET /orders
handler to the pre-Next.JS 13 way of defining API routes.
Refactored the NextJS/React end-to-end tests to use the page object pattern
The handful of tests were getting pretty messy, and so I asked Junie to refactor them to use the page object pattern. Here’s the prompt:
Refactor the tests in tests/e2e to use the page object pattern
It did an excellent job.
See this commit.
But, there were some minor annoyances.
First, it refused to learn how to run the end-to-end tests to validate the changes (npm run e2e-test
).
It repeatedly ran some other command that never worked.
I later learned that it’s best if the prompt includes explicit instructions:
Run unit tests:
npm run unit-test
Run end-to-end tests:npm run e2e-test
Second, it generated code like this:
const signinStatus = await homePage.getSignInStatus();
expect(signinStatus).toBe('Signed in as user1');
What I wanted was:
await homePage.expectSignInStatusToBe('Signed in as user1');
But overall, Junie was quite useful.
The @#$%**
In this section, I will describe scenarios when the AI coding agent behaved as if were a hungover intern who was completely lost.
Customizing Spring Authorization Server to return the user’s name and email in the JWT
I wanted to customize the Spring Authorization Server to return the user’s name and email as claims the JWT. I gave Github Copilot the following prompt:
can Spring authorization server store the user’s name and email and return that in the JWT?
On the one hand, Github Copilot correctly recommended using a OAuth2TokenCustomizer
.
But it was very wrong about practically everything else:
- The type parameter. It suggested
OAuth2TokenCustomizer<OAuth2TokenClaimsContext>
rather thanOAuth2TokenCustomizer<JwtEncodingContext>
. - The API for registering claims. It suggested
claims.put()
rather thanclaims.claim()
. - Since the type parameter was incorrect, the
OAuth2TokenCustomizer
was not used. Copilot recommended numerous incorrect solutions including imaginary APIs for configuring Spring Security.
In the end, I read the manual.
Killing processes to resolve port conflicts
In the BFF project, I have a mock server that runs on port 3001 that implements the GET /orders
endpoint.
The end-to-end tests also start and use this mock server.
Consequently, if the mock server is already running, the tests fail to run.
Previously, Junie would fix the test failure by executing kill -9 $(lsof -t -i:3001)
and rerunning the tests.
Somehow that seemed fine, even amusing, especially since I would have done the same thing.
But today it added the following code to the end-to-end tests:
await new Promise<void>((resolve) => {
exec('lsof -ti:3001 | xargs kill -9', () => {
// Ignore error as it means no process was running on that port
resolve();
});
});
This feels potentially more damaging.
Let’s just hope that a coding agent doesn’t resolve some other problem with a truly destructive command, such as rm -fr .
.
It’s important to carefully review the code that a coding agent generates.
Some thoughts about the user experience
I have a few concerns about the UX.
Disruption of flow
First, coding agents can be slow. They often take a lot longer than 10 seconds to complete a task. The 10 second limit is important because UX research shows that it’s difficult for users to main focus for longer than that. The agent can, in other words, interrupt the user’s flow - something that is critical for productivity.
An agent needs its own workspace
Second, a key benefit of an assistant is that you can delegate a task and then let them complete it while you work on something else. The problem is that agents like Junie run in your IDE. While I suppose I could work on some other task within the IDE, I want to keep their changes separate from mine. I want the agent to run on a separate copy of the code base, and produce the equivalent of a pull request (perhaps more lightweight) that I can review and merge.
Why only one agent?
Third, why stop with just one agent? I’d like to be able to run multiple agents in parallel working on different subtasks. Apparently, that’s coming soon according to this fascinating article by Steve Yegge. Having said that, a human has cognitive limits and so there is a limit to agents a single developer can use.
Final thoughts so far
- These tools are certainly useful.
- Will they replace developers? No.
- Will they make developers more productive? Yes.
- Coding agents need to automatically run tests (and other commands) so they can fix their mistakes. Manually approving every command execute is slow and tedious. Junie calls this Brave mode! I hope it can’t do anything too dangerous.
- I suspect the confusion-in, confusion-out principle applies. If a coding agent’s training data is confusing, there’s a good chance it will be too.
- Using an agent to solve a problem means that you have an additional problem: getting the prompts right.
- You need to understand and verify every single change they make
- Once the tests pass, run
git commit
and, ideallygit push
, to save your work.
Need help with accelerating software delivery?
I’m available to help your organization improve agility and competitiveness through better software architecture: training workshops, architecture reviews, etc.