This will be the last post on vibe coding for now, I promise.. ( at least about Google Gemini 2.5 Pro Exp ). I did some vibe coding every weekend in April, just to get a change of pace from work (and for science), starting with ‘what if I could not code’ experiment (not great success), and finishing with two probably useful tools that I wanted.

Last week Google made Gemini 2.5 Pro Exp flash available commercially, and reduced the free input token rate limit per day quite a lot. The new limits are (as of now) million input tokens, 25 requests per day (no idea about output tokens). Single request maximum size is probably still? 250k tokens (I hit it couple of times earlier, not sure if it was reduced as most recent project was smaller and I didn’t get beyond 100k token requests).

Anyway, this made it too slow for my vibe coding weekends (hit rate limits hard last weekend), but as a recap what I got done (LoCs are what is in git; all have lots of iterations so the repo churn is a lot more):

  1. Data format converter (unfinished - see earlier post - it seemed to be underspecified) - 800 LoC
  2. DNCP/SHSP2 protocol implementation (unfinished - see previous post - fingon/go-dncp: DNCP implementation in Go ) - 4k LoC
  3. Matrix backup tool ( in use by me - https://github.com/fingon/go-matrixbackup ) - ~1k LoC
  4. SSS memory vault ( hopefully soon in use by me - not quite finished - fingon/sssmemvault: Shamir’s Secret Sharing based in-memory secret vault ) - ~2,3k LoC

Once the input token limit went down, doing more than couple of iterations on big codebase per day was no longer feasible (this is why I gave up on finishing DNCP, and worked on other tools I wanted).

Of course, the token costs are still reasonable and if I feel the urge I will definitely just pay for it (or something else). Writing some random open source software while being trendy ‘vibe coder’ felt like good idea at the time, but once I start hitting rate limits too much, it is not worth it for me anymore.

Lessons learned

Context size (and its recall) matters

High functioning model with large context and reasonably good recall of whole input data is really game changer. I tossed up to 150k tokens per request in to Gemini, and it used them all quite well to produce outputs (either e.g. major rewrites of codebase, or implementing lots of stuff from scratch based on e.g. IETF RFCs). For example, as I was pretty sure the Go Matrix library I wanted to use was not part of Google training corpus, I fed most of its source as input to the Go matrix backup tool, and told aider to write what I wanted with relatively detailed prompts.

Prompt adherence is hard even for Gemini 2.5

I fought with the model quite a bit on producing the kind of code I wanted; while for most part it seemed to obey happily, some unorthodox choices I prefer were harder for it as they were not in the training data.

Produced code quality leaves still a lot to be desired out of the box

The code had nasty habit of producing crazily long functions with insane nesting depth of if-else-if-else branches. While the code was mostly fine (it worked), it was horrible to review and potentially manually maintain later on.

Having said that, with prompts, Gemini improved the code based on subsequent requests.

Sometimes you should or must get your hands dirty

There were some relatively simple bugs that Gemini introduced and could not fix, despite repeated prompts, at least not correctly. Similarly Go import statement handling was hard for it, and some other syntax error causing things - I could either rerun it (and wait for fix to happen), or fix imports/syntax errors manually (or use goimports) which was often faster and especially given the free tier rate limits ‘the way’ to get as much as possible done during the weekend.

Reflections

I have written code for a long time. I think this vibe coding exercise is the first time I felt that my primary role was not actually writing the code, but instead guiding a fast but very stupid intern to write the code (I have dabbled with various tools before too). The outcome was quite awesome in hindsight - typically I perhaps write hundreds to at most thousand lines of hobby project a day, but with the help of Aider+Gemini, I got a lot more done during the weekends allocated to the experiment, and probably with only one work day worth of time used per weekend.

Still, if I did not know how to code, I could not guide the model (aka stupid intern) to produce what I wanted, or have it fix either implementation, or its bugs (correctly). So it remains to be seen how well these ‘coding is dead’ stories pan out, I remain skeptical. For example, at work, I have been working on a code generator on and off since December, and for that, the models have not been that helpful.