Llm | Randomly walking in the technical wilderness

HomeNetFlow and facelift experiment feat. Kimi K2.6, Minimax M2.5, and GPT-5.4

Background I have been working on number of vibe coded apps recently after I gave up on my most recent startup, and before starting to work somewhere elsewhere. So I have done lots of projects I have never had the time for before. HomeNetFlow Lixie’s first iteration that I wrote by hand in 2024 ( see Observability at home ) was and is still useful. It is still only about categorizing log lines, by hand, and then having those rules applied to logs at scale using vector. My home infrastructure uses rules generated by it still, and I look at the logs quite often (that are filtered based on those rules). ...

Custom coding agent sandboxing with nono

I have been ‘enjoying’ some awkward moments with default sandboxing of Codex CLI, Claude Code, and (lack of it in) OpenCode. I settled on a single sandboxing solution, and it is nono - Next-Generation Agent Security. The problem All coding agents use current shell environment to launch the tools they use. Those with sandboxing have rules which are set up to prevent ‘invalid’ use, but in practice they break often and are painful (or not possible) to configure properly per tool. ...

ChatGPT Plus - still worth it these days?

Updated 11.4.2024: Added second day case too without using fast mode. There has been a lot of brouhaha in e.g. Reddit about recent (10.4.2026) changes to plans: OpenAI announced new 100€ ‘pro lite’ tier, and there was worry that ‘plus’ tier was cannibalized for it. So I decided to use my 5 hour quota using fast mode gpt-5.4 (high for planning, medium for implementation), and then subsequently second time without fast mode, same model and thinking parameters. ...

A month of Codex CLI with ChatGPT Plus

I started converting some of my hobby projects using Codex to Go exactly month ago, and developing two iOS apps after initial honeymoon period with Codex CLI was done (I felt it to be good enough in the trial conversion effort). Here’s some notes of my experiences and a bonus rant near the end.. My thoughts about Codex CLI and OpenAI models in general Overall it has been pretty good. While they are clearly iterating on how e.g. sub-agents work, the basic flow is pretty robust. It is annoying that you cannot use separate model for planning and build stages though - just different thinking levels. I have wound up using ‘high’ for planning and ‘medium’ for implementation for most part, most recently using GPT 5.4. ...

First impressions: GPT 5.4 + GPT 5.3 codex spark!

After brew upgrade (and GPT 5.4 announcement last night) I was excited to see what was up. Interesting. Try 1 I did one minor feature using planning mode (gpt-5.4 high) and implemented it using gpt-5.3 codex spark. Codex spark is backed by super-fast Cerebras megachips, so I was optimistic about the speed (if not quality). Observations: REALLY fast (couple of seconds to implement something that was couple of hundred lines long) Bit stupid, or literal? It did not run unit tests by default Unit tests had an error, which it fixed It had used 62% of its context by this time(!) - seems like order of magnitude less context than 5.3 Codex Try 2 Without planning mode, on high, I asked it to refactor logging in the app I am working on. It actually did what was asked for, but again did not run tests as I did not specifically ask for it to run them too. The outcome was also pretty ugly as it did not look at the API surface of the logging library (or know it), so I had to send gpt-5.4 after it to clean up. ...

Another day, another vibe coded tool (ta-export)

I spent some time today finally creating a (publicly available) export handling code for Time Atlas app. While I enjoyed using (and developing) the app to some extent, it no longer fulfills some of my wishes and I no longer work there, so I deleted the app (and no longer have access to internal tooling for dealing with GPX exports). What could be better in the app I miss the old comments to journal notes that pre-1.0 versions had (this export tool preserves them) The battery usage was and is bit janky (and could be optimized, but has not been) GPS track (when actually tracking) is suboptimal compared to what is available (I rather filter stuff myself, than get filtered data for my long term storage; storing infrequent points isn’t really helping battery usage much, compared to just keeping the radio off when it is not really needed). Storage is cheap, but if you don’t have the (original) data, you don’t have the option to refine the data further later on if you come up with better analysis algorithm or more ways to use the raw data. Switch .. somewhere .. This week I figured I might as well switch to another GPS tracker app, and move my journal entries back to Bear, where I have 27 years of journal entries and some posts of some of my blogs as well. ...

Developing stuff with LLMs on your own credit card

After a week of agentic coding on my own credit card, here’s some notes, mostly for to organize my own thoughts. I don’t like term vibe coding, although most of this code is only cursorily reviewed as 29k LoC produced this week would be full-time job just to read through once or twice. OpenCode Zen (free tier) As of today, it has 3 models: Big Pickle (unknown, presumably older GLM?) Minimax M2.5 - 230B MoE model Trinity Large Preview from Arcee AI - 400B MoE model I am not sure if they are quantified for economical reasons or not, but at least earlier free GLM 5 did not perform very well in my tests last week. Trinity did not impress me either (it seemed quite slow and less capable than Minimax), but Minimax I am using as my daily driver when I don’t need more powerful model. ...

filemirror vibe coding

This is my experiment to see how far you can get using free models that are currently available in the OpenCode Zen. As they are (mostly) open weights, I could also run them locally if I felt like it, someday. What was available today: GLM-5 (with some rate limiting, unfortunately) MiniMax M2.5 Big Pickle (?) Trinity Large Preview (?) Scaffolding I copy pasta’d some configuration files (e.g. Makefile, .golangci-lint.yml) from another project, wrote README.md (which is bit inconsistent with the prompt by design, to see which model honors or if it asks questions), and also left one error in place in the Makefile (lint target doesn’t work due to depending on another target which is no longer in the file). ...

Beer consumption analysis using LLMs

I have been working on a life tracking app since last year. To analyze the data I have logged using it, I queried it for ‘beer in 2025’ and analyzed results. The dataset itself I will not publish here, but there are three types of relevant data there (in parentheses how they are encoded in the Markdown output that I pass to the LLMs): Place visits involving beer ( e.g. * 2 hours spent in <insert pub here>) Journal entries mentioning beer ( e.g. I had beer and pizza for lunch) Explicitly counted beer logging ( e.g. - 3 count beer) Baseline - shell egrep 'count beer$' 20250528-beer.md | cut -d ' ' -f 2 | awk '{sum += $1} END {print sum}' 17 So the expectation is that the number should be AT least 17 beers, but ideally more, as there are some journal entries which mention beer. ...

April vibe coding summary

This will be the last post on vibe coding for now, I promise.. ( at least about Google Gemini 2.5 Pro Exp ). I did some vibe coding every weekend in April, just to get a change of pace from work (and for science), starting with ‘what if I could not code’ experiment (not great success), and finishing with two probably useful tools that I wanted. Last week Google made Gemini 2.5 Pro Exp flash available commercially, and reduced the free input token rate limit per day quite a lot. The new limits are (as of now) million input tokens, 25 requests per day (no idea about output tokens). Single request maximum size is probably still? 250k tokens (I hit it couple of times earlier, not sure if it was reduced as most recent project was smaller and I didn’t get beyond 100k token requests). ...