Gemini's 3 line execution mode

I had a multi-hour conversation with a single Gemini session and noticed that the responses dropped in quality and started coming back very fast. I’m familiar with context limits and context poisoning, but the sudden brevity still surprised me. I asked the agent what was going on and found this: It turns out that deep into a work session, 3-line responses are a feature, not a bug. Nothing too deep here, just a fun way to accidentally stumble onto a technical detail through a random observation.

January 6, 2026 · 1 min · 86 words

TIL: The two phases in LLM inference

This is a TIL of a TIL. Simon Willison wrong of using an NVIDIA DGX Spark with an Apple Mac Studio for faster inference, with all the details here. He talks about the two phases of inference: Prefill: Influence Time-To-First-Token (TTFT) Decode: Influence Tokens Per Second (TPS) Prefill Read the prompt Build a Key-Value Cache for each transformer layer in the model Bound by compute as it initializes the model’s internal state and does a lot of matrix multiplication. ...

November 2, 2025 · 1 min · 107 words