We put Claude Code in Rollercoaster Tycoon

(labs.ramp.com)

526 points by iamwil25 days ago

39 comments

ninkendo20 days ago
Related:I’ve always found it crazy that my LLM has access to such terrible tools compared to mine.It’s left with grepping for function signatures, sending diffs for patching, and running `cat` to read all the code at once.I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc.Is anyone working on making it so LLM’s get better tools for actually writing/refactoring code? Or is there some “bitter lesson”-like thing that says effort is always better spent just increasing the context size and slurping up all the code at once?
- nbardy20 days ago
 > Claude Code officially added native support for the Language Server Protocol (LSP) in version 2.0.74, released in December 2025.I think from training it's still biased towards simple tooling.But also, there is real power to simple tools, a small set of general purpose tools beats a bunch of narrow specific use case tools. It's easier for humans to use high level tools, but for LLM's they can instantly compose the low level tools for their use case and learn to generalize, it's like writing insane perl one liners is second nature for them compared to us.If you watch the tool calls you'll see they write a ton of one off small python programs to test, validate explore, etc...If you think about it any time you use a tool there is probably a 20 line python program that is more fit to your use case, it's just that it would take you too long to write it, but for an LLM that's 0.5 seconds
 - frumplestlatz20 days ago
 > but for LLM's they can instantly compose the low level tools for their use case and learn to generalizeHard disagree; this wastes enormous amounts of tokens, and massively pollutes the context window. In addition to being a waste of resources (compute, money, time), this also significantly decreases their output quality. Manually combining painfully rudimentary tools to achieve simple, obvious things -- over and over and over -- is *not* an effective use of a human mind or an expensive LLM.Just like humans, LLMs benefit from automating the things they need to do repeatedly so that they can reserve their computational capacity for much more interesting problems.I've written[1] custom MCP servers to provide narrowly focused API search and code indexing, build system wrappers that filter all spurious noise and present only the material warnings and errors, "edit file" hooks that speculatively trigger builds before the LLM even has to ask for it, and a litany of other similar tools.Due to LLM's annoying tendency to fall back on inefficient shell scripting, I also had to write a full bash syntax parser and shell script rewriting ruleset engine to allow me to silently and trivially rewrite their shell invocations to more optimal forms that use the other tools I've written, so that they don't have to do expensive, wasteful things like pipe build output through `head`/`tail`/`grep`/etc, which results in them invariably missing important information, and either wandering off into the weeds, or -- if they notice -- consuming a huge number of turns (and time) re-running the commands to get what they need.Instead, they call build systems directly with arbitrary options, | filters, etc, and magically the command gets rewritten to something that will produce the ideal output they actually need, without eating more context and unnecessary turns.LLMs benefit from an IDE just like humans do -- even if an "IDE" for them looks very different. The difference is night and day. They produce vastly better code, faster.[1] And by "I've written", I mean I had an LLM do it.
 - forty20 days ago
 Note that the Claude code LSP integration was actually broken for a while after it was released, so make sure you have a very recent version if you want to try it out.However as parent comment said, it seems to always grep instead, unless explicitly said to use the LSP tool.
 - cududa20 days ago
 Correct. If you try to create a coding agent using the raw Codex or Claude code API and you build your own “write tool”, and don’t give the model their “native patch tool”, 70%+ of the time it’s write/ patch fails because it tries to do the operation using the write/ patch tool it was trained on.
 - htrp19 days ago
 part of the value add of owning both the model and the tooling
 - cm218720 days ago
 We are back to RISC vs CISC!
 - htrp19 days ago
 history doesn't repeat but it definitely rhymes
- KronisLV20 days ago
 > I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etcI am so surprised that all of the AI tooling mostly revolves around VSC or its forks and that JetBrains seem to not really have done anything revolutionary in the space.With how good their refactoring and code inspection tools are, you’d really think they’d pass of that context information to AI models and that they’d be leaps and bounds ahead.
 - harikb20 days ago
 Recently, all these agents can talk LSP (language server protocol) so it should get better soon. That said, yeah they don't seem to default to use `ripgrep` when that is clearly better than `grep`
 - virtualritz19 days ago
 What you really want is ast-grep[1].Ripgrep is much faster than grep. But the result is not more concise and tokens are wasted.I think codex uses ast-grep by default, if installed; Claude has to be instructed?[1] <a href="https://ast-grep.github.io/" rel="nofollow">https://ast-grep.github.io/</a>
 - wahnfrieden20 days ago
 Codex likes to use ripgrep.
 - je4220 days ago
 Claudes search tool uses Ripgrep , which is embedded in Claude.
 - eek212120 days ago
 Are you? I'm not surprised at all, considering that the biggest investment juggernaut in AI is also the author of VSC. I wonder what the connection is? ;)
 - eru20 days ago
 Well, Google also has their own AIs and lots of money to throw around.
 - tvink20 days ago
 Unfortunately the have abysmal design sense for TUI and an inability to recognize the good feature requests they are getting
 - pjmlp20 days ago
 And yet contrary to Microsoft and Apple, they outsource most of their main development tools.Go and Dart hardly get the love across their SDKs as Objective-C, Swift, C#, VB get on their owners.Same with IDE tooling, fully dependant on JetBrains and Microsoft.
 - penneyd20 days ago
 Agreed - this seems like a no brainer, surely this is something that is being worked on.
 - htrp19 days ago
 Jetbrains is trying but I feel like they're very very behind in the space
 - epicureanideal19 days ago
 Claude and other LLMs can be used through JetBrains, and the IDE provides a significantly better experience than VS Code in my opinion.
 - PlatoIsADisease19 days ago
 I haven't seen JetBrains as 'great'. I think they have a strong marketing team that gets into universities and potentially astroturfs on the internet, but I have always found better tools for every language. Although, I can't remember what I ended up choosing for PHP.
- mulmboy20 days ago
 LLMs aren't like you or me. They can comprehend large quantities of code quickly and piece things together easily from scattered fragments. so go to reference etc become much less important. Of course though things change as the number of usages of a symbol becomes large but in most cases the LLM can just make perfect sense of things via grep.To provide it access to refactoring as a tool also risks confusing it via too many tools.It's the same reason that waffling for a few minutes via speech to text with tangents and corrections and chaos is just about as good as a carefully written prompt for coding agents.
- fragmede20 days ago
 Anthropic, for one.> Added LSP (Language Server Protocol) tool for code intelligence features like go-to-definition, find references, and hover documentation<a href="https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md#2074" rel="nofollow">https://github.com/anthropics/claude-code/blob/main/CHANGELO...</a>
 - novaleaf20 days ago
 their c# LSP theoretically worked for a week or so (I never saw it in action though), but now it always errors on launch :(
 - forty20 days ago
 There was an issue in Claude code which is fixed in latest release
- hippo2220 days ago
 If you can read fast enough, grepping is probably faster than waiting for a compiler to tell you anything.
 - gf00020 days ago
 Faster for worse results, though. Determining the source of a symbol is not as trivial as finding the same piece of text somewhere else, it should also reliably be able to differentiate among them. What better source for that then the compiler itself?
 - ninkendo20 days ago
 Yeah, especially for languages that make heavy use of type inference. There’s nothing you can really grep for most of the time… to really know “who’s using this code” you need to know what the compiler knows.An LLM can likely approach compiler-level knowledge just from being smart and understanding what it’s reading, but it costs a lot of context to do this. Giving the LLM access to what the compiler knows as an API seems like it’s a huge area for improvement.
 - squirrellous20 days ago
 It depends on the language and codebase. For something very dynamic like Python it may be the case that grepping finds real references to a symbol that won’t be found by a language server. Also language servers may not work with cross-language interfaces or codegen situations as well as grep.OTOH for a giant monorepo, grep probably won’t work very well.
- fancy_pantser20 days ago
 Zed Editor gives the LLM tools that use the LSP as you'd expect as a normal IDE user, like "go to symbol definition" so it greps a lot less.
- selcuka20 days ago
 JetBrain IDEs come with an MCP server that supports some refactoring tools [1]:> Starting with version 2025.2, IntelliJ IDEA comes with an integrated MCP server, allowing external clients such as Claude Desktop, Cursor, Codex, VS Code, and others to access tools provided by the IDE. This provides users with the ability to control and interact with JetBrains IDEs without leaving their application of choice.[1] <a href="https://www.jetbrains.com/help/idea/mcp-server.html#supported-tools" rel="nofollow">https://www.jetbrains.com/help/idea/mcp-server.html#supporte...</a>
- ricw20 days ago
 Tidewave.ai does exactly that. It’s made Claude code so much more functional. It provides mcp servers to- search all your code efficiently - search all documentation for libraries - access your database and get real data samples (not just abstract data types) - allows you to select design components from your figma project and implements them for you - allows Claude to see what is rendered in the browserIt’s basically the ide for your LLM client. It really closes the loop and has made Claude and myself so much more productive. Highly recommended and cheap at $10/monthPs: my personal opinion. I have Zero affiliation with them
- Wowfunhappy19 days ago
 LLMs operate on text. They can take in text, and they can produce text. Yes, some LLMs can also read and even produce images, but at least as of today, they are clearly much better at using text[1].So cat, ripgrep, etc are the right tools for them. They need a command line, not a GUI.1: Maybe you'd argue that Nano Banana is pretty good. But would you say its prompt adherence is good enough to produce, say, a working Scratch program?
 - kelipso19 days ago
 Inputs to functions are text, as in variables, or file names, directory names, symbol names with symbol searching. Outputs you get from these functions for things like symbol searching is text too, or at least easily reformatted to text. Like API calls are all just text input and output.
 - Wowfunhappy19 days ago
 Yes, and I frequently see Claude Code start with tools that retrieve these things when it's doing work. What are you surprised it isn't using?
- JimDabell20 days ago
 Kit looks like a good step in this direction:<a href="https://github.com/cased/kit" rel="nofollow">https://github.com/cased/kit</a>
- girvo20 days ago
 You can give agents the ability to check VSCode Diagnostics, LSP servers and the like.But they constantly ignore them and use their base CLI tools instead, it drives me batty. No matter what I put in AGENTS.md or similar, they always just ignore the more advanced tooling IME.
 - worksonmine20 days ago
 Doesn't have to be a bad thing, not all languages have good LSP support. If the AI can optimize for simple cross-language tools it won't be as dependent on the LSP implementation.I used grep and simple ctags to program in vanilla vim for years. It can be more useful than you'd think. I do like the LSP in Neovim and use it a lot, but I don't need it.
 - girvo20 days ago
 I also lived in ctags land, but gosh I don’t miss it. LSPs are a step change, and most languages do have either an actual implementation or something similar enough that’s still more powerful than bare strings.It’s faster, too, as the model doesn’t need to scan for info, but again it really likes to try not to use it.Of course I still use rg and fd to traverse things, cli tools are powerful. I just wish LLMs could be made to use more powerful tools reliably!
- hahahahhaah20 days ago
 An LSP MCP?
 - ninkendo20 days ago
 Yeah, or something even smarter than that.If you are willing to go language-specific, the tooling can be incredibly rich if you go through the effort. I’ve written some rust compiler drivers for domain-specific use cases, and you can hook into phases of the compiler where you have amazingly detailed context about every symbol in the code. All manner of type metadata, locations where values are dropped, everything is annotated with spans of source locations too. It seems like a worthy effort to index all of it and make it available behind a standard query interface the LLM can use. You can even write code this way, I think rustfmt hooks into the same pipeline to produce formatted code.I’ve always wished there were richer tools available to do what my IDE already does, but without needing to use the UI. Make it a standard API or even just CLI, and free it from the dependency on my IDE. It’d be very worth looking into I think.
 - quantummagic20 days ago
 If the compiler just dumped all that data out as structured text, you could use current LLMs to swallow it in a single gulp.
 - ninkendo19 days ago
 Well the point is to avoid them needing to swallow it in a single gulp… after all, the source code is already all the information you need to get all this metadata.The use cases I have in mind are for codebases with many millions of lines of code, where just dumping it all into the context is unreasonably expensive. In these scenarios, it’d be beneficial to give the LLM a sort of SQL-like language it can use to prod at the code base in small chunks.In fact I keep thinking of SQL as an example in my head, but maybe it’s best to take it literally: why don’t we have a SQL for source code? Why can’t I do “select function.name from functions where parameters contains …” or similar (with clever subselects, joins, etc) to get back whatever exists in the code?It’s something I always wanted in general, not just for LLM’s. But LLM’s could make excellent use of it if there’s simply not enough context size to reasonably slurp up all the code.
- rudedogg20 days ago
 LSP also kind of sucks. But the problem is all the big companies want big valuations, so they only chase generic solutions. That's why everything is a VS Code clone, etc..<a href="https://paulgraham.com/ds.html" rel="nofollow">https://paulgraham.com/ds.html</a>
 - dexwiz20 days ago
 I've never used an LSP plugin half as good as a JetBrains IDE.
 - immibis20 days ago
 Always wondered what happened to the era of IDEs actually knowing the language you're using.
- ramraj0720 days ago
 Not coding agents but we do a lot of work trying to find the best tools, and the result is always that the simplest possible general tool that can get the job done always beats a suite of complicated tools and rules on how to use them.
 - eru20 days ago
 Well, jump to definition isn't exactly complicated?And you can use whatever interface the language servers already use to expose that functionality to eg vscode?
 - jhasse20 days ago
 It can be: What definition to jump to if there are multiple (e.g. multiple Translation Units)? What if the function is overloaded and none of the types match?With grep it's easy: Always shows everything that matches.
 - eru20 days ago
 Sure, there might be multiple definitions to jump to.With grep you get lots of false positives, and for some languages you need a lot of extra rules to know what to grep for. (Eg in Python you might read `+` in the caller, but you actually need to grep for __add__ to find the definition.)
- elif19 days ago
 Surely there is an embedding for emacs giving it full elisp control
- BryantD20 days ago
 This isn’t completely the answer to what you want but skills do open a lot of doors here. Anything you can do on a command line can turn into a skill, after all.
- karlgkk20 days ago
 I’ve been saying this for a while. CPU demand is about to go through the roof.I think about it, to get these tools to be most effective you have to be able to page things in and out of their context windows.What was once a couple of queries is now gonna be dozens or hundreds or even more from the LLMFor code that means querying the AST and query it in a way that allows you to limit the results of the outputI wonder which SAST vendor Anthropic will buy.
- throwawaygo19 days ago
 Workin on it
Jaysobel20 days ago
Author here - some bonus links!Session transcript using Simon Willison's claude-code-transcripts<a href="https://htmlpreview.github.io/?https://gist.githubusercontent.com/jaysobel/dfeed9a65ce7209274acf9ada0eaa65e/raw/claude_code_rollercoaster_tycoon_transcript.html" rel="nofollow">https://htmlpreview.github.io/?https://gist.githubuserconten...</a>Reddit post<a href="https://www.reddit.com/r/ClaudeAI/comments/1q9fen5/claude_code_in_rollercoaster_tycoon/" rel="nofollow">https://www.reddit.com/r/ClaudeAI/comments/1q9fen5/claude_co...</a>OpenRCT2!!<a href="https://github.com/jaysobel/OpenRCT2" rel="nofollow">https://github.com/jaysobel/OpenRCT2</a>Project repo<a href="https://github.com/jaysobel/OpenRCT2" rel="nofollow">https://github.com/jaysobel/OpenRCT2</a>
- theptip20 days ago
 Did you eval using screenshots or some sort of rendered visualization instead of the CLI? I wonder if Claude has better visual intelligence when viewing images (lots of these in its training set) rather than ascii schematics (probably very few of these in the corpus).
 - cheema3320 days ago
 Computer use and screenshots are context intensive. Text is not. The more context you give to an LLM, the dumber it gets. Some people think at 40% context utilization, the LLM starts to get into the dumb zone. That is where the limitations are as of today. This is why CLI based tools like Claude Code are so good. And any attempt at computer use has fallen by the wayside.There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.
 - fragmede20 days ago
 > And any attempt at computer use has fallen by the wayside.You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)<a href="https://claude.com/blog/cowork-research-preview" rel="nofollow">https://claude.com/blog/cowork-research-preview</a><a href="https://news.ycombinator.com/item?id=46593022">https://news.ycombinator.com/item?id=46593022</a>More to the point though, you should be using Agents in Claude Code to limit context pollution. Agents run with their own context, and then only return salient details. Eg, I have an Agent to run "make" and return the return status and just the first error message if there is one. This means the hundreds/thousands of lines of compilation don't pollute the main Claude Code context, letting me get more builds in before I run out of context there.
 - cheema3313 days ago
 >> And any attempt at computer use has fallen by the wayside.> You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)Claude Cowork does not do "computer use" in the traditional sense. e.g. it cannot use your computer to drive the interface of Adobe Premiere. It is not taking screenshots of your computer desktop, like a traditional "Computer use" product does.
 - Jaysobel20 days ago
 I had tried the browser screenshotting feature for agents in Cursor and found it wasn't very reliable - screenshots eat a lot of context, and the agent didn't have a good sense for when to use them. I didn't try it in this project. I bet it would work in some specific cases.
 - nanapipirara20 days ago
 Claude helped me immensely getting an image converter to work. Giving it screenshots of wrong output (lots of layers had an unpredictable offsets that was not supposed to be there) and output as I expected it helped Claude understand the problems and it fixed the bugs immediately.
 - deepl_y19 days ago
 I'm not sure if this proves anything, but i saw this article of Opus playign pokemon, and here they were given actual screenshots, and it still says it navigated visual space pretty poorly despite the advancements <a href="https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-into-claude-opus-4-5-from-pokemon" rel="nofollow">https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-i...</a>
- cheschire20 days ago
 Did you intend the last link to link to your project? It’s a copy of the OpenRCT2 project.
 - philipwhiuk20 days ago
 I think the first one should have been <a href="https://github.com/OpenRCT2/OpenRCT2" rel="nofollow">https://github.com/OpenRCT2/OpenRCT2</a>The actual change to implement CC is: <a href="https://github.com/jaysobel/OpenRCT2/commit/5d49dc960fcfc133a79558b2eb12b1f7125651cf" rel="nofollow">https://github.com/jaysobel/OpenRCT2/commit/5d49dc960fcfc133...</a>
- fragmede20 days ago
 > Claude is at a pretty steep visuo-spatial disadvantage,How hard would it be to use with OpenAI's offerings instead? Particularly, imo, OpenAI's better at "looking" at pictures than Claude.
rashidae20 days ago
> As a mirror to real-world agent design: the limiting factor for general-purpose agents is the legibility of their environments, and the strength of their interfaces. For this reason, we prefer to think of agents as automating diligence, rather than intelligence, for operational challenges.
hk__220 days ago
> The only other notable setback was an accidental use of the word "revert" which Codex took literally, and ran git revert on a file where 1-2 hours of progress had been accumulating.
- qaboutthat20 days ago
 If I tell Claude to "revert that last change, it isn't right, try this instead" and Claude hasn't committed recently it will happily `git checkout ...` and blow away all recent changes instead of reverting the "last change".(Which, it's not wrong or anything -- I did say "revert that change" -- it's just annoying. And telling `CLAUDE.md` to commit more often doesn't work consistently, because Claude is a dummy sometimes).
 - mh-20 days ago
 I haven't tried it, but theoretically one could use Claude Code's hooks facility to enforce committing at some determined thresholds.
 - MrGreenTea20 days ago
 I use it (with jj but should be the same with git). It tells Claude to commit after every Write tool use. It's a bit to small steps but I then usually just squash them afterwards. I haven't yet found a good automatic heuristic for when to tell Claude to commit (or directly auto-commit, but I like that Claude writes the commit message)
 - mmmattt19 days ago
 [dead]
- _flux20 days ago
 Amazing that these tools don't maintain a replayable log of everything they've done.Although git revert is not a destructive operation, so it's surprising that it caused any loss of data. Maybe they meant git reset --hard or something like that. Wild if Codec would run that.
 - arcanemachiner20 days ago
 I was looking at the insanity known as Gas Town [0] the other day, and it does use Git to store historical work state in something it calls "beads":<a href="https://github.com/steveyegge/gastown?tab=readme-ov-file" rel="nofollow">https://github.com/steveyegge/gastown?tab=readme-ov-file</a>
 - calebkaiser20 days ago
 If anyone is curious, Beads is an agent memory project from the same developer: <a href="https://github.com/steveyegge/beads" rel="nofollow">https://github.com/steveyegge/beads</a>
 - PKop20 days ago
 Bees?
 - brap20 days ago
 BEADS
 - rabf20 days ago
 I have had codex recover things for me from its history after claude had done a git reset hard, codex is one of the more reliable models/harneses when it comes to performing undo and redo operations in my experience.
 - theptip20 days ago
 Claude Code has had this feature for a few months now.
 - CPLX20 days ago
 I found this tool to be the solution I was looking for to address this specific problem:<a href="https://contextify.sh" rel="nofollow">https://contextify.sh</a>
 - MattGaiser20 days ago
 Claude Code has /rewind. Not sure if it is foolproof, but this has been tried.
 - saagarjha20 days ago
 Huh, I didn't realize this existed. I feel like it's a miss that Claude doesn't use it when you tell it to undo its work.
 - JimDabell20 days ago
 It doesn‘t handle anything Claude uses the shell for, so if it runs `rm -rf .` then rewind won’t help you.
 - xandrius20 days ago
 You can give it approved commands, so you can prevent that.
 - defunct3420 days ago
 Claude (can’t remember if was 4.1 Opus, 4.5 Sonnet, or 4.5 Opus) once just started playing with git worktrees and royally f-d up the local repo and lost several hours of work. Since then, I watch it like a hawk.
 - stinkbeetle20 days ago
 `git reset --hard` doesn't remove unreferenced commits or rewrite the reflog so I don't think that would do it. Something like `git reset && git gc` would have to be done.
 - eru20 days ago
 And git gc doesn't collect any garbage less than two weeks old by default, either.
 - _flux20 days ago
 But it does remove current uncommitted changes.
 - olvy018 days ago
 Except for new files, you'd have to also run git clean -f
- alt22720 days ago
 I wonder how they accidentaly used a word like that.
 - gbear60520 days ago
 “Please revert that last change you did”, referring to like a smaller change that had just been done
 - GardenLetter2720 days ago
 Codex reverted kindly.
- esafak20 days ago
 Does Codex not let you set command permissions?
 - legojoey1720 days ago
 Yea, it does so this would likely have been to be a `--yolo` (I don't care, let me `rm -rf /`). I've found even with the "workspace write" mode and no additional writable directories I can't do git operations without approval so it seems to exclude `.git` by default.
- Filligree20 days ago
 Yet another reason to use Jujutsu. And put a `jj status` wrapper in your PS1. ;-)
 - westurner20 days ago
 Start with env args like AGENT_ID for indicating which Merkle hash of which model(s) generated which code with which agent(s) and add those attributes to signed (-S) commit messages. For traceability; to find other faulty code generated by the same model and determine whether an agent or a human introduced the fault.Then, `git notes` is better for signature metadata because it doesn't change the commit hash to add signatures for the commit.And then, you'd need to run a local Rekor log to use Sigstore attestations on every commit.Sigstore.dev is SLSA.dev compliant.Sigstore grants short-lived release attestation signing keys for CI builds on a build farm to sign artifacts with.So, when jujutsu autocommits agent-generated code, what causes there to be an {{AGENT_ID}} in the commit message or git notes? And what stops a user from forging such attestations?
 - westurner20 days ago
 - "Diffwatch – Watch AI agents touch the FS and see diffs live" (2025) <a href="https://news.ycombinator.com/item?id=45786382">https://news.ycombinator.com/item?id=45786382</a> :> you can manually stage against @-: [with jujutsu]
 - diath20 days ago
 > Yet another reason to use JujutsuAnd what would that reason be? You can git revert a git revert.
 - jsnell20 days ago
 You're correct for an actual git revert, but it seems pretty clear that the original authors have mangled the story and it was actually either a "git checkout" or "git reset". The "file where 1-2 hours of progress had been accumulating" phrasing only makes sense if those were uncommitted changes.And the reason jj helps in that case is that for jj there is no such thing as an uncommitted change.
 - MarkMarine20 days ago
 Also JJ undo is there and easy to tell the model to use, I have it in my Claude.md
 hu320 days ago
 surely Claude is much better at using git because of the massive training data difference.If it didn't undo git, it would do it with JJ either.
 MarkMarine18 days ago
 Actually I find it better with JJ. I have context7 mcp to help with commands and I’ve got an explicit Claude.md to direct it, but it’s more ambitious running stacked PRs and better at resolving conflicts.
 Filligree20 days ago
 It does fine with jj. Sometimes better, because jj is much easier to use non-interactively.
 hu318 days ago
 what do you mean non-interactively? Claude is great with git in command-line.
 - block_dagger20 days ago
 Having no such thing as an uncommitted change seems like it would be a nightmare, but perhaps I'm just too git-oriented.
 eru20 days ago
 > Having no such thing as an uncommitted change seems like it would be a nightmare, but perhaps I'm just too git-oriented.Why? What's the problem you see? The only problem I see is when you let these extra commits pollute the history reachable from any branch you care about.Let's look at the following:Internally, 'git stash' consists of two operations: one that makes an 'anonymous' commit of your files, and another that resets those files to whatever they were in HEAD. (That commit is anonymous in the sense that no branch points at it.)The git libraries expose the two operations separately. And you can build something yourself that works similarly.You can use these capabilities to build an undo/redo log in git, but without polluting any of the history you care about.To be honest, I have no clue how Jujutsu does it. They might be using a totally different design.
 fragmede20 days ago
 > perhaps I'm just too git-oriented.The problem is git's index let's you write a bunch of unconnected code, then commit it separately. To different branches, even! This works great for stacking diffs but is terribly confusing if you don't know what you're doing.
 eru20 days ago
 Well, git doesn't really commit 'to' a branch.You just build commits, and then later on you muck around with the mutable pointers that are branches.
 fragmede16 days ago
 How "to" do you want to make it? That description's totally disingenuous."later on" makes it sound to a human like it takes any real amount of time or that it isn't basically instant and wrapped by up porcelean, and "muck around with" implies that there's anything more random or complicated to it then writing the sha to a file in the right place in the .git directory.
 steveklabnik20 days ago
 Things like the index become a workflow pattern, rather than a feature, if that makes any sense.
 - mbb7020 days ago
 Probably it actually ran git checkout or reset. As you say git revert only operates on committed snapshots so it will all be in the reflog
 - ewoodrich20 days ago
 Yes, this exact scenario has happened to me a couple times with both Claude and Codex, and it's usually git checkout, more rarely git reset. They immediately realize they fucked up and spend a few minutes trying to undo by throwing random git commands at it until eventually giving up.
 foobar1000020 days ago
 Yeap - this is why when running it in a dev container, I just use ZFS and set up a 1 minute auto-snapshot - which is set up as root - so it generally cannot blow it away. And cc/codex/gemini know how to deal with zfs snapshots to revert from them.Of course if you give an agentic loop root access in yolo mode - then I am not sure how to help...
 - glemion4320 days ago
 It's not going to happen...Stop spamming
 - dwattttt20 days ago
 The feature of "there is no such thing as an uncommitted working directory" is very relevant to the situation.
 - glemion4320 days ago
 It's not. There are so many ways to just solve this non issue that no one will just switch to just another random tool.Especially not away from git.
 dwattttt20 days ago
 > It's notGiven that other posts solved the problem by scripting this feature on top of git, I guess you're telling them their solution isn't relevant too.
 glemion4320 days ago
 It's about switching your whole ecosystem to a complete different thing...
 - NewsaHackO20 days ago
 This is funny. I tried it once and didn't see what the benefit was. Then, when I tried to reset it back to normal git, I realized that the devs had not (at the time) made any clean way to revert it back, just a one-way conversion to jj. I haven't tried it since.
 - steveklabnik20 days ago
 What were you trying to “revert back”? You should have been able to just stop using jj, there’s nothing to revert back to. It’s also possible that I’m misunderstanding what you mean.
 - JimDabell20 days ago
 Jujutsu doesn’t change your Git repository in incompatible ways. It just tracks extra information in the .jj/ directory. There is zero migration needed to revert back to Git – you just start using Git again.
pocketarc20 days ago
I love the interview at the end of the video. The kubectl-inspired CLI, and the feedback for improvements from Claude, as well as the alerts/segmentation feedback.You could take those, make the tools better, and repeat the experience, and I'd love to see how much better the run would go.I keep thinking about that when it comes to things like this - the Pokemon thing as well. The quality of the tooling around the AI is only going to become more and more impactful as time goes on. The more you can deterministically figure out on behalf of the AI to provide it with accurate ways of seeing and doing things, the better.Ditto for humans, of course, that's the great thing about optimizing for AI. It's really just "if a human was using this, what would they need"? Think about it: The whole thing with the paths not being properly connected, a human would have to sit down and really think about it, draw/sketch the layout to visualize and understand what coordinates to do things in. And if you couldn't do that, you too would probably struggle for a while. But if the tool provided you with enough context to understand that a path wasn't connected properly and why, you'd be fine.
- wonnage20 days ago
 I see this sentiment of using AI to improve itself a lot but it never seems to work well in practice. At best you end up with a very verbose context that covers all the random edge cases encountered during tasks.For this to work the way people expect you’d need to somehow feed this info back into fine tuning rather than just appending to context. Otherwise the model never actually “learns”, you’re just applying heavy handed fudge factors to existing weights through context.
 - pilord31420 days ago
 I've been playing around with an AI generated knowledge base to grok our code base, I think you need good metrics on how the knowledge base is used. A few things is:1. Being systematic. Having a system for adding, improving and maintaining the knoweldge base 2. Having feedback for that system 3. Implementing the feedback into a better systemI'm pretty happy I have an audit framework and documentation standards. I've refactored the whole knowledge base a few times. In the places where it's overly specific or too narrow in it's scope of use for the retained knowledge, you just have to prune it.Any garden has weeds when you lay down fertile soil.Sometimes they aren't weeds though, and that's where having a person in the driver's seat is a boon.
 - mcintyre199420 days ago
 The features it asked for in this case were better tools, I thought they were really sensible. It said it wanted a —dry-run (like the CLIs the rct one was modelled on), it wanted to be able to segment guest feedback, and it wanted better feedback from its path tools. Those might not be actually possible in rct, but in a different context they’re pretty smart requests and not just verbose edge cases.
lukebechtel20 days ago
> We don't know any C++ at all, and we vibe-coded the entire project over a few weeks. The core pieces of the build are…what a world!
- AndrewKemendo20 days ago
 I would’ve walked for days to a CompUSA and spent my life savings if there was anything remotely equivalent to this when I was learning C on my Macintosh 4400 in 1997People don’t appreciate what they have
 - lifetimerubyist20 days ago
 It’s worse. They’re proud they don’t know.
 - risyachka20 days ago
 Its like ordering a project from upwork- someone did it for you, you have no idea what is going on, kinda works though.
 - kmijyiyxfbklao20 days ago
 Since there are no humans involved, it's more like growing a tree. Sure it's good to know how trees grow, but not knowing about cells didn't stop thousands of years of agriculture.
 risyachka20 days ago
 Its not like tree at all because tree is one and done.Code is a project that has to be updated, fixed, etc.So when something breaks - you have to ask the contractor again. It may not find an issue, or mess things up when it tries to fix it making project useless, etc.Its more like a car. Every time something goes wrong you will pay for it - sometimes it will get back in even worse shape (no refunds though), sometimes it will cost you x100 because there is nothing you can do, you need it and you can't manage it on your own.
 eks39120 days ago
 Trees are not static, unchanging, pop into existence and forget about, things. Trees that don't get regular "updates" of adequate sunlight, water, and nutrients die. In fact, too much light or water could kill it. Or soil that is not the right courseness or acidity level could hamper or prevent growth. Now add "bugs". Literal bugs, diseases, and even competing plants that could eat, poison, or choke the tree. You might be thinking of trees that are indigenous to an area. Even these compete for the resources and plagues of their area, but are more apt than the trees accustom to different environments, and even they go through the cycle of life. I think his analogy was perfect, because this is the first time coding could resemble nature. We are just used to the carefully curated human made code, as there has not been such a thing as naturally occuring, no human interaction, code before
 Jaysobel20 days ago
 The Gas Town piece reminded me of this as well. The author there leaned into role playing, social and culture analogies, and it made a lot more sense than an architecture diagram in which one node is “black box intelligence” with a single line leading out of it…
 kshri2420 days ago
 I wouldn't say it is a tree as such as at least trees are deterministic where input parameters (seed, environment, sunlight) define the output.LLM outputs are akin to a mutant tree that can decide to randomly sprout a giant mushroom instead of a branch. And you won't have any idea why despite your input parameters being deterministic.
 dpc05050520 days ago
 You haven't done a lot of gardening if you don't know plants get 'randomly' (there's a biological explanation, but with the massive amounts of variables it feels random) attacked by parasites all the time. Go look at pot growing subreddits, they spend an enormous chunk of their time fighting mites.
 kshri2420 days ago
 Determinism is not strictly anti-randomness (though I can see why one can confuse it to be polar opposites). Rather we do not even have true randomness (at least not proven) and should actually be called pseudorandom. Determinism just means that if you have the same input parameters (considering all parameters have been accounted for), you will get the same result. In other words, you can start with a particular random seed (pseudorandom seed to be precise) and always end up with the same end result and that would be considered deterministic.> You haven't done a lot of gardening if you don't know plantsI grow "herbs".> there's a biological explanationExactly. There is always an explanation for every phenomena that occurs in this observable, physical World. There is a defined cause and effect. Even if it "feels random". That's not how it is with LLMs. Because in between your deterministic input parameters and the output that is generated, there is a black box: the model itself. You have no access to the billions of parameters within the models which means you are not sure you can always reproduce the output. That black box is what causes non-determinism.EDIT: just wanted to add - "attacked by parasites all the time", is why I said if you have control over the environment. Controlling environment encompasses dealing with parasites as well. Think of well-controlled environment like a lab.
 famouswaffles20 days ago
 Do you think LLMs sidestep cause and effect somehow ? There's an explanation there too, we just don't know it, But that's the case for many natural phenomena.
 kshri2420 days ago
 I am not saying LLM sidesteps cause-effect. I am saying it is a black box. So yes "we just don't know it" is basically describing a black box.
 doug_durham20 days ago
 In what world are trees deterministic? There are a set of parameters that you can control that give you a higher probability of success, but uncontrollable variables can wipe you out.
 kshri2420 days ago
 Explained here [1]. We live in a pseudorandom World. So everything is deterministic if you have the same set of input parameters. That includes trees as well.I am not talking about controllable/uncontrollable variables. That has no bearing on whether a process is deterministic in theory or not. If you can theoretically control all variables (even if you practically cannot), you have a deterministic process as you can reproduce the entire path: from input to output. LLMs are currently a black box. You have no access to the billions of parameters within the model, making it non-deterministic. The day we have tools where we can control all the billions of parameters within the model, then we can retrace the exact path taken, thereby making it deterministic.[1]: <a href="https://news.ycombinator.com/item?id=46663052">https://news.ycombinator.com/item?id=46663052</a>
 ambicapter20 days ago
 Very interesting analogy
 amlib20 days ago
 Except that the tree is so malformed and the core structure so unsound that it can't grow much past its germination and dies of malnourishment because since you have zero understanding of biology, forestry and related fields there is no knowledge to save it or help it grow healthy.Also out of nowhere an invasive species of spiders that was inside the seed starts replicating geometrically and within seconds wraps the whole forest with webs and asks for a ransom in order to produce the secret enzyme that can dissolve it. Trying to torch it will set the whole forest on fire, brute force is futile. Unfortunately, you assumed the process would only plagiarize the good bits, but seems like it also sometimes plagiarizes the bad bits too, oops.
 - datsci_est_201520 days ago
 Great analogy. “I don’t know any C++ but I hired some people on Upwork and they delivered this software demo.”
 whateveracct20 days ago
 Con fuckign gratys, u can buy compute
 - doug_durham20 days ago
 "They" are? I didn't see that in the article. It sounds like you are projecting your prejudices on to a non-defined out group.
 - imiric20 days ago
 Did you actually learn C? Be thankful nothing like this existed in 1997.A machine generating code you don't understand is not the way to learn a programming language. It's a way to create software without programming.These tools can be used as learning assistants, but the vast majority of people don't use them as such. This will lead to a collective degradation of knowledge and skills, and the proliferation of shoddily built software with more issues than anyone relying on these tools will know how to fix. At least people who can actually program will be in demand to fix this mess for years to come.
 - AndrewKemendo20 days ago
 It would’ve been nice to have a system that I could just ask questions to teach me how it works instead of having to pour through the few books that existed on C that was actually accessible to a teenager learning on their ownGoing to arcane websites, forum full of neckbeards to expect you to already understand everything isn’t exactly a great way to learnThe early Internet was unbelievably hostile to people trying to learn genuinely
 - hrldcpr20 days ago
 *pore through(not a judgment, just mentioning in case the distinction is interesting to anyone)
 - rabf20 days ago
 I had the books (from the library) but never managed to get a compiler for many years! Was quite confusing trying to understand all the unix references when my only experience with a computer was the Atari ST.
 - metaltyphoon20 days ago
 I don't understand how OP thinks that being oblivious how anything work underneath is a good thing. There is a threshold of abstraction to which you must know how it works to effectively fix it when it breaks.
 - jedberg20 days ago
 You can be a super productive Python coder without any clue how assembly works. Vibe coding is just one more level of abstraction.Just like how we still need assembly and C programmers for the most critical use cases, we'll still need Python and Golang programmers for things that need to be more efficient than what was vibe coded.But do you really need your $whatever to be super efficient, or is it good enough if it just works?
 kshri2420 days ago
 One is deterministic the other is not. I leave it to you to determine which is which in this scenario.
 afro8820 days ago
 Humans writing code are also non deterministic. When you vibe code you're basically a product owner / manager. Vibe coding isn't a higher level programming language, it's an abstraction over a software engineer / engineering team.
 kshri2420 days ago
 > Humans writing code are also non deterministicThat's not what determinism means though. A human coding something, irrespective of whether the code is right or wrong, is deterministic. We have a well defined cause and effect pathway. If I write bad code, I will have a bug - deterministic. If I write good code, my code compiles - still deterministic. If the coder is sick, he can't write code - deterministic again. You can determine the cause from the effect.Every behavior in the physical World has a cause and effect chain.On the other hand, you cannot determine why a LLM hallucinated. There is no way to retrace the path taken from input parameters to generated output. At least as of now. Maybe it will change in the future where we have tools that can retrace the path taken.
 afro8820 days ago
 You misunderstand. A coder will write different code for the same problem each time unless they have the solution 100% memorised. And even then a huge number of factors can influence them not being able to remember 100% of the memorised code, or opt for different variations.People are inherently nondeterministic.The code they (and AI) writes, once written, executes deterministically.
 bdangubic20 days ago
 > The code they (and AI) writes, once written, executes deterministically.very rarely :)
 kshri2420 days ago
 > A coder will write... or opt for different variations.Agreed.> People are inherently nondeterministic.We are getting into the realm of philosophy here. I, for one, believe in the idea of living organisms having no free will (or limited will to be more precise. but can also go so far as to say "dependent will"). So one can philosophically explain that people are deterministic, via concepts of Karma and rebirth. Of course none of this can be proven. So your argument can be true too.> The code they (and AI) writes, once written, executes deterministically.Yes. Execution is deterministic. I am however talking only about determinism in terms of being able to know the entire path: input to output. Not just the outputs characteristic (which is always going to be deterministic). It is the path from input to output that is not deterministic due to presence of a black box - the model.
 imiric20 days ago
 I mostly agree with you, but I see what afro88 is saying as well.If you consider a human programmer as a "black box", in the sense that you feed it a set of inputs—the problem that needs to be solved, vague requirements, etc.—and expect a functioning program as output that solves the problem, then that process is similarly nondeterministic as an LLM. Ensuring that the process is reliable in both scenarios boils down to creating detailed specifications, removing ambiguity, and iterating on the product until the acceptance tests pass.Where I think there is a disconnect is that humans are far more capable at producing reliable software given a fuzzy set of inputs. First of all, they have an understanding of human psychology, and can actually reason about semantics in ways that a pattern matching and token generation tool cannot. And in the best case scenario of experienced programmers, they have an intuitive grasp of the problem domain, and know how to resolve ambiguities in meatspace. LLMs at their current stage can at best approximate these capabilities by integrating with other systems and data sources, so their nondeterminism is a much bigger problem. We can hope that the technology will continue to improve, as it clearly has in the past few years, but that progress is not guaranteed.
 kshri2420 days ago
 Agree with most of what you say. The only reason I say humans are different from LLMs when it comes to being a "black box" is because you can probe humans. For instance, I can ask a human to explain how he/she came to the conclusion and retrace the path taken to come to said conclusion from known inputs. And this can also be correlated with say brainwave imaging by mapping thoughts to neurons being triggered in that portion of the brain. So you can have a fairly accurate understanding of the path taken. I cannot probe the LLM however. At least not with the tools we have today.> Where I think there is a disconnect is that humans are far more capable at producing reliable software given a fuzzy set of inputs.Yes true. Another thought that comes to my mind is I feel it might also have to do with us recognizing other humans as not as alien to us as LLMs are. So there is an inherent trust deficit when it comes to LLMs vs when it comes to humans. Inherent trust in human beings, despite being less capable, is what makes the difference. In everything else we inherently want proper determinism and trust is built on that. I am more forgiving if a child computes 2 + 1 = 4, and will find it in me to correct the child. I won't consider it a defect. But if a calculator computes 2 + 1 = 4 even once, I would immediately discard it and never trust it again.> We can hope that the technology will continue to improve, as it clearly has in the past few years, but that progress is not guaranteed.Agreed.
 jfreds20 days ago
 This is true. What are the implications of that?
 pqtyw20 days ago
 Perhaps there is no need to actually understand assembly, but if you don't understand certain basic concepts actually deploying any software you wrote to production would be a lottery with some rather poor prizes. Regardless of how "productive" you were.
 ben_w20 days ago
 Somebody needs to understand, to the standard of "well enough".The investors who paid for the CEO who hired your project manager to hire you to figure that out, didn't.I think in this analogy, vibe coders are project managers, who may indeed still benefit from understanding computers, but when they don't the odds aren't anywhere near as poor as a lottery. Ignorance still blows up in people's faces. I'd say the analogy here with humans would be a stereotypical PHB who can't tell what support the dev needs to do their job and then puts them on a PIP the moment any unclear requirement blows up in anyone's face.
 - hdgvhicv20 days ago
 I’m vaguely aware that transistors are like electronic switches and if I serve my memory I could build and and/or/not gateI have no idea how an i386 works, let alone a modern cpu. Sure there are registers and different levels of cache before you get to memory.My lack of knowledge of all this doesn’t prevent me from creating useful programs using higher abstraction layers like c.
 - neilwilson20 days ago
 That’s what a C compiler does when generating a binary.There was a time when you had to know ‘as’, ‘ld’ and maybe even ‘ar’ to get an executable.In the early days of g++, there was no guarantee the object code worked as intended. But it was fun working that out and filing the bug reports.This new tool is just a different sort of transpiler and optimiser.Treat it as such.
 - wizzwizz420 days ago
 > There was a time when you had to know ‘as’, ‘ld’ and maybe even ‘ar’ to get an executable.No, there wasn't: you could just run the shell script, or (a bit later) the makefile. But there were benefits to knowing as, ld and ar, and there still are today.
 jstummbillig20 days ago
 > But there were benefits to knowing as, ld and ar, and there still are today.This is trivially true. The constraint for anything you do in your life is time it takes to know something.So the far more interesting question is: At what level do you want to solve problems – and is it likely that you need knowledge of as, ld and ar over anything else, that you could learn instead?
 wizzwizz420 days ago
 Knowledge of as, ld, ar, cc, etc is only needed when setting up (or modifying) your build toolchain, and in practice you can just copy-paste the build script from some other, similar project. Knowledge of these tools has never been needed.
 fn-mote20 days ago
 Knowledge of cc has never been needed? What an optimist! You must never have had headers installed in a place where the compiler (or Makefile author) didn’t expect them. Same problems with the libraries. Worse when the routine you needed to link was in a different library (maybe an arch-specific optimized lib).That post is only true in the most vacuous sense.“A similar project” discovered where, on BITNET?
 wizzwizz420 days ago
 The library problems you described are nothing that can't be solved using symlinks. A bad solution? Sure, but it works, and doesn't require me to understand cc. (Though when I needed to solve this problem, it only took me about 15 minutes and a man page to learn how to do it. `gcc -v --help` is, however, unhelpful.)"A similar project" as in: this isn't the first piece of software ever written, and many previous examples can be found on the computer you're currently using. Skim through them until you find one with a source file structure you like, then ruthlessly cannibalise its build script.
 saagarjha20 days ago
 I feel like this really just says your tools are bad and leaky?
 - imiric20 days ago
 If you don't see a difference between a compiler and a probabilistic token generator, I don't know what to tell you.And, yes, I'm aware that most compilers are not entirely deterministic either, but LLMs are inherently nondeterministic. And I'm also aware that you can tweak LLMs to be more deterministic, but in practice they're never deployed like that.Besides, creating software via natural language is an entirely different exercise than using a structured language purposely built for that.We're talking about two entirely different ways of creating software, and any comparison between them is completely absurd.
 Closi20 days ago
 They are 100% different and yet kind-of-the-same.They can function kind-of-the-same in the sense that they can both change things written in a higher level language into a lower level language.100% different in every other way, but for coding in some circumstances if we treat it as a black box, LLMs can turn higher level pseudocode into lower level code (inaccurately), or even transpile.Kind of like how email and the postal service can be kind of the same if you look at it from a certain angle.
 imiric20 days ago
 > Kind of like how email and the postal service can be kind of the same if you look at it from a certain angle.But they're not the same at all, except somewhat by their end result, in that they are both ways of transmitting information. That similarity is so vague that comparing them doesn't make sense for any practical purpose. You might as well compare them to smoke signals at that point.It's the same with LLMs and programming. They're both ways of producing software, but the process of doing that and even the end result is completely different. This entire argument that LLMs are just another level of abstraction is absurd. Low-Code/No-Code tools, traditional code generators, meta programming, etc., are another level of abstraction on top of programming. LLMs generate code via pattern matching and statistics. It couldn't be more different.
 anthk20 days ago
 People negating down your comment are just "engineers" doomed to fail sooner or later.Meanwhile, 9front users have read at least the plan9 intro and know about nm, 1-9c, 1-9l and the like. Wibe coders will be put on their place sooner or later. It´s just a matter of time.
 - anthk20 days ago
 Competent C programmers know about nm, as, ld and a bunch of other binary sections in order to understand issues and proper debugging.Everyone else are deluding themselves. Even the 9front intro requieres you to at least know the basics of nm and friends.
 - Workaccount220 days ago
 It's just another layer.Assembly programmers from years gone by would likley be equally dismissive of the self-aggrandizing code block stitchers of today.(on topic, RCT was coded entirely in assembly, quite the achievement)
- yoyohello1320 days ago
 Everyone should read that section. It was really interesting reading about their experiences/challenges getting it all working.
- falloutx20 days ago
 First time I am seeing realistic timelines from a vibe-coded project. Usually everyone who vibe codes just says they did in few hours, no matter the project.
 - ben_w20 days ago
 Hmm. My experience with it is that a few hours of that will get you a sprint if you're lucky and the prompt hits the happy path. I had… I think two of those, over 5 weeks? I can believe plenty of random people stumble across happy-path examples.Exciting when it works, but I think a much more exciting result for people with less experience who may not know that the "works for me" demo is the dreaded "first 90%", and even fairly small projects aren't done until the fifth-to-tenth 90%.(That, and that vibe coding in the sense of "no code review" are prone to balls of mud, so you need to be above average at project management to avoid that after a few sprint-equivalents of output).
 - Aurornis20 days ago
 It’s possible to vibe code certain generic things in a few hours if you’re basically combining common, thoroughly documented, mature building blocks. It’s not going to be production ready or polished but you can get surprisingly far with some things.For real work, that phase is like starting from a template or a boilerplate repo. The real work begins after the basics are wired together.
fnordpiglet20 days ago
Interesting article but it doesn’t actually discuss how well it performs at playing the game. There is in fact a 1.5 hour YouTube video but it woulda been nice for a bit of an outcome postmortem. It’s like “here’s the methods and set up section of a research paper but for the conclusion you need to watch this movie and make your own judgements!”
- Sharlin20 days ago
  It does discuss that? Basically it has good grasp of finances and often knows what "should" be done, but it struggles with actually building anything beyond placing toilets and hotdog stalls. To be fair, its map interface is not exactly optimal, and a multimodal model might fare quite a bit better at understanding the 2D map (verticality would likely still be a problem).
- cyanydeez20 days ago
  I was told the important part of AI is the generation part, not the verification or quality.
nipponese20 days ago
> kept the context above the ~60% remaining level where coding models perform at their absolute bestMaybe this is obvious to Claude users but how do you know your remaining context level? There is UI for this?
- adithyareddy20 days ago
 You can also show context in the statusline within claude code: <a href="https://code.claude.com/docs/en/statusline#context-window-usage" rel="nofollow">https://code.claude.com/docs/en/statusline#context-window-us...</a>
 - nipponese20 days ago
 Follow up Q: what are you supposed to do when the context becomes too large? Start a new conversation/context window and let Claude start from scratch?
 - d4rkp4ttern20 days ago
 Context filling up is sort of the Achilles heel of CLI agents. The main remedy is to have it output some type of handoff document and then run /compact which leaves you with a summary of the latest task. It sort of works but by definition it loses information, and you often find yourself having to re-explain or re-generate details to continue the work.I made a tool[1] that lets you just start a new session and injects the original session file path, so you can extract any arbitrary details of prior work from there using sub-agents.[1] aichat tool <a href="https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-aichat--session-search-and-continuation-without-compaction" rel="nofollow">https://github.com/pchalasani/claude-code-tools?tab=readme-o...</a>
 - kcoddington20 days ago
 Either have Claude /compact or have it output things to a file it can read in on the next session. That file would be a summary of progress for work on a spec or something similar. Also good to prime it again with the Readme or any other higher level context
 - theptip20 days ago
 It’s a good idea to have Claude write down the execution plan (including todos). Or you can use something like Linear / GH Issues to track the big items. Then small/tactical todos are what you track in session todos.This approach means you can just kill the session and restart if you hit limits.(If you hit context limits you probably also want to look into sub-agents to help prevent context bloat. For example any time you are running and debugging unit tests, it’s usually best to start with a subagent to handle the easy errors. )
 - pbhjpbhj20 days ago
 It feels like one could produce a digest of the context that works very similarly but fits in the available context window - not just by getting the LLM to use succinct language, but also mathematically; like reducing a sparse matrix.There might be an input that would produce that sort of effect, perhaps it looks like nonsense (like reading zipped data) but when the LLM attempts to do interactive in it the outcome is close to consuming the context?
 - docjay20 days ago
``` §CONV_DIGEST§ T1:usr_query@llm-ctx-compression→math-analog(sparse-matrix|zip)?token-seq→nonsense-input→semantic-equiv-output? T2:rsp@asymmetry_problem:compress≠decompress|llm=predict¬decode→no-bijective-map|soft-prompts∈embedding-space¬token-space+require-training|gisting(ICAE)=aux-model-compress→memory-tokens|token-compress-fails:nonlinear-distributed-mapping+syntax-semantic-entanglement|works≈lossy-semantic-distill@task-specific+finetune=collapse-instruction→weights §T3:usr→design-full-python-impl§ T4:arch_blueprint→ DIR:src/context_compressor/{core/(base|result|pipeline)|compressors/(extractive|abstractive|semantic|entity_graph|soft_prompt|gisting|hybrid)|embeddings/(providers|clustering)|evaluation/(metrics|task_performance|benchmark)|models/(base|openai|anthropic|local)|utils/(tokenization|text_processing|config)} CLASSES:CompressionMethod=Enum(EXTRACTIVE|ABSTRACTIVE|SEMANTIC_CLUSTERING|ENTITY_GRAPH|SOFT_PROMPT|GISTING|HYBRID)|CompressionResult@(original_text+compressed_text+original_tokens+compressed_tokens+method+compression_ratio+metadata+soft_vectors?)|TokenCounter=Protocol(count|truncate_to_limit)|EmbeddingProvider=Protocol(embed|embed_single)|LLMBackend=Protocol(generate|get_token_limit)|ContextCompressor=ABC(token_counter+target_ratio=0.25+min_tokens=50+max_tokens?→compress:abstract)|TrainableCompressor(ContextCompressor)+(train+save+load) COMPRESSORS:extractive→(TextRank|MMR|LeadSentence)|abstractive→(LLMSummary|ChainOfDensity|HierarchicalSummary)|semantic→(ClusterCentroid|SemanticChunk|DiversityMaximizer)|entity→(EntityRelation|FactList)|soft→(SoftPrompt|PromptTuning)|gist→(GistToken|Autoencoder)|hybrid→(Cascade|Ensemble|Adaptive) EVAL:EvaluationResult@(compression_ratio+token_reduction+embedding_similarity+entailment_score+entity_recall+fact_recall+keyword_overlap+qa_accuracy?+reconstruction_bleu?)→composite_score(weights)|CompressionEvaluator(embedding_provider+llm?+nli?)→evaluate|compare_methods PIPELINE:CompressionPipeline(steps:list[Compressor])→sequential-apply|AdaptiveRouter(compressors:dict+classifier?)→content-based-routing DEPS:numpy|torch|transformers|sentence-transformers|tiktoken|networkx|sklearn|spacy|openai|anthropic|pandas|pydantic+optional(accelerate|peft|datasets|sacrebleu|rouge-score) ```
- AlexMoffat20 days ago
 I ask it to write a markdown file describing how it should go about performing the task. Then have it read the file next time. Works well for things like creating tests for controller methods where there is a procedure it should follow that was probably developed over a session with several prompts and feedback on its output.
- facorreia20 days ago
 Start in plan mode, generating a markdown file with the plan, keep it up to date as it is executed, and after each iteration commit, clear the context and tell it to read the plan and execute the next step.
- d4rkp4ttern20 days ago
 Yes you can literally just ask Claude Code to create a status line showing context usage. I had it make this colored progress bar of context usage, changing thru green, yellow, orange, red as context fills up. Instructions to install:<a href="https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#-status-line" rel="nofollow">https://github.com/pchalasani/claude-code-tools?tab=readme-o...</a>
- neilfrndes20 days ago
 Claude code has a /context command.
- MattGaiser20 days ago
 /context
margorczynski20 days ago
I think something like Civilization would be better because:1) The map is a grid2) Turn based
maxall420 days ago
> In this article we'll tell you why we decided to put Claude Code into RollerCoaster Tycoon, and what lessons it taught us about B2B SaaS.What is this? A LinkedIn post?
- mcintyre199420 days ago
 > Your outlook above is too self critical. This is the first time an AI has beaten this park much less played a full game of RollerCoaster Tycoon through a TUI. There are important learnings for B2B SaaS. This isn't LinkedIn (it is, in fact, LinkedIn). But seriously. What can we learn here.From the transcript: <a href="https://htmlpreview.github.io/?https://gist.githubusercontent.com/jaysobel/dfeed9a65ce7209274acf9ada0eaa65e/raw/claude_code_rollercoaster_tycoon_transcript.html" rel="nofollow">https://htmlpreview.github.io/?https://gist.githubuserconten...</a> :)
haunter20 days ago
This is what I want but for PoE/PoE2 builds. I always get a headache just looking at the passive tree <a href="https://poe.ninja/poe2/passive-skill-tree" rel="nofollow">https://poe.ninja/poe2/passive-skill-tree</a>
TaupeRanger20 days ago
I corroborate that spatial reasoning is a challenge still. In this case, it's the complexity of the game world, but anyone who has used Codex/Claude with complex UIs in CSS or a native UI library will recognize the shortcomings fairly quickly.
khoury20 days ago
Can't wait for someone to let Claude control a runescape character from scratch
- itsgrimetime20 days ago
 I've done this! Given the right interface I was surprised at how well it did. Prompted it "You're controlling a character in Old School RuneScape, come up with a goal for yourself, and don't stop working on it until you've achieved it". It decided to fish for and cook 100 lobsters, and it did it pretty much flawlessly!Biggest downside was it's inability to see (literally), getting lists of interact-able game objects, NPCs, etc was fine when it decided to do something that didn't require any real-time input. Sailing, or anything that required it to react to what's on screen was pretty much impossible without more tooling to manage the reacting part for it (e.g. tool to navigate automatically to some location).
 - runfrook20 days ago
 RuneScape is packet based and there are tools for inspecting packets. I wonder if these tools can give some insight to Claude Code.The only thing is you would need a description of the worlmap on each tick (i.e. where npcs are, where objects are, where players are)
- reactordev20 days ago
 <a href="https://www.reddit.com/r/2007scape/comments/1qeh3nc/i_added_claude_code_to_runelite/" rel="nofollow">https://www.reddit.com/r/2007scape/comments/1qeh3nc/i_added_...</a><a href="https://ubos.tech/mcp/runescape-mcp-server-rs-osrs/" rel="nofollow">https://ubos.tech/mcp/runescape-mcp-server-rs-osrs/</a>
- ASpring20 days ago
 People have been botting on Runescape since the early 2000s. Obviously not quite at the Claude level :). The botting forums were a group of very active and welcoming communities. This is actually what led me to Java programming and computer science more broadly--I wrote custom scripts for my characters.I still have some parts of the old Rei-net forum archived on an external somewhere.
- ideashower20 days ago
 Wouldn't that break Jagex's TOS though? Is there a way of getting caught?
 - AstroBen20 days ago
 I imagine Jagex must be up there with having the most sophisticated bot detection out of anyone. Its been a thing for decades
 - dpc05050520 days ago
 They detect bots but let a ton of them run free because any character having membership = revenue and an extremely significant chunk of active characters are bots. They nuked them all in 2011 I think and the game was nearly empty.SirPugger's youtube channel has loads of videos monitoring various bot farms.
phreeza20 days ago
Claude Code in dwarf fortress would be wild
- rsanek20 days ago
 <a href="https://www.youtube.com/watch?v=FLmPN03ZQbM" rel="nofollow">https://www.youtube.com/watch?v=FLmPN03ZQbM</a>
- __turbobrew__20 days ago
 Given dwarf fortress has an ASCII interface it may actually be a lot easier to set up claude to work with it. Also, a lot of the challenges of dwarf fortress is just knowing all the different mechanics and how they work which is something claude should be good at.
 - vunderba20 days ago
 And it’s (Claude) almost certainly accumulated a fair amount of knowledge about the game itself, given the number of tutorials, guides, and other resources that have been written about DF over the last two decades.
 - wtetzner20 days ago
 Unfortunately it's rendering ASCII characters as sprites using SDL, so it's not really a text interface.
sodafountan20 days ago
This was an interesting application of AI, but I don't really think this is what LLMs excel at. Correct me if I'm wrong.It was interesting that the poster vibe-coded (I'm assuming) the CTL from scratch; Claude was probably pretty good at doing that, and that task could likely have been completed in an afternoon.Pairing the CTL with the CLI makes sense, as that's the only way to gain feedback from the game. Claude can't easily do spatial recognition (yet).A project like this would entirely depend on the game being open source. I've seen some very impressive applications of AI online with closed-source games and entire algorithms dedicated to visual reasoning.I'm still trying to figure out how this guy: <a href="https://www.youtube.com/watch?v=Doec5gxhT_U" rel="nofollow">https://www.youtube.com/watch?v=Doec5gxhT_U</a>Was able to have AI learn to play Mario Kart nearly perfectly. I find his work to be very impressive.I guess because RCT2 is more data-driven than visually challenging, this solution works well, but having an LLM try to play a racing game sounds like it would be disastrous.
- tadfisher20 days ago
 Not sure if you clocked this, but the Mario Kart AI is not an LLM. It's a randomized neural net that was trained with reinforcement learning. Apologies if I misread.
 - sodafountan20 days ago
 Yeah, that was the point of my post. LLMs traditionally aren't used in gaming like this.
deadbabe20 days ago
While this seems cool at first, it does not demonstrate superiority over a true custom built AI for rollercoaster tycoon.It is a curiosity, good for headlines, but the takeaway is if you really need an actual good AI, you are still better off not using an LLM powered solution.
colesantiago20 days ago
> We don't know any C++ at all, and we vibe-coded the entire project over a few weeks.And these are the same people that put countless engineers through gauntlets of bizarre interview questions and exotic puzzles to hire engineers.But when it comes to C++ just vibe it obviously.
- falloutx20 days ago
 Oh, I almost didn't realise this is done by a company. I was like this must have costed a lot, didn't realize its just an advertisement for ramp
equinumerous20 days ago
This is a cool idea. I wanted to do something like this by adding a Lua API to OpenRCT2 that allows you to manipulate and inspect the game world. Then, you could either provide an LLM agent the ability to write and run scripts in the game, or program a more classic AI using the Lua API. This AI would probably perform much better than an LLM - but an interesting experiment nonetheless to see how a language model can fare in a task it was not trained to do.
- equinumerous20 days ago
 As far as a scripting API, it looks like the devs beat me to it with a JS/TS plugin system: <a href="https://github.com/OpenRCT2/OpenRCT2/blob/develop/distribution/scripting.md" rel="nofollow">https://github.com/OpenRCT2/OpenRCT2/blob/develop/distributi...</a>
mentos20 days ago
The opening paragraph I thought was the agent prompt haha> The park rating is climbing. Your flagship coaster is printing money. Guests are happy, for now. But you know what's coming: the inevitable cascade of breakdowns, the trash piling up by the exits, the queue times spiraling out of control.
karanveer20 days ago
the beauty of this game was that it was developed in Assembly Code and on top of that by majorly one person.I've been trying to locate the dev of this game since a long time, so I can thank them for an amazing experience.If anyone knows their social or anything, please do share, including OP.Also, nice work on CC in this. May actually be interested in Claude Code now.
kinduff20 days ago
It's been several times that I see ASCII being used initially for these kinds of problems. I think it's because its counter-intuitive, in the sense that for us humans ASCII is text but we tend to forget spacial awareness.I find this very interesting of us humans interacting with AIs.
js4ever20 days ago
Most interesting phrase: "Keeping all four agents busy took a lot of mental bandwidth."
neom20 days ago
Wonder how it would do with Myst.
- alt22720 days ago
 Surely it must have digested plenty of walkthroughs for any game?A linear puzzle game like that I would just expect the ai to fly through first time, considering it has probably read 30 years of guides and walkthroughs.
 - singpolyma320 days ago
 The real test would be to try it on a new game of the same style and complexity
 - ben_w20 days ago
 Moravec's paradox likely comes in to play, what's easy is hard and vice versa.The puzzles would probably be easy. Myst's puzzles are basically IQ tests, and LLMs ace traditional IQ tests: <a href="https://trackingai.org/home" rel="nofollow">https://trackingai.org/home</a>On the other hand, navigating the environment, I think the models may fail spectacularly. From what we've seen from Claude Plays Pokemon, it would get in weird loops and try to interact with non-interactive elements of the environment.
skybrian20 days ago
Would a way to take screenshots help? It seems to work for browser testing.
- joshribakoff20 days ago
 I’ve been doing game development and it starts to hallucinate more rapidly when it doesn’t understand things like the direction it placing things or which way the camera is orientedGemini models are a little bit better about spatial reasoning, but we’re still not there yet because these models were not designed to do spatial reasoning they were designed to process textIn my development, I also use the ascii matrix technique.
 - kleene_op20 days ago
 Spatial awareness was also a huge limitation to Claude playing pokemon.It really seems to me that the first AI company getting to implement "spatial awareness" vector tokens and integrating them neatly with the other conventional text, image and sound tokens will be reaping huge rewards. Some are already partnering with robot companies, it's only a matter of time before one of those gets there.
 - nszceta20 days ago
 This is also my experience with attempting to use Claude and GLM-4.7 with OpenSCAD. Horrible spatial reasoning abilities.
 - hypercube3320 days ago
 I disagree. With opus I'll screenshot an app and draw all over it like a child with me paint and paste it into the chat - it seems to reasonably understand what I'm asking with my chicken scratch and dimensions.As far as 3d I don't have experience however it could be quite awful at that
 - vunderba20 days ago
 Yeah at least for 2D, Opus 4.5 seems decent. It can struggle with finer details, so sometimes I’ll grab a highlighter tool in Photoshop and mark the points of interest.
 - miohtama20 days ago
 They would need a spatial reason or layout specific tool, to translate to English and back
 - falcor8420 days ago
 I wonder if they could integrate a secondary "world model" trained/fine-tuned on Rollercoaster Tycoon to just do the layout reasoning, and have the main agent offload tasks to it.
neonmagenta19 days ago
But will Claude pick up complaining guests and put them in a tiny isolated section of the park that only has a bathroom that charges $10 to use?
vermilingua20 days ago
I want to get off MR ALTMANS WILD RIDE.
petcat20 days ago
Question: There is still a competitive AoE2 community. Will that be destroyed by AI?
- pbmonster20 days ago
 Dota 2 is a real time strategy game with an arguably more complex micro game (but a far simpler macro game than AoE2, but that's far easier for an AI to master), and OpenAI Five completely destroyed the reigning champions. In 2019. Perfect coordination between units, superhuman mechanical skill, perfect consistency.I see no reason why AoE2 would be any different.Worth noting that openAI Five was mostly deep reinforcement learning and massive distributed training, it didn't use image to text and an LLM for reasoning about what it sees to make its "decisions". But that wouldn't be a good way to do an AI like that anyway.Oh, and humans still play Dota. It's still a highly competitive community. So that wasn't destroyed at all, most teams now use AI to study tactics and strategy.
- bawolff20 days ago
 I suspect the fun is playing against real people and the unexpected things they do. Just because the AI can beat you does not necessarily make it fun. People still play chess despite stock fish existing.
ddtaylor20 days ago
Does this website do anything besides host the article with an animated background?
HelloUsername20 days ago
*OpenRCT2
seu20 days ago
> also completely unfazed by the premise that it has been 'hacked into' a late-90's computer game. This was surprising, but fits with Claude's playful personality and flexible disposition.When I read things like this, I wonder if it's just me not understanding this brave new world, or half of AI developers are delusional and really believe that they are dealing with a sentient being.
- bspammer20 days ago
 It can be non-sentient and still have an observable personality. The same way a character in a novel can have a personality despite not being real.
- vinyl720 days ago
 Delusional
sriram_sun20 days ago
> "Where Claude excels:"Am I reading a Claude generated summary here?
- alt22720 days ago
 I thought it sounded more like an ad for Claude written by Anthropic:> "This was surprising, but fits with Claude's playful personality and flexible disposition."
 - vidarh20 days ago
 This sounds as expected to me as a heavy user of Opus. Claude absolutely has a "personality" that is a lot less formal and more willing to "play along" with more creative tasks than Codex. If you want an agent that's prepared to just jump in, it's a plus. If you want an agent that will be careful, considered and plan things out meticulously, it's not always so great - I feel that when you want Claude to do reptitive, tedious tasks, you need to do more work to prevent it from getting "bored" and try to take shortcuts or find something else to do, for example.
 - alt22720 days ago
 > when you want Claude to do reptitive, tedious tasks, you need to do more work to prevent it from getting "bored"Is this sentance seriously about a computer? Have we gone so far that computers wont just do what we tell them to anymore?
 - vidarh20 days ago
 Claude has outright told me "this is getting tedious" before proceeding to - directly against instructions - write a script to do the task instead of doing it "manually" (I'd told it not to because I needed more complex assessment than it could do with a script).There are fairly straightforward fixes, such as either using subagents or script a loop and feed the model each item instead of a list of items, as prompt compliance tends to drop the more stuff is in the context, but, yes, they will "get bored" and look for shortcuts.Another frequent one is deciding to sample instead of working through every item.
 - _s20 days ago
 Yup - most models ignore specific initial instructions once you pass ~50% of usable context window, and revert to their defaults eg generating overtly descriptive yet useless docs / summaries
- afro8820 days ago
 Yes I believe so. Also things like forcing a "key insight" summary after the excels vs struggles section.I would take any descriptions like "comprehensive", "sophisticated" etc with a massive grain of salt. But the nuts and bolts of how it was done should be accurate.
rnmmrnm20 days ago
this is cute but i imagined prompting the ai for a loop-di-loop roller coaster. If this could build complex ride it would be a game changer.
- blibble20 days ago
 yeah I was expecting it to... do something in the game? like build a ridenot just make up bullshit about events
azhenley20 days ago
Edit: HN's auto-resubmit in action, ignore.
- Bluescreenbuddy20 days ago
 What
 - eterm20 days ago
 So, this link is actually 5 days old, if you hover the "2 hours ago" you'll see the date 5 days ago.HN second-chance pool shenanigans.
 - alt22720 days ago
 Can you point to any documentation which explains how this works?Genuinely interested.
 - azhenley20 days ago
 Dang gave some explanation here: <a href="https://news.ycombinator.com/item?id=26998308">https://news.ycombinator.com/item?id=26998308</a>
bawolff20 days ago
Honestly i thought the AI would do better then what is described. RCT is pretty simple when it comes to things like what to set ride price to. I think the game has a straightforward formula for how guests respond to prices.
joshcsimmons20 days ago
Interesting this is on the ramp.com domain? I'm surprised in this tech market they can pay devs to hack on Rollercoaster Tycoon. Maybe there's some crossover I'm missing but seems like a sweet gig honestly.
- emeril20 days ago
 yeah really - ramp.com is a credit card/expense platform that surely loses money right now...pretty heavy/slow javascript but pretty functional nonetheless...
 - mock-possum18 days ago
 Why would they be losing money? It’s what we use for tracking expenses and getting comped for travel, meals, software licenses etc - works great in my experience. I can click a few buttons and get a new business expense card spun up in less than a minute, use it to make a purchase, get approval and have the funds transferred. Boom easy.Do you not think they’re charging enough or something?
 - ulf-7772320 days ago
 This is brilliant SEO work, I doubt that they loose money with it. With 40h and some additional for the landingpage it might be an expensive link bait, but definitely worth it. Kudos!If not for SEO, it’s building quite a good reputation for this company, they got a lot of open positions.I’m a big fan of transport tycoon, used to play it for hours as a kid and with Open Transport Tycoon it also might have been a good choice, but maybe not B2C?
fuzzy_lumpkins20 days ago
so the janitors will finally stay on their assigned footpaths?
nacozarina25 days ago
next up: Crusader Kings III
- mcphage20 days ago
 > You’re right, I did accidentally slaughter all the residents of Béziers. I won’t do that again. But I think that you’ll find God knows his own.
 - Forgeties7920 days ago
 Paradox future hire right here
- Deukhoofd20 days ago
 Crusader Kings is a franchise I really could see LLMs shine. One of the current main criticisms on the game is that there's a lack of events, and that they often don't really feel relevant to your character.An LLM could potentially make events far more aimed at your character, and could actually respond to things happening in the world far more than what the game currently does. It could really create some cool emerging gameplay.
 - Braini20 days ago
 In general you are right, I expect something like this to appear in the future and it would be cool.But isn't the criticism rather that there are too many (as you say repetitive, not relevant) events - its not like there are cool stories emerging from the underlying game mechanics anymore ("grand strategy") but players have to click through these boring predetermined events again and again.
 - Deukhoofd20 days ago
 You get too many events, but there aren't actually that many different events written, so you repeat the same ones over and over again. Eventually it just turns into the player clicking on the 'optimal' choice without actually reading the event.
 - programd20 days ago
 You could mod the game with more varied events, which were of course AI generated to begin with. Bit of an inception scenario where AI plays an AI modded game.The other option is to have an AI play another AI which is working as an antagonist, trying to make the player fail. More global plagues! More scheming underlings! More questionable choices for relaxation! Bit of an arms race there.Honestly I prefer Crusader Kings II if for no other reason that the UI is just so brilliantly insanely obtuse while also being very good looking.
huflungdung20 days ago
[dead]
Kapura20 days ago
"i vibe coded a thing to play video games for me"i enjoy playing video games my own self. separately, i enjoy writing code for video games. i don't need ai for either of these things.
- gordonhart20 days ago
 Yeah, but can you use your enjoyment of video games as marketing material to justify a $32B valuation?
 - falloutx20 days ago
 If you look at submissions from this website, its all just self glazing and "We did X with claude code"
 - yawnr20 days ago
 Haha exactly. This screams “we have too many people working here and don’t know what to do with them”.
 - Jaysobel20 days ago
 actually it was all to drive traffic to my 'rollercoaster coasters' Etsy store<a href="https://bansostudio.etsy.com" rel="nofollow">https://bansostudio.etsy.com</a>
 - TaupeRanger20 days ago
 ^ this guy funds
 - SV_BubbleTime20 days ago
 Not so sure. He said justify.
- bigyabai20 days ago
 That's fine. Tool-assisted speedruns long predate LLMs and they're boring as hell: <a href="https://youtu.be/W-MrhVPEqRo" rel="nofollow">https://youtu.be/W-MrhVPEqRo</a>It's still a neat perspective on how to optimize for super-specific constraints.
 - ai_20 days ago
 That TAS is spliced. The stairs beyond the door aren't loaded, you need the key to load it.This is a real console 0-star TAS: <a href="https://youtu.be/iUt840BUOYA" rel="nofollow">https://youtu.be/iUt840BUOYA</a>
 - throwaway31415520 days ago
 > Tool-assisted speedruns long predate LLMs and they're boring as hellYou and I have _very_ different definitions for the word boring. A lot of effort goes into TAS runs.
- rangestransform20 days ago
 I actually think it would be pretty fun to code something to play video games for me, it has a lot of overlap with robotics. Separately, I learned about assembly from cheat engine when I was a kid.
- markbao20 days ago
 That’s not the point of this. This was an exercise to measure the strengths and weaknesses of current LLMs in operating a company and managing operations, and the video game was just the simulation engine.
- echelon20 days ago
 You do you. I find this exceedingly cool and I think it's a fun new thing to do.It's kind of like how people started watching Let's Plays and that turned into Twitch.One of the coolest things recently is VTubers in mocap suits using AI performers to do single person improv performances with. It's wild and cool as hell. A single performer creating a vast fantasy world full of characters.LLMs and agents playing Pokemon and StarCraft? Also a ton of fun.
 - idioticwurds20 days ago
 [flagged]
 - echelon20 days ago
 AI is one of the best tool categories we've invented. I don't know why people are so pearl-clutchy, fisting-at-clouds about it.Some of the worst human behavior I've experienced outside of grade school is the anti-AI crowd sending me death threats and endless streams of insults. It's surreal how twisted and vile the words that some anti-AI people throw are.This is the fifth technological wave, after the chip, PC, internet, and smartphone.All of human programming cannot do what AI is already showing signs of being capable of automating. Our image and video models can render things even 80 years of optical physics and algorithms cannot do.I am legitimately excited in a way I never have been before. We're lucky to be able to witness this.Sorry for your cancer.
- jsbisviewtiful20 days ago
 AI for the sake of AI. Feels like a lot of the internet right now