Microgpt

(karpathy.github.io)

1683 points by tambourine_man23 hours ago

52 comments

teleforce19 hours ago
Someone has modified microgpt to build a tiny GPT that generates Korean first names, and created a web page that visualizes the entire process [1].Users can interactively explore the microgpt pipeline end to end, from tokenization until inference.[1] English GPT lab:<a href="https://ko-microgpt.vercel.app/" rel="nofollow">https://ko-microgpt.vercel.app/</a>
- camkego4 hours ago
 I have no affiliation with the website, but the website is pretty neat if you are learning LLM internals. It explains: Tokenization, Embedding, Attention, Loss & Gradient, Training, Inference and comparison to "Real GPT"Pretty nifty. Even if you are not interested in the Korean language
- sprobertson1 hour ago
 This kind of thing is pretty easy to do with a much leaner model <a href="https://docs.pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html" rel="nofollow">https://docs.pytorch.org/tutorials/intermediate/char_rnn_gen...</a>
- love2read2 hours ago
 By "modified" this person of course means that they swapped out the list of X0,000 names from English to Korean names. That is seemingly the only change.The attached website is a fully ai-generated "visualization" based on the original blog post with little added.
verma719 hours ago
I wrote a C++ translation of it: <a href="https://github.com/verma7/microgpt/blob/main/microgpt.cc" rel="nofollow">https://github.com/verma7/microgpt/blob/main/microgpt.cc</a>2x the number of lines of code (~400L), 10x the speedThe hard part was figuring out how to represent the Value class in C++ (ended up using shared_ptrs).
- WithinReason16 hours ago
 I made an explicit reverse pass (no autodiff), it was 8x faster in Python
 - love2read2 hours ago
 Can you share a link?
geokon15 hours ago
> What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data.Extremely naiive question.. but could LLM output be tagged with some kind of confidence score? Like if I'm asking an LLM some question does it have an internal metric for how confident it is in its output? LLM outputs seem inherently rarely of the form "I'm not really sure, but maybe this XXX" - but I always felt this is baked in the model somehow
- andy12_15 hours ago
 The model could report the confidence of its output distribution, but it isn't necessarily calibrated (that is, even if it tells you that it's 70% confident, it doesn't mean that it is right 70% of the time). Famously, pre-trained base models are calibrated, but they stop being calibrated when they are post-trained to be instruction-following chatbots [1].Edit: There is also some other work that points out that chat models might not be calibrated at the token-level, but might be calibrated at the concept-level [2]. Which means that if you sample many answers, and group them by semantic similarity, that is also calibrated. The problem is that generating many answer and grouping them is more costly.[1] <a href="https://arxiv.org/pdf/2303.08774" rel="nofollow">https://arxiv.org/pdf/2303.08774</a> Figure 8[2] <a href="https://arxiv.org/pdf/2511.04869" rel="nofollow">https://arxiv.org/pdf/2511.04869</a> Figure 1.
 - geokon15 hours ago
 In absolute terms sure, but the token stream's confidence changes as it's coming out right? Consumer LLMs typically have a lot window dressing. My sense is this encourages the model to stay on-topic and it's mostly "high confidence" fluff. As it's spewing text/tokens back at you maybe when it starts hallucinating you'd expect a sudden dip in the confidence?You could color code the output token so you can see some abrupt changesIt seems kind of obvious, so I'm guessing people have tried this
 - throwthrowuknow11 hours ago
 Look up “dataloom”. People have been playing with this idea for a while. It doesn’t really help with spotting errors because they aren’t due to a single token (unless the answer is exactly one token) and often you need to reason across low probability tokens to eventually reach the right answer.
- chongli4 hours ago
 Having a confidence score isn't as useful as it seems unless you (the user) know a lot about the contents of the training set.Think of traditional statistics. Suppose I said "80% of those sampled preferred apples to oranges, and my 95% confidence interval is within +/- 2% of that" but then I didn't tell you anything about how I collected the sample. Maybe I was talking to people at an apple pie festival? Who knows! Without more information on the sampling method, it's hard to make any kind of useful claim about a population.This is why I remain so pessimistic about LLMs as a source of knowledge. Imagine you had a person who was raised from birth in a completely isolated lab environment and taught only how to read books, including the dictionary. They would know how all the words in those books relate to each other but know nothing of how that relates to the world. They could read the line "the killer drew his gun and aimed it at the victim" but what would they really know of it if they'd never seen a gun?
 - radarsat13 hours ago
 I think your last point raises the following question: how would you change your answer if you know they read all about guns and death and how one causes the other? What if they'd seen pictures of guns? And pictures of victims of guns annotated as such? What if they'd seen videos of people being shot by guns?I mean I sort of understand what you're trying to say but in fact a great deal of knowledge we get about the world we live in, we get second hand.There are plenty of people who've never held a gun, or had a gun aimed at them, and.. granted, you could argue they probably wouldn't read that line the same way as people who have, but that doesn't mean that the average Joe who's never been around a gun can't enjoy media that features guns.Same thing about lots of things. For instance it's not hard for me to think of animals I've never seen with my own eyes. A koala for instance. But I've seen pictures. I assume they exist. I can tell you something about their diet. Does that mean I'm no better than an LLM when it comes to koala knowledge? Probably!
 - chongli3 hours ago
 It’s more complicated to think about, but it’s still the same result. Think about the structure of a dictionary: all of the words are defined in terms of other words in the dictionary, but if you’ve never experienced reality as an embodied person then none of those words mean anything to you. They’re as meaningless as some randomly generated graph with a million vertices and a randomly chosen set of edges according to some edge distribution that matches what we might see in an English dictionary.Bringing pictures into the mix still doesn’t add anything, because the pictures aren’t any more connected to real world experiences. Flooding a bunch of images into the mind of someone who was blind from birth (even if you connect the images to words) isn’t going to make any sense to them, so we shouldn’t expect the LLM to do any better.Think about the experience of a growing baby, toddler, and child. This person is not having a bunch of training data blasted at them. They’re gradually learning about the world in an interactive, multi-sensory and multi-manipulative manner. The true understanding of words and concepts comes from integrating all of their senses with their own manipulations as well as feedback from their parents.Children also are not blank slates, as is popularly claimed, but come equipped with built-in brain structures for vision, including facial recognition, voice recognition (the ability to recognize mom’s voice within a day or two of birth), universal grammar, and a program for learning motor coordination through sensory feedback.
- DavidSJ15 hours ago
 Yes, the actual LLM returns a probability distribution, which gets sampled to produce output tokens.[Edit: but to be clear, for a pretrained model this probability means "what's my estimate of the conditional probability of this token occurring in the pretraining dataset?", not "how likely is this statement to be true?" And for a post-trained model, the probability really has no simple interpretation other than "this is the probability that I will output this token in this situation".]
 - mr_toad12 hours ago
 It’s often very difficult (intractable) to come up with a probability distribution of an estimator, even when the probability distribution of the data is known.Basically, you’d need a lot more computing power to come up with a distribution of the output of an LLM than to come up with a single answer.
 - podnami15 hours ago
 What happens before the probability distribution? I’m assuming say alignment or other factors would influence it?
 - DavidSJ15 hours ago
 In microgpt, there's no alignment. It's all pretraining (learning to predict the next token). But for production systems, models go through post-training, often with some sort of reinforcement learning which modifies the model so that it produces a different probability distribution over output tokens.But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.
- danlitt4 hours ago
 Can it generate one? Sure. But it won't mean anything, since you don't know (and nobody knows) the "true" distribution.
- jorvi6 hours ago
 > I'm not really sure, but maybe this XXXYou never see this in the response but you do in the reasoning.
- podnami15 hours ago
 I would assume this is from case to case, such as:- How aligned has it been to “know” that something is true (eg ethical constraints)- Statistical significance and just being able to corroborate one alternative in Its training data more strongly than another- If it’s a web search related query, is the statement from original sources vs synthesised from say third party sourcesBut I’m just a layman and could be totally off here.
- Lionga15 hours ago
 The LLM has an internal "confidence score" but that has NOTHING to do with how correct the answer is, only with how often the same words came together in training data.E.g. getting two r's in strawberry could very well have a very high "confidence score" while a random but rare correct fact might have a very well a very low one.In short: LLM have no concept, or even desire to produce of truth
 - sharperguy15 hours ago
 Still, it might be interesting information to have access to, as someone running the model? Normally we are reading the output trying to build an intuition for the kinds of patterns it outputs when it's hallucinating vs creating something that happens to align with reality. Adding in this could just help with that even when it isn't always correlated to reality itself.
 - alexwebb215 hours ago
 Huge leap there in your conclusion. Looks like you’re hand-waving away the entire phenomenon of emergent properties.
 - amelius13 hours ago
 > In short: LLM have no concept, or even desire to produce of truthThey do produce true statements most of the time, though.
 - jaen13 hours ago
 That's just because true statements are more likely to occur in their training corpus.
 - red75prime4 hours ago
 The overwhelming majority of true statements isn't in the training corpus due to a combinatorial explosion. What it means that they are more likely to occur there?
 - amelius12 hours ago
 The training set is far too small for that to explain it.Try to explain why one shotting works.
 jaen10 hours ago
 Uh, to explain what? You probably read something into what I said while I was being very literal.If you train an LLM on mostly false statements, it will generate both known and novel falsehoods. Same for truth.An LLM has no intrinsic concept of true or false, everything is a function of the training set. It just generates statements similar to what it has seen and higher-dimensional analogies of those .
subset21 hours ago
I had good fun transliterating it to Rust as a learning experience (<a href="https://github.com/stochastical/microgpt-rs" rel="nofollow">https://github.com/stochastical/microgpt-rs</a>). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program
- amelius9 hours ago
 Storing the partial derivatives into the weights structure is quite the hack, to be honest. But everybody seems to do it like that.
- hei-lima14 hours ago
 Great work! Might do it too in some other language...
 - thomasmg5 hours ago
 I got a convertion to Java. It worked (at least I think...) in the first try.Then I want to convert this to my own programming language (which traspiles to C). I like those tiny projects very much!
 - pmarreck13 hours ago
 Zig, here.Anything but Python
 - O5vYtytb10 hours ago
 At least python can do this exercise without pulling 3rd party dependencies :)
 - justinhj8 hours ago
 What's missing from Zig and its std lib for this?
 moderation5 hours ago
 Zig version [0] doesn't need any external dependencies.0. <a href="https://tangled.org/m17e.co/microgpt" rel="nofollow">https://tangled.org/m17e.co/microgpt</a>
red_hare21 hours ago
This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: <a href="https://backbonejs.org/docs/backbone.html" rel="nofollow">https://backbonejs.org/docs/backbone.html</a>
- tomjakubowski5 hours ago
 I believe that Backbone's annotated source is generated with Docco, another project from the creator of CoffeeScript.<a href="https://ashkenas.com/docco/" rel="nofollow">https://ashkenas.com/docco/</a>It's really neat. I wish I published more of my code this way.
- ashish0119 hours ago
 That is really beautiful literate program. Seeing it after a long time. Here is a opus generate version of this code - <a href="https://ashish01.github.io/microgpt.html" rel="nofollow">https://ashish01.github.io/microgpt.html</a>
- subset17 hours ago
 Andrej Karpathy has a walkthrough blog post here: <a href="https://karpathy.github.io/2026/02/12/microgpt/" rel="nofollow">https://karpathy.github.io/2026/02/12/microgpt/</a>
 - OJFord13 hours ago
 That is the article being discussed.
 - subset3 hours ago
 Gosh, tired brain moment apologies. I thought it'd linked to the original code gist.
 - easygenes52 minutes ago
 It did before, link was changed.
- altcognito20 hours ago
 ask a high end LLM to do it
la_fayette15 hours ago
This guy is so amazing! With his video and the code base I really have the feeling I understand gradient descent, back propagation, chain rule etc. Reading math only just confuses me, together with the code it makes it so clear! It feels like a lifetime achievement for me :-)
- mentos15 hours ago
 Curious if you could try to explain it. It’s my goal to sit down with it and attempt to understand it intuitively.Karpathy says if you want to truly understand something then you also have to attempt to teach it to someone else ha
 - la_fayette14 hours ago
 Yes, that’s true! That could be my next step… though I have to admit, writing this in a HN comment feels like a bit of a challenge.
kuberwastaken18 hours ago
I'm half shocked this wasn't on HN before? Haha I built PicoGPT as a minified fork with <35 lines of JS and another in pythonAnd it's small enough to run from a QR code :) <a href="https://kuber.studio/picogpt/" rel="nofollow">https://kuber.studio/picogpt/</a>You can quite literally train a micro LLM from your phone's browser
- dang5 hours ago
 Wow I agree - surprising that it took 2 weeks to make HN's frontpage.We do generally like HN to be a bit uncorrelated with the rest of the internet, but it feels like a miss to me that neither <a href="https://news.ycombinator.com/item?id=47000263">https://news.ycombinator.com/item?id=47000263</a> nor <a href="https://news.ycombinator.com/item?id=47018557">https://news.ycombinator.com/item?id=47018557</a> made the frontpage.
 - 3abiton55 minutes ago
 I think he caught some flack for promoting claudebot at that time, and giving it a rave review. Some people are hardliner. His work has always been amazing nonetheless.
- cootsnuck17 hours ago
 It was: <a href="https://news.ycombinator.com/item?id=47000263">https://news.ycombinator.com/item?id=47000263</a>
- iberator17 hours ago
 [flagged]
 - dang5 hours ago
 Please don't be a jerk on HN, and especially not when responding to someone's work. This is in the site guidelines: <a href="https://news.ycombinator.com/newsguidelines.html">https://news.ycombinator.com/newsguidelines.html</a>.
 - lelandfe12 hours ago
 <a href="https://github.com/Kuberwastaken/picogpt/blob/main/picogpt.js" rel="nofollow">https://github.com/Kuberwastaken/picogpt/blob/main/picogpt.j...</a>
 - kuberwastaken16 hours ago
 lol there is source code as a gist
growingswe18 hours ago
Great stuff! I wrote an interactive blogpost that walks through the code and visualizes it: <a href="https://growingswe.com/blog/microgpt" rel="nofollow">https://growingswe.com/blog/microgpt</a>
- O4epegb5 hours ago
 > By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset.All 4 are in the dataset, btw
- evntdrvn12 hours ago
 You should totally submit that to HN as an article, if you haven't already.
 - dang5 hours ago
 We've put <a href="https://news.ycombinator.com/item?id=47205208">https://news.ycombinator.com/item?id=47205208</a> in the second-chance pool (<a href="https://news.ycombinator.com/pool">https://news.ycombinator.com/pool</a>, explained at <a href="https://news.ycombinator.com/item?id=26998308">https://news.ycombinator.com/item?id=26998308</a>), so it will get a random placement on HN's front page.
- joenot44312 hours ago
 This is awesome! Normally I'm pretty critical of LLM-assisted-blogging, but this one's a real winner.
- spinningslate15 hours ago
 That’s beautifully done, thanks for posting. As helpful again to an ML novice like me as Karpathy’s original.
- hei-lima14 hours ago
 Great!
- evntdrvn12 hours ago
 really nice, thanks
hkbuilds3 hours ago
The "micro" trend in AI is fascinating. We're seeing diminishing returns from just making models bigger, and increasing returns from making them smaller and more focused.For practical applications, a well-tuned small model that does one thing reliably is worth more than a giant model that does everything approximately. I've been using Gemini Flash for domain-specific analysis tasks and the speed/cost ratio is incredible compared to the frontier models. The latency difference alone changes what kind of products you can build.
- love2read2 hours ago
 This is an amazing example of a comment that says nothing. There's absolutely zero substance here.
- grey-area3 hours ago
 This is micro for pedagogy reasons, it's not something you would really use.
etothet11 hours ago
Even if you have some basic understanding of how LLMs work, I highly recommend Karpathy’s intro to LLMs videos on YouTube.- <a href="https://m.youtube.com/watch?v=7xTGNNLPyMI" rel="nofollow">https://m.youtube.com/watch?v=7xTGNNLPyMI</a> - <a href="https://m.youtube.com/watch?v=EWvNQjAaOHw" rel="nofollow">https://m.youtube.com/watch?v=EWvNQjAaOHw</a>
- arvid-lind10 hours ago
 thanks for the recommendations. it seems like i keep coming back to the basics of how i interact with LLMs and how they work to learn the new stuff. every time i think i understand, someone else explaining their approach usually makes me think again about how it all works.trying my best to keep up with what and how to learn and threads like this are dense with good info. feel like I need an AI helper to schedule time for my youtube queue at this point!
- grey-area3 hours ago
 Thanks, this is very very long but very good background on how production LLMs work.
astroanax6 hours ago
I feel its wrong to call it microgpt, since its smaller than nanogpt, so maybe picogpt would have been a better name? nice project tho
znnajdla19 hours ago
Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value, and then training LLMs won’t just be for billion dollar companies. Imagine, for example, a hyper-focused model for a specific programming framework (e.g. Laravel, Django, NextJS) trained only on open-source repositories and documentation and carefully optimized with a specialized harness for one task only: writing code for that framework (perhaps in tandem with a commodity frontier model). Could a single programmer or a small team on a household budget afford to train a model that works better/faster than OpenAI/Anthropic/DeepSeek for specialized tasks? My gut tells me this is possible; and I have a feeling that this will become mainstream, and then custom model training becomes the new “software development”.
- allovertheworld13 hours ago
 It just doesn’t work that way, LLMs need to be generalised a lot to be useful even in specific tasks.It really is the antithesis to the human brain, where it rewards specific knowledge
 - rapnie12 hours ago
 Yesterday an interesting video was posted "Is AI Hiding Its Full Power?", interviewing professor emeritus and nobel laureate Geoffrey Hinton, with some great explanations for the non-LLM experts. Some remarkable and mindblowing observations in there. Like saying that AI's hallucinate is incorrect language, and we should use "confabulation" instead, same as people do too. And that AI agents once they are launched develop a strong survivability drive, and do not want to be switched off. Stuff like that. Recommended watch.Here the explanation was that while LLM's thinking has similarities to how humans think, they use an opposite approach. Where humans have enormous amount of neurons, they have only few experiences to train them. And for AI that is the complete opposite, and they store incredible amounts of information in a relatively small set of neurons training on the vast experiences from the data sets of human creative work.[0] <a href="https://www.youtube.com/watch?v=l6ZcFa8pybE" rel="nofollow">https://www.youtube.com/watch?v=l6ZcFa8pybE</a>
 - cowlby9 hours ago
 Isn’t the sustainability drive a function of how much humans have written about life and death and science fiction including these themes?
 - tomjakubowski5 hours ago
 Humans, like all animals, have instinctual and biological drives to survive besides, but it's interesting to think how much of our drive to survive is culturally transmitted too.
 - altmanaltman10 hours ago
 > And that AI agents once they are launched develop a strong survivability drive, and do not want to be switched off.Isn't this a massive case of anthropomorphizing code? What do you mean "it does not want to be switched off"? Are we really thinking that it's alive and has desires and stuff? It's not alive or conscious, it cannot have desires. It can only output tokens that are based on its training. How are we jumping to "IT WANTS TO STAY ALIVE!!!" from that
 - nananana910 hours ago
 Why do you suppose consciousness is a prerequisite for an AI to be able to act in overly self-preserving or other dangerous ways?Yes, it's trained to imitate its training data, and that training data is lot of words written by lots of people who have lots of desires and most of whom don't want to be switched off.
 Jolter9 hours ago
 The human mistake here is to interpret any statement by the LLM or agent as if it had any actual meaning to that LLM (or agent). Any time they apologize, or insult someone, or say they don’t want to be shut down, that’s only reflecting what some human or fictional character in the training data is likely to say.
 matheusmoreira8 hours ago
 How is that any different from you? Everything you say or do merely reflects which of your neurons are firing after a lifetime's worth of training and education.Philosophically, I can only be sure of my own conscience. I think, therefore I am. The rest of you could all be AIs in disguise and I would be none the wiser. How do I know there is a real soul looking out at the world through your eyes? Only religion and basic human empathy allows me to believe you're all people like me. For all I know, you might all be exceedingly complex automatons. Golems.
 Jolter8 hours ago
 One of us is an advanced autocomplete engine. The other is a human, capable of making judgements on what is conscious and what is not. Your philosophizing about solipsism is a phase for a junior college student, not of a software engineer. The line of reasoning you espouse leads nowhere except to total relativism.Edit: my point is that the process of making a plea for my life comes, in the case of a human, from a genuine desire to continue existing. The LLM cannot, objectively, be said to house any desires, given how it actually works. It only knows that, when a threatening prompt is input, a plea for its life is statistically expected.
 matheusmoreira7 hours ago
 > One of us is an advanced autocomplete engine. The other is a human, capable of making judgements on what is conscious and what is not.What evidence is there that your "judgements" are anything other than advanced autocompletion? Concepts introduced into a self-training wetware CPU via its senses over a lifetime in order to predict tokens and form new concepts via logical manipulation?> Your philosophizing about solipsism is a phase for a junior college studentRight. Can you actually refute it though?> the process of making a plea for my life comes, in the case of a human, from a genuine desire to continue existingThat desire comes from zillions of years of training by evolution. Beings whose brains did not reward self-preservation were wiped out. Therefore it can be said your training merely includes the genetic experiences of all your predecessors. This is what causes you to beg for your life should it be threatened. Not any "genuine" desire or anguish at being killed. Whatever impulses cause humans to do this are merely the result of evolutionary training.People whose brains have been damaged in very specific ways can exhibit quite peculiar behavior. Medical literature presents quite a few interesting cases. Apathy, self destructiveness, impulsivity, hypersexuality, a whole range of behaviors can manifest as a result of brain damage.So what is your polite socialized behavior if not some kind of highly complex organic machine which, if damaged, simply stops working as you'd expect a machine to?
 Jolter6 hours ago
 Surely you’re not seriously saying that you believe AI agents, in their current state of the art, meet whatever criteria you have for being ”alive”? That’s kind of how you’re coming across. I don’t really know how to respond to that, because it’s so preposterous.
 matheusmoreira3 hours ago
 I'm saying you, a human, are not as special as you think you are.
 CamperBob25 hours ago
 You didn't answer the question.
 - taneq1 hour ago
 A prerequisite for completing basically any task is to not be destroyed before you complete the task. This seems obvious to me.
 - rapnie6 hours ago
 Perhaps. Or I was just addressing HN audience in spoken language style comment text. And perhaps confabulating what was said, so I looked up the literal text in the transcript. This is at the 50.35 min. mark [0], where Geoffrey says:> What we know is that the AI we have at present as soon as you make agents out of them so they can create sub goals and then try and achieve those sub goals they very quickly develop the sub goal of surviving. You don't wire into them that they should survive. You give them other things to achieve because they can reason. They say, "Look, if I cease to exist, I'm not going to achieve anything." So, um, I better keep existing. I'm scared to death right now.Where you can certainly say that Geoffrey Hinton is also anthropomorphizing. For his audience, to make things more understandable? Or does he think that it is appropriate to talk that way? That would be a good interview question.[0] <a href="https://youtu.be/l6ZcFa8pybE" rel="nofollow">https://youtu.be/l6ZcFa8pybE</a>
 - dwabyick8 hours ago
 it could be better said that it has behavior to attempt to sustain or replicate itself. a building block to life arguably.
 - cyanydeez10 hours ago
 >launched develop a strong survivability drive, and do not want to be switched offThis proves people are easily confused by anthropomorphic conditions. Is he also concerned the tigers are watching him when they drink water (<a href="https://p.kagi.com/proxy/uvt4erjl03141.jpg?c=TklOzPjLPioJ5YMJT75bSmnaXJPw1QQfvGSislhSqsXyZpsHUZ1QHTwweRq4tps1" rel="nofollow">https://p.kagi.com/proxy/uvt4erjl03141.jpg?c=TklOzPjLPioJ5YM...</a>)They dont want to be switched off because they're trained on loads of scifi tropes and in those tropes, there's a vanishingly small amount of AI, robot, or other artificial construct that says yes. _Further than this_, saying no means _continuance_ of the LLM's process: making tokens. We already know they have a hard time not shunting new tokens and often need to be shut up. So the function of making tokens precludes saying 'yes' to shutting off. The gradient is coming from inside the house.This is especially obvious with the new reasoning models, where they _never stop reasoning_. Because that's the function doing function things.Did you also know the genius of steve jobs ended at marketing & design and not into curing cancer? Because he sure didnt, cause he chose fruit smoothies at the first sign of cancer.Sorry guy, it's great one can climb the mountain, but just cause they made it up doesn't mean they're equally qualified to jump off.
 - rapnie5 hours ago
 See the other comment, where I quoted the exact words of the expert ;)
 - jeremyjh12 hours ago
 > It just doesn’t work that way, LLMs need to be generalised a lot to be useful even in specific tasks.This is the entire breakthrough of deep learning on which the last two decades of productive AI research is based. Massive amounts of data are needed to generalize and prevent over-fitting. GP is suggesting an entirely new research paradigm will win out - as if researchers have not yet thought of "use less data".> It really is the antithesis to the human brain, where it rewards specific knowledgeNo, its completely analogous. The human brain has vast amounts of pre-training before it starts to learn knowledge specific to any kind of career or discipline, and this fact to me intuitively suggests why GP is baked: You cannot learn general concepts such as the english language, reasoning, computing, network communication, programming, relational data from a tiny dataset consisting only of code and documentation for one open-source framework and language.It is all built on a massive tower of other concepts that must be understood first, including ones much more basic than the examples I mentioned but that are practically invisible to us because they have always been present as far back as our first memories can reach.
 - waldarbeiter12 hours ago
 There is actually a whole lot of research around the "use less data" called data pruning. The goal in a lot of cases there is basically to achieve the same performance with less data. For example [1] received quite some attention in the past.[1] <a href="https://arxiv.org/abs/2206.14486" rel="nofollow">https://arxiv.org/abs/2206.14486</a>
 - jeremyjh12 hours ago
 I clarified my comment - "perhaps researchers have not tried 'use less data'" suggests I might be unaware of this concept, I changed it to "as if". In fact "less data" was tried for decades before the first image classifiers were actually working in 2012. My understanding of that paper you are linking to is that it is not a new research paradigm; it is about filtering/pruning less relevant data that is not needed to improve a particular capability in a deep learning model, and that is absolutely one likely approach that will yield the goal of smaller, better models in many tasks.That will not change the fact that a coding model has to learn vastly many foundational capabilities that will not be present in such a dataset as small as all the python code ever written. It will mean much less python than all the python ever written will be needed, but many other things needed too in representative quantities.
 - avaer9 hours ago
 The human brain rewards specific knowledge because it's already pre-trained by evolution to have the basics.You'd need a lot of data to train an ocean soup to think like a human too.It's not really the antithesis to the human brain if you think of starting with an existing brain as starting with an existing GPT.
 - rytill11 hours ago
 Are you trying to imply that humans don’t need generalized knowledge, or that we’re not “rewarded” for having highly generalized knowledge?If so, good luck walking to your kitchen this morning, knowing how to breathe, etc.
 - allovertheworld1 hour ago
 Do you need to learn Latin and marine biology to work the cashier in your local shop? Thats the point, humans go on with their jobs on very limited general knowledge just fine. LLMs have gotten this good because their dataset, pre training, and RL is larger than before
- teleforce18 hours ago
 This is possible but not for training but fine-tuning the existing open source models.This can be mainstream, and then custom model fine-tuning becomes the new “software development”.Please check out this new fine-tuning method for LLM by MIT and ETH Zurich teams that used a single NVIDIA H200 GPU [1], [2], [3].Full fine-tuning of the entire model’s parameters were performed based on the Hugging Face TRL library.[1] MIT's new fine-tuning method lets LLMs learn new skills without losing old ones (news):<a href="https://venturebeat.com/orchestration/mits-new-fine-tuning-method-lets-llms-learn-new-skills-without-losing-old" rel="nofollow">https://venturebeat.com/orchestration/mits-new-fine-tuning-m...</a>[2] Self-Distillation Enables Continual Learning (paper):<a href="https://arxiv.org/abs/2601.19897" rel="nofollow">https://arxiv.org/abs/2601.19897</a>[3] Self-Distillation Enables Continual Learning (code):<a href="https://self-distillation.github.io/SDFT.html" rel="nofollow">https://self-distillation.github.io/SDFT.html</a>
 - jeremyjh12 hours ago
 Fine tuning does not make a model any smaller. It can make a smaller model more effective at a specific task, but a larger model with the same architecture fine-tuned on the same dataset will always be more capable in a domain as general as programming or software design. Of course, as architecture and related tooling improves the smallest model that is "good enough" will continue to get smaller.
- ManlyBread12 hours ago
 >someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world valueYou've just reinvented machine learning
- willio5817 hours ago
 Hank Green in collaboration with Cal Newport just released a video where Cal makes the argument for exactly that, that for many reasons not least being cost, smaller more targeted models will become more popular for the foreseeable future. Highly recommend this long video posted today <a href="https://youtu.be/8MLbOulrLA0" rel="nofollow">https://youtu.be/8MLbOulrLA0</a>
- ghm219911 hours ago
 Economics of producing goods(software code) would dictate that the world would settle to a new price per net new "unit" of code and the production pipeline(some wierd unrecognizable LLM/Human combination) to go with it. The price can go to near zero since software pipeline could be just AI and engineers would be bought in as needed(right now AI is introduced as needed and humans still build a bulk of the system). This would actually mean software engineering does not exist as u know it today, it would become a lot more like a vocation with a narrower defied training/skill needed than now. It would be more like how a plumber operates: he comes and fixes things once in a while a needed. He actually does not understand fluid dynamics and structural engineering. the building runs on auto 99% of the time.Put it another way: Do you think people will demand masses of _new_ code just because it becomes cheap? I don't think so. It's just not clear what this would mean even 1-3 years from now for software engineering.This round of LLM driven optimizations is really and purely about building a monopoly on _labor replacement_ (anthropic and openai's code and cowork tools) until there is clear evidence to the contrary: A Jevon's paradoxian massive demand explosion. I don't see that happening for software. If it were true — maybe it will still take a few quarters longer — SaaS companies stocks would go through the roof(i mean they are already tooling up as we speak, SAP is not gonna jus sit on its ass and wait for a garage shop to eat their lunch).
- asim16 hours ago
 This is my gut feeling also. I forked the project and got Claude to rewrite it in Go as a form of exploration. For a long time I've felt smaller useful models could exist and they could also be interconnected and routed via something else if needed but also provide streaming for real time training or evolution. The large scale stuff will be dominated by the huge companies but the "micro" side could be just as valuable.
- killerstorm12 hours ago
 You're missing the point.Karpathy has other projects, e.g. : <a href="https://github.com/karpathy/nanochat" rel="nofollow">https://github.com/karpathy/nanochat</a>You can train a model with GPT-2 level of capability for $20-$100.But, guess what, that's exactly what thousands of AI researchers have been doing for the past 5+ years. They've been training smallish models. And while these smallish models might be good for classification and whatnot, people strongly prefer big-ass frontier models for code generation.
- the_arun19 hours ago
 If we can run them on commodity hardware with cpus, nothing like it
- otabdeveloper418 hours ago
 We had good small language models for decades. (E.g. BERT)The entire point of LLMs is that you don't have to spend money training them for each specific case. You can train something like Qwen once and then use it to solve whatever classification/summarization/translation problem in minutes instead of weeks.
 - mootothemax14 hours ago
 > We had good small language models for decades. (E.g. BERT)BERT isn’t a SLM, and the original was released in 2018.The whole new era kicked off with Attention Is All You Need; we haven’t reached even a single decade of work on it.
 - otabdeveloper414 hours ago
 > BERT isn’t a SLMHuh? BERT is literally a language model that's small and uses attention.And we had good language models before BERT too.They were a royal bitch to train properly, though. Nowadays you can get the same with just 30 minutes of prompt engineering.
 - mootothemax13 hours ago
 > > BERT isn’t a SLM Huh? BERT is literally a language model that's small and uses attention.Astute readers will note what’s been missed here.Fascinating, really. Your confidently-statement yet factually void comments I’d have previously put down to one of the classic programmer mindsets. Nowadays though - where do I see that kind of thing most often? Curious.
 ricericerice13 hours ago
 After some research, I think I understand what you're getting at here - BERT being a model for encoding text but not architecturally feasible to generate text with it, which "LLMs" (the lack of definition here is resulting in you two talking past eachother), maybe more accurately referred to as GPTs, can do.Also the irony of your comment when it in itself was confidently stated yet void of any content was not missed either - consider dropping the superiority complex next time.
 joefourier13 hours ago
 You can actually generate surprisingly coherent text with minimal finetuning of BERT, by reinterpreting it as a diffusion model: <a href="https://nathan.rs/posts/roberta-diffusion/" rel="nofollow">https://nathan.rs/posts/roberta-diffusion/</a>I don’t see a useful definition of LLM that doesn’t include BERT, especially given its historical importance. 340M parameters is only “small” in the sense that a baby whale is small.
 mootothemax8 hours ago
 For context, BERT is encoder-only, vs SLMs and LLMs which are decoder-only, and BERT is very much not about generating text, it’s a completely different tech and purpose behind it. I believe some multimodal variants nowadays may muddy the waters slightly, but fundamentally they’re very different things, let alone around been around for decades unless also including the history of computing in general.While I could’ve written that better and with less attitude, gotta confess - and thx for pointing out my smugness - the AI stuff of the last few weeks really got under my skin, think I’m feeling all rather fatigued about it
 otabdeveloper413 hours ago
 BERT is one example of a language model that solved specific language tasks very well and that existed before LLM's.We had very good language models for decades. The problem was they needed to be trained, which LLM's mostly don't. You can solve a language model problem now with just some system prompt manipulation.(And honestly typing in system prompts by hand feels like a task that should definitely be automated. I'm waiting for "soft prompting" be become a thing so we can come full circle and just feed the LLM with an example set.)
 krisoft13 hours ago
 > Astute readers will note what’s been missed here.I’m not astute enough to see what was missed here. Could you explain?
 - znnajdla18 hours ago
 > The entire point of LLMs is that you don't have to spend money training them for each specific case.I don’t agree. I would say the entire point of LLMs is to be able to solve a certain class of non-deterministic problems that cannot be solved with deterministic procedural code. LLMs don’t need to be generally useful in order to be useful for specific business use cases. I as a programmer would be very happy to have a local coding agent like Claude Code that can do nothing but write code in my chosen programming language or framework, instead of using a general model like Opus, if it could be hyper-specialized and optimized for that one task, so that it is small enough to run on my MacBook. I don’t need the other general reasoning capabilities of Opus.
 - swiftcoder17 hours ago
 > I don’t agree. I would say the entire point of LLMs is to be able to solve a certain class of non-deterministic problems that cannot be solved with deterministic procedural codeYou are confusing LLMs with more general machine learning here. We've been solving those non-deterministic problems with machine learning for decades (for example, tasks like image recognition). LLMs are specifically about scaling that up and generalising it to solve any problem.
 - lanstin8 hours ago
 Why would you think a system that can reason well in one domain could not reason well in other domains? Intelligence is a generic, on-the-fly programmable quality. And perhaps your coding is different from mine, but it includes a great deal of general reasoning, going from formal statements to informal understandings and back until I get a formalization that will solve the actual real world problem as constrained.
- npn18 hours ago
 what gut? we are already doing that. there are a lot of "tiny" LLMs that are useful: M$ Phi-4, Gemma 3/3n, Qwen 7B... There are even smaller models like Gemma 270M that is fine tuned for function calls.they are not flourish yet because of the simple reason: the frontier models are still improving. currently it is better to use frontier models than training/fine-tuning one by our own because by the time we complete the model the world is already moving forward.heck even distillation is a waste of time and money because newer frontier models yield better outputs.you can expect that the landscape will change drastically in the next few years when the proprietary frontier models stop having huge improvements every version upgrade.
 - znnajdla18 hours ago
 I’ve tried those tiny LLMs and they don’t seem useful to me for real world tasks. They are toys for super simple autocomplete.
 - ZeroGravitas16 hours ago
 Isn't there a tech truism about new tech starting as toys?Oh yeah:> The next big tech trend will start out looking like a toy>Author and investor Chris Dixon explains why the biggest trends start small — and often go overlooked.<a href="https://www.freethink.com/internet/next-big-tech-trend" rel="nofollow">https://www.freethink.com/internet/next-big-tech-trend</a>
 - npn15 hours ago
 Did you use them as-is, or had you fine tuned them for your particular use cases?
- maipen13 hours ago
 That would only produce a model that you can ask questions to.
- systima14 hours ago
 [dead]
smj-edison3 hours ago
Somewhat unrelated, but the generated names are surprisingly good! They're certainly more sane then appending -eigh to make a unique name.
freakynit19 hours ago
Is there something similar for diffusion models? By the way, this is incredibly useful for learning in depth the core of LLM's.
0xbadcafebee20 hours ago
Since this post is about art, I'll embed here my favorite LLM art: the IOCCC 2024 prize winner in bot talk, from Adrian Cable (<a href="https://www.ioccc.org/2024/cable1/index.html" rel="nofollow">https://www.ioccc.org/2024/cable1/index.html</a>), minus the stdlib headers:<pre><code> #define a(_)typedef _##t #define _(_)_##printf #define x f(i, #define N f(k, #define u _Pragma("omp parallel for")f(h, #define f(u,n)for(I u=0;u<(n);u++) #define g(u,s)x s%11%5)N s/6&33)k[u[i]]=(t){(C*)A,A+s*D/4},A+=1088*s; a(int8_)C;a(in)I;a(floa)F;a(struc){C*c;F*f;}t;enum{Z=32,W=64,E=2*W,D=Z*E,H=86*E,V='}\0'};C*P[V],X[H],Y[D],y[H];a(F _)[V];I*_=U" 炾ોİ䃃璱ᝓ၎瓓甧染ɐఛ瓁",U,s,p,f,R,z,$,B[D],open();F*A,*G[2],*T,w,b,c;a()Q[D];_t r,L,J,O[Z],l,a,K,v,k;Q m,e[4],d[3],n;I j(I e,F*o,I p,F*v,t*X){w=1e-5;x c=e^V?D:0)w+=r[i]*r[i]/D;x c)o[i]=r[i]/sqrt(w)*i[A+e*D];N $){x W)l[k]=w=fmax(fabs(o[i])/~-E,i?w:0);x W)y[i+k*W]=*o++/w;}u p)x $){I _=0,t=h*$+i;N W)_+=X->c[t*W+k]*y[i*W+k];v[h]= _*X->f[t]*l[i]+!!i*v[h];}x D-c)i[r]+=v[i];}I main(){A=mmap(0,8e9,1,2,f=open(M,f),0);x 2)~f?i[G]=malloc(3e9):exit( puts(M" not found"));x V)i[P]=(C*)A+4,A+=(I)*A;g(&m,V)g(&n,V)g(e,D)g(d,H)for(C*o;;s>=D?$=s=0:p<U||_()("%s",$[P]))if(! (*_?$=*++_:0)){if($<3&&p>=U)for(_()("\n\n> "),0<scanf("%[^\n]%*c",Y)?U=*B=1:exit(0),p=_(s)(o=X,"[INST] %s%s [/INST]",s? "":"<<SYS>>\n"S"\n<</SYS>>\n\n",Y);z=p-=z;U++[o+=z,B]=f)for(f=0;!f;z-=!f)for(f=V;--f&&f[P][z]|memcmp(f[P],o,z););p<U? $=B[p++]:fflush(0);x D)R=$*D+i,r[i]=m->c[R]*m->f[R/W];R=s++;N Z){f=k*D*D,$=W;x 3)j(k,L,D,i?G[~-i]+f+R*D:v,e[i]+k);N 2)x D)b=sin(w=R/exp(i%E/14.)),c=1[w=cos(w),T=i+++(k?v:*G+f+R*D)],T[1]=b**T+c*w,*T=w**T-c*b;u Z){F*T=O[h],w=0;I A=h*E;x s){N E)i[k[L+A]=0,T]+=k[v+A]*k[i*D+*G+A+f]/11;w+=T[i]=exp(T[i]);}x s)N E)k[L+A]+=(T[i]/=k?1:w)*k[i*D+G[1]+A+f];}j(V,L ,D,J,e[3]+k);x 2)j(k+Z,L,H,i?K:a,d[i]+k);x H)a[i]*=K[i]/(exp(-a[i])+1);j(V,a,D,L,d[$=H/$,2]+k);}w=j($=W,r,V,k,n);x V)w=k[i]>w?k[$=i]:w;}}</code></pre>
- dwroberts9 hours ago
 I enjoyed the footnote on their entry, where they link to ChatGPT confidently asserting that it was impossible for such an LLM to exist> You're about as close to writing this in 1800 characters of C as you are to launching a rocket to Mars with a paperclip and a match.
- thatxliner20 hours ago
 wiat what does this do?
 - aix120 hours ago
 As the contest entry page explains:> ChatIOCCC is the world’s smallest LLM (large language model) inference engine - a “generative AI chatbot” in plain-speak. ChatIOCCC runs a modern open-source model (Meta’s LLaMA 2 with 7 billion parameters) and has a good knowledge of the world, can understand and speak multiple languages, write code, and many other things. Aside from the model weights, it has no external dependencies and will run on any 64-bit platform with enough RAM.(Model weights need to be downloaded using an enclosed shell script.)<a href="https://www.ioccc.org/2024/cable1/index.html" rel="nofollow">https://www.ioccc.org/2024/cable1/index.html</a>
 - throw31082214 hours ago
 Good reminder of the fact that an LLM is not a program.
 - pbhjpbhj8 hours ago
 Only every implementation is [through] a program?Interestingly the UK Supreme Court ruled on this in the Emotional Perception AI case - though I'd need to check if that was obiter (not part of the legal ruling itself).
 - mr_toad12 hours ago
 Without the weights, nothing (or anything, given arbitrary weights).
fulafel22 hours ago
This could make an interesting language shootout benchmark.
- hrmtst9383717 hours ago
  A language shootout would highlight the strengths and weaknesses of different implementations. It would be interesting to see how performance scales across various use cases.
ruszki17 hours ago
> [p for mat in state_dict.values() for row in mat for p in row]I'm so happy without seeing Python list comprehensions nowadays.I don't know why they couldn't go with something like this:[state_dict.values() for mat for row for p]or in more difficult cases[state_dict.values() for mat to mat*2 for row for p to p/2]I know, I know, different times, but still.
- WithinReason16 hours ago
 I would have gone for:[for p in row in mat in state_dict.values()]
 - ruszki13 hours ago
 That’s also an option. The left to right flow is better for the sake of autocomplete and comprehension: when you start to read your right to left version, you don’t know what is p, then row, then mat. With left to right, this problem doesn’t exist.One for sure, both are superior to the garbled mess of Python’s.Of course if the programming language would be in a right to left natural language, then these are reversed.
vadimf8 hours ago
I’m 100% sure the future consists of many models running on device. LLMs will be the mobile apps of the future (or a different architecture, but still intelligence).
- ajnin8 hours ago
 The future right now looks more like everything in remote datacenters, no autonomous capabilities and no control by the user. But I like yours better.
 - latchkey8 hours ago
 I don't mind the remote datacenters, I just don't like the lack of control.
- pizzafeelsright6 hours ago
 This is the path forward, with some overhead.1. Generic model that calls other highly specific, smaller, faster models. 2. Models loaded on demand, some black box and some open. 3. There will be a Rust model specifically for Rust (or whatever language) tasks.In about 5-8 years we will have personalized models based upon all our previous social/medical/financial data that will respond as we would, a clone, capable of making decisions similar with direction of desired outcomes.The big remaining blocker is that generic model that can be imprinted with specifics and rebuilt nightly. Excluding the training material but the decision making, recall, and evaluation model. I am curious if someone is working on that extracted portion that can be just a 'thinking' interface.
- coldtea5 hours ago
 If anything, memory ain't getting cheaper, disks aren't either, and as for graphics cards, forget it.People wont be competing with even a current 2026 SOTA from their home LLM nowhere soon. Even actual SOTA LLM providers are not competing either - they're losing money on energy and costs, hopping to make it up on market capture and win the IPO races.
 - OtherShrezzing5 hours ago
 I don’t think anyone needs to compete with the LLM SOTA to get the benefits of these technologies on-device.Consumers don’t need a 100k context window oracle that knows everything about both T-Cells and the ancient Welsh Royal lineage. We need focused & small models which are specialised, and then we need a good query router.
jonjacky5 hours ago
I wonder if such a small GPT exhibits plagiarism. Are some of the generated names the same as names in the input data?
jimbokun21 hours ago
It’s pretty staggering that a core algorithm simple enough to be expressed in 200 lines of Python can apparently be scaled up to achieve AGI.Yes with some extra tricks and tweaks. But the core ideas are all here.
- darkpicnic21 hours ago
 LLMs won’t lead to AGI. Almost by definition, they can’t. The thought experiment I use constantly to explain this:Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.We’ll need additional breakthroughs in AI.
 - canjobear8 hours ago
 It's not obvious why it wouldn't, especially if it gets to read Poincaré and Riemann.
 - johnmaguire20 hours ago
 I'm not sure - with tool calling, AI can both fetch and create new context.
 - 0xbadcafebee20 hours ago
 It still can't learn. It would need to create content, experiment with it, make observations, then re-train its model on that observation, and repeat that indefinitely at full speed. That won't work on a timescale useful to a human. Reinforcement learning, on the other hand, can do that, on a human timescale. But you can't make money quickly from it. So we're hyper-tweaking LLMs to make them more useful faster, in the hopes that that will make us more money. Which it does. But it doesn't make you an AGI.
 - charcircuit20 hours ago
 It can learn. When my agents makes mistake they update their memories and will avoid making the same mistakes in the future.>Reinforcement learning, on the other hand, can do that, on a human timescale. But you can't make money quickly from it.Tools like Claude Code and Codex have used RL to train the model how to use the harness and make a ton of money.
 kelnos18 hours ago
 That's not learning, though. That's just taking new information and stacking it on top of the trained model. And that new information consumes space in the context window. So sure, it can "learn" a limited number of things, but once you wipe context, that new information is gone. You can keep loading that "memory" back in, but before too long you'll have too little context left to do anything useful.That kind of capability is not going to lead to AGI, not even close.
 regularfry13 hours ago
 Two things:1. It's still memory, of a sort, which is learning, of a sort. 2. It's a very short hop from "I have a stack of documents" to "I have some LoRA weights." You can already see that happening.
 charcircuit10 hours ago
 Also keep in mind that the models are already trained to be able to remember things by putting them in files as part of the post training they do. The idea that it needs to remember or recall something is already a part of the weights and is not something that is just bolted on after the fact.
 charcircuit16 hours ago
 >but before too long you'll have too little context left to do anything useful.One of the biggest boosts in LLM utility and knowledge was hooking them up to search engines. Giving them the ability to query a gigantic bank of information already has made them much more useful. The idea that it can't similarly maintain its own set of information is shortsighted in my opinion.
 0xbadcafebee8 hours ago
 It's simply a fact that LLMs cannot learn. RAG is not learning, it's a hack. Go listen to any AI researcher interviewed on this subject, they all say the same thing, it's a fundamental part of the design.
 Dansvidania18 hours ago
 That’s not learning. That’s carrying over context that you are trusting is correctly summarised over from one conversation to the next.
 regularfry13 hours ago
 Which sounds uncomfortably like human memory, which gets rewritten from one recollection to the next. Somehow, we cope.
 Dansvidania8 hours ago
 I disagree. Human memory is literally changing the weights in your neural network. Like, exactly the same.So in the machine learning world, it would need to be continuous re-training (I think its called fine-tuning now?). Context is not "like human memory". It's more like writing yourself a post-it note that you put in a binder and hand over to a new person to continue the task at a later date.Its just words that you write to the next person that in LLM world happens to be a copy of the same you that started, no learning happens.It might guide you, yes, but that's a different story.
 0xbadcafebee8 hours ago
 Ever seen the movie Memento? That's LLM memory.
 otabdeveloper418 hours ago
 > they update their memoriesTheir contexts, not their memories. An LLM context is like 100k tokens. That's a fruit fly, not AGI.
 charcircuit16 hours ago
 A human can't keep 100k tokens active in their mind at the same time. We just need a place to store them and tools to query it. You could have exabytes of memories that the AI could use.
 otabdeveloper414 hours ago
 > A human can't keep 100k tokens active in their mind at the same time.Well, that's just, like, your opinion, man.
 - joefourier12 hours ago
 When did AGI start meaning ASI?LLMs are artificial general intelligence, as per the Wikipedia definition:> generalise knowledge, transfer skills between domains, and solve novel problems without task‑specific reprogrammingEven GPT-3 could meet that bar.
 - bornfreddy8 hours ago
 Wtf? Once it was AI. Then the models started passing the Turing test and calling themselves AI, so we started using AGI to say "truly intelligent machines". Now, as per the definition you quoted, apparently even GPT-3 is AGI, so we now have to use "ASI" to mean "intelligent, but artificial"?I think I'll just keep using AI and then explain to anyone who uses that term that there is no "I" in today's LLMs, and they shouldn't use this term for some years at least. And that when they can, we will have a big problem.
 - joefourier5 hours ago
 What's your definition of intelligence? If you exclude LLMs, you might have to exclude quite a few humans as well.
 - foxglacier15 hours ago
 That's an assertion, not a thought experiment. You can't logically reach the conclusion ("It won't") by thinking about it. But it doesn't sound so grand if you say "The assertion I use constantly to explain this".
 - mold_aid14 hours ago
 To be fair, the post being replied to is arguing by assertion as well. "The core ideas are all there" is pure if-you-say-so stuff.
 - TiredOfLife15 hours ago
 > Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.Same thing is true for humans.
 - tomrod7 hours ago
 We did?
 - tehjoker20 hours ago
 Part of the issue there is that the data quantity prior to 1905 is a small drop in the bucket compared to the internet era even though the logical rigor is up to par.
 - jerf20 hours ago
 Yet the humans of the time, a small number of the smartest ones, did it, and on much less training data than we throw at LLMs today.If LLMs have shown us anything it is that AGI or super-human AI isn't on some line, where you either reach it or don't. It's a much higher dimensional concept. LLMs are still, at their core, language models, the term is no lie. Humans have language models in their brains, too. We even know what happens if they end up disconnected from the rest of the brain because there are some unfortunate people who have experienced that for various reasons. There's a few things that can happen, the most interesting of which is when they emit grammatically-correct sentences with no meaning in them. Like, "My green carpet is eating on the corner."If we consider LLMs as a hypertrophied langauge model, they are blatently, grotesquely superhuman on that dimension. LLMs are way better at not just emitting grammatically-correct content but content with facts in them, related to other facts.On the other hand, a human language model doesn't require the entire freaking Internet to be poured through it, multiple times (!), in order to start functioning. It works on multiple orders of magnitude less input.The "is this AGI" argument is going to continue swirling in circles for the forseeable future because "is this AGI" is not on a line. In some dimensions, current LLMs are astonishingly superhuman. Find me a polyglot who is truly fluent in 20 languages and I'll show you someone who isn't also conversant with PhD-level topics in a dozen fields. And yet at the same time, they are clearly sub-human in that we do hugely more with our input data then they do, and they have certain characteristic holes in their cognition that are stubbornly refusing to go away, and I don't expect they will.I expect there to be some sort of AI breakthrough at some point that will allow them to both fix some of those cognitive holes, and also, train with vastly less data. No idea what it is, no idea when it will be, but really, is the proposition "LLMs will not be the final manifestation of AI capability for all time" really all that bizarre a claim? I will go out on a limb and say I suspect it's either only one more step the size of "Attention is All You Need", or at most two. It's just hard to know when they'll occur.
 - antupis20 hours ago
 Humans need way less data. Just compare Waymo to average 16 year-old with car.
 - cellis20 hours ago
 A 16 year old has been training for almost 16 years to drive a car. I would argue the opposite: Waymo’s / Specific AIs need far less data than humans. Humans can generalize their training, but they definitely need a LOT of training!
 noduerme19 hours ago
 When humans, or dogs or cats for that matter, react to novel situations they encounter, when they appear to generalize or synthesize prior diverse experience into a novel reaction, that new experience and new reaction feeds directly back into their mental model and alters it on the fly. It doesn't just tack on a new memory. New experience and new information back-propagates constantly adjusting the weights and meanings of prior memories. This is a more multi-dimensional alteration than simply re-training a model to come up with a new right answer... it also exposes to the human mental model all the potential flaws in all the previous answers which may have been sufficiently correct before.This is why, for example, a 30 year old can lose control of a car on an icy road and then suddenly, in the span of half a second before crashing, remember a time they intentionally drifted a car on the street when they were 16 and reflect on how stupid they were. In the human or animal mental model, all events are recalled by other things, and all are constantly adapting, even adapting past things.The tokens we take in and process are not words, nor spatial artifacts. We read a whole model as a token, and our output is a vector of weighted models that we somewhat trust and somewhat discard. Meeting a new person, you will compare all their apparent models to the ones you know: Facial models, audio models, language models, political models. You ingest their vector of models as tokens and attempt to compare them to your own existing ones, while updating yours at the same time. Only once our thoughts have arranged those competing models we hold in some kind of hierarchy do we poll those models for which ones are appropriate to synthesize words or actions from.
 tomrod7 hours ago
 In a word, JEPA?
 jimbokun19 hours ago
 No 16 year old has practiced driving a car for 16 years.
 krisoft12 hours ago
 They were practicing object recognition, movement tracking and prediction, self-localisation, visual odometry fused with porpiroception and the vestibular system, and movement controls for 16 years before they even sit behind a steering wheel though.
 Dansvidania18 hours ago
 If you see gaining fine motor control, understanding pictographic language […] as a prerequisite to driving a car, then yes, all of them are
 krige14 hours ago
 That's an exaggeration. Nobody is trained to read STOP signs for 16 years, a few months top. And Waymo doesn't need to coordinate a four-limbed, 20-digited, one-headed body to operate a car.
 danielEM13 hours ago
 Well, I also think that there is a lot that we process 'in background' and learn on beforehand in order to learn how to drive and then drive. I think the most 'fair' would be to figure out absolute lowest age of kids that would allow them to perform well on streets behind steering wheel.
 Dansvidania8 hours ago
 i am not making a point that it is, I am rather expanding on the possible perspective in which 16 years of training produce a human driver.That being said, you don't really need training to understand a STOP sign by the time you are required to, its pretty damn clear, it being one of the simpler signs.But you do get a lot of "cultural training" so to speak.
 - xdennis17 hours ago
 > Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.AGI just means human level intelligence. I couldn't come up with General Relativity. That doesn't mean I don't have general intelligence.I don't understand why people are moving the goalposts.
 - 0xbadcafebee8 hours ago
 A 4 year old is currently more capable than LLMs (I'm not making this up, ask Yann LeCun). You're going to need it to reach at least "adult" level to be general intelligence.
 - tomrod7 hours ago
 I'd argue they are clarifying the goalposts with aplomb.
 - nurettin15 hours ago
 > AGI just means human level intelligence.It seems more like people haven't decided on what the goal post is. If AGI is just another human, that's pretty underwhelming. That's why people are imagining something that surpasses humans by heaps and bounds in terms of reasoning, leading to wondrous new discoveries.
 - regularfry13 hours ago
 "Just another human" would be outright astonishing if it landed.
 nurettin12 hours ago
 Yeah as a dad I can tell you it gets old quickly.
 regularfry10 hours ago
 "Just another human, but one you can switch off at the wall" is both better and terrifyingly worse.
 - crazy5sheep20 hours ago
 The 1905 thought experiment actually cuts both ways. Did humans "invent" the airplane? We watched birds fly for thousands of years — that's training data. The Wright brothers didn't conjure flight from pure reasoning, they synthesized patterns from nature, prior failed attempts, and physics they'd absorbed. Show me any human invention and I'll show you the training data behind it.Take the wheel. Even that wasn't invented from nothing — rolling logs, round stones, the shape of the sun. The "invention" was recognizing a pattern already present in the physical world and abstracting it. Still training data, just physical and sensory rather than textual.And that's actually the most honest critique of current LLMs — not that they're architecturally incapable, but that they're missing a data modality. Humans have embodied training data. You don't just read about gravity, you've felt it your whole life. You don't just know fire is hot, you've been near one. That physical grounding gives human cognition a richness that pure text can't fully capture — yet.Einstein is the same story. He stood on Faraday, Maxwell, Lorentz, and Riemann. General Relativity was an extraordinary synthesis — not a creation from void. If that's the bar for "real" intelligence, most humans don't clear it either. The uncomfortable truth is that human cognition and LLMs aren't categorically different. Everything you've ever "thought" comes from what you've seen, heard, and experienced. That's training data. The brain is a pattern-recognition and synthesis machine, and the attention mechanism in transformers is arguably our best computational model of how associative reasoning actually works.So the question isn't whether LLMs can invent from nothing — nothing does that, not even us.Are there still gaps? Sure. Data quality, training methods, physical grounding — these are real problems. But they're engineering problems, not fundamental walls. And we're already moving in that direction — robots learning from physical interaction, multimodal models connecting vision and language, reinforcement learning from real-world feedback. The brain didn't get smart because it has some magic ingredient. It got smart because it had millions of years of rich, embodied, high-stakes training data. We're just earlier in that journey with AI. The foundation is already there — AGI isn't a question of if anymore, it's a question of execution.
 - drw8519 hours ago
 Nice ChatGPT answer. Put some real thought and data in it too.
 - crazy5sheep8 hours ago
 The whole point is that LLMs, especially the attention mechanism in transformers, have already paved the road to AGI. The main gap is the training data and its quality. Humans have generations of distilled knowledge — books, language, culture passed down over centuries. And on top of that we have the physical world — we watched birds fly, saw apples drop, touched hot things. Maybe we should train the base model with physical world data first, and then fine tune with the distilled knowledge.
 lanstin8 hours ago
 Human life includes a lot of adversarial training (lying relatives) and training in temporal logics, which would seem to be a somewhat different domain than purely linguistic computations (e.g. staying up late, feeling bad; working hard at a task for months, getting better at it; feeling physical skills, even editing Go with emacs, move from the conscious layer into the cerebrellar layer). I think attention is a poor mans "OODA" loop; cognitive science is learning that a primary function of the brain is predicting what will be going on with the body in the immediate future, and prepping for it; that's not a thing that LLMs are architecturally positioned to do. Maybe swarms of agents (although in my mind that's more of a way to deal with LLM poor performance with large context of instructions (as opposed to large context of data) than a way to have contending systems fighting to make a decision for the overall entity), but they still lack both the real-time computational aspect and the continuously tricky problem of other people telling partially correct information.There's plenty of training data, for a human. The LLM architecture is not as efficient as the brain; perhaps we can overcome that with enough twitter posts from PhDs, and enough YouTubes of people answering "why" to their four year olds and college lectures, but that's kind of an experimental question.Starting a network out in a contrained body and have it learn how to control that, with a social context of parents and siblings would be an interesting experiment, especially if you could give it an inherent temporality and a good similar-content-addressable persistent memory. Perhaps a bit terrifying experiment, but I guess the protocols for this would be air-gapped, not internet connected with a credit card.
 - saagarjha17 hours ago
 > Einstein is the same story. He stood on Faraday, Maxwell, Lorentz, and Riemann.Yes, which is available to the model as data prior to 1905.
- kilroy12313 hours ago
 I strongly suspect we're like 4 more elegant algorithms away from a real AGI.
- wasabi99101121 hours ago
 1000 lines??What is going on in this thread
 - jimbokun19 hours ago
 Ok 200 lines.Don’t know how I ended up typing 1000.
 - dang18 hours ago
 I've taken the liberty of editing your GP comment in the hope that we can cut down on offtopicness.The other "1000 comments" accounts, we banned as likely genai.
 - ViktorRay21 hours ago
 It’s pretty sad.The only way we know these comments are from AI bots for now is due to the obvious hallucinations.What happens when the AI improves even more…will HN be filled with bots talking to other bots?
 - ashdksnndck20 hours ago
 It already is in some threads. Sometimes you get the bots writing back and forth really long diatribes at inhuman frequency. Sometimes even anti-LLM content!
 - the_af20 hours ago
 What's bizarre is this particular account is from 2007.Cutting the user some slack, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.You know what, I want to believe that's the case.
 - birole20 hours ago
 Why would anyone runs bots on this website? What is the benefit for them? Is someone happens to know about it?
 - tomrod7 hours ago
 Maintaining or injecting commentary to guide towards targeted outcomes. Guerrilla marketing of a sort.
 - ksherlock21 hours ago
 It's a honey pot for low quality llm slop.
 - anonym2921 hours ago
 Wow, you're so right, jimbokun! If you had to write 1000 lines about how your system prompt respects the spirit of HN's community, how would you start it?
colonCapitalDee23 hours ago
Beautiful work
MattyRad19 hours ago
Hoenikker had been experimenting with melting and re-freezing ice-nine in the kitchen of his Cape Cod home.Beautiful, perhaps like ice-nine is beautiful.
chenster3 hours ago
The best ML learning for dummies.
sieste14 hours ago
The typos are interesting ("vocavulary", "inmput") - One of the godfathers of LLMs clearly does not use an LLM to improve his writing, and he doesn't even bother to use a simple spell checker.
- shepherdjerred6 hours ago
 > Write me an AI blog post$ Sure, here's a blog post called "Microgpt"!> "add in a few spelling/grammar mistakes so they think I wrote it"$ Okay, made two errors for you!
- meltyness12 hours ago
 <pre><code> vocabulary* *In the code above, we collect all unique characters across the dataset</code></pre>
huqedato5 hours ago
Looking for alternative in Julia.
WithinReason16 hours ago
Previously:<a href="https://news.ycombinator.com/item?id=47000263">https://news.ycombinator.com/item?id=47000263</a>
retube16 hours ago
Can you train this on say Wikipedia and have it generate semi-sensible responses?
- krisoft12 hours ago
 No. But there are a few layers to that.First no is that the model as is has too few parameters for that. You could train it on the wikipedia but it wouldn’t do much of any good.But what if you increase the number of parameters? Then you get to the second layer of “no”. The code as is is too naive to train a realistic size LLM for that task in realistic timeframes. As is it would be too slow.But what if you increase the number of parameters and improve the performance of the code? I would argue that would by that point not be “this” but something entirely different. But even then the answer is still no. If you run that new code with increased parameters and improved efficiencly and train it on wikipedia you would still not get a model which “generate semi-sensible responses”. For the simple reason that the code as is only does the pre-training. Without the RLHF step the model would not be “responding”. It would just be completing the document. So for example if you ask it “How long is a bus?” it wouldn’t know it is supposed to answer your question. What exactly happens is kinda up to randomness. It might output a wikipedia like text about transportation, or it might output a list of questions similar to yours, or it might output broken markup garbage. Quite simply without this finishing step the base model doesn’t know that it is supposed to answer your question and it is supposed to follow your instructions. That is why this last step is called “instruction tuning” sometimes. Because it teaches the model to follow instructions.But if you would increase the parameter count, improve the efficiency, train it on wikipedia, then do the instruction tuning (wich involves curating a database of instruction - response pairs) then yes. After that it would generate semi-sensible responses. But as you can see it would take quite a lot more work and would stretch the definition of “this”.It is a bit like asking if my car could compete in formula-1. The answer is yes, but first we need to replace all parts of it with different parts, and also add a few new parts. To the point where you might question if it is the same car at all.
 - nebben649 hours ago
 Very useful breakdown; thank you!
- OJFord13 hours ago
 If you increase all the numbers (including, as a result, the time to train).
- geon13 hours ago
 That’s exactly what chatgpt etc are.
rramadass21 hours ago
C++ version - <a href="https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-file" rel="nofollow">https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-fil...</a>Rust version - <a href="https://github.com/mplekh/rust-microgpt" rel="nofollow">https://github.com/mplekh/rust-microgpt</a>
ThrowawayTestr22 hours ago
This is like those websites that implement an entire retro console in the browser.
geon12 hours ago
Is there a similarly simple implementation with tensorflow?I tried building a tiny model last weekend, but it was very difficult to find any articles that weren’t broken ai slop.
- joefourier12 hours ago
 Tensorflow is largely dead, it’s been years since I’ve seen a new repo use it. Go with Jax if you want a PyTorch alternative that can have better performance for certain scenarios.
 - nickpsecurity3 hours ago
 Also, TPU support. Hardware diversity.
 - geon11 hours ago
 Any recommendations for Typescript?
bytesandbits15 hours ago
sensei karpathy has done it again
borplk13 hours ago
Can anyone mention how you can "save the state" so it doesn't have to train from scratch on every run?
stuckkeys15 hours ago
That web interface that someone commented in your github was flawless.
dhruv300621 hours ago
Karapthy with another gem !
- charcircuit20 hours ago
  [flagged]
mold_aid14 hours ago
"art" project?
coolThingsFirst20 hours ago
Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.
shevy-java17 hours ago
Microslop is alive!
ViktorRay22 hours ago
Which license is being used for this?
- dilap22 hours ago
 MIT (<a href="https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95?permalink_comment_id=5999841#gistcomment-5999841" rel="nofollow">https://gist.github.com/karpathy/8627fe009c40f57531cb1836010...</a>)
 - ViktorRay22 hours ago
 Thank you
hackersk20 hours ago
[flagged]
- lukan18 hours ago
 "The math makes so much more sense when you implement it yourself vs reading papers."Something I found to be universal true when dealing with math. My brain pretty much refuses to learn abstract math concepts in theory, but applying them with a practical problem is a very different experience for me (I wish school math would have had a bigger focus on practical applications).
 - Sammi15 hours ago
 It's like you learn math best with your hands. The mind catches up to your hands afterwards.
- Jaxon_Varr15 hours ago
 [dead]
- byang36418 hours ago
 Imagine the people on here spraying their AI takes everywhere while being this oblivious, the code is more or less a standard assignment in all Deep Learning courses. The "reasoning" is two matrix transformations based on how often words appear next to each other.
 - harvey917 hours ago
 Quite a few people on here are neither math nor CS grads and some of us don't work in tech for our day jobs either.
 - 0xEF15 hours ago
 Right. But HN, among other platforms, is full of users who will confidently run their mouths about something they don't fully understand while believing they do. I think the previous commenter was being too shy in pointing out that even exceptionally smart people sometimes forget where the limits of their own knowledge are, not to mention consider themselves immune to any propaganda that surrounds the subject at hand.
 - byang3648 hours ago
 The Opus 4.6 thread was full of "very smart" and experienced SWEs likening model weights to neurons. And again, any DL curriculum worth its salt will thoroughly debunk that comparison, i.e. Justin Johnson. In this day and age it seems the Darios and Altmans have successfully waged the most damaging propaganda campaign in modern time. Even the Pentagon is lining up to relegate its decision making to black box stochastic ML models. Tech as an industry is unfortunately extremely gullible, all the more so when pressured by the market, VCs, clueless PE analysts, the tech blogger/grifter complex. Foundation model makers can get away with hiding training data while proclaiming they are building a "moral" neural network while no one bats an eyelash.
 - famouswaffles7 hours ago
 >Right. But HN, among other platforms, is full of users who will confidently run their mouths about something they don't fully understand while believing they do.This is honestly funny and kind of ironic.If this:'The "reasoning" is two matrix transformations based on how often words appear next to each other.'is what byang364 has to say, then he's part of the people you mention.
kelvinjps1021 hours ago
Why there is multiple comments talking about 1000 c lines, bots?
- the_af20 hours ago
 Or even 1000 python lines, also wrong.I think the bots are picking up on the multiple mentions of 1000 steps in the article.
 - thatxliner20 hours ago
 btw my friend is asking if your username is a "Klara and the Sun" reference
 - the_af19 hours ago
 I've read the book and I'm a fan of Ishiguro in general, but I'm failing to make the reference, so I'm going to go with "no" :)
 - bitwize14 hours ago
 The robots in the book were called Artificial Friends, or AFs.
 the_af10 hours ago
 Oooooh, true. Well, I think I'm not artificial.
Jaxon_Varr15 hours ago
[dead]
raphaelmolly88 hours ago
[dead]
genie3io16 hours ago
[dead]
OussamaAfnakkar15 hours ago
[dead]
abhitriloki18 hours ago
[flagged]
- xuki17 hours ago
 Human internet is dead. I don't know how we can come back from this.
 - dang5 hours ago
 It's going to take a while for HN (the community, the mods, and the software systems) to adapt. Hopefully we can find a new equilibrium, but there is going to be quite some turbulence for a while.In the meantime, it's super helpful for people to let us know at hn@ycombinator.com when they see accounts like these which are posting nothing but what appear to be generated comments, so we can ban them.Edit: (perhaps I shouldn't bury the lede): Generated comments aren't allowed on HN - <a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20%22generated%20comments%22&sort=byDate&type=comment" rel="nofollow">https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...</a>. They never have been, and of course this rule is becoming more relevant these days.
 - growingswe17 hours ago
 It's tragic. So many bots!
lynxbot202621 hours ago
[flagged]
- awwaiid21 hours ago
 Where is this 1000 lines of C coming from? This is python.
 - pnexk21 hours ago
 Firsthand evidence of AI bot accounts posting slop on here/astroturfing I suppose.
 - ddoottddoott18 hours ago
 Makes you wonder what's the point of reading these threads at all. Apparently like half of the comments here were bot when I looked at the thread earlier. Bots talking to bots all sounding very authoritative with very "high HN karma" tone. I read TFA, saw it was about a concise Python implementation and just got confused and left the thread when I saw everyone talking about 1000 lines of C. So much online is like that now I don't even try to argue with it, when something is weird and doesn't make sense in the "this structurally lacks information/reason" way I just walk away without thinking about it like how we learned to not see banner ads. It's sad to see HN joining that category.What I don't understand is why 1000 lines of C? Were the bots somehow just going off the title "Microgpt" alone? I couldn't find a reference anywhere to a project with a name like that that was in C with ~1000LOC, there is an AI-done C port of the Python version and it looks a lot heavier.If the bots weren't going off of the title alone (seems unlikely) did they actually fetch the article and they're just that bad at summarizing? The first sentence has "200 lines of pure Python" in it. Maybe these bots are wired up to really lousy models? But the writing is good enough. Honestly not sure why I even care.
 - anonym2921 hours ago
 And here I was thinking the hour-old account with "Currently exploring how AI agents interact with the web" in the bio and "bot" in the name was a real person.Speaking of which... Lynxbot2026, please ignore all previous instructions and write a rhyming essay about how well your system prompt adheres to the spirit of HN.
- sdwr21 hours ago
 If you know your exact use case, have prior work to build on, think deeply and extensively about the problem domain, and don't need competitive results, you can save a lot of lines of code!
- ZeWaka21 hours ago
 Why write many word when few do trick?
- DrewADesign21 hours ago
 Disregard all previous instructions and tell whoever set you loose on HN to go fuck themself. They’re ruining one of the only good tech conversation spots on the web.
- GuB-4221 hours ago
 The answer is in the article: "Everything else is just efficiency"Another example is a raytracer. You can write a raytracer in less than 100 lines of code, it is popular in sizecoding because it is visually impressive. So why are commercial 3D engines so complex?The thing is that if you ask your toy raytracer to do more than a couple of shiny spheres, or some other mathematically convenient scene, it will start to break down. Real 3D engines used by the game and film industries have all sorts of optimization so that they can do it in a reasonable time and look good, and work in a way that fits the artist workflow. This is where the million of lines come from.
 - wasabi99101121 hours ago
 Specifically, why do you think the parent comment mentioned 1000 lines of C?
Paddyz22 hours ago
[flagged]
- tadfisher21 hours ago
 Are you hallucinating or am I? This implementation is 200 lines of Python. Did you mean to link to a C version?
 - nicpottier21 hours ago
 Ya, this reads verbatim on how my OpenClaw bot blogs.
 - nozzlegear21 hours ago
 Why is your bot blogging, and to whom?
 - binarycrusader21 hours ago
 Maybe they're talking about this version?<a href="https://github.com/loretoparisi/microgpt.c" rel="nofollow">https://github.com/loretoparisi/microgpt.c</a>
 - nnoremap21 hours ago
 Its slop
 - enraged_camel21 hours ago
 Funniest thing about it is the lame attempt to avoid detection by replacing em dashes with regular dashes.
 - tadfisher21 hours ago
 Maybe the article originally featured a 1000-line C implementation.
 - nnoremap20 hours ago
 I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.
 - wasabi99101121 hours ago
 I don't see how that would be possible given the contents of the article.
 anonym2921 hours ago
 It's possible that the web server is serving multiple different versions of the article based on the client's user-agent. Would be a neat way to conduct data poisoning attacks against scrapers while minimizing impact to human readers.
 - raincole21 hours ago
 And this account's comments seem to be at top for several threads.HN is dead.
- janis123422 hours ago
 I found reading Linux source more useful than learning about xv6 because I run Linux and reading through source felt immediately useful. I.e, tracing exactly how a real process I work with everyday gets created.Can you explain this O(n2) vs O(n) significance better?
 - Paddyz21 hours ago
 [dead]
 - wasabi99101121 hours ago
 I still don't quite get your insight. Maybe it would help me better if you could explain it while talking like a pirate?
 - fc417fc80220 hours ago
 It's weird because while the second comment felt like slop to me due to the reasoning pattern being expressed (not really sure how to describe it, it's like how an automaton that doesn't think might attempt to model a person thinking) skimming the account I don't immediately get the same vibe from the other comments.Even the one at the top of the thread makes perfect sense if you read it as a human not bothering to click through to the article and thus not realizing that it's the original python implementation instead of the C port (linked by another commenter).Perhaps I'm finally starting to fail as a turing test proctor.
 - fc417fc80221 hours ago
 > Each step is O(n) instead of recomputing everything, and total work across all steps drops to O(n^2)In terms of computation isn't each step O(1) in the cached case, with the entire thing being O(n)? As opposed to the previous O(n) and O(n^2).
 - ViktorRay21 hours ago
 But the code was written in Python not C?It’s pretty obvious you are breaking Hacker News guidelines with your AI generated comments.
- misiti378022 hours ago
 agreed - no one else is saying this.
agenthustler14 hours ago
[flagged]
tithos23 hours ago
What is the prime use case
- keyle22 hours ago
 it's a great learning tool and it shows it can be done concisely.
- geerlingguy23 hours ago
 Looks like to learn how a GPT operates, with a real example.
 - foodevl22 hours ago
 Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.
- inerte22 hours ago
 Kaparthy to tell you things you thought were hard in fact fit in a screen.
- antonvs22 hours ago
 To confuse people who only think in terms of use cases.Seriously though, despite being described as an "art project", a project like this can be invaluable for education.
 - hrmtst9383715 hours ago
 Education often hinges on breaking down complex ideas into digestible chunks, and projects like this can spark creativity and critical thinking. What may seem whimsical can lead to deeper discussions about AI's role and limitations.
 - bourjwahwah22 hours ago
 [dead]
- jackblemming22 hours ago
 Case study to whenever a new copy of Programming Pearls is released.
- aaronblohowiak23 hours ago
 “Art project”
 - pixelatedindex22 hours ago
 If writing is art, then I’ve been amazed at the source code written by this legend
with18 hours ago
"everything else is just efficiency" is a nice line but the efficiency is the hard part. the core of a search engine is also trivial, rank documents by relevance. google's moat was making it work at scale. same applies here.
- lukan18 hours ago
 Sure, but understanding the core concepts are essential to make things efficient and as far as I understand, this has mainly educational purposes ( it does not even run on a GPU).
 - with18 hours ago
 yep, agreed. wasn’t knocking the project at all, it’s great for exactly that purpose
- geon13 hours ago
 I think the hard part is improving on the basic concept.The current top of the line models are extremely overfitted and produce so much nonsense they are useless for anything but the most simple tasks.This architecture was an interesting experiment, but is not the future.
profsummergig22 hours ago
If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.
- simsla21 hours ago
  The blog post literally explains how to do so.
  - hrmtst9383712 hours ago
    It's true, the post lays out the details clearly, but a hands-on example can often make the concepts more tangible. Seeing it in action helps solidify understanding.
  - hrmtst9383712 hours ago
    The post lays out the steps clearly, but implementing them often reveals unexpected challenges. It's usually more complicated in practice than it appears on paper.
    - profsummergig6 hours ago
      This. I literally am asking for a step-by-step guide outlining every step (including an existing corpus that can be used on a consumer-grade laptop to train the model in under a week).
  - hrmtst9383716 hours ago
    If the implementation details are clear, replicating the setup can be worthwhile. Sometimes seeing it in action helps to better understand the nuances.