9 comments

  • minimaxir41 days ago
    As someone who has spent an embarrassing amount of time researching Hacker News title trends over the years, I was excited to look at the methodology (<a href="https:&#x2F;&#x2F;hn-ph.vercel.app&#x2F;analysis" rel="nofollow">https:&#x2F;&#x2F;hn-ph.vercel.app&#x2F;analysis</a>) but after looking at it, I am calling shenanigans afoot.<p>That&#x27;s not a methodology paper and it doesn&#x27;t explain how the model being advertised works in the spirit of open machine learning research; given that the startup is an AI startup, I assume that the actual model is more sophisticated. As Section 8 notes: &quot;This analysis is descriptive and intended to summarize empirical patterns.&quot;<p>It&#x27;s an exploratory data analysis which not only does not explain the methodology around how the model is constructed, but it also makes a number of assumptions that imply the people making it without proper context of how Hacker News works:<p>1. The extreme right-skewed nature should have raised a very large number of flags in the statistical methodology and calculations, but it mostly ignores them. The mean values are effectively useless, the p-values even <i>more</i> useless. It doesn&#x27;t point out that the negative performing terms are likely spam.<p>2. It does not question why there are so few questions with a title &gt;80 characters (answer: 80 characters is the max for a HN submission)<p>3. The analysis separates day of the week and hour: you can&#x27;t do that. They&#x27;re intrinsically linked and weekend behavior with respect to activity is far different than on weekdays.<p>4. &quot;Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k)&quot;. No statistician would call that a weak correlation; those values are effectively no correlation.<p>There is also no person tied to this paper, just the &quot;Memvid Research Team&quot;, which raises further questions.
    • Breza34 days ago
      You absolutely nailed it. I&#x27;m the Director of Advanced Analytics in a company called Cision. I&#x27;ve spent a lot of time analyzing content quantitatively. It&#x27;s easy to find spurious correlations in this kind of data and present them confidently.
  • delichon41 days ago
    Here are the result for this username, this title and this description:<p><a href="https:&#x2F;&#x2F;hn-ph.vercel.app&#x2F;results&#x2F;ZT06GF" rel="nofollow">https:&#x2F;&#x2F;hn-ph.vercel.app&#x2F;results&#x2F;ZT06GF</a><p>It got a 62, a C+, predicting that this won&#x27;t be very viral. So you either didn&#x27;t test this submission on your own product, or you did, but didn&#x27;t feel that the low score was a handicap? You don&#x27;t seem to be dogfooding. If this post does well it would be evidence against its own accuracy. If it fizzles out, congratulations on being correct.
    • baobun41 days ago
      Uncharitable and assumptious of the goals. I prefer submissions to not be hyper-optimized for virality.
  • tverbeure41 days ago
    Current nr 3 in the leaderboard: &quot;Show HN: I built a Rust compiler in Rust with Rust&quot;<p>Could use some more Rust to boost it to nr 1.
    • baobun41 days ago
      I&#x27;m calling it: Some AI controversy in Rust core will be in the top 5 of 2026.
    • Frotag41 days ago
      Show HN: I built a Rust compiler in Python with JavaScript using Java on Android
  • andr3wV41 days ago
    The analysis they ran in their research paper found most surface features don’t meaningfully separate viral from non‑viral outcomes. So the tool isn&#x27;t actually predicting if your launch title will go viral, it&#x27;s more like checking for heuristics and descriptive patterns.<p>Cool idea though! And they&#x27;re on the front page lol
  • higginsniggins41 days ago
    According to your research paper you should have made this post a &quot;Tell HN:&quot; rather then a &quot;Show HN:&quot;, lol
  • amitav141 days ago
    This tool: &quot;Avoid keyword stuffing; make the title read naturally.&quot;<p>Also this tool: &quot;Show HN (AI): I built GPT 6 in Rust Using Claude Gemini Grok OpenAI NVIDIA Google&quot; - #1<p>(No hate to the creators obviously. Just really funny.)
  • codybontecou41 days ago
    Well, he made it to the front page so there’s that.
  • simonw41 days ago
    (Replaced my original comment here which was a little unkind.)<p>Question for OP, who created Memvid (the .mv2 file format that&#x27;s used to distribute this data). Are you still taking text, chunking it and then storing those chunks as QR codes in a video file? That seems like an inherently inefficient storage mechanism to me compared with something like SQLite or Parquet - do you have concrete numbers or a demo that shows that your file format really is more effective for storing data for &quot;AI agents&quot; than those existing solutions?
    • minimaxir41 days ago
      As a side note: the dataset is referenced in the paper as being from Hugging Face (<a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;julien040&#x2F;hacker-news-posts" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;julien040&#x2F;hacker-news-posts</a>), which does host it as a 426 MB Parquet, while the .mv2 being distributed is 847 MB, for some reason.
    • simonw41 days ago
      Found a relevant comment here: <a href="https:&#x2F;&#x2F;github.com&#x2F;memvid&#x2F;memvid&#x2F;issues&#x2F;86#issuecomment-3560234272" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;memvid&#x2F;memvid&#x2F;issues&#x2F;86#issuecomment-3560...</a><p>&gt; We’ve rebuilt everything from the ground up for Memvid v2, new format, new core, new benchmarks, no QR hacks. It’s a real storage engine now, crash-safe, deterministic, and fully verified with a proper TOC, WAL, Merkle tree, and time index.<p>So I guess the QR code hack isn&#x27;t a thing any more.
    • tossit44441 days ago
      Look at memvid&#x27;s closed issues. The entire thing is a farce.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;memvid&#x2F;memvid&#x2F;issues?q=is%3Aissue%20state%3Aclosed" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;memvid&#x2F;memvid&#x2F;issues?q=is%3Aissue%20state...</a>
  • mitexleo41 days ago
    Let&#x27;s see if this goes viral
    • asciii41 days ago
      o7 see you in the 1% someday