Show HN: Viral Potential Predictor

(hn-ph.vercel.app)

34 points by salebanolow41 days ago

9 comments

minimaxir41 days ago
As someone who has spent an embarrassing amount of time researching Hacker News title trends over the years, I was excited to look at the methodology (<a href="https://hn-ph.vercel.app/analysis" rel="nofollow">https://hn-ph.vercel.app/analysis</a>) but after looking at it, I am calling shenanigans afoot.That's not a methodology paper and it doesn't explain how the model being advertised works in the spirit of open machine learning research; given that the startup is an AI startup, I assume that the actual model is more sophisticated. As Section 8 notes: "This analysis is descriptive and intended to summarize empirical patterns."It's an exploratory data analysis which not only does not explain the methodology around how the model is constructed, but it also makes a number of assumptions that imply the people making it without proper context of how Hacker News works:1. The extreme right-skewed nature should have raised a very large number of flags in the statistical methodology and calculations, but it mostly ignores them. The mean values are effectively useless, the p-values even more useless. It doesn't point out that the negative performing terms are likely spam.2. It does not question why there are so few questions with a title >80 characters (answer: 80 characters is the max for a HN submission)3. The analysis separates day of the week and hour: you can't do that. They're intrinsically linked and weekend behavior with respect to activity is far different than on weekdays.4. "Title length has a weak relationship with score (Pearson r = -0.017, Spearman r = 0.048, n = 100k)". No statistician would call that a weak correlation; those values are effectively no correlation.There is also no person tied to this paper, just the "Memvid Research Team", which raises further questions.
- Breza35 days ago
 You absolutely nailed it. I'm the Director of Advanced Analytics in a company called Cision. I've spent a lot of time analyzing content quantitatively. It's easy to find spurious correlations in this kind of data and present them confidently.
delichon41 days ago
Here are the result for this username, this title and this description:<a href="https://hn-ph.vercel.app/results/ZT06GF" rel="nofollow">https://hn-ph.vercel.app/results/ZT06GF</a>It got a 62, a C+, predicting that this won't be very viral. So you either didn't test this submission on your own product, or you did, but didn't feel that the low score was a handicap? You don't seem to be dogfooding. If this post does well it would be evidence against its own accuracy. If it fizzles out, congratulations on being correct.
- baobun41 days ago
 Uncharitable and assumptious of the goals. I prefer submissions to not be hyper-optimized for virality.
tverbeure41 days ago
Current nr 3 in the leaderboard: "Show HN: I built a Rust compiler in Rust with Rust"Could use some more Rust to boost it to nr 1.
- baobun41 days ago
 I'm calling it: Some AI controversy in Rust core will be in the top 5 of 2026.
- Frotag41 days ago
 Show HN: I built a Rust compiler in Python with JavaScript using Java on Android
andr3wV41 days ago
The analysis they ran in their research paper found most surface features don’t meaningfully separate viral from non‑viral outcomes. So the tool isn't actually predicting if your launch title will go viral, it's more like checking for heuristics and descriptive patterns.Cool idea though! And they're on the front page lol
higginsniggins41 days ago
According to your research paper you should have made this post a "Tell HN:" rather then a "Show HN:", lol
amitav141 days ago
This tool: "Avoid keyword stuffing; make the title read naturally."Also this tool: "Show HN (AI): I built GPT 6 in Rust Using Claude Gemini Grok OpenAI NVIDIA Google" - #1(No hate to the creators obviously. Just really funny.)
codybontecou41 days ago
Well, he made it to the front page so there’s that.
simonw41 days ago
(Replaced my original comment here which was a little unkind.)Question for OP, who created Memvid (the .mv2 file format that's used to distribute this data). Are you still taking text, chunking it and then storing those chunks as QR codes in a video file? That seems like an inherently inefficient storage mechanism to me compared with something like SQLite or Parquet - do you have concrete numbers or a demo that shows that your file format really is more effective for storing data for "AI agents" than those existing solutions?
- minimaxir41 days ago
 As a side note: the dataset is referenced in the paper as being from Hugging Face (<a href="https://huggingface.co/datasets/julien040/hacker-news-posts" rel="nofollow">https://huggingface.co/datasets/julien040/hacker-news-posts</a>), which does host it as a 426 MB Parquet, while the .mv2 being distributed is 847 MB, for some reason.
- simonw41 days ago
 Found a relevant comment here: <a href="https://github.com/memvid/memvid/issues/86#issuecomment-3560234272" rel="nofollow">https://github.com/memvid/memvid/issues/86#issuecomment-3560...</a>> We’ve rebuilt everything from the ground up for Memvid v2, new format, new core, new benchmarks, no QR hacks. It’s a real storage engine now, crash-safe, deterministic, and fully verified with a proper TOC, WAL, Merkle tree, and time index.So I guess the QR code hack isn't a thing any more.
- tossit44441 days ago
 Look at memvid's closed issues. The entire thing is a farce.<a href="https://github.com/memvid/memvid/issues?q=is%3Aissue%20state%3Aclosed" rel="nofollow">https://github.com/memvid/memvid/issues?q=is%3Aissue%20state...</a>
mitexleo41 days ago
Let's see if this goes viral
- asciii41 days ago
  o7 see you in the 1% someday