There are many patches of almost-identical sites.<p>Some of them are due to many people using the same theme.<p>Some of them are expired or parked domains, which I reckon should be detected and excluded.
That's a lot of fun to explore. I'm not entirely convinced by the "you can judge a book by its cover" thing, there are so many "Hi, I'm _____" pages that might have content or might just be portfolio stubs.
Maybe can add a timeline and clock<p>Timeline: view older versions<p>Clock: view light/dark mode theme according to user time zone (or enable dark/light mode manually)<p>I'm also a bit curious, since most web pages are predominantly white, how many of them are adapted to dark mode?