Wikipedia: WikiProject AI Cleanup

(en.wikipedia.org)

235 points by thinkingemote19 days ago

22 comments

Antibabelic19 days ago
I found the page Wikipedia:Signs of AI Writing[1] very interesting and informative. It goes into a lot more detail than the typical "em-dashes" heuristic.[1]: <a href="https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing" rel="nofollow">https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing</a>
- jcattle19 days ago
 An interesting observation from that page:"Thus the highly specific "inventor of the first train-coupling device" might become "a revolutionary titan of industry." It is like shouting louder and louder that a portrait shows a uniquely important person, while the portrait itself is fading from a sharp photograph into a blurry, generic sketch. The subject becomes simultaneously less specific and more exaggerated."
 - embedding-shape19 days ago
 I think that's a general guideline to identify "propaganda", regardless of the source. I've seen people in person write such statements with their own hands/fingers, and I know many people who speak like that (shockingly, most of them are in management).Lots of those points seems to get into the same idea which seems like a good balance. It's the language itself that is problematic, not how the text itself came to be, so makes sense to 100% target what language the text is.Hopefully those guidelines make all text on Wikipedia better, not just LLM produced ones, because they seem like generally good guidelines even outside the context of LLMs.
 - Antibabelic19 days ago
 Wikipedia already has very detailed guidelines on how text on Wikipedia should look, which address many of these problems.[1] For example, take a look at its advice on "puffery"[2]:"Peacock example:Bob Dylan is the defining figure of the 1960s counterculture and a brilliant songwriter.Just the facts:Dylan was included in Time's 100: The Most Important People of the Century, in which he was called "master poet, caustic social critic and intrepid, guiding spirit of the counterculture generation". By the mid-1970s, his songs had been covered by hundreds of other artists."[1]: <a href="https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style" rel="nofollow">https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style</a>[2]: <a href="https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Words_to_watch#Puffery" rel="nofollow">https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Word...</a>
 - embedding-shape19 days ago
 Right, but unless you have a specific page about "This is how to treat AI texts", people will (if they haven't already) bombard you with "This text is so obviously AI written, do something" and by having a specific page to answer to those, you can just link that instead of general "Here's how text on Wikipedia should be" guidelines. Being more specific sometimes helps people understand better :)
 - foobarchu18 days ago
 A good place to see this pre-2022 (the ai epoch) is articles on less known bands from the late 2000s when Wikipedia was becoming more popular. Quite a few of them turn out to be copy/paste promo text. I know this because I did webdev work for that industry, and when I look up those bands on wikipedia I will recognize the text as text that I personally had to paste into a bio page 20 years ago. Since the bands are well known, nobody reports it (I admit I'm too lazy)The real tell on those tends to be weirdly time-specific claims that tend to be wildly outdated ("currently touring with XYZ")
 - mrweasel19 days ago
 To me that seems like we're mistaken in mixing fiction and non-fiction in AI training data. The "a revolutionary titan of industry" makes sense if you where reading a novel where something like 90% of a book is describing the people, locations, objects and circumstances. The author of a novel would want to use exaggeration and more colourful words to underscore a uniquely important person, but "this week in trains" would probably de-emphasize the person and focus on the train-coupler.
 - lacunary19 days ago
 fiction is part of our shared language and culture. we communicate by making analogies, and our stories, especially our old ones, provide a rich basis to draw upon. neither a person nor an llm can be fluent users of human language without spending time learning from both fiction and non-fiction.
 - andrepd19 days ago
 Outstanding. Praise wikipedia, despite any shortcomings wow, isn't it such a breath of fresh air in the world of 2026.
 - robertjwebb19 days ago
 The funny thing about this is that this also appears in bad human writing. We would be better off if vague statements like this were eliminated altogether, or replaced with less fantastical but verifiable statements. If this means that nothing of the article is left then we have killed two birds with one stone.
 - nottorp19 days ago
 What do you think the LLMs were trained on? 90% of everything is crap, and they trained on everything.
 - eurekin19 days ago
 That's actually putting into words, what I couldn't, but felt similar. Spectacular quote
 - jcattle19 days ago
 I'm thinking quite a bit about this at the moment in the context of foundational models and their inherent (?) regression to the mean.Recently there has been a big push into geospatial foundation models (e.g. Google AlphaEarth, IBM Terramind, Clay).These take in vast amounts of satellite data and with the usual Autoencoder architecture try and build embedding spaces which contain meaningful semantic features.The issue at the moment is that in the benchmark suites (<a href="https://github.com/VMarsocci/pangaea-bench" rel="nofollow">https://github.com/VMarsocci/pangaea-bench</a>), only a few of these foundation models have recently started to surpass the basic U-Net in some of the tasks.There's also an observation by one of the authors of the Major-TOM model, which also provides satellite input data to train models, that the scale rule does not seem to hold for geospatial foundation models, in that more data does not seem to result in better models.My (completely unsupported) theory on why that is, is that unlike writing or coding, in satellite data you are often looking for the needle in the haystack. You do not want what has been done thousands of times before and was proven to work. Segmenting out forests and water? Sure, easy. These models have seen millions of examples of forests and water. But most often we are interested in things that are much, much rarer. Flooding, Wildfire, Earthquakes, Landslides, Destroyed buildings, new Airstrips in the Amazon, etc. etc.. But as I see it, the currently used frameworks do not support that very well.But I'd be curious how others see this, who might be more knowledgeable in the area.
 - bspammer19 days ago
 That sounds like Flanderization to me <a href="https://en.wikipedia.org/wiki/Flanderization" rel="nofollow">https://en.wikipedia.org/wiki/Flanderization</a>From my experience with LLMs that's a great observation.
 - inquirerGeneral18 days ago
 [dead]
 - Amorymeltzer19 days ago
 I particularly like (what I assume is) the subtle paean to Ted Chiang's "Blurry Jpeg of the Web" in there.<<a href="https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web" rel="nofollow">https://www.newyorker.com/tech/annals-of-technology/chatgpt-...</a>>
- smusamashah19 days ago
 This is so much detailed and everyone who is sick of reading generated text should read this.I had a bad experience at a shitty airport, went to google maps to leave a bad review, and found that its rating was 4.7 by many thousand people. Knowing that airport is run by corrupt government, I started reading those super positive reviews and the other older reviews by them. People who could barely manage few coherent sentences of English are now writing multiple paragraphs about history and vital importance of that airport in that region.Reading first section "Undue emphasis on significance" those fake reviews is all I can think of.
- cjlm19 days ago
 Turned this into a ruleset[0] for vale.sh[1][0]: <a href="https://ammil.industries/signs-of-ai-writing-a-vale-ruleset/" rel="nofollow">https://ammil.industries/signs-of-ai-writing-a-vale-ruleset/</a> [1]: <a href="https://vale.sh/" rel="nofollow">https://vale.sh/</a>
- eddyg19 days ago
 It’s also very useful in writing skills to help avoid these kinds of issues.<a href="https://github.com/blader/humanizer" rel="nofollow">https://github.com/blader/humanizer</a>
- harrisoned19 days ago
 This is very good, but I'm surprised the term "game-changer" is not mentioned there. From my observations this is used a lot in LLM texts.
 - tasuki19 days ago
 Great point! That would be a game-changer!
 - danielbln19 days ago
 "This is the smoking gun!"_sigh_ Is it though, Claude, is it really?
- paradite19 days ago
 Ironically this is a goldmine for AI labs and AI writer startups to do RL and fine-tuning.
 - zipy12419 days ago
 That's not quite how that works though. It can for example be possible that fine-tuning a model to avoid the styles described in the article cause the LLM to stop functionaing as well as it can. It might just be an artefact of the architecture itself that to be effective it has to follow these rules. If it was as easy as just providing data and the LLM would then 'encode' that as a rule, we would advance much quicker than we currently are.
 - einrealist19 days ago
 In the case of those big 'foundation models': Fine-tune for whom and how? I doubt it is possible to fine-tune things like this in a way that satisfies all audiences and training set instances. Much of this is probably due to the training set itself containing a lot of propaganda (advertising) or just bad style.
 - paradite19 days ago
 I'm pretty sure Mistral is doing fine tuning for their enterprise clients. OpenAI and Anthropic are probably not?I'm more thinking about startups for fine-tuning.
 - kingstnap19 days ago
 Seems more like the kind of thing you would make prompts using.I can totally see someone taking that page and throwing it into whatever bot and going "Make up a comprehensive style guide that does the opposite of whatever is mentioned here".
 - eddyg19 days ago
 <a href="https://github.com/blader/humanizer" rel="nofollow">https://github.com/blader/humanizer</a>
vintermann19 days ago
There was a paper recently about using LLMs to find contradictions in Wikipedia, i.e. claims on the same page or between pages which appear to be mutually incompatible.<a href="https://arxiv.org/abs/2509.23233" rel="nofollow">https://arxiv.org/abs/2509.23233</a>I wonder if something more came out of that.Either way, I think that generation of article text is the least useful and interesting way to use AI on Wikipedia. It's much better to do things like this paper did.
- JimDabell19 days ago
 That’s super interesting. I had a similar idea about 18 months ago.I think the biggest opportunity is building a knowledge graph based on Wikipedia and then checking against the graph when new edits are made. Detect any new assertions in the edit, check for conflicts against the graph, and bring up a warning along with a link to all the pages on Wikipedia that the new edit is contradicting. If the new edit is bad, it shows the editor why with citations, and if the new edit is correcting something that Wikipedia currently gets incorrect, then it shows all the other places that also need to be corrected.<a href="https://www.reddit.com/r/LocalLLaMA/comments/1eqohpm/if_someone_gave_you_a_free_dedicated_16x_a100/lhuikjg/" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/comments/1eqohpm/if_some...</a>
 - Levitz19 days ago
 Hopefully you know of Abstract Wikipedia? <a href="https://en.wikipedia.org/wiki/Abstract_Wikipedia" rel="nofollow">https://en.wikipedia.org/wiki/Abstract_Wikipedia</a>Sounds pretty relevant
 - JimDabell19 days ago
 I didn’t know about that, thank you!
- Tiberium19 days ago
 You can easily do this with normal GPT 5.2 in ChatGPT, just turn on thinking (better if extended) and web search, point a Wikipedia page to the model and tell it to check the claims for errors. I've tried it before and surprisingly it finds errors very often, sometimes small, sometimes medium. The less popular the page you linked is, the more likely it'll have errors.This works because GPT 5.x actually properly use web search.
 - nottorp19 days ago
 Have you verified those errors?
 - Tiberium18 days ago
 Yes, I did.
 - mort9618 days ago
 Can you describe some of the errors you have found this way?
 - Tiberium18 days ago
 Usually they are smaller details in the pages, not the core claims, but that doesn't really refute my point of GPT being easily able to point them. Here are some examples, I'm not including all errors per page that GPT 5.1 found back then (this reply is already too long), but just a few examples.<a href="https://en.wikipedia.org/wiki/Large_Hadron_Collider" rel="nofollow">https://en.wikipedia.org/wiki/Large_Hadron_Collider</a>> (infobox) Maximum luminosity 1×10^34/(cm2⋅s)This is from the original design, LHC has been upgraded several times, e.g. if you check <a href="https://home.web.cern.ch/news/news/accelerators/lhc-report-record-luminosity-well-done-lhc" rel="nofollow">https://home.web.cern.ch/news/news/accelerators/lhc-report-r...</a>, you see "Thanks to these improvements, the instantaneous luminosity record was smashed, reaching 2.06 x 10^34 cm^(-2) s^(-1), twice the nominal value." and that was in 2017.> The first collisions were achieved in 2010 at an energy of 3.5 tera-electronvolts (TeV) per beamThis is wrong, if you check <a href="https://home.web.cern.ch/resources/faqs/facts-and-figures-about-lhc" rel="nofollow">https://home.web.cern.ch/resources/faqs/facts-and-figures-ab...</a> it says "23 November 2009: LHC first collisions (see press release)" - <a href="https://home.web.cern.ch/news/press-release/cern/two-circulating-beams-bring-first-collisions-lhc" rel="nofollow">https://home.web.cern.ch/news/press-release/cern/two-circula...</a> and the energy was 450 GeVAnother random example, I was reading <a href="https://en.wikipedia.org/wiki/Camponotus_japonicus" rel="nofollow">https://en.wikipedia.org/wiki/Camponotus_japonicus</a> (a very small article) and decided to ask GPT about it. It checked a lot of other sources and found out that no other source claims that this species of ant inhabits Iran.Another one: <a href="https://en.wikipedia.org/wiki/Java_(software_platform)" rel="nofollow">https://en.wikipedia.org/wiki/Java_(software_platform)</a>> and—until its discontinuation in JDK 9—a browser plug-inIn reality it was deprecated in JDK 9 and removed in JDK 11 - most people would think "discontinuation" means that it was already removed in JDK 9<a href="https://en.wikipedia.org/wiki/Nekopara" rel="nofollow">https://en.wikipedia.org/wiki/Nekopara</a>> The Opening theme for After, "Contrail" was composed by "Motokyio" and Sung by "Ceul".Just two misspellings, it should be Motokiyo and Ceui> A manga adaptation illustrated by Tam-U is currently being publishedThis section hasn't been updated, but the manga has already finished a long time ago.===Here's a direct part of GPT 5.1's response (I tried this back in November, so there was no GPT 5.2 yet) regarding luminosity, and it did also have a citation in the 2nd paragraph to the exact link I used above for the luminosity claim.– The infobox lists “Maximum luminosity 1×10^34/(cm²·s)” without qualification.– That number is the original design (nominal) peak luminosity for the LHC, but the machine has substantially exceeded it in routine operation: CERN operations reports show peak instantaneous luminosities of about 1.6×10^34 cm⁻²·s⁻¹ in 2016 and ≈2.0–2.1×10^34 cm⁻²·s⁻¹ in 2017–2018, roughly a factor of two above the nominal design.– Since the same infobox uses the current maximum beam energy (6.8 TeV per beam) rather than the 7 TeV design value, presenting 1×10^34 cm⁻²·s⁻¹ as “Maximum luminosity” is misleading/outdated if read as the machine’s achieved maximum. It should either be labelled explicitly as “design luminosity” (with a note that higher values have been reached) or the numerical value should be updated to reflect the achieved peak.
 - sgc19 days ago
 I am sure that could be useful with proper post-request research.As a technique though, never ask an LLM to find errors. Ask it to either find errors or verify that there are no errors. That way it can answer without hallucinating more easily.
 - 171862744019 days ago
 > As a technique though, never ask an LLM to find errors.What I do is both ask it to explain why there are no errors at all and why there tons of errors. Then I use my natural intelligence to reason about the different claims.
 - multjoy19 days ago
 It says it finds errors.
 - Tiberium18 days ago
 It gives references that you can then verify manually. I wasn't advocating for a 100% automated process.
jMyles19 days ago
Signed up to help.On PickiPedia (bluegrass wiki - pickipedia.xyz), we've developed a mediawiki extension / middleware that works as an MCP server, and causes all of the contributions from the AI in question to appear as partially grayed out, with a "verify" button. A human can then verify and either confirm the provided source or supply their own.It started as a fork of a mediawiki MCP server.It works pretty nicely.Of course it's only viable in situations where the operator of the LLM is willing to comply / be transparent about that use. So it doesn't address the bulk of the problem on WikiPedia.But still might be interesting to some:<a href="https://github.com/magent-cryptograss/pickipedia-mcp" rel="nofollow">https://github.com/magent-cryptograss/pickipedia-mcp</a>
ChrisMarshallNY19 days ago
> Unfortunately, these models virtually always fail to properly source claims and often introduce errors.A quote for the times.May be a bit of a sisyphean task, though...
maxbaines19 days ago
This is hardly surprising given - New partnerships with tech companies support Wikipedia’s sustainability. Which relies on Human content.<a href="https://wikimediafoundation.org/news/2026/01/15/wikipedia-celebrates-25years/" rel="nofollow">https://wikimediafoundation.org/news/2026/01/15/wikipedia-ce...</a>
- jraph19 days ago
 I agree with the dig, although it's worth mentioning that this AI Cleanup page's first version was written on the 4th of December 2023.
tonymet19 days ago
Although Wikipedia has no firm rules (WP:PILLARS), the admins reference the policies (that aren’t rules) when reverting content and banning. So here’s what I gathered* no new Articles from LLM content (WP:NEWLLM)* Most images wholly generated by AI should not be used." (WP:AILLM)* “it is within admins' and closers' discretion to discount, strike, or collapse obvious use of generative LLMs" (WP:AITALK)There doesn’t seem to be an outright ban on LLM content as long as it’s high quality .Just an amateur summary for those less familiar with Wikipedia policy. I encourage people to open an account, edit some pages and engage in the community. It’s the single most influential piece of media that’s syndicated into billions of views daily, often without attribution.
crtasm19 days ago
I enjoyed the recent talk looking at the reasons people add generated content: <a href="https://media.ccc.de/v/39c3-ai-generated-content-in-wikipedia-a-tale-of-caution" rel="nofollow">https://media.ccc.de/v/39c3-ai-generated-content-in-wikipedi...</a>
progbits19 days ago
The Sanderson wiki [1] has a time-travel feature where you read a snapshot just before a publication of a book, ensuring no spoilers.I would like a similar pre-LLM Wikipedia snapshot. Sometimes I would prefer potentially stale or incomplete info rather than have to wade through slop.1: <a href="https://coppermind.net/wiki/Coppermind:Welcome" rel="nofollow">https://coppermind.net/wiki/Coppermind:Welcome</a>
- csande1719 days ago
 The easiest way to get this is probably Kiwix. You can download a ~100GB file containing all of English Wikipedia as of a particular date, then browse it locally offline.I'm not sure if it's real or not, but the Internet Archive has a listing claiming to be the dump from May 2022: <a href="https://archive.org/details/wikipedia_en_all_maxi_2022-05" rel="nofollow">https://archive.org/details/wikipedia_en_all_maxi_2022-05</a>
 - embedding-shape19 days ago
 Alternatively, straight from Wikimedia, those are the dumps I'm using, trivial to parse concurrently and easy format to parse too, multistream-xml in bz2. Latest dump (text only) is from 2026-01-01 and weights 24.1 GB. <a href="https://dumps.wikimedia.org/enwiki/20260101/" rel="nofollow">https://dumps.wikimedia.org/enwiki/20260101/</a> Also have splits together with indexes, so you can grab few sections you want, if 24GB is too large.
 - JKCalhoun19 days ago
 There's a torrent at the linked URL. Trying that right now. (I have a couple of Kiwix dumps of Wikipedia offline already.)
- Antibabelic19 days ago
 But you can already view the past version of any page on Wikipedia. Go to the page you want to read, click "View history" and select any revision before 2023.
 - progbits19 days ago
 I know but it's not as convenient if you have to keep scrolling through revisions.
- kace9119 days ago
 Have you personally encountered slop there? I tend to use Wikipedia rabbit holes as a pastime and haven’t really felt a difference.
KolmogorovComp19 days ago
I wish they also spent on the reverse: automatic rephrasing of the (many) obscure and very poorly worded and/or with no neutral tone whatsoever.And I say that as a general Wikipedia fan.
- philipwhiuk19 days ago
 WP:BOLD and start your own project to do it.
 - vintermann19 days ago
 Or be extra bold, and have an AI bot handle the forum politics associated with being allowed to make nontrivial changes.
 - embedding-shape19 days ago
 Great way to get banned :)I've made a bunch of nontrivial changes (+- 1000s of characters), none of them seems to have been reverted, never asked for permission, I just went ahead and did it. Maybe the topics I care about are so non-controversial no one actually seen it?
- tonymet19 days ago
 there are many copy editing projects that do this.If you mean the left leaning tone / bias, that will be a bit more spicy. But general grammar, tone, ambiguity , superlatives – that’s the goal of copy editing.I copy edit typesetting , for example.
 - KolmogorovComp18 days ago
 > If you mean the left leaning tone / bias, that will be a bit more spicy. But general grammar, tone, ambiguity , superlatives – that’s the goal of copy editingNo, no I mainly mean non-neutral phrasing and/or too personal. Especially for people’s articles. (“And they released that greeeat album! But unfortunately the critics did not understand them… Booh!)
 - tonymet18 days ago
 I agree. Wikipedia Cleanup is a good starting point. Or look for a Wiki Project to join.I've found the best way to learn and contribute is to jump into an existing project. Usually direction is the hardest thing .You can of course dive into an article and make changes, but you'll often get pushback (warranted or unwarranted) and that can be discouraging. It's a somewhat natural feedback loop.
 - KolmogorovComp17 days ago
 Pure styling work is boring and not worth human time anymore (for "good enough" result, starting from bad) IMO, considering how "good" AI has become to that (while still being assisted).
 tonymet17 days ago
 it can be a good way to practice automation and AI workflows.
- alt22719 days ago
 I would hate it so much if all the articles on Wikipedia were suddenly all rewritten to have a smiliar tone and style. Its beauty is its diversity.
 - KolmogorovComp18 days ago
 It’s explicitly in Wikipedia’s goal to keep a neutral tone. See WP:neutral
merelysounds19 days ago
I opened a random page with the label: <a href="https://en.wikipedia.org/wiki/Ain%27t_in_It_for_My_Health" rel="nofollow">https://en.wikipedia.org/wiki/Ain%27t_in_It_for_My_Health</a>Curious, what are the signs that this particular page has been written by an AI?I’m not saying it wasn’t, I’m probably not seeing something and wondering what to look for.
- malfist18 days ago
 Likely this passage:>Upon release, the album received generally positive reviews from critics, with praise for Top's traditionalist approach and vocal authenticity, though some noted its adherence to familiar country frameworks.Generic and uncited.
guywithahat18 days ago
Humorous that Wikipedia is attempting to force out AI while Grokipedia (<a href="https://grokipedia.com/" rel="nofollow">https://grokipedia.com/</a>) is trying to replace wikipedia with AI. I'll be fun to see how each methodology pans out
dfajgljsldkjag19 days ago
It is really good that they are taking steps to remove this stuff. You can usually tell right away when something was not written by a human.
mnming18 days ago
Isn't this a losing battle? Since the goal post keeps moving as new LLM comes out each month.
shevy-java19 days ago
It may be that AI made Wikipedia worse (I have no idea), but Wikipedia itself made several changes in the last 5 years which I hate. The "temporary account" annoys me; the strange side bars that are now the new default also annoy me. Yes, they can be hidden, but why are they shown by default? I never want them; I don't want to use them either. And some discussion pages can not be modified either - I understand that main articles can not so easily be changed, but now discussion pages as well? This happened to me on a few pages, in particular for "ongoing events". Well, I don't even visit ongoing events at a later time usually, so I give feedback or I WANT to give feedback, then I move on. With that changed policy, I can now skip bothering giving any feedback, so Wikipedia becomes less interesting as I give feedback on the QUALITY - what to improve. And so forth. It is really sad how different people can worsen the quality of a project such as Wikipedia. Wikipedia is still good, but it was better, say, 6 years ago.
bluebarbet19 days ago
Contrarian take: Wikipedia could use more AI, as well as less.A major flaw of Wikipedia is that much of it is simply poorly written. Repetition and redundancy, ambiguity, illogical ordering of content, rambling sentences, opaque grammar. That should not be surprising. Writing clear prose is a skill that most people do not have, and Wikipedia articles are generally the fruit of collaboration without copy editors.AI is perfectly suited to fixing this problem. I recently spent several hours rewriting a somewhat important article. I did not add or subtract information from the article, I simply made it clearer and more concise. I came away convinced that AI could have done as good a job - with supervision, of course - in a fraction of the time. AI-assisted copy-editing is not against Wikipedia rules. Yet as things stand, there are no built-in tools to facilitate it, doubtless because of the ambient suspicion of AI as a technology. We need to take a smarter approach.
- oasisbob19 days ago
 > AI is perfectly suited to fixing this problem. I recently spent several hours rewriting a somewhat important article. I did not add or subtract information from the article, I simply made it clearer and more concise.I'm confused by this. Is this written by an AI?> Repetition and redundancy, ambiguity, illogical ordering of content, rambling sentences, opaque grammar.This pile of words is missing a verb."You" (whoever that is, human or not) edited a single article, and that experience convinced you that "AI is perfectly suited to fixing this problem"?Ironically, the lack of evidence to support such a strong assertion is one of the key problems with AI writing in general.The idea that you could edit an article extensively without adding or subtracting information is facile. I would love to see this edit.
 - bluebarbet18 days ago
 Hard to explain the hostility here. I simply outlined my opinion ("take") and backed it up with reasons. I have been a Wikipedia editor for well over 20 years. That should not be relevant to my argument.
 - oasisbob18 days ago
 Why shouldn't it be relevant?The example you put forth supporting your claim that AI is perfect for Wikipedia editing is that you (ambiguously) edited an article, perhaps using AI.The post also reads like it was partially written with AI.I'm sorry you see my response as hostile, but I hope you can see how this example isn't accomplishing your intended rhetorical goals.
 - bluebarbet17 days ago
 It shouldn't be relevant because it's an argument from authority, and one that I can't even prove.Meanwhile, my actual argument (which, like my Wikipedia contributions, involved no help from AI) was reasoned.The real issue is that you (alone in this thread, I might add) are not taking my argument at face value. Indeed you seem to be accusing me of dishonesty. I must admit that I've never understood this kind of cynicism. I personally find it very easy to assume good faith on the part of others (which, incidentally, is a community rule here.). Anyway, that's all I have to say.
 oasisbob15 days ago
 I'm not accusing you of dishonesty, I just don't understand the evidence you're putting forward to support your claim.Have you, as an experienced Wikipedia editor ever used AI to revise articles? What was your experience?Telling a story about editing an article once and thinking that AI could do it isn't as compelling.Establishing ethos isn't the same thing as an appeal to authority.
 - ragazzina18 days ago
 > This pile of words is missing a verb.And yet is completely understandable.
 - oasisbob18 days ago
 And also entirely ironic.> Repetition and redundancy, ambiguity, illogical ordering of content, rambling sentences, opaque grammar.Repetition AND redundancy?Illogical ordering in an unordered list of things.Rambling (could be more items or less).What's the grammar of a sentence like this? Diagramming it would be a challenge. I'd call that opaque.
- OsrsNeedsf2P19 days ago
 Strongly disagree. At least on English Wikipedia, articles tend to be high quality, and it's AI - not Wikipedia - that has repetition and redundancy.
 - tonymet19 days ago
 AI isn’t one thing. The consumer AI bots (e.g. ChatGPT) are heavily dumbed down. Asking it a Wikipedia topic without prompting will of course produce low quality content.With a good model , fine tuning & supervision AI can produce stellar content.AI is at least a thousand tools. It’s a mistake to write it off so trivially.
singinishi19 days ago
[dead]
juicytip19 days ago
[dead]
feverzsj19 days ago
Didn't they just sells access to all the AI giants?
- input_sh19 days ago
 They sold AI giants enterprise downloads in order for them not to hammer Wikimedia's infrastructure by downloading bulk data the usual way available to everyone else. You really have to twist the truth to turn it into something bad for any of the sides.
- jcattle19 days ago
 You mean convenient read-access?
russnes19 days ago
Inb4 wikipedia is lost to the same narrative control as MSM
weli19 days ago
I don't see how this is going to work. 'It sounds like AI' is not a good metric whatsoever to remove content.
- csande1719 days ago
 Wikipedia agrees: <a href="https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing#Your_detection_ability" rel="nofollow">https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing#...</a>That's why they're cataloging specific traits that are common in AI-generated text, and only deleting if it either contains very obvious indicators that could never legitimately appear in a real article ("Absolutely! Here is an article written in the style of Wikipedia:") or violates other policies (like missing or incorrect citations).
- embedding-shape19 days ago
 If that's your takeaway, you need to read the submission again, because that's not what they're suggesting or doing.
- ramon15619 days ago
 This is about wiping unsourced and fake AI generated content, which can be confirmed by checking if the sources are valid
PlatoIsADisease19 days ago
Isn't having a source the only thing that should be required. Why is AI speak bad?I'm a embarrassed to be associated with US Millennials who are anti AI.No one cares if you tie your legs together and finish a marathon in 12 hours. Just finish it in 3. Its more impressive.EDIT:I suppose people missed the first sentence:>Isn't having a source the only thing that should be required.>Isn't having a source the only thing that should be required.>Isn't having a source the only thing that should be required.
- PurpleRamen19 days ago
 There is usually no quality-control on AI-output, because people are lacking time and/or competence doing it, which are also the reasons why they are using AI.And AI still can make up things, which might be fine in some random internet-comment, or some irrelevant article about something irrelevant happening somewhere in the world, but not with a knowledge-vault like Wikipedia.And, we are talking here about Wikipedia. They are not just checking for AI, they are checking everything from everyone and have many many rules to ensure a certain level of quality. They can't check everything at once and fetch all problems immediately, but they are working step by step and over time.> I'm a embarrassed to be associated with US Millennials who are anti AI.You should be embarrassed for making such a statement.
- IshKebab19 days ago
 > Why is AI speak badIt's not inherently bad, but if something was written with AI the chances that it is low effort crap are much much much higher than if someone actually spent time and effort on it.
 - simulator5g18 days ago
 It is inherently bad, because it encourages people to skim over its output and miss hallucinations. The point of AI is “saving time” so obviously you’re not meant to look too close at the output, but you have to anyway. So it’s just a big waste of time.
- alt22719 days ago
 >Isn't having a source the only thing that should be required.No, referencing and discussing it properly whilst retaining the tone and inferred meaning are equally as important. I can cite anything as a source that I want, but if I use it incorrectly or my analysis misses the point of the source then the reference source itself is pointless.