Image: Nihal Avşar via Pexels

The Web is forgetting itself, and some newspapers are helping it along

Newspapers are restricting access to their content to save themselves from AI. Marie Boran says the wisdom of the move cannot be judged by history
Blogs
Image: Nihal Avşar via Pexels

5 June 2026

There is a particular irony in a publication deleting its own past. For nearly 30 years, the Internet Archive’s Wayback Machine has done the quiet, unglamorous work that newsroom archives and library basements once did: keeping a copy. It has preserved newspapers since the mid-1990s, and now holds more than a trillion pages, used daily by journalists, researchers and courts. And now some of the very outlets that depend on it are pulling up the drawbridge.

What’s triggering this? Unsurprisingly, AI. At the end of 2025, The New York Times added the archive’s crawler to its robots.txt file, blocking access; The Guardian and others have since done the same. Their reasoning, to be fair, is actually logical. They worry the Wayback Machine offers unfettered access to their content, which includes AI companies hungrily hoovering it up for free to train their models. So when you are already fighting AI firms in court, I suppose an open archive looks like an unlocked back door.

The thing is, The Guardian, for example, has admitted it hasn’t actually documented AI companies scraping its content through the Wayback Machine; the measure is purely precautionary. So a pre-emptive defence against a threat nobody can evidence is quietly erasing the record instead. Between May and October 2025, news captures in the archive fell by 87%. There is a growing hole in the Wayback Machine where journalism used to be.

 

advertisement



 

Ancient history

We should care about this in Ireland because so much of our recent history now lives only online. How will historians make sense of the Repeal referendum, life during Covid, Brexit’s effect on the border, without the article, posts, tweets, campaign sites etc. that were edited or deleted either purposely or through neglect?

The National Library of Ireland runs its own Web archive because websites are part of Ireland’s documentary record but we know posts, webpages and entire sites can vanish in instant. This archive runs on the same infrastructure as the Wayback Machine and therefore is also under threat. The further sting is that, unlike most EU national libraries, Irish law does not let the NLI systematically collect and preserve all Irish websites, leaving a significant gap in our online record. We’re already starting from a disadvantage.

Anyone who has done research knows the dread of the dead link or ‘link rot’. Pew Research Centre found that 38% of webpages from a decade ago are simply gone; the Wayback Machine has rescued roughly 15% of them. Strip that net away and accountability goes with it: a statement quietly amended here, a policy page scrubbed there. If we don’t preserve independent copies, history can be rewritten or disappeared.

The campaign group Fight for the Future launched an open letter earlier this week urging news outlets to commit to keeping their journalism in the archive. 2026, it notes, is the first World Press Freedom Day in 30 years when work at outlets including the New York Times and USA Today is not being preserved by the Internet Archive. Its line is the one worth remembering: “the freedom of journalists isn’t only the freedom to write, but the freedom to have your work read and remembered for generations to come”. You can add your name at savethearchive.com.

A newspaper that blocks its own archive is, in the end, betting that nobody will ever need to check. History suggests that is a losing bet.

Read More:


Back to Top ↑