Internet Archive’s legal fights are over, but its founder mourns what was lost - Ars Technica

submitted by

arstechnica.com/tech-policy/2025/11/the-interne…

Last month, the Internet Archive’s Wayback Machine archived its trillionth webpage, and the nonprofit invited its more than 1,200 library partners and 800,000 daily users to join a celebration of the moment. To honor “three decades of safeguarding the world’s online heritage,” the city of San Francisco declared October 22 to be “Internet Archive Day.” The Archive was also recently designated a federal depository library by Sen. Alex Padilla (D-Calif.), who proclaimed the organization a “perfect fit” to expand “access to federal government publications amid an increasingly digital landscape.”

The Internet Archive might sound like a thriving organization, but it only recently emerged from years of bruising copyright battles that threatened to bankrupt the beloved library project. In the end, the fight led to more than 500,000 books being removed from the Archive’s “Open Library.”

“We survived,” Internet Archive founder Brewster Kahle told Ars. “But it wiped out the Library.”

An Internet Archive spokesperson confirmed to Ars that the archive currently faces no major lawsuits and no active threats to its collections. Kahle thinks “the world became stupider” when the Open Library was gutted—but he’s moving forward with new ideas.

17
304

Log in to comment

17 Comments

I’m glad the lawsuits didn’t kill them but what Kahle tried to do with “Open Library” and “Project 78” was truly insane. Admirable but insane. They absolutely had to know right from the outset that the Media Companies weren’t going to allow it to continue post-COVID.

I mean, large corps like Meta get away with straight up piracy these days.

Laws only matter if you’re not part of the ruling class.

They didn’t even want it during COVID.

Calling Project 78 insane is a bit much.

If amazon can download books I can also against what’s missing from thw archive

I’ve been donating monthly for about a year now, and yall should too!

Books are less valuable than a record of the state of the digital space, through time.

Indeed, I consider this to be an okay outcome. It’s the Internet Archive, not the All Information Ever Archive. It archives the Internet. There are other projects archiving books.

And it’s the Internet Archive, not the Internet Barely Disguised Pirate Bay. I’m okay if the data they’re archiving isn’t super easy to access by everyone all the time, as long as it’s being preserved. Someday eventually copyright law might become sane again, at which point these archives can come out of their bunkers. Until then those bunkers are important for keeping them safe.

I really think the Internet Archive did a downright stupid thing poking this bear with a stick. I’m relieved they survived and I hope they learned from the experience.

Copyright should not apply to a historic record, but things don’t become historic record upon creation, but sometime after. However, you can’t have a historic record if it isn’t recording history as it happens so…once enough time passes to make a historic record case versus copyright how do you add back the stuff that wasn’t recorded at the time?

The removal of this content is itself now historic record so tag the missing information and why there is a black hole where the record should be. Digital history, and thus history, as swiss cheese because the value of copyright matters more than accuracy of the history of the digital age itself. It is a tragedy to the future that we can’t record reality because someone claims they own it…

We are a stupid, stupid species.

You archive it but don’t publish it.

That might be a viable option.

It’s the approach I’ve been advocating for for years now, throughout this whole lawsuit circus. I got a lot of downvotes for it over the years too, people couldn’t separate my position from capitulation.

Really, it’s just a matter of fighting the battles you can win and not fighting the battles that will annihilate you simply on the basis of principle. The analogy I kept using was a man carrying a precious and fragile treasure going up to a bear and whacking it with a stick, and then acting like we should be sympathetic to them as they desperately scream about how the precious treasure was at risk now that the bear was eating their leg.

They should be focusing on protecting that treasure. Let the EFF take the bear on, that’s what they are for.

It does call into question the motive of the archive and it’s financial viability to pivot to doing that.

Yes, the archiving and republishing would be illegal in most countries, but not in the US. Fair Use

They didn’t face trouble over archiving the net, but over digitally lending e-books and audio.

Had a similar conversation over in Mastodon recently and yeah, this is a very fair point. The indiscriminate scanning and publication of copyrighted books shouldn’t have happened in the first place, especially when there are existing ecosystems for ethical lending/leasing/borrowing of books already in place, which benefit and are working with authors/publishers already.

You mean in term of internet archives mission and where it can do the most good? I would agree.

Good.

Stick to CC media, next pandemic.

Comments from other communities

I have mixed feelings. I’m glad they survived the lawsuits, and now they can spend their funding on their actual goals rather than it going towards lawyers.

On the other hand, it’s really sad that they had to delete so much of their archive - over half a million books, and a bunch of recordings from their Great 78 Project (which was archiving 300k+ music albums released between ~1900 and 1950). A lot of the things that can’t be archived are eventually going to become lost media.

I really hope that they didn’t actually delete anything, and only just removed public access.

And open themselves up to massive penalties? That would be beyond stupid.

I wouldn’t think a library/archive retaining data in an offline form would incur penalties, and I feel like preserving books for the future is the opposite of stupid.

Preserving is important, sure. But if the settlement required them to delete it and they keep an offline backup and this ever gets out, the settlement is voided and it opens up a world of hurt for them.

This is not a debate about the merits of preservation but about legal repercussions for the Internet Archive.

I didn’t know if it did or didn’t. But since you say that’s the case, that sucks and I hate the publishers even more.

I’m 95% sure the settlement with the publishers would have included a clause requiring the Internet Archive to delete all “infringing” material in their possession.

what’s your methodology for that 95% figure? because Internet Archive themselves mention no such clause:

The lawsuit only concerns our book lending program. The injunction clarifies that the Publisher Plaintiffs will notify us of their commercially available books, and the Internet Archive will expeditiously remove them from lending. Additionally, Judge Koeltl also signed an orderin favor of the Internet Archive, agreeing with our request that the injunction should only cover books available in electronic format, and not the publishers’ full catalog of books in print

Because this case was limited to our book lending program, the injunction does not significantly impact our other library services.  The Internet Archive may still digitize books for preservation purposes, and may still provide access to our digital collections in a number of ways, including through interlibrary loan and by making accessible formats available to people with qualified print disabilities. We may continue to display “short portions” of books as is consistent with fair use—for example, Wikipedia references (as shown in the image above). The injunction does not affect lending of out-of-print books. And of course, the Internet Archive will still make millions of public domain texts available to the public without restriction.

the judgement did not require they delete the books from their archives, only that they stop lending out digital copies of books fitting specific criteria. which should be obvious because possession not copyright infringement, reproduction/distribution is.

in fact, the judgement specfically allows Internet Archive to continue to use those books “for the purpose of accessibility for ‘eligible persons’”

Distributed archives seem to be the way forward. It’s much harder to take something down if it’s spread across the globe and not controlled by a single entity

It’s also much harder to guarantee preservation with distributed archive. Example: torrents with 0 seeders.

That’s why you need more people and spread the word. If enough people and devices are dedicated to the archival probably cess, the safer it is

So 5 times more overhead to guarantee the safety of data, that is x5 more cost cause it’s not like regular people have servers with lots of memory just sitting at their homes.

That’s the price you pay to ensure archival in the face of adversity

In the end, the fight led to more than 500,000 books being removed from the Archive’s “Open Library.”

In case you wanted to know what was lost.

I’m kind of amazed that only that and the 78’s archive was lost. At least there’s Project Gutenberg et al to help with the books, and meanwhile the IA does still archive a vast load of video material, software, and no doubt, other stuff.

What criteria decided those books? It must be a relatively small number of all books

How many of them are not backed up somewhere else?

Most of them are still “on” the Internet Archive.

Reminds me of a certain emulation site that was hit by Nintendo’s lawyers and removed the download links for all of their games.

Except that’s all they did. The files are still there, the game pages are still up, all that’s missing is the big shiny download button. A simple userscript can add them back and let you download the “removed” games.

Got any info on that user script? Y’know, just for educational purposes

I’m sure Anna has an archive or something

Someone should check to see how much of their library is accessible

Is that Steve Wozniak holding one of those zeroes?

Not unless he became Asian. You can click on the picture in the article to make it full screen. Not Woz.

Alright, but even full-size it does look a bit like him.

A bit, but he’s quite a bit older and heavier at this point in his life. Looks closer to what he did 25 years ago.

https://annas-archive.org/

Copyright and patent laws need to die.

They don’t need to die, they need to go back to what they used to be. The first copyright law was called the Statute of Anne and it covered a work for 14 years.

That’s a totally reasonable amount of time for an author/publisher to make their money. And it’s reasonable for creators to want to get paid for their work.

And then it should be public domain.

No, they need to die.

If you can’t protect your own ideas, you shouldn’t get to rely on the government to do it for you.

How are you proposing that people protect their own ideas?

Say you write a book. You self-publish. A big publisher CTRL-C/CTRL-Vs your book and publishes it themselves with their access to distribution networks and advertising budgets. Now you sell 0 copies of your book while the publishing house makes millions.

What should you have done differently?

Copyright laws were invented to protect creative people against publishing monopolies.

How is the publisher making money if everyone can copy and redistribute it for free themselves?

Edit: Loving the downvotes from useful idiots. Keep getting taken for a ride 👍

I can’t believe people disagree with this point I am unable to explain. My utter lack of self awareness and critical thinking skills inform me that they’re all idiots, not I!

Yuuup.

You didn’t answer my questions

To answer yours, beyond what i already laid out in the question itself, the original Night Of The Living Dead has been out of copyright for decades, and yet corporations still make money off it.

How do they make money off of it?

Do they create their own derivative work and then make money off of that because copyright laws prevent people from copying and redistributing it for free?

Edit: They didn’t have an answer because they know I’m right. They respond with insults rather than admitting they’re wrong.

This is why businesses that profit off of copyright and patent laws make so much profit, because they have no shortage of suckers and saps who don’t know any better proud to throw money at them.

But hey, at least they fit in with each other, right? 😉

People can literally do that right now and yet the music, book, etc. industries still exist

People can and will do that Big publishing houses cannot, because of the litigational threat.

While I don’t uncritically support one side or the other, there are provisions for protecting the small and large alike, and I think there’s no easy answer.

Everyone thinks the problem is easy to solve until a specific incidence lands in their lap.

on todays bozo braindead takes.