If you are not familiar with archive.org, their goal to take multiple snapshots of every website on the internet and provide tools for viewing websites as they appeared at various points in time. In order to do this they have a web-crawler called ia_archiver which has a tendency to very aggressively crawl websites in order to take the ‘snapshot’ of the site as quickly as possible so all the content is in sync. I know that a lot of people like archive.org and support their project.
I despise ia_archiver and archive.org; not for what they are doing but for their mentality that is the basis for their methodology. Their actions are all predicated on a false assumption: that copyright doesn’t apply to content published on a web page. They believe that is perfectly legal, and socially acceptable, to make as many snapshot copies of your entire website as they want and to make those copies of your website available on their website. Fair Use does not apply when you copy the entirety of someone’s work to be re-published somewhere else.
Their defense is that they have a page on their site informing people of the method they support for not being crawled by ia_archiver. This defense is about as weak as it gets. This would be the equivalent of a music pirate saying “Well, Guns and Roses never called me up and told me not to dupe their CDs.”
Imagine this scenario: You are an author of books. You are at the library one day. You see someone take one of your books off the shelf and walk over to the photocopier where they proceed to photocopy every single page of your book.
By law the above scenario is illegal. It is a violation of your copyright, unless you have explicitly stated in your book that duplication and/or redistribution are allowed. You control the use of your work and this person needed to have your written permission to make that copy.
According to the logic of archive.org, your book being available in the library gave them permission to make a copy. Their premise is that if you make your website available to the public then they can make copies. In fact, they expect you to figure out that they exist and have been making copies of your site, then find the page on their website about being excluded from their copy process, and then set up the robots.txt file entry they require in order for them to honor your copyright and stop making copies. In a digital sense, rather than them requesting written permission to copy your website they are requiring you to digitally write them asking them to stop. This is the direct opposite of how copyright protections are supposed to work.
I first encountered archive.org in the late ’90s, before The Wayback Machine existed. The Wayback Machine is the user interface for viewing the copies they make of websites. Back then, they were not making their pilfered content available yet; they were just making their copies. I found ia_archiver in my weblogs, and I had to use a search engine to track down who it was. Unlike other more honorable web-crawlers ‘ia_archiver’ does not include a URL in the browser string so that webmasters can quickly locate information about their spider. When I found them, and read their mission, I was greatly annoyed. Without my permission they were copying my site and despite copyright law they felt it was my responsibility to take action to stop them from violating my rights. I took that action, and banned ia_archiver, but it galled me that I had to take action when the Berne Convention copyright protections dictated that they needed my permission in the first place.
It is my belief that webmasters and content authors should have to ‘opt-in’ to be included in archive.org. If you sign up to use the webmaster tools at Google or Yahoo you have to take action to prove the website is yours; and so it should be with archive.org. Someone should have to register their website for archiving, and then prove they own the site/content. Making someone opt-out is what spammers do, not legitimate businesses.
On the one hand I hope that Suzanne Shell wins her dispute. On the other, if she wins on the basis of the one claim that the courts are allowing her it will still mean that we, as authors and copyright holders, will have to take action in order to create an electronic contract banning archive.org. We shouldn’t have to do that. It should be an accepted fact that barring permission nobody can wholesale copy all of your content and reproduce it elsewhere.
Is it any wonder that there is widespread piracy of music and video on the internet when sites like archive.org operate under the premise that if it is available online it is legal to make copies? With archive.org as an example of ‘internet mentality’ it is no wonder that the RIAA and MPAA are so fearful of digital content distribution.
Colorado Woman Sues To Hold Web Crawlers To Contracts – Technology News by InformationWeek
By Thomas Claburn
InformationWeek
March 16, 2007 05:00 PMComputers can enter into contracts on behalf of people. The Uniform Electronic Transactions Act UETA says that a “contract may be formed by the interaction of electronic agents of the parties, even if no individual was aware of or reviewed the electronic agents actions or the resulting terms and agreements.”
This presumes a prior agreement to do business electronically.
So what constitutes such an agreement? The Internet Archive, which spiders the Internet to copy Web sites for posterity unless site owners opt out, is being sued by Colorado resident and Web site owner Suzanne Shells for conversion, civil theft, breach of contract, and violations of the Racketeering Influence and Corrupt Organizations act and the Colorado Organized Crime Control Act.
Shells site states, “IF YOU COPY OR DISTRIBUTE ANYTHING ON THIS WEB SITE, YOU ARE ENTERING INTO A CONTRACT,” at the bottom of the main page, and refers readers to a more detailed copyright notice and agreement. Her suit asserts that the Internet Archives programmatic visitation of her site constitutes acceptance of her terms, despite the obvious inability of a Web crawler to understand those terms and the absence of a robots.txt file to warn crawlers away.
A court ruling last month granted the Internet Archives motion to dismiss the charges, except for the breach of contract claim.
In a post on law professor Eric Goldmans Technology & Marketing Law blog, attorney John Ottaviani, a partner at Edwards & Angell in Providence, R.I., says the issue is “whether there was an adequate notice of the existence of the terms and a meaningful opportunity to review the terms.”
If a notice such as Shells is ultimately construed to represent just such a “meaningful opportunity” to an illiterate computer, the opt-out era on the Net may have to change. Sites that rely on automated content gathering like the Internet Archive, not to mention Google, will have to convince publishers to opt in before indexing or otherwise capturing their content. Either that or theyll have to teach their Web spiders how to read contracts.
[View the complete article by following the link at the top of this block quote section.]