Court Confirms: Robots.txt Is a Sign, Not a Gate
In a recent lawsuit against OpenAI’s web data scraping, a federal court in the Southern District of New York rejected an argument that failure to follow a robots.txt file violates the Digital Millennium Copyright Act (DMCA). In “OpenAI, Inc. Copyright Infringement Litigation” – the court held that a robots.txt instruction is not a technological measure that effectively controls access to a copyrighted work, likening robots.txt to a “keep off the grass” sign. This outcome aligns with multiple other federal court decisions under the Computer Fraud and Abuse Act (CFAA) that consistently distinguish between signs and technological “gates,” holding that robots.txt files are not technological gates and do not give rise to liability under the CFAA. This outcome is not just legally correct, it’s important for the open internet.
While robots.txt began as a useful tool to convey information to crawlers seeking to build a site index, over time, some website operators have attempted to use it to restrict public access to otherwise publicly available information. Treating robots.txt as a mandate that rather than a request is both oversimplified and dangerous:
🔸 robots.txt is a blunt instrument: It doesn’t just touch copyrighted content; it also sweeps in facts and data that copyright law has never treated as protectable – like basic biographical details. Allowing robots.txt to wall off this kind of information would undermine longstanding copyright principles.
🔸 robots.txt, unchecked, concentrates power and harms competition: it would allow website operators to decide who can efficiently access public information and who cannot, for any reason at all. It would allow companies to restrain competitors from accessing their publicly available information or override the preferences of their own users who may want their data to be visible without restriction. That’s not a healthy model for an open web.
🔸 robots.txt chills lawful uses of public web data: researchers, educators, people with disabilities, archivists, and others rely on automated access to public data for access to knowledge and essential public services. Use of AI tools for common tasks like translating documents, exploring content, and analyzing information are increasingly essential to people everywhere.
🔸 Public information should be readable without permission. Access to knowledge is a fundamental human right. Access to public web data should not be based on whether the reader is a human using a browser or a tool acting on someone’s behalf. Behind every scraper, crawler, or AI agent is a person seeking knowledge.
ARDC’s position has been consistent: robots.txt can be a useful method to convey information to crawlers or downstream data users, but it is not a legal gate or a technological measure to selectively control who may access public information.