The Web Archive’s Wayback Machine is the most recent sufferer of Reddit’s crackdown on information entry. The corporate has begun to position new restrictions on what the archive website will be capable to entry in a transfer that can considerably restrict the Wayback Machine’s means to protect info from Reddit.
With the change, the Wayback Machine, a venture run by the nonprofit Web Archive, will solely be capable to crawl Reddit’s homepage. It would now not be capable to entry feedback, subreddit pages, put up particulars, profiles and different information.
The transfer is the most recent step Reddit has taken on its quest to restrict AI firms’ means to make use of its information to coach giant language fashions with out paying licensing charges. It is also a notably totally different stance than the corporate took final yr, when it explicitly stated that it could not restrict “good religion actors,” together with the Web Archive. It is not clear what precisely has modified since then. Reddit appears to imagine that AI firms are circumventing its guidelines by scraping information by way of the Wayback Machine. We have reached out to the Web Archive for remark.
Information licensing has turn out to be a big enterprise for Reddit. The corporate has struck multimillion-dollar offers with OpenAI and Google that permit them to make use of Reddit posts to assist prepare their AI fashions. On the identical time, Reddit has taken an more and more hardline stance towards firms that try to make use of its information with out such preparations. Earlier this yr, the corporate sued Anthropic, alleging it scraped Reddit for years with out permission.