A Little Bundle of Gross AI Stories

# A Little Bundle of Gross AI Stories [Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal](https://www.404media.co/google-is-the-only-search-engine-that-works-on-reddit-now-thanks-to-ai-deal/) by Emanuel Maiberg for 404 Media [Anthropic AI Scraper Hits iFixit’s Website a Million Times in a Day](https://www.404media.co/anthropic-ai-scraper-hits-ifixits-website-a-million-times-in-a-day/) by Jason Koebler for 404 Media [AI crawlers need to be more respectful](https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/) by Eric Holscher for Read the Docs all via Andy Baio [at](https://waxy.org/2024/07/reddit-blocks-all-crawlers-in-robots-txt-gives-exclusive-access-to-google-with-ai-deal/) [waxy](https://waxy.org/2024/07/aggressive-ai-crawlers-leads-to-surprise-bandwidth-bills/) All of these articles showcase the gross and bullish nature of AI scrapers and the power of a .txt file. These AI companies have a flagrant disregard for policies, terms, and copyright. One of the more brash displays of asking for forgiveness instead of permission I've witness from big tech in a long time. > Google is now the only search engine that can surface results from Reddit, making one of the web’s most valuable repositories of user generated content exclusive to the internet’s already dominant search engine. > > If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn’t rely on Google’s indexing and search Reddit by using “site:reddit.com,” you will not see any results from the last week. > The web scraper bot for Anthropic’s AI chatbot Claude hit iFixit’s website nearly a million times in a single day, despite the repair database having terms of service provisions that state “reproducing, copying or distributing any Content, materials or design elements on the Site for any other purpose, including training a machine learning or AI model, is strictly prohibited without the express prior written permission of iFixit.” > One crawler downloaded 73 TB of zipped HTML files in May 2024, with almost 10 TB in a single day. This cost us over $5,000 in bandwidth charges, and we had to block the crawler.