palewire – Ben Welsh: “In total, 546 of 1,149 news publishers surveyed by the homepages.news archive have instructed OpenAI, Google AI or the non-profit Common Crawl to stop scanning their sites, which amounts to 47.5% of the sample. The three organizations systematically crawl web sites to gather the information that fuels generative chatbots like OpenAI’s ChatGPT and Google’s Bard. Publishers can request that their content be excluded by opting out via the robots.txt convention. The open-source system run by homepages.news gathers each news site’s robots.txt file twice per day. This page continually updates with the latest results. Here are the current totals for each crawler…”