EFF: “Both OpenAI and Google have released guidance for website owners who do not want the two companies using the content of their sites to train the company’s large language models (LLMs). We’ve long been supporters of the right to scrape websites—the process of using a computer to load and read pages of a website for later analysis—as a tool for research, journalism, and archivers. We believe this practice is still lawful when collecting training data for generative AI, but the question of whether something should be illegal is different from whether it may be considered rude, gauche, or unpleasant. As norms continue to develop around what kinds of scraping and what uses of scraped data are considered acceptable, it is useful to have a tool for website operators to automatically signal their preference to crawlers. Asking OpenAI and Google (and anyone else who chooses to honor the preference) to not include scrapes of your site in its models is an easy process as long as you can access your site’s file structure…”
Sorry, comments are closed for this post.