Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Daily Archives: May 31, 2023

Why won’t Google give a straight answer on whether Bard was trained on Gmail data?

Skiff Blog: “… Google’s Smart Compose feature was trained on Gmail users’ private emails.Bard is not Google’s only language-focused machine learning model. Anyone who’s used Gmail in the past few years knows about the Smart Compose and Smart Reply features, which auto-complete sentences for you as you go. According to Google’s 2019 paper introducing Smart Compose, the feature was trained on “user-composed emails.” Along with the email’s contents, the model also made use of these emails’ subjects, dates and locations. So it’s plainly true that some of Google’s language models have been trained on Gmail users’ emails. Google has not confirmed whether any training data is shared between these earlier models and Bard, but the idea that a new model would build on the strengths of another doesn’t seem far-fetched…the fact that both Smart Compose and Smart Reply were unambiguously trained on Gmail users’ data seems to be an underappreciated topic of public interest in its own right, which brings us to point 3…3. Google researchers have extensively documented the risk of leaking private data from their own machine-learning models, some of which are acknowledged to be trained on “private text communications between users.”In a 2021 paper, Google researchers laid out the privacy risks presented by large language models. They wrote:“The most direct form of privacy leakage occurs when data is extracted from a model that was trained on confidential or private data. For example, GMail’s autocomplete model [10] is trained on private text communications between users, so the extraction of unique snippets of training data would break data secrecy.”As part of this research, Google’s scientists demonstrated their ability to extract “memorized” data — meaning raw training data that reveals its source — from OpenAI’s GPT-2. They emphasized that — although they had chosen to probe GPT-2 because it posed fewer ethical risks since it was trained on publicly available data — the attacks and techniques they laid out in their research “directly apply to any language model, including those trained on sensitive and non-public data”, of which they cite Smart Compose as an example. 4. Google has never denied that Bard was trained on data from Gmail. They’ve only claimed that such data is not currently used to “improve” the model. This point is subtle but significant. Following the controversy around AI researcher Kate Crawford’s tweet, Google crafted an official response to questions about Bard’s use of Gmail data (after having deleted a more immediate response discussed in point 1 above). That statement, which they added to Bard’s FAQ page, is:“Bard responses may also occasionally claim that it uses personal information from Gmail or other private apps and services. That’s not accurate, and as an LLM interface, Bard does not have the ability to determine these facts. We do not use personal data from your Gmail or other private apps and services to improve Bard.”There are two important details in this statement. One is the use of the adjective “personal”. Google has not said that it’s inaccurate that Bard uses information from Gmail, only that it’s inaccurate that it uses personal information from Gmail. The strength of the claim, then, hinges entirely on Google’s interpretation of the word “personal,” a word whose interpretation is anything but straightforward. The other, possibly more significant, detail is that Google has conspicuously never used the past tense in its denials of Bard’s use of Gmail data. In their first tweet on the subject, Google said Bard “is not trained on Gmail data” and in the official FAQ, they write that they do not “use personal data from your Gmail or other private apps and services to improve Bard.” Neither of these statements is inconsistent with Bard having been trained on Gmail data in the past…”

FTC Finds Amazon Ring Cameras Responsible for “Egregious Violations of Users’ Privacy,” Requires Data Deletion

EPIC: “In a proposed consent order released today, the Federal Trade Commission will require Amazon to “delete data products such as data, models, and algorithms derived from videos it unlawfully reviewed,” implement new privacy and security measures, and pay a fine of $5.8 million. The proposed order was published alongside a complaint finding that Amazon… Continue Reading

AI machines aren’t ‘hallucinating’. But their makers are

The Guardian: “Inside the many debates swirling around the rapid rollout of so-called artificial intelligence, there is a relatively obscure skirmish focused on the choice of the word “hallucinate”. This is the term that architects and boosters of generative AI have settled on to characterize responses served up by chatbots that are wholly manufactured, or… Continue Reading

Our model suggests that global deaths remain 5% above pre-covid forecasts

The Economist [free to read at this link] – Attributing this increase to covid would make it the fourth-leading cause of death: “n May 5th the World Health Organisation declared an end to the covid-19 public-health emergency. Based on official mortality counts, this looked tardy. By April 2022, average weekly death tolls had already fallen… Continue Reading

Test driving Google’s Search Generative Experience

Search Engine Land, Eric Enge: “I’ve had access to Google’s new Search Generative Experience (SGE) for about a week now. I decided to “formally” put it to the test using the same 30 queries from my March mini-study comparing the top generative AI solutions. Those queries were designed to push the limits of each platform.… Continue Reading

10 ways to speed up your internet connection today

ZDNET: “Are you suffering from slow internet speeds at home? Connectivity drops, bottlenecks, lagged content streaming and downloads, and slow speeds are all common problems with home internet services — and it may not be the fault of your internet service provider (ISP). True, the routers typically provided by ISPs are basic and may not… Continue Reading

Safe and just Earth system boundaries

Rockström, J., Gupta, J., Qin, D. et al. Safe and just Earth system boundaries. Nature (2023). https://doi.org/10.1038/s41586-023-06083-8 [free PDF download] “The stability and resilience of the Earth system and human well-being are inseparably linked yet their interdependencies are generally under-recognized; consequently, they are often treated independently. Here, we use modelling and literature assessment to quantify… Continue Reading