Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Privacy

Why won’t Google give a straight answer on whether Bard was trained on Gmail data?

Skiff Blog: “… Google’s Smart Compose feature was trained on Gmail users’ private emails.Bard is not Google’s only language-focused machine learning model. Anyone who’s used Gmail in the past few years knows about the Smart Compose and Smart Reply features, which auto-complete sentences for you as you go.According to Google’s 2019 paper introducing Smart Compose, the feature was trained on “user-composed emails.” Along with the email’s contents, the model also made use of these emails’ subjects, dates and locations. So it’s plainly true that some of Google’s language models have been trained on Gmail users’ emails. Google has not confirmed whether any training data is shared between these earlier models and Bard, but the idea that a new model would build on the strengths of another doesn’t seem far-fetched…the fact that both Smart Compose and Smart Reply were unambiguously trained on Gmail users’ data seems to be an underappreciated topic of public interest in its own right, which brings us to point 3…3. Google researchers have extensively documented the risk of leaking private data from their own machine-learning models, some of which are acknowledged to be trained on “private text communications between users.”In a 2021 paper, Google researchers laid out the privacy risks presented by large language models. They wrote:“The most direct form of privacy leakage occurs when data is extracted from a model that was trained on confidential or private data. For example, GMail’s autocomplete model [10] is trained on private text communications between users, so the extraction of unique snippets of training data would break data secrecy.”As part of this research, Google’s scientists demonstrated their ability to extract “memorized” data — meaning raw training data that reveals its source — from OpenAI’s GPT-2. They emphasized that — although they had chosen to probe GPT-2 because it posed fewer ethical risks since it was trained on publicly available data — the attacks and techniques they laid out in their research “directly apply to any language model, including those trained on sensitive and non-public data”, of which they cite Smart Compose as an example. 4. Google has never denied that Bard was trained on data from Gmail. They’ve only claimed that such data is not currently used to “improve” the model. This point is subtle but significant. Following the controversy around AI researcher Kate Crawford’s tweet, Google crafted an official response to questions about Bard’s use of Gmail data (after having deleted a more immediate response discussed in point 1 above). That statement, which they added to Bard’s FAQ page, is:“Bard responses may also occasionally claim that it uses personal information from Gmail or other private apps and services. That’s not accurate, and as an LLM interface, Bard does not have the ability to determine these facts. We do not use personal data from your Gmail or other private apps and services to improve Bard.”There are two important details in this statement. One is the use of the adjective “personal”. Google has not said that it’s inaccurate that Bard uses information from Gmail, only that it’s inaccurate that it uses personal information from Gmail. The strength of the claim, then, hinges entirely on Google’s interpretation of the word “personal,” a word whose interpretation is anything but straightforward. The other, possibly more significant, detail is that Google has conspicuously never used the past tense in its denials of Bard’s use of Gmail data. In their first tweet on the subject, Google said Bard “is not trained on Gmail data” and in the official FAQ, they write that they do not “use personal data from your Gmail or other private apps and services to improve Bard.” Neither of these statements is inconsistent with Bard having been trained on Gmail data in the past…”

FTC Finds Amazon Ring Cameras Responsible for “Egregious Violations of Users’ Privacy,” Requires Data Deletion

EPIC: “In a proposed consent order released today, the Federal Trade Commission will require Amazon to “delete data products such as data, models, and algorithms derived from videos it unlawfully reviewed,” implement new privacy and security measures, and pay a fine of $5.8 million. The proposed order was published alongside a complaint finding that Amazon… Continue Reading

LLRX May 2023 Issue

Is using Generative AI just another form of outsourcing?– Is the implementation of generative AI simply a new flavor of outsourcing? How does this digital revolution reflect on our interpretation of the American Bar Association’s (ABA) ethical guidelines? How can we ensure that we maintain the sacrosanct standards of our profession as we step into… Continue Reading

Pete Recommends – Weekly highlights on cyber security issues, May 27, 2023

Via LLRX – Pete Recommends – Weekly highlights on cyber security issues, May 27, 2023 – Privacy and cybersecurity issues impact every aspect of our lives – home, work, travel, education, health and medical records – to name but a few. On a weekly basis Pete Weiss highlights articles and information that focus on the increasingly… Continue Reading

Driver’s Licenses, Addresses, Photos: Inside How TikTok Shares User Data

The New York Times [alternate free link]: “Employees of the Chinese-owned video app have regularly posted user information on a messaging and collaboration tool called Lark, according to internal documents… In August 2021, TikTok received a complaint from a British user, who flagged that a man had been “exposing himself and playing with himself” on… Continue Reading

Pete Recommends – Weekly highlights on cyber security issues, May 20, 2023

Via LLRX – Pete Recommends – Weekly highlights on cyber security issues, May 20, 2023. Privacy and cybersecurity issues impact every aspect of our lives – home, work, travel, education, health and medical records – to name but a few. On a weekly basis Pete Weiss highlights articles and information that focus on the increasingly complex… Continue Reading

Digital Privacy Legislation is Civil Rights Legislation

EFF: “Our personal data and the ways private companies harvest and monetize it plays an increasingly powerful role in modern life. Corporate databases are vast, interconnected, and opaque. The movement and use of our data is difficult to understand, let alone trace. Yet companies use it to reach inferences about us, leading to lost employment,… Continue Reading

Position Paper: Escaping Academic Cloudification to Preserve Academic Freedom

Fiebig, T., Gürses, S., & Lindorfer, M. (2022). Position Paper: Escaping Academic Cloudification to Preserve Academic Freedom. Privacy Studies Journal, 1, 51–68. https://doi.org/10.7146/psj.vi.132713 “Especially since the onset of the COVID-19 pandemic, the use of cloud-based tools and solutions – lead by the ‘Zoomification’ of education, has picked up attention in the EdTech and privacy communities.… Continue Reading

Beyond the Safeguards: Exploring the Security Risks of ChatGPT

6 major risks of using ChatGPT, according to a new study – Beyond the Safeguards: Exploring the Security Risks of ChatGPT. Erik DernerA and Kristina Batisti, 13 May 2023. arXiv:2305.08005 “The increasing popularity of large language models (LLMs) such as ChatGPT has led to growing concerns about their safety, security risks, and ethical implications. This… Continue Reading

Your DNA Can Now Be Pulled From Thin Air. Privacy Experts Are Worried

The New York Times: “Environmental DNA research has aided conservation, but scientists say its ability to glean information about human populations and individuals poses dangers. David Duffy, a wildlife geneticist at the University of Florida, just wanted a better way to track disease in sea turtles. Then he started finding human DNA everywhere he looked.… Continue Reading