Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: E-Records

Why won’t Google give a straight answer on whether Bard was trained on Gmail data?

Skiff Blog: “… Google’s Smart Compose feature was trained on Gmail users’ private emails.Bard is not Google’s only language-focused machine learning model. Anyone who’s used Gmail in the past few years knows about the Smart Compose and Smart Reply features, which auto-complete sentences for you as you go.According to Google’s 2019 paper introducing Smart Compose, the feature was trained on “user-composed emails.” Along with the email’s contents, the model also made use of these emails’ subjects, dates and locations. So it’s plainly true that some of Google’s language models have been trained on Gmail users’ emails. Google has not confirmed whether any training data is shared between these earlier models and Bard, but the idea that a new model would build on the strengths of another doesn’t seem far-fetched…the fact that both Smart Compose and Smart Reply were unambiguously trained on Gmail users’ data seems to be an underappreciated topic of public interest in its own right, which brings us to point 3…3. Google researchers have extensively documented the risk of leaking private data from their own machine-learning models, some of which are acknowledged to be trained on “private text communications between users.”In a 2021 paper, Google researchers laid out the privacy risks presented by large language models. They wrote:“The most direct form of privacy leakage occurs when data is extracted from a model that was trained on confidential or private data. For example, GMail’s autocomplete model [10] is trained on private text communications between users, so the extraction of unique snippets of training data would break data secrecy.”As part of this research, Google’s scientists demonstrated their ability to extract “memorized” data — meaning raw training data that reveals its source — from OpenAI’s GPT-2. They emphasized that — although they had chosen to probe GPT-2 because it posed fewer ethical risks since it was trained on publicly available data — the attacks and techniques they laid out in their research “directly apply to any language model, including those trained on sensitive and non-public data”, of which they cite Smart Compose as an example. 4. Google has never denied that Bard was trained on data from Gmail. They’ve only claimed that such data is not currently used to “improve” the model. This point is subtle but significant. Following the controversy around AI researcher Kate Crawford’s tweet, Google crafted an official response to questions about Bard’s use of Gmail data (after having deleted a more immediate response discussed in point 1 above). That statement, which they added to Bard’s FAQ page, is:“Bard responses may also occasionally claim that it uses personal information from Gmail or other private apps and services. That’s not accurate, and as an LLM interface, Bard does not have the ability to determine these facts. We do not use personal data from your Gmail or other private apps and services to improve Bard.”There are two important details in this statement. One is the use of the adjective “personal”. Google has not said that it’s inaccurate that Bard uses information from Gmail, only that it’s inaccurate that it uses personal information from Gmail. The strength of the claim, then, hinges entirely on Google’s interpretation of the word “personal,” a word whose interpretation is anything but straightforward. The other, possibly more significant, detail is that Google has conspicuously never used the past tense in its denials of Bard’s use of Gmail data. In their first tweet on the subject, Google said Bard “is not trained on Gmail data” and in the official FAQ, they write that they do not “use personal data from your Gmail or other private apps and services to improve Bard.” Neither of these statements is inconsistent with Bard having been trained on Gmail data in the past…”

Driver’s Licenses, Addresses, Photos: Inside How TikTok Shares User Data

The New York Times [alternate free link]: “Employees of the Chinese-owned video app have regularly posted user information on a messaging and collaboration tool called Lark, according to internal documents… In August 2021, TikTok received a complaint from a British user, who flagged that a man had been “exposing himself and playing with himself” on… Continue Reading

CISA, FBI, NSA, MS-ISAC Publish Updated #StopRansomware Guide 

“Updated guide developed through the Joint Ransomware Task Force provides best practices and resources to help organizations reduce the risk of ransomware incidents. The Cybersecurity and Infrastructure Security Agency (CISA), Federal Bureau of Investigation (FBI), National Security Agency (NSA), and Multi-State Information Sharing and Analysis Center (MS-ISAC) today published the #StopRansomware Guide—an updated version of… Continue Reading

Thomson Reuters brings forward vision to redefine the future of professionals with content-driven AI technology

“Thomson Reuters plugin with Microsoft 365 Copilot helps unlock the value of generative AI for legal professionals  a global content and technology company, today brings forward its vision to redefine the future of professionals through generative artificial intelligence (AI). At a time of rapid global innovation, Thomson Reuters is at the forefront, helping its customers… Continue Reading

Google Builds on Tech’s Latest Craze With Its Own A.I. Products

Washington Post: “Google is changing the way we search with AI. It could upend the web. Google Search will start answering some queries directly by generating its own results — a move dreaded by publishers and bloggers..” The New York Times: “On Wednesday [May 10, 2023], at its annual conference in Mountain View, Calif., the… Continue Reading

Pete Recommends – Weekly highlights on cyber security issues, May 7, 2023

Via LLRX – Pete Recommends – Weekly highlights on cyber security issues, May 7, 2023 – Privacy and cybersecurity issues impact every aspect of our lives – home, work, travel, education, health and medical records – to name but a few. On a weekly basis Pete Weiss highlights articles and information that focus on the… Continue Reading

New Tool Shows if Your Car Might Be Tracking You, Selling Your Data

Vice: “A new tool that is free to use for consumers aims to better inform people about the types of data their particular car manufacturer might be collecting and sharing about their identity and driving patterns. The Vehicle Privacy Report tool, made by automotive privacy company Privacy4Cars, is based on a manual and automatic analysis… Continue Reading

Whistleblowers Are the Conscience of Society, Yet Suffer Gravely For Trying to Hold the Rich and Powerful Accountable For Their Sins

Via LLRX –  Whistleblowers Are the Conscience of Society, Yet Suffer Gravely For Trying to Hold the Rich and Powerful Accountable For Their Sins – Lawyer, activist, author, and whistleblower Ashley Gjovik states: “I blew the whistle and was met with an experience so destructive that I did not have the words to describe what… Continue Reading

Chatbots Sound Like They’re Posting on LinkedIn

The Atlantic – “Large language models make things up, but the worse problem may be in how they present those falsehoods…If you spend any time on the internet, you’re likely now familiar with the gray-and-teal screenshots of AI-generated text. At first they were meant to illustrate ChatGPT’s surprising competence at generating human-sounding prose, and then… Continue Reading