Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks. Vaidehi Patil, Peter Hase, Mohit Bansal: “Pretrained language models sometimes possess knowledge that we do not wish them to, including memorized personal information and knowledge that could be used to harm people. They can also output toxic or harmful text. To mitigate… Continue Reading