MIT Technology Review: “This new tool helps you figure out which ones to trust. In many high-stakes situations, large language models are not worth the risk. Knowing which outputs to throw out might fix that. Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk. A new tool created by Cleanlab, an AI startup spun out of a quantum computing lab at MIT, is designed to give high-stakes users a clearer sense of how trustworthy these models really are. Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets people choose which responses to trust and which to throw out. In other words: a BS-o-meter for chatbots tests. With hopes and fears about the technology running wild, it’s time to agree on what it can and can’t do. Cleanlab hopes that its tool will make large language models more attractive to businesses worried about how much stuff they invent. “I think people know LLMs will change the world, but they’ve just got hung up on the damn hallucinations,” says Cleanlab CEO Curtis Northcutt. Chatbots are quickly becoming the dominant way people look up information on a computer. Search engines are being redesigned around the technology. Office software used by billions of people every day to create everything from school assignments to marketing copy to financial reports now comes with chatbots built in. And yet a study put out in November by Vectara, a startup founded by former Google employees, found that chatbots invent information at least 3% of the time. It might not sound like much, but it’s a potential for error most businesses won’t stomach…”
Sorry, comments are closed for this post.