Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Daily Archives: August 15, 2024

Exploring Goodreads Data: An Analysis of 10 Million Books

Ammar Alyousfi’s Blog: “Goodreads is one of the largest book websites on the internet. It has data about millions and millions of books from different genres and in many languages. It’s hard not to find a book on Goodreads whether it’s published hundreds of years ago or just a few days ago. Today, I present the analysis results of more than 10 million books on Goodreads. In fact, the original dataset that I used had 50+ million books but I excluded 40 million of them for data quality reasons mentioned later in this article. Goodreads allows you to search for any book and view its info, but there is no way to see all the available books and interact with them. Using the data in this analysis, however, I was able to do just that with millions of titles. Below, I’ll share some interesting findings and provide a method for further exploration at the end. Continue reading to know more about the analysis and the data or you can jump directly to the results section. But don’t also forget to read about how to get the most out of this analysis.”

NationalPublicData.com Hack Exposes a Nation’s Data

Krebs on Security: “A great many readers this month reported receiving alerts that their Social Security Number, name, address and other personal information were exposed in a breach at a little-known but aptly-named consumer data broker called NationalPublicData.com. This post examines what we know about a breach that has exposed hundreds of millions of consumer… Continue Reading

Etsy for Guns?

Court Watch: A 3D Etsy print shop sells ghost gun parts. “When we think about products typically sold on Etsy, it conjures up images of lovely espresso martini scented candles or maybe even drink coasters you can customize as mini vinyl records with your favorite album covers. After all, Etsy describes itself as “the global… Continue Reading

Microsoft Tweaks Fine Print To Warn Everyone Not To Take Its AI Seriously

The Register – “Microsoft is notifying users that its AI services should not be taken too seriously, echoing prior service-specific disclaimers – an update to the IT giant’s Service Agreement, which takes effect on September 30, 2024, Redmond has declared that its Assistive AI isn’t suitable for matters of consequence. “AI services are not designed,… Continue Reading

Google, Amazon and Meta are proposing changes to climate laws that would allow them to hide their actual emission numbers

FT.com: “By its own account, Amazon is a green business leader. The world’s most visited online marketplace and leading cloud services provider says it hit its 100 per cent renewable energy goal seven years ahead of a self-imposed target. But by another, Amazon is a heavy polluter, emitting much more climate-warming greenhouse gases through its… Continue Reading

The new Google AI Overview layout is a small win for publishers

Mashable: “Google’s AI Overviews got off to a rocky start, but it hasn’t deterred the tech giant from charging ahead with foisting AI-generated summaries upon your search results, like it or not. On Thursday Google announced new updates to AI Overviews, some of which might make publishers a little happier. As of today, Google is… Continue Reading

Most Adults are Not Confident They Can Tell Whether Information from AI Chatbots Is True or False

KFF: “…Most U.S. adults are not confident that they can tell what is true versus what is false when it comes to information from AI chatbots, such as Chat-GPT and Microsoft Copilot. Fewer than half say they are either “very confident” (9%) or “somewhat confident” (33%) that they can tell the difference between true and… Continue Reading

Google’s AI Search Gives Sites Dire Choice: Share Data or Die

Bloomberg [unpaywalled] – Publishers say blocking the company’s AI bot could also prevent their sites from showing up in search. Google now displays convenient artificial intelligence-based answers at the top of its search pages — meaning users may never click through to the websites whose data is being used to power those results. But many… Continue Reading