Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Search Engines

Developer Creates Infinite Maze That Traps AI Training Bots

404 Media – “A pseudonymous coder has created and released an open source “tar pit” to indefinitely trap AI training web crawlers in an infinitely, randomly-generating series of pages to waste their time and computing power. The program, called Nepenthes after the genus of carnivorous pitcher plants which trap and consume their prey, can be deployed by webpage owners to protect their own content from being scraped or can be deployed “offensively” as a honeypot trap to waste AI companies’ resources. “It’s less like flypaper and more an infinite maze holding a minotaur, except the crawler is the minotaur that cannot get out. The typical web crawler doesn’t appear to have a lot of logic. It downloads a URL, and if it sees links to other URLs, it downloads those too. Nepenthes generates random links that always point back to itself – the crawler downloads those new links. Nepenthes happily just returns more and more lists of links pointing back to itself,” Aaron B, the creator of Nepenthes, told 404 Media. “Of course, these crawlers are massively scaled, and are downloading links from large swathes of the internet at any given time,” they added. “But they are still consuming resources, spinning around doing nothing helpful, unless they find a way to detect that they are stuck in this loop.” Human users can see how Nepenthes works by clicking here, though I must warn that the page loads incredibly slowly (on purpose) and links endlessly to pages that load the same way. It looks like this, in practice…”

Perplexity acquires Read.cv, a social media platform for professionals

TechCrunch: “Read.cv, a social media platform for professionals that competed with LinkedIn, has been acquired by AI-powered search engine Perplexity. As part of the deal, Read.cv will begin to wind down operations Friday. Users will be able to export their data, including their profiles, posts, and messages, until May 16. “We’ve long admired Perplexity and… Continue Reading

AI Is Like Tinkerbell: It Only Works If We Believe in It Keep clapping. Louder

Futurism – “There’s no limit to the promise of artificial intelligence. Or at least, there’s no limit to the promises that the powerful make about AI. We’re told by tech companies and their investors that AI has the capacity to transform everything, making us more productive workers and more efficient learners — before eventually making… Continue Reading

Search the BIRLS Database

“Department of Veterans Affairs (the VA). It provides an index to basic biographical information on more than 18 million deceased American veterans who received some sort of veterans benefits in their lifetime, including health care, disability or life insurance policies, educational benefits (the GI Bill), mortgage assistance (VA loans), and more. The BIRLS database includes… Continue Reading

Pete Recommends – Weekly highlights on cyber security issues, January 11, 2025

Pete Recommends – Weekly highlights on cyber security issues, January 11, 202 – Privacy and cybersecurity issues impact every aspect of our lives – home, work, travel, education, finance, health and medical records – to name but a few. On a weekly basis Pete Weiss highlights articles and information that focus on the increasingly complex… Continue Reading

The Depths of Wikipedians

Asterisk Interview with Annie Rauwerda – The Depths of Wikipedians – Asterisk: You’re famous for the Depths of Wikipedia account, where you share factoids from some of the most arcane, interesting, and surprising pages on Wikipedia. But you’re also now a part of the broader Wikipedia community. How did you first get interested in the… Continue Reading

Power of Community Posts: Conversation is the New Influencer

Research by Reddit, about Reddit – At CES 2025, Reddit released research outlining how conversation is becoming an influencer throughout the purchase journey.  “In the last year, our research has shown that 47% of social media users say “irrelevant search terms” are the most frustrating aspects of their product research). That is in part why… Continue Reading

Announcing the Public Domain Image Archive

“After the hundreds (thousands?) of hours trawling through online image collections since the PDR’s inception, we’ve decided it was time to create one of our own! We are really excited to share with you the launch of our new sister-project, the Public Domain Image Archive (PDIA), a curated collection of more than 10,000 out-of-copyright historical… Continue Reading

analytics.usa.gov

U.S. Federal Government Website and App Analytics: “This website provides a window into how people are interacting with the government online. The data come from a unified Google Analytics account for U.S. federal government agencies known as the Digital Analytics Program . This program helps government agencies understand how people find, access, and use government… Continue Reading

Science paper piracy site Sci-Hub shares lots of retracted papers

Ars Technica: “85 percent of invalid papers continue to be shared after they’ve been retracted. Keeping track of when a paper has been retracted can be a challenge. Most scientific literature is published in for-profit journals that rely on subscriptions and paywalls to turn a profit. But that trend has been shifting as various governments… Continue Reading