Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Search Engines

We’re about to enter the Digital Dark Ages

The Business Insider – Online archives are vanishing — and they’re taking our history with them. “The long-promised digital apocalypse has finally arrived, and it was heralded by a blog post. Published on July 18, the post’s headline sounded pretty arcane. “Google URL Shortener links will no longer be available,” it declared. I know, I know — not exactly an attack of alien zombies from the death dimension. But the news nevertheless freaked me out. It means another swath of the web is about to disappear. Here’s the gist: Google used to have an online service that generated pithy, user-friendly versions of long, commercially unwieldy uniform resource locators — the key addresses that identify everything on the web. Shorter URLs are easier to track and better for online commerce. Google stopped shortening addresses back in 2019, but the concise URLs it had already created kept right on doing their job. Click on one and it would take you to the right webpage, the way it’s supposed to. No more. In the blog post, Google announced that as of next year, all of the existing shortened URLs are getting turned off. Poof. And on the web, if your URL doesn’t work, you might as well not exist. You are unreachable. Without laborious renaming, everything behind those links — billions of them, a decade of digital content — will become inaccessible. Gone. Ask not for whom the 404 message tolls. Now, rendering a bunch of web content invisible isn’t the end of days. Not by itself. The problem is, this kind of thing keeps happening. And it’s getting worse. Social networks go bankrupt. Digital journalism sites close up shop. Companies pull their online products. Links rot. Files get not found. The cloud, as wags have noted, is really just “someone else’s computers.” And when clouds get turned off, not even the silver lining is left to tell the tale. Maybe none of this matters much right now. But it will. The internet has become the default archive of our history and culture. And the whole thing is burning down before our eyes, like the Library of Alexandria — only worse. For the first time since people started carving letters into rocks, we’re making a time with no history. We’re about to enter the Digital Dark Ages.

Attempts to quantify the scope of the problem are heartbreaking. Half of links in US Supreme Court decisions no longer lead to the information being cited. A report in 2021 found that a full quarter of the more than 2.2 million hyperlinks on The New York Times website were broken. Even worse, the Pew Research Center estimates that a quarter of everything put on the web from 2013 to 2023 is inaccessible — meaning almost 40% of the web as it existed in 2013 is simply not there today, a decade later…”

Washington Post Leverages ‘AI’ To Undermine History And Make Search Less Useful

TechDirt: “While “AI” (language learning models) certainly could help journalism, the fail upward brunchlords in charge of most modern media outlets instead see the technology as a way to cut corners, undermine labor, badly automate low-quality, ultra-low effort, SEO-chasing clickbait, and rush undercooked solutions to nonexistent problems to market under the pretense of progress.  For example,… Continue Reading

Location data firm helps police find out when suspects visited their doctor

Ars Technica: “A location-tracking company that sells its services to police departments is apparently using addresses and coordinates of doctors’ and lawyers’ offices and other types of locations to help cops compile lists of places visited by suspects, according to a 404 Media report published today. Fog Data Science, which says it “harness[es] the power… Continue Reading

Dow Jones negotiates AI usage agreements with nearly 4,000 news publishers

NiemanLab: “…Last month, Factiva announced it had signed generative AI usage agreements with nearly 4,000 publishers around the world. The agreements are for the business intelligence platform and news database, which houses articles by online outlets, newspapers, magazines, and transcripts of radio shows. Among the thousands of publishers who signed the agreements are The Associated… Continue Reading

CREAT: Census Research Exploration and Analysis Tool

The Census Research Exploration and Analysis Tool CREAT is a data tool from the Center for Economic Studies (CES) at the US Census Bureau that uses natural language processing and artificial intelligence tools to analyze, categorize, and sort the economic research contained in the CES working paper series. The goal of this project is to… Continue Reading

Searchable archive of DOJ Civil Rights Division reports and findings letters

Tyler McBrien. DOJ Police Department Pattern or Practice Reports and Findings Letters. A searchable archive of the Department of Justice’s Civil Rights Division reports from investigations into patterns and practices of excessive force, biased policing, and other unconstitutional practices by law enforcement. Continue Reading

CFPB Orders Federal Supervision of Google Following Contested Designation

The Consumer Financial Protection Bureau (CFPB) today published an order establishing supervisory authority over Google Payment Corp. The CFPB is responsible for supervising a wide range of financial firms to ensure they are complying with federal consumer financial protection laws. The CFPB has supervised nonbank entities in certain industries like mortgage and payday lending, service… Continue Reading

New EDGAR advanced search gives you access to the full text of electronic filings since 2001

What is Full-Text Search? Full-Text Search will allow you to search the full text of all EDGAR filings submitted electronically since 2001. The full text of a filing includes all data in the filing itself as well as all attachments (such as exhibits) to the filing. What kinds of searches can I do on Full-Text… Continue Reading

100 million places

Data is Plural: “Foursquare has released an open dataset describing more than 100 million points of interest across 200+ countries. For each place, the dataset includes its name, address, latitude/longitude, date entered, date updated, date marked closed, telephone number, website, email address, and relevant categories. Among the many possible labels: casino, comedy club, 300+ kinds… Continue Reading

How ChatGPT Search (Mis)represents Publisher Content

Columbia Journalism Review – “ChatGPT search—which is positioned as a competitor to search engines like Google and Bing—launched with a press release from OpenAI touting claims that the company had “collaborated extensively with the news industry” and “carefully listened to feedback” from certain news organizations that have signed content licensing agreements with the company. In… Continue Reading