Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Knowledge Management

CancerDB: Datasets about Cancer

“CancerDB is a public domain blog assembling searchable key datasets on cancer for dependable models. The focus is on cancer, cancer preventions and treatments, but the database also includes datasets on closely connected things–from cancer types to organizations and more. CancerDB is for two groups of people: Cancer researchers. CancerDB is organized big data to… Continue Reading

Our World in Data

Research and data to make progress against the world’s largest problems – “Poverty, disease, hunger, climate change, war, existential risks, and inequality: The world faces many great and terrifying problems. It is these large problems that our work at Our World in Data focuses on. Thanks to the work of thousands of researchers around the… Continue Reading

What is The Scale of Life?

Everyday Life in Real Time – “Our site is a “real-time” visualization of the relative scale of different life events and natural phenomena (details on what real-time means below). You can select from various categories, time periods, and some unique units of measure that we created in the dropdowns to modify the counter lists.  Each… Continue Reading

A new tool for copyright holders can show if their work is in AI training data

MIT Technology Review [unpaywalled]: “Since the beginning of the generative AI boom, content creators have argued that their work has been scraped into AI models without their consent. But until now, it has been difficult to know whether specific text has actually been used in a training data set. Now they have a new way… Continue Reading

When scientific citations go rogue: Uncovering ‘sneaked references’

Via LLRX – When scientific citations go rogue: Uncovering ‘sneaked references’ – Reading and writing articles published in academic journals and presented at conferences is a central part of being a researcher. When researchers write a scholarly article, they must cite the work of peers to provide context, detail sources of inspiration and explain differences in… Continue Reading

Microsoft researchers are teaching AI to read spreadsheets

Spreadsheet LLM – Encoding Spreadsheets for Large Language Models: “Spreadsheets are characterized by their extensive two-dimensional grids, flexible layouts, and varied formatting options, which pose significant challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs’ powerful understanding and reasoning capability on spreadsheets.… Continue Reading

News homepages, archived

Data is Plural: “Since launching in March 2022, homepages.news has archived millions of screenshots, performance audits, robots.txt files, accessibility trees, and hyperlink lists from the homepages of 1,100+ news sites. The open-source project, run by journalist Ben Welsh, provides bulk data for each of those assets. The screenshots themselves are stored on the Internet Archive;… Continue Reading

Woefully Insufficient Publisher Policies on Author AI Use Put Research Integrity at Risk

The Scholarly Kitchen: “There is broad consensus in scholarly publishing that AI tools will make the task of ensuring the integrity of the scientific record a Herculean task. However, it seems that many publishers are still struggling to figure out how to address the new issues and challenges that these AI tools present. Current publisher… Continue Reading

webXray

Wired [unpaywalled]- This Machine Exposes Privacy Violations. A former Google engineer has built a search engine, WebXray, that aims to find illicit online data collection and tracking—with the goal of becoming “the Henry Ford of tech lawsuits.”…It’s a search engine for rooting out specific privacy violations anywhere on the web. By searching for a specific… Continue Reading