Rowan Philp, GIJN senior reporter: “…In a recent article for the Columbia Journalism Review, Schellmann, Kapoor, and Dallas Morning News reporter Ari Sen explained that AI “machine learning” systems are neither sentient nor independent. Instead, these systems differ from past computer models because, rather than following a set of digital rules, they can “recognize patterns in data.” “While details vary, supervised learning tools are essentially all just computers learning patterns from labeled data,” they wrote. They warned that futuristic-sounding processes like “self-supervised learning” — a technique used by ChatGPT — do not denote independent thinking, but merely automated labeling. “Performance of AI systems is systematically exaggerated… there are conflicts of interest, bias, and accountability issues to watch.” — Sayash Kapoor, Princeton University computer science Ph.D. candidate. So the data labels and annotations that train algorithms — a largely human-driven process that coaches the computer to find similar things — are a major source of questions for investigative reporters on this beat. Do the labels represent the whole population affected by the algorithm? Who entered those labels? Were they audited? Do the training labels embed historic discrimination? For instance, if you simply asked a basic hiring algorithm to evaluate job applicants for a long-standing engineering company, it would likely discriminate against female candidates, because the data it has for most prior hires would most likely overwhelmingly feature “male” labels..”
Sorry, comments are closed for this post.