Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

AI System Sorts News Articles By Whether or Not They Contain Actual Information

Motherboard: “…In a recent paper published in the Journal of Artificial Intelligence Research, computer scientists Ani Nenkova and Yinfei Yang, of Google and the University of Pennsylvania, respectively, describe a new machine learning approach to classifying written journalism according to a formalized idea of “content density.” With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles. At a high level this works like most any other machine learning system. Start with a big batch of data—news articles, in this case—and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers…”

Sorry, comments are closed for this post.