Computational Analysis of Official Secrecy – a project by historians, data scientists, legal scholars, and transparency advocates from Columbia University: “The enormous growth in the number of official documents – many of them withheld from scholars and journalists even decades later – has raised serious concerns about whether traditional research methods are adequate for ensuring government accountability. But the millions of documents that have been released, often in digital form, also create opportunities to use Natural Language Processing (NLP) and statistical/machine learning to explore the historical record in very new ways. Historians, journalists, legal scholars, statisticians, and computer scientists are joining together to determine whether novel statistical/machine learning methodologies can accelerate the declassification process, or at least help illuminate the broad patterns of official secrecy. Challenges we will consider include:
- Attributing authorship to anonymous documents
- Characterizing attributes of redacted text
- Modeling spatial and temporal patterns of diplomatic communications
The featured projects indicate some of the preliminary work we have done. More fully-developed versions will be made available to the public as they become ready. The long-range goal is to create a cloud-based virtual archive. It would aggregate the digitized documents now scattered across dozens of different repositories, offer a place for scholars and journalists to upload their own archival finds, and provide a range of visualization and attribution tools to advance research on the history, and future, of world politics.” [Secrecy News]