Library of Congress Blog – The Signal: “The Digital Content Management section has been working to extract and make available sets of files from the Library’s significant Web Archives holdings. The outcome of the project is a series of web archive file datasets, each containing 1,000 files of related media types selected from .gov domains. You can read more about this series here. PowerPoint presentations have become a nearly ubiquitous form of communication document in the digital era. At the most basic level, PowerPoint files present a sequence of slides containing text, images and multimedia. Today, we are excited to share out a dataset of 1,000 random slide decks from U.S. government websites, collected via the Library of Congress Web Archive, such as the presentation on transporting hazardous materials in Figure 1. You can download a CSV file of data about the files, you can learn more about the dataset from this README, and you can also download the entire 3.7 GB dataset of the actual files…”
Sorry, comments are closed for this post.