The Declassification Engine

Table of Contents:

Q: What is the declassification engine?

A: The declassification engine refers to a set of applications that will provide new forms of access to official documents as well as tools to help interpret them. This toolkit will expand over time. Currently its main projects are the (de)classifier, a tool displaying cable activity over time for different embassies and topics contrasting the number of documents declassified with the number still withheld, the (de)sanitizer, an app that illuminates previously redacted text to show what kinds of information is considered particularly sensitive, and the sphere of influence, a visualization of diplomatic activity around the globe based on the volume of classified and declassified cable traffic. We hope to combine these and other tools in a platform that will accept user-submitted documents to further improve its capacity to predict what still-classified documents might reveal.

Q: What is the intent of the project?

A: We are using statistical/machine learning methodologies to illuminate the broad patterns of official secrecy. Some secrets need to be protected, and no technology now in view will reveal every operational detail. But we want to help scholars and citizens better understand the basic nature of American foreign policy.

Q: What inspired this effort?

A: The exponential growth in the number of classified documents has created a massive backlog in the declassification process, delaying access to the historical record and raising questions about government accountability. The proportion of documents withheld from researchers in critical collections like the State Department Central Foreign Policy Files has been growing year-by-year. At the same time, short-staffed archivists have had to destroy many millions of electronic records that could prove crucial for data-mining-research.

Q: Where do you get your data?

A: All of our data comes from declassified documents, most directly from the National Archives. For the (De)Classifier and Sphere of Influence, we are using data from the State Departments Central Foreign Policy Files. This consists of declassified electronic telegrams, electronic withdrawal cards for telegrams that remain classified, and reference data for P-reel documents (documents created on paper and microfilmed beginning in January 1974). It includes the text of over a million telegrams, as well as metadata for a total of 1.7 million documents. We are adding new types of declassified documents to our projects all the time. A recent addition is the contents of the Declassified Documents Reference System (DDRS), courtesy of Gale/Cengage Learning. DDRS includes image scans of over a hundred thousand declassified documents. Further work will use data from online reading rooms that host documents released under the Freedom of Information Act.

Q: Are you working with the U.S. government?

A: This project is a collaborative effort of students and faculty at Columbia University. We have sought information from the State Department Office of the Historian and the National Archives to better understand the declassification process as well as the features of declassified documents. But the government has provided no financial support and no special access to declassified documents.

Q: How can this help scholars?

A: The Declassification Engine offers a few unique benefits to scholars in a wide range of focuses, from history to international relations. The Engine includes a variety of dynamic infographics which are designed to collect relevant information from the database, and relay this data in a concise, digestible form. The Sphere of Influence, for instance, breaks down the totality of the cables in the database into geospacial and time components, allowing the user to see approximations of the frequency and directional relationships of diplomatic cables. While we offer these graphic tools to provide meaningful interaction with the data, the end goal of the Declassification Engine is to learn, through Natural Language Processing (NLP) techniques, more about what exactly goes into classifying documents and data as national secrets. These results will be useful for both scholars and organizations that work closely with documents released by the government.

Q: How can I submit documents of my own?

A: We know that countless scholars and journalists have accumulated their own archives of declassified documents. We aim to create a platform where they will be able to submit their finds and use the tools to make new discoveries. If you have documents you would consider submitting, please contact us with a statement of interest, describing the size and nature of your collection, such as the period and subjects it covers, the repositories where you collected it, whether you have text files or just document images, and whether you have associated metadata and would be willing to enter it in our database. This will help us plan our website. If you do plan to contribute, make sure to collect citations as rigorously as you can, noting authorship, collection location, etc. For materials located in digital archives, please note file formats and dates uploaded.