Table of Contents:
Q: What is the declassification engine?
A: The declassification engine refers to a set of applications that will provide new
forms of access to official documents as well as tools to help interpret them. This
toolkit will expand over time. Currently its main projects are the (de)classifier, a
tool displaying cable activity over time for different embassies and topics
contrasting the number of documents declassified with the number still withheld,
the (de)sanitizer, an app that illuminates previously redacted text to show what
kinds of information is considered particularly sensitive, and the sphere of
influence, a visualization of diplomatic activity around the globe based on the
volume of classified and declassified cable traffic. We hope to combine these and
other tools in a platform that will accept user-submitted documents to further
improve its capacity to predict what still-classified documents might reveal.
Q: What is the intent of the project?
A: We are using statistical/machine learning
methodologies to illuminate the
broad patterns of official secrecy. Some secrets need to be protected, and no
technology now in view will reveal every operational detail. But we want to help
scholars and citizens better understand the basic nature of American foreign
Q: What inspired this effort?
A: The exponential growth in the number of classified documents has created a
massive backlog in the declassification process, delaying access to the historical
record and raising questions about government accountability. The proportion of
documents withheld from researchers in critical collections like the State
Department Central Foreign Policy Files has been growing year-by-year. At the
same time, short-staffed archivists have had to destroy many millions of electronic
records that could prove crucial for data-mining-research.
Q: Where do you get your data?
A: All of our data comes from declassified documents, most directly from the
National Archives. For the (De)Classifier and Sphere of Influence, we are using
data from the State Departments Central Foreign Policy Files. This consists of
declassified electronic telegrams, electronic withdrawal cards for telegrams that
remain classified, and reference data for P-reel documents (documents created
on paper and microfilmed beginning in January 1974). It includes the text of over a
million telegrams, as well as metadata for a total of 1.7 million documents.
We are adding new types of declassified documents to our projects all the time. A
recent addition is the contents of the Declassified Documents Reference System (DDRS),
courtesy of Gale/Cengage Learning. DDRS includes image scans of over a
hundred thousand declassified documents. Further work will use data from online
reading rooms that host documents released under the Freedom of Information
Q: Are you working with the U.S. government?
A: This project is a collaborative effort of students and faculty at Columbia
University. We have sought information from the State Department Office of the
Historian and the National Archives to better understand the declassification
process as well as the features of declassified documents. But the government
has provided no financial support and no special access to declassified
Q: How can this help scholars?
A: The Declassification Engine offers a few unique benefits to scholars in a wide
range of focuses, from history to international relations. The Engine includes a
variety of dynamic infographics which are designed to collect relevant information
from the database, and relay this data in a concise, digestible form.
The Sphere of Influence, for instance, breaks down the totality of the cables in
the database into geospacial and time components, allowing the user to see approximations
of the frequency and directional relationships of diplomatic cables.
While we offer these graphic tools to provide meaningful interaction with the data,
the end goal of the Declassification Engine is to learn, through Natural Language Processing (NLP)
techniques, more about what exactly goes into classifying documents
and data as national secrets. These results will be useful for both scholars and
organizations that work closely with documents released by the government.
Q: How can I submit documents of my own?
A: We know that countless scholars and journalists have accumulated their own
archives of declassified documents. We aim to create a platform where they will
be able to submit their finds and use the tools to make new discoveries. If you
have documents you would consider submitting, please contact us
statement of interest, describing the size and nature of your collection, such as
the period and subjects it covers, the repositories where you collected it, whether
you have text files or just document images, and whether you have associated
metadata and would be willing to enter it in our database. This will help us plan
our website. If you do plan to contribute, make sure to collect citations as
rigorously as you can, noting authorship, collection location, etc. For materials
located in digital archives, please note file formats and dates uploaded.