A note on data and databases

DHS’ databases — what information they contain, how they sort information or share, how they operate — are shrouded in secrecy. ln recent years there has been increased scrutiny and exposure of the most powerful and harmful contractors that power many of DHS’ newest tech systems and databases. Palantir, Northrop Grumman, and Amazon are among this list. In this report, we will focus less on the role of individual corporations and their specific tech contributions (knowing that crucial work is happening), and more on data criminalization as a violent legacy technical process with information bottlenecks and weak links.

It is easy to get deep in the weeds when trying to understand a database or technological feature. It is more challenging still to piece together how these work when one is unable to see an interface or use a technology. Most law enforcement and DHS databases are connected to each other in multiple ways, and different iterations of the same system of records may be connected to dozens of contract vendors, some of which come and go. Old “legacy” data systems sometimes change names or functions, or gain “components” as technologies advance. A “System of Records” may actually include records stored in multiple databases, while a single database may include data classified for Privacy Act purposes as part of several “Systems of Records.” While there are a couple dozen databases and analytic systems that DHS clearly relies on, the agency owns more than 900 databases.Footnote 1

Our sources

The processes we detail are based on government reports and audits, publicly available FOIA-requested material of internal documents and correspondence, court testimonies, old user manuals, Privacy Impact Assessments (PIAs) and descriptions from law enforcement websites. Information about law enforcement technology that is publicly available tends to be old, and many of the descriptions and screen captures date back to 2011 or earlier. Federal agency self-descriptions are not always accurate or complete. When it is possible, we cite sources that are more complete (even if they are older) over those that are newer but sparse on details and references. Based on the limited amount of sources on these systems, it is not always clear (or perhaps knowable) exactly which parts of the automated data criminalization process are strictly machine-to-machine, when human intervention is required due to technological need or bureaucratic requirement, when it is merely a formality or rubber-stamp approval of de facto automated decisions, and when it is optional. We include descriptions nonetheless because although some elements are clearly outdated, we think that there is value in understanding how these automated systems between agencies were built to function in the data criminalization feedback loop. 

Often, the main sources of information about government criminalizing databases are government-written Privacy Impact Assessments (PIAs) and System of Record Notices (SORNs). We cite them but also take them with a grain of salt. 

PIAs are supposed to be conducted before a government agency develops or procures IT systems or projects that collect, disseminate, maintain, or dispose of personally identifying information (PII) about members of the public, in order to assess the associated privacy risks.Footnote 2 They are written by the “Project Manager/System Owner,” “in consultation with the department’s Chief Privacy Officer.”Footnote 3 Because the objective of PIAs is to fulfill the 2002 eGovernment Act requiring all federal agencies to provide “sufficient protections for privacy of personal information,” they are not a flawless source for our purposes.Footnote 4 PIAs are legally required to comply with current privacy law and “provide basic documentation on the flow of personal information” within IT systems across government staff and contractors, but they do not require detailed description of database sharing. Additionally, technologies, databases and systems are often so interlinked and enmeshed that when PIAs break them down into separate reports, inaccuracies and misunderstandings from the arbitrary separation may be introduced. There is no evidence to indicate that a penalty is imposed on an agency or its staff for failure to issue a PIA, or for issuing an incomplete or inaccurate PIA. 

SORNs are legally binding public notifications that identify and document the purpose for a “system of records,” the individuals profiled in the system, the types of records in the system, and how the information is shared. They are required by the Privacy Act of 1974 and are published in the Federal Register for public comment.Footnote 5 SORNs are supposed to explain how information is used, retained, and may be accessed or corrected, and whether certain portions of the system are subject to Privacy Act exemptions. Like PIAs, SORNs are written by the program manager, who works with a Component Privacy Office and their legal counsel for submission to the DHS Privacy Office for review and approval by the Chief Privacy Officer. Operating a “system of records” without first publishing a SORN in the Federal Register is a criminal offense on the part of the responsible agency officials, but this criminal law is never enforced.

Generally, the requirements to conduct PIAs are broader than for SORNs, because PIAs are required when a system collects any PII. SORNs, on the other hand, are triggered only when the PII is “retrieved by a personal identifier” — which might be a person’s name, address, phone number, or biometric data.Footnote 6

Government statistics

Furthermore, the way that DHS parses its data can be misleading. DHS and the government are infamous for concealing and misrepresenting data. For example, in 2019, then-Vice President Mike Pence asserted on CNN that over 90% of migrants don’t show up in immigration court. The Washington Post fact checked this statement, and determined that: 

  1. Pence was referencing a statistic describing the results of the controversial “rocket docket” pilot program, which fast-tracked 7,000 cases through immigration courts in ten cities; 
  2. the Justice Department’s number for how many migrants did not show up in court for that period was 44 percent (half of Pence’s 90 percent) — but that number was based only counting final decisions, not pending cases; 
  3. when pending cases as well as final decisions were counted by Syracuse University’s Transactional Records Access Clearinghouse (TRAC), TRAC calculated that 81 percent of migrant families actually attended all their court hearings during the period in question.Footnote 7
  4. finally, the number of no-shows was artificially inflated because of Trump-era policies such as Remain in Mexico that made it effectively impossible for many people to attend court when scheduled.Footnote 8

TRAC files FOIAs and follows them up with lawsuits in order to receive individual case data of every removal from the Executive Office for Immigration Review (EOIR) — and using the government’s own records, TRAC has debunked some of the agency’s claims. TRAC is currently suing DHS for withholding, since 2017, information previously published: whether deportations actually result from its use of detainers under the program Secure Communities, how and when ICE took individuals into custody, and the full details for any criminal history for those who were deported.Footnote 9  

For this report, we lean heavily on TRAC’s work but also cite government data where it exists, noting its limits here.

A note on language

In general, we use the words “deportable” and “deportation” rather than the euphemistic terms “removable” and “removal,” but we do use “removal” when the term refers to a specific legal or technical meaning. We also want to note that we use both the term “migrant” and “immigrant" throughout the report. This is because US laws generally refer to “aliens” and “immigrants,” but the term “migrant” better describes the hegemonic oppression that forces the displacement of people across the world.

Likewise, we use the word “train” to describe the technical process of “teaching” machines to discern useful information, identify patterns, and make predictions — even though we resist conferring legitimacy onto the process, which is very much shaped and developed by humans and human biases.

Jump to top of page