Case Study: A Chronicle of Differential Privacy in the 2020 U.S. Census

The Census Bureau’s Dual Mandate
The U.S. Census Bureau operates with a dual mission:
- To conduct and disseminate the decennial census as mandated by Article 1, Section 2 of the U.S. Constitution.
- To ensure confidentiality of respondent data, as reinforced by Title 13 of the U.S. Code.
This balance is critical, as census data determines:
- Representation in the House of Representatives.
- Electoral district boundaries.
- Distribution of $1.5 trillion annually in federal funds.
However, rising privacy concerns and advancements in computational technology have threatened traditional disclosure avoidance systems (DAS), prompting a shift to differential privacy (DP) for the 2020 Census.
Historical Context: From Suppression to Data Swapping
20th Century Confidentiality Measures
Early census confidentiality measures, including the 1929 Census Act and Title 13, sought to protect individual responses. However, wartime policies, such as the release of data for Japanese-American internment during WWII, eroded public trust.
By the mid-20th century, the Bureau began employing suppression methods to protect data, evolving into more sophisticated techniques like data swapping in the 1990s. Despite these advancements, rising demand for detailed small-area data and computational tools increased the risk of re-identification.
The Move Toward Differential Privacy
Why Differential Privacy?
Differential privacy provides mathematical guarantees of confidentiality by adding noise to data. This ensures that the inclusion or exclusion of any individual in a dataset has minimal impact on aggregate results. The Census Bureau adopted DP due to:
- Increasing risks from data reconstruction attacks.
- Legal obligations under Title 13 to protect respondent identities.
Key Features of DP
- ε (Epsilon): The privacy-loss parameter, where smaller values indicate stronger privacy but less accurate data.
- Noise Infusion: Randomized noise is added to statistics, ensuring privacy while preserving utility.

Implementation: The TopDown Algorithm (TDA)
The Census Bureau developed the TopDown Algorithm to enforce DP while maintaining data usability. Key features included:
- Ensuring state and national population counts matched actual totals.
- Engaging with the scientific community to refine the algorithm.
To evaluate TDA, the Bureau applied it to 2010 Census data, generating "demonstration products" for user feedback.
Challenges and Tradeoffs
Accuracy vs. Privacy
- Small Geographic Areas: Data for small census blocks showed high variability due to noise, limiting usability.
- Political and Administrative Areas: Lack of direct allocation of privacy budgets for small populations led to inaccuracies in counts critical for policy and legal decisions.
- Temporal Consistency: DP introduced inconsistencies when comparing data across censuses.
Postprocessing Concerns
Postprocessing, used to ensure consistency (e.g., non-negative counts), introduced biases. For example:
- Non-integer values from DP algorithms were adjusted to fit census requirements, distorting results.
Implications for Redistricting and Policy
The introduction of DP had significant implications for redistricting:
- Minority Representation: Studies indicated DP reduced the ability to create majority-minority districts, potentially violating the Voting Rights Act.
- Legal Challenges: Lawsuits, such as State of Alabama v. Department of Commerce, questioned the legality of DP under Title 13.
Looking Ahead: Lessons and Recommendations
- Transparency and Communication: The Census Bureau must improve public understanding of DP's impact on data accuracy and privacy.
- Optimizing Privacy Budgets: Strategic allocation of privacy budgets across geographies and data types can mitigate usability issues.
- Future Research: Ongoing evaluation of DP's effects on critical applications, like public health and disaster preparedness, is essential.
Conclusion
The Census Bureau's adoption of differential privacy for the 2020 Census represents a landmark shift in protecting respondent confidentiality. While it addresses modern privacy challenges, the tradeoffs in data accuracy highlight the need for continuous refinement. The lessons learned from this transition will shape the future of data privacy in public statistics.
References
- Hotz, V. J., & Salvo, J. (2022). A Chronicle of Differential Privacy in the 2020 U.S. Census. Harvard Data Science Review.