There is a common misconception that data is neutral, an objective truth. However, the data that we use to build computer programs, to conduct research, and to inform policy cannot exist outside of the systems of oppression that permeate our society. For example, studies show that facial recognition software is least reliable for people of color, women, and nonbinary individuals (Buolamwini & Gebru, 2018; Costanza-Chock, 2020); risk assessment algorithms are more likely to falsely flag black defendants as future criminals as opposed to white defendants (Angwin et al., 2016); and racialized data creates real barriers for minority groups’ access to housing and employment (Williams and Rucker, 2000). Given this, we—as researchers, practitioners, and educators—have a responsibility to consider how systemic inequities impact our data practices.
Optimization & Standardization
White, male heteronormativity is often technologically privileged by ideas of optimization and standardization. The ‘database revolution’ of the 1960s, characterized by the need to create more streamlined processes for how large amounts of data is organized, arranged, and managed, emphasized the importance of optimizing databases for usability and to provide “natural” representations of data. However, Stevens, Hoffman, and Florini (2021) argue that “database optimization efforts [help] reproduce and sustain white racial dominance, in part, by making it easier for dominant actors in government and business to both conceive of and organize the social world in ways that served white interests” (p. 114). Some of the most prominent works to emerge from the database revolution take up whiteness as a kind of implicit optimum, the norm for which anything outside becomes a deviation.
Therefore, it is critical for us to assess how data is collected and constructed in ways that reinforce the matrix of domination (Collins, 1990). The matrix of domination is a “conceptual model that helps us think about how power, oppression, resistance, privilege, penalties, benefits, and harms are systematically distributed,” and, when we think about data as a reflection of existing power dynamics, it is imperative that we consider the ways in which databases can serve to enshrine inequities (Costanza-Chock, 2020, p. 20). The most vulnerable populations should have both access to and control over their data, and are entitled to informed consent and transparency about how their data is being used.
Data Collection
“The decisions people make about which data matter, what means and methods to use to collect them, and how to analyze and share them are important but silent factors that reflect the interests, assumptions, and biases of the people involved” (Gaddy & Scott, 2020, 1). Racial and gender equity need to be considered during the entire data life cycle, including in planning, collection, access, use of statistical tools, analysis, and dissemination. Oftentimes, disparities are overlooked or ignored due to a simple lack of data on certain populations. For example, Boston University’s Center for Antiracist Research assisted race and ethnicity data collection efforts during the COVID-19 pandemic. They found that state-reported data suffered from deficiencies that led to errors and underestimations of racial and ethnic inequalities, including incomplete datasets, failing to account for the ways that race and ethnicity can intersect, and defining race and ethnicity in overly broad ways that obscure experiences of racism and subordination (Khoshkhoo et al., 2022). These limitations hindered evidence-based responses to the pandemic for already-marginalized groups. Remember, when collecting data, it is important to not only consider what data is available, but also what data is missing and why. Data collection practices that fail to consider race as a critical factor pose tangible harms to individuals of color.
Data Use
The University of Pennsylvania’s Actionable Intelligence for Social Policy created a toolkit for Centering Racial Equity Throughout Data Integration. Here, they uphold the work of BU’s Center by encouraging researchers to practice ethical data use with a racial equity lens “that supports power sharing and building across agencies and community members” (AISP, 2022, p. 1). They shine a light on the risks and benefits of civic data use and suggest that, while cross-sector data can often give us a more holistic view of the individuals who are ‘datafied,’ it can also reinforce legacies of racist policies and promote problematic practices. The toolkit states that:
Incorporating a racial equity lens during data analysis includes incorporating individual, community, political, and historical contexts of race to inform analysis, conclusions and recommendations. Solely relying on statistical outputs will not necessarily lead to insights without careful consideration during the analytic process, such as ensuring data quality is sufficient and determining appropriate statistical power. (AISP, 2022, p. 28)
Data should be used in ways that benefit the communities from which the data comes from.
Conclusion
Data reflects our social world, meaning that race—as well as gender, class, sexuality—is a powerful mediator for how we use and interpret it. Using some of the insights and resources from this blog, I am hopeful that readers will take some time to consider how they incorporate antiracist methodologies into their data work. Remember that all data have limits, and it’s both incorrect and harmful to assume that something technological is automatically objective and neutral. Nuanced identities and circumstances exist as much in the digital world as they do in the physical world, and they require our attention.
References & Resources:
Actionable Intelligence for Social Policy. (2022). A Toolkit for Centering Racial Equity Throughout Data Integration. https://aisp.upenn.edu/wp-content/uploads/2022/07/AISP-Toolkit_5.27.20.pdf
Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of Machine Learning Research, 81, 1-15.
Costanza-Chock, S. (2020). Design Justice: Community-led Practices to Build the Worlds We Need. MIT Press.
Williams, D. R., & Rucker, T. D. (2000). Understanding and addressing racial disparities in health care. Health care financing review, 21(4), 75–90.
Stevens, N., Hoffmann, A.L., & Florini, S. (2021) The unremarked optimum: whiteness, optimization, and control in the database revolution. Review of Communication, 21(2), 113-128.
Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). Machine Bias. ProPublica, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Collins, P.H. (2002). Black Feminist Thought: Knowledge, Consciousness, and the Politics of Empowerment. New York: Routledge.
Gaddy, M. & Scott, K. (2020). Principles for Advancing Equitable Data Practice. Urban Institute. https://www.urban.org/sites/default/files/publication/102346/principles-for-advancing-equitable-data-practice.pdf
Khoshkhoo, N.A., Schwarz, A.G., Puig, L.G., Glass, C., Holtzman, G.S., Nsoesie, E.O., and Rose, J.B.G. (2022). Toward Evidence-Based Antiracist Policymaking: Problems and Proposals for Better Racial Data Collection and Reporting. BU Center for Antiracist Research. https://www.bu.edu/antiracism-center/files/2022/06/Toward-Evidence-Based-Antiracist-Policymaking.pdf