Datasets are eggs

This excerpt is premised on the differences between eggs at American and European grocery stores. Eggs in the US are pasteurized (cleaned) before they can be sold, resulting in a bleached shell that must be refrigerated. Eggs sold in Europe (and some US farm-to-table situations) are more commonly unpasteurized and therefore maintain the dirt and debris of the hen house, a sight that commonly surprises Americans abroad.

As Bowker reminds us, “data is never raw,” [1] but when it arrives in a spreadsheet, not yet cleaned or standardized, it can give the appearance of an unpasteurized egg purchased from a commercial supermarket; on the surface, its aura of distinct origin (feathers, bits of debris clinging to its shell) can mask the complex sociotechnical process by which that given egg arrived on the grocery store shelf. Recall here the vast and potentially even global transportation networks, trade agreements between store and farm, and the hundreds of years of transformation from subsistence to commercial farming that made possible this egg’s residence—if debris coated—on the grocery store shelf. This example parallels an uncleaned or standardized dataset, where the presence of uncleanliness allows us to know something about the object’s origins while simultaneously leading up to think we know more about those origins than we actually do, until we examine further the sociotechnical process that made possible that object’s proximity to us (within arm’s reach on grocery store shelf, or arrival in our email inbox).

[1] Geoffrey C. Bowker. 2005. Memory practices in the sciences. MIT Press, Cambridge, Mass.

From my thesis proposal (in progress).






Leave a Reply

Your email address will not be published. Required fields are marked *