As I have repeated many times to my classmates in Digital Humanities: the data doesn’t speak for itself. Part of understanding that comes from an insight provided by the Philosopher Karl Popper, who reminded a group of physics students that the first step in observation is choosing what to observe in the first place.
This is exactly what we were asked to do for our lab this week – choose what to observe and thus, create data. Every student evaluated the same data source, The Seventh Day Adventist Yearbook, but we each chose different information to make into our own datasets.
This semester I am taking a Digital Humanities course designed and taught by Dr. Jeri Wieringa. Part of this class includes writing blog posts about various topics discussed in class. I have already crafted a few posts (one on accessibility in DH and another assessing and critiquing a DH project) and there will be several more to follow.
Last class, we read and discussed Hadley Wickham‘s “Tidy Data” as a way to re-evaluate the options for organizing and presenting data. For homework, we were tasked with tidying a table from the PEW Research Center on the frequency of prayer. Below is the original table:
According to Wickham’s argument, a table should be made of columns and rows. The columns should consist of a single variable while the rows should be filled with a single observation of what is described. The rest of the table is filled with values that represent the recorded data. Based on Hadley Wickham’s criteria, this Pew research presentation is a bit untidy. What is being described is the percentage of various religious traditions that pray. The frequency of prayer is divided into categories (‘At least daily’, ‘weekly’, ‘monthly’, ‘seldom/never’ ‘don’t know’). These categories represent various observations and as such, should exist in rows, not columns. The column headers should represent the variables being measured.