Data doesn’t speak for itself

This semester I am taking a Digital Humanities course designed and taught by Dr. Jeri Wieringa. Part of this class includes writing blog posts about various topics discussed in class. I have already crafted a few posts (one on accessibility in DH and another assessing and critiquing a DH project) and there will be several more to follow.

Last class, we read and discussed Hadley Wickham‘s “Tidy Data” as a way to re-evaluate the options for organizing and presenting data. For homework, we were tasked with tidying a table from the PEW Research Center on the frequency of prayer. Below is the original table:

A table designed and presented by The PEW Research Center that demonstrates an ‘untidy’ organization of data.

According to Wickham’s argument, a table should be made of columns and rows. The columns should consist of a single variable while the rows should be filled with a single observation of what is described. The rest of the table is filled with values that represent the recorded data. Based on Hadley Wickham’s criteria, this Pew research presentation is a bit untidy. What is being described is the percentage of various religious traditions that pray. The frequency of prayer is divided into categories (‘At least daily’, ‘weekly’, ‘monthly’, ‘seldom/never’ ‘don’t know’). These categories represent various observations and as such, should exist in rows, not columns. The column headers should represent the variables being measured.

Below is my attempt at tidying the Pew table. The ‘Frequency of Prayer’ categories have been copied into repeating rows, organized according to religious tradition. The percentages of individuals who report their frequency of prayer are measured to the right of each religious tradition. I expanded on the Pew data to help organize my thoughts. I converted the percentages to decimals and then multiplied them with the Sample Size of each religion to find the fraction of individuals who identified with a particular category within the total sample size for that particular group.

In tidying this data, I may have made it more complex for viewers. Wickham’s tidying functions best for computational statistics; it’s not so easy on the eyes for non-computers.

Resorting the data in this way led me to several conclusions; First, the tidy table is a lot. Wickham’s recommendations seem to work best for computational statistics, not so much the human mind. I found myself creating another table just to sort out the basics of the information. But this table, just like any method of presenting data, was also flawed, though still useful for my needs. The horizontal layout satisfied me, likely because English readers move their eyes from left to right when deciphering texts. While I cannot explain the technicalities of it, it turns out that vertical, repetitive data works better for computer brains (as I learned in class last week).

The second conclusion I drew from tidying the PEW graph was that data is not self-evident. As a Master’s student in Religious Studies, I know that nothing explains itself just by existing, but this exercise really solidified that concept. If you look at the tidy graph, you can see that I calculated the number of individuals for each frequency in each different religious group, rounded them to whole numbers, and then added those together.

I realized in this process that I had to choose which numbers to round up and which to round down. At first, I made an active effort to match the rounded sample total to the original sample total given in the PEW data, but I noticed that matching the two totals with or without rounding was difficult. I also realized that in my choice to round one number up but not the others I was actively changing the data. People obviously don’t exist in half or quarters, but the way the PEW data adds up makes it appear that way. My decision to add an extra ‘person’ to the “at least daily” category of Buddist’s could sway the conclusions of the data.

I ended up rounding every number up or down based on the traditional method of 0.5 and higher gets rounded up, 0.49 and under is rounded down. The results, naturally, did not add up but they made me feel a smidge more honest.

In the end, data is never about data. It can be clean and tidy or messy and untidy but that all depends on who is calling the shots. Wickham made his decision based on computational statistics, I made my organizational decisions to clarify the thoughts in my head. Neither is right or wrong, but more or less useful for the each of us in that particular moment. The question to ask, as always, is what is accomplished in presenting data in one way rather than another?

Less digital, more humanities, please: The Viral Text’s Project

The Viral Text’s Project is a digital humanities project that aims to help scholars understand the themes and decisions that helped newspaper content ‘go viral’ before going viral was the hip thing to do. The project created an algorithm that ‘reads’ newspapers and traces its reprinting in other areas. By following the reprints they visualize how certain newspaper trends went ‘viral’. 

Most newspapers at the time did not have intellectual property rights, so editors and publishers of papers in smaller cities would literally cut and paste the newspaper sections from larger newspapers into their local papers. This created a sort of modge-podge of ‘viral’ material that publishers thought their readers might be interested in.

Below is a presentation I gave for a Digital Humanities course which asked students to constructively critique and assess a digital humanities website. The Viral Texts Project was the focus of my presentation.

Continue reading “Less digital, more humanities, please: The Viral Text’s Project”

Assessing Accessibility in the Digital Humanities

It is easy to forget the things we take for granted. That’s sort of why we take them for granted in the first place. When tasks don’t require much planning or strain, our brains don’t seem to work as hard, and so those little things slip through the cracks as our synapses prune and make more room for other ‘more relevant’ information. But what seems to be easiest to forget is that we still get a say in what counts as relevant. Ask any tutor the best way to study material and they’ll tell you to involve multiple senses, to try different techniques; basically to make your brain do new kinds of work. Reading the textbook isn’t enough. You have to quiz yourself, make flashcards, study while you exercise, pace yourself . . . there is a lot of thought that goes into making those tidbits of information memorable, of making them more relevant.

Original caption: “Disabled veteran, ca. 1943” from the US National Archives

One area where this effort to make overlooked information relevant is accessibility. Too frequently we design buildings, create technological devices, or program software, to enhance the quality of life for able-bodied individuals. There is such focus on traditional, idealized progress that other individuals get left behind. In the clamber to make life easy, we sometimes make tasks more difficult for those with cognitive, motor, visual, or auditory disabilities.

Continue reading “Assessing Accessibility in the Digital Humanities”