Lab 9 – Graphing Data

For Lab 9 of my Digital Humanities course, I evaluated the various ways to organize and then visualize data. These graphics were done using Tableau Prep and Tableau Desktop and are far from comprehensive. The dataset manipulated for these graphics came from a group called Gallup in 2019 and is titled the Self-described religious identification of Americans. This dataset is similar to the Longitudinal Religious Congregations and Membership File discussed in previous posts as it also looks at self-identified religious groups over time. Although both evaluate similar categories, they each draw the categorical lines differently, and beyond that, count category members differently (but this is an idea that I’ll explore later).

For now, it is important to understand the process of visualizing data. Once you’re the one in charge, the choices of inclusion and exclusion become quite obvious. Consider my first attempt at cleaning and visualizing the Gallup data:

The graph seems nice enough. It seems to show the change in constituent numbers over time. The color-coding helps each denomination separate and there are notable differences between particular groups (e.g. catholic and protestant numbers appear much higher than the other categories), but this visualization misconstrues one key part of the data: the way the religious identifications are counted. Looking at the data overview provided with the dataset provides some insight. It explains how once every five or so years, 1000 different Americans were asked to self identify their religious affiliation. This means that this data is a sample of the larger population. Of those 1000 individuals, a particular number identified as protestant, for example. From that number, a percentage was calculated (so, in 1948, there were about 690 protestants and thus 69% of the 1000 individuals sampled identified themselves as protestant).

The actual number of individuals who identified themselves as Protestant does not come from the data though (see the original dataset excel sheet as downloaded below). The dataset provides only the percentage of the 1000 collected survey responses. They’re represented as whole numbers, which in itself can be misleading; or at least this format requires more work on the part of the interpreter, since they will have to check the guide to better understand the table.

Original Gallup dataset as downloaded from Statistica shows whole numbers that represent percentages of 1000 individuals who were surveyed.

My first table is flawed because it does not communicate visually that the growth or diminishment of a religious identity over time is in percentages. It should really be adjusted to fit the actual number of individuals who self-identified for a certain religion per year. I made this change from percent to numbers by simply multiplying the percentage by 1000 (the number of individuals surveyed at each year listed).

Religious Denominations table after some data cleaning. Religious identification swaps its axis with year and percentages are turned into whole numbers to represent the estimated number of humans that completed the survey.

The table above shows the number of individuals per 1000 that identified for a particular religious denomination. Some quick adding will interestingly show that not every column adds up to a perfect 1000. This could be because there were truly more or less individuals surveyed, or it could be a flaw in rounding that occurred after the actual data collection. Either way, this goes to show again that data does not speak for itself.

Graph of Religious Denomination adjusted to represent number of individuals instead of percent of the sample. This visualization is useful for making assumptions about that particular sample group, but not so much for generalizing about the wider population of Americans.

After converting percentages to whole numbers (representing whole people), I recreated the same line graph. The graph itself looks the same, but the y-axis has changed from decimals (representing a percentage of a whole) to whole numbers. This version of the graph is easier to understand. For example, it could be assumed that the number of Protestants in America has decreased by about 350 individuals since 1948. . . but even that still is not right. There are many many more Protestants in the US than what is represented in this table, whether it be in 1948 or 2019. This is why percentages are more logical in this situation; the 1000 individuals surveyed are a sample of the wider population.

But, before changing back to percentages, I tried a new graph technique. One that I think better shows the changing numbers of one denomination over time:

This layout makes it easier to see the detail of the groups that are more jumbled on the bottom of the line graph. You can see a steady increase in the ‘none’ category (Did survey’s stimulate the growth of this new category? You get to decide, and have the visualization to back your claim up. No matter which argument you make). The deficits in data collection are also more obvious. Mormon’s don’t seem to be counted until 1980, Christian (non-specific) until 1995, and the Undesignated group (which is somehow different from the none’s group . . . perhaps a, “I have no religion” versus an “I’d rather not answer” situation) went uncollected in 1965, 1970, and 1980.

This shift in the data presentation is useful for looking at that particular group of 1000 individuals, but not so much if one wants to make generalizations about the wider public. That is where the percentages come in handy. We could go back to percentages and try out a new graph technique. I had a pie graph in mind, but that information could only represent one year per full pie and so does little for comparison. Not to mention the angles are harder to compare in the human mind than a bar graph.

The moral of this short story? Visualizing data is not as easy as throwing numbers on an axis. It requires many changes and with each change a decision is made about what information to include and how to include it. Once again — and I cannot emphasize this enough — data does not speak for itself.

Leave a Reply

Your email address will not be published. Required fields are marked *