Excel is a powerful tool for transforming datasets for use in more powerful visualization engines. Tableau and Excel together make a dangerous team when leveraged correctly, allowing for clear and concise communication of actionable data efficiently.
To begin, I used the dataset published on Scouting Magazine’s post about Merit Badge popularity. This post included some summary metrics but also included the raw data of how many merit badges were earned for the years 2013 through 2017. The data is presented in an HTML table, which we can copy and paste directly into an Excel workbook without too much hassle.
Once the data is in an Excel sheet, there are two additional dimensions we can pull from this webpage as well as some cleanup to do to the Merit Badge names.
The page lists for different asterisk levels denoting some classification about the requirements for Eagle:
To handle this, we can create a new “Eagle Required” column that leverages a simple formula to generate a dummy variable that allows us to differentiate between Eagle-Required and non-Eagle-Required merit badges:
=ISNUMBER(SEARCH("~*",A2))
This formula does two things:
SEARCH("~*",A2)
part looks for the character string *
inside of cell A2
. Since by default Excel considers *
to be a wildcard condition, we need to use ~*
to escape it. This will return the position of *
as a number if it exists in the string. If not, it returns #VALUE!
ISNUMBER()
part checks if the result of the SEARCH
is a number. If SEARCH
returned a number, i.e., the position of an asterisk, the cell will display TRUE
. If SEARCH
does not return a number, i.e., there is no asterisk, the cell will display FALSE
.This means that We can now filter on TRUE
/FALSE
to see whether a merit badge is Eagle Required or not.
Since we do not want to see the Merit Badge names with asterisks next to them, we can take the column and paste it as values. Next, we can initiate a find and replace on the asterisk character on the range of Merit Badge names. Remember that we need to make the criteria to replace ~*
instead of *
due to the wildcard condition.
The Scouting Magazine page uses colors to denote whether a merit badge was released in 2013 or later and considers these to be new badges. Excel cannot use a formula to check the color of a cell.1 Since there are only a handful, I tagged these manually in a new column.
Because the data has multiple value columns, it is not a simple task to pivot this in a useful way:
Badge | Eagle Required | New | 2013 | 2014 | 2015 | 2016 | 2017 |
---|---|---|---|---|---|---|---|
Personal Fitness | True | False | 56,295 | 50,693 | 52,499 | 52,079 | 50,428 |
Programming | False | True | 480 | 2,970 | 3,577 | 4,085 | 4,138 |
To pivot this data, we need to manipulate it such that each year column turns into a single column:
Badge | Eagle Required | New | Year | Value |
---|---|---|---|---|
Art | False | False | 2013 | 28031 |
Art | False | False | 2014 | 25438 |
Art | False | False | 2015 | 24374 |
Art | False | False | 2016 | 22990 |
Art | False | False | 2017 | 21567 |
We can do this by using Get and Transform (previously called Power Query) in Excel. First, we need to make the source data we want to transform into a table, so select any cell with our Merit Badge data and hit CTRL+T
to do so. Make sure “My table has headers” remains checked.
Next, use “From Table” in the “Get and Transform” section under “Data” in the ribbon. The key chord to do this is ALT+N → PT
. This will open up the Query Editor in a new window.
Once this window is open, navigate to the “Transform” section in the Query Editor Ribbon. Select the five year columns by clicking on 2013 and then shift-clicking on 2017. Once the years are selected, we can select “Unpivot” in the ribbon under the “Any Column” section.
This will transform the data the way we need it, where there is only one value column with one row for each year.
To load this data into Excel where it is usable, in the Get and Transform ribbon, navigate to “Home” and select “Close and Load.” This will display the transformed data in a new sheet.
Now that the data is pivotable we can go ahead and run some analysis using a PivotTable or a PivotChart where the value for the years are separated longitudinally instead of categorically. However, this is also a perfect time to leverage Tableau’s visualization prowess.
Tableau can connect to Excel documents to extract data from them. This extraction also supports queries like the one we just made to transform the data. In Tableau, select “Microsoft Excel” from the Connect menu and select the file we created above. We should see three entries here: two sheets (one for each sheet we created) and one Query (the query we created in Power Query). Drag the Query in and load that as the source.
Tableau does not immediately understand that the “Year” column is a date and not a number, so we need to specify the data type. In the Data sidebar, right-click on the Year dimension and navigate to Change Data Type > Date.
To get these dates to display as just a year (2017) instead of MM/DD/YYYY, we need to drag it in as a variable. I put it in the column section so that it gets sorted left-to-right. Once the variable has been added, right-click and select “Year” to suppress the display of the interpolated month and day.
To compare Eagle-Required badge volume to normal badge volume, we can use a column chart. For columns, use Year
and Eagle Required
in that order. For Rows, put Value
from the measures section.
To color code the columns, also drag the Eagle Required
dimension onto the “Color” section of the Marks box.
This generates a nice bar graph of the volume earned over time for normal and Eagle-Required Merit Badges:
This evinces an interesting trend: While Eagle-Required badges seem to be more or less steady, non-Eagle required badges are declining year-over-year. This leads me to believe that more people are only doing the "bare minimum" instead of continuing to work after they make Eagle.
We can also chart the changes over time as a line graph. This allows us to more easily discern the differences in popularity of merit badges over time. To do this, we only need to add the Dimension Year
as a Column and the Measure Value
as a Row.
To color code the lines, also drag the Merit Badge
dimension onto the “Color” section of the Marks box.
From here, we need to create some filters to make the chart more readable. Drag the Dimension Eagle Required
and the Measure Value
into the filters section. Right-click on each and select “Show filter” in the menu.
This will create two filters. One where we can specify if we want to include or exclude normal or Eagle-Required badges and one where we can restrict the number earned.
This generates a nice line chart. When filtered only on Eagle Required badges, we see the following trends:
This is interesting because it shows that most of the eagle required badges are relatively constant. Interestingly, Cooking spikes in 2014 to the highest point of any Merit Badge. This is because in 2014 it was added as a new requirement to make Eagle.
Also interesting is that First Aid is the most popular merit badge overall. This means that scouting leaders must push for this to be the first Eagle Required badge earned by scouts. Since other Eagle Required badges are all lower than this one, it makes sense to conclude that the added immediate importance of first aid over a badge like Citizenship in the Community influences the popularity.
The data comes from Scouting Magazine’s blog post entitled Presenting the 2017 Merit Badge Rankings: Which were the most and least popular?
You can access the Tableau file I generated here. I also uploaded the Excel file with the Query construction here. Note that since Get and Transform is a Windows-only feature, you cannot edit the query on a Mac2.
Discussion: r/DataIsBeautiful