MLG Columbus 2016: An Econometric Analysis of Counter-Strike

eSports — a form of sports where the primary aspects are facilitated by electronic systems — provide unique opportunities to mine and analyze data. Every aspect of eSports matches can be recorded digitally and analyzed with software, vastly reducing the energy required to collect and interpret patterns. The recent MLG Major Counter-Strike Global Offensive competition in Columbus, Ohio is one of these opportunities.

Counter-Strike: Global Offensive (CS:GO) is an internationally acclaimed competitive skill-based First Person Shooter developed by Valve LLC. Released in 2000, it has a long history in competitive gaming (eSports), attracting over a million viewers at the 2016 Major League Gaming competition in Columbus, Ohio.

Game Overview

Research Question(s)

Originally, this project’s aim was to ascertain the impact winning a pistol round in Counter-Strike: Global Offensive has on winning the match. As the data grew, the scope of this project increased beyond this single question to a general analysis of the effectiveness of teams’ economic management.

Data Processing

HLTV.org keeps a brilliant searchable index of historical CS:GO matches. To focus on the Columbus 2016 competition, set Events to MLG Columbus 2016 and download the demo files for each match. Demo files contain highly compressed information about matches. They allow users to accurately recreate matches inside the game engine. Demo files also provide a unique window for analyzing play without manually recording a single piece of data.

Parsing the Demo

CSGO-Demos-Manager is an invaluable open-source tool for analyzing CS:GO demo files. It will “play” through a demo, tracking many different variables. Because the project is open source, it is easily extensible. However, it only runs on Windows.

Once CSGO-Demos-Manager has analyzed the relevant demo files, users can choose to export the data to Excel as a group or individually. Both of the exports look more or less the same: a 6-sheet Excel table:

This provides a strong basis to start analyzing various aspects of the event as well as ascertain important levels of performance.

Massaging the Data

The most important column to add to the table is also the most tedious. Before the CSGO-Demos-Manager created column Number¹ it is best to a new column, Demo ID. This creates a faux primary key² to compare individual matches with ending match results. Without this, there is no way to pull data related to the overall match from the General sheet to the Rounds sheet. To do this, manually copy the relevant Demo ID from column B on the General sheet.

General Sheet

Sadly, CSGO-Demos-Manager improperly symbolicates the Winner column. To resolve this, replace that column with the following formula³: =IF(N2>O2,L2,M2).

This checks if Team 1’s score is larger than Team 2’s score. If this case is met, the formula outputs Team 1’s name; if the case is not met, it outputs Team 2’s name.

Rounds Sheet

The Rounds tab can now be related to the General sheet thanks to the Demo ID column added earlier.

Match Winner

This column references the General sheet and tells us what the winner of the entire match is. This is important when looking for variables that affect the outcome of entire matches and not individual rounds. To do this, use =VLOOKUP(A2,General!B:AT,19,FALSE).⁴

As a sanity check, the Match Winner column should remain the same for a given set of rounds. It does, so this calculation appears accurate.

Round Winner = Match Winner?

Now that the data shows who won the overall match, it is trivial to see how often a match winner was a round winner. The simple formula =IF(AJ2=E2,1,0) checks if the winner of the round is the same as the winner of the match. It returns a dummy variable, 0, if the teams do not match and 1 if they do.

As a sanity check, this column should have a larger sum than the number of rows since winners naturally have more round wins. The spreadsheet has 1,112 rows, 663 of which read 1, so this calculation appears accurate.

Analysis

Now that the data has been adequately prepared, the real work can begin. Both Excel and Stata prove to be integral tools for analyzing and visualizing the data sets.

Excel Analysis

For simpler analysis and visualization, Excel’s tools are easy to use and works well.

Pistol Round Impact

The original research question posited that the pistol round positively influenced a team’s chance to win the match. This is easily ascertained with a basic Excel analysis.

First, the relevant data must be extracted from the Rounds sheet. We can use the Match Winner column created earlier to make a simple dummy variable to simplify counting. Insert a new column to the right of the dataset called Round Winner = Match Winner? and use =IF(AI2=E2,1,0) to check if the Match Winner is the same as the Round Winner. If they are the same, the equation returns 1. If not it returns 0.⁵

Once the Round Winner = Match Winner? column has been created, it is simple to count the data with a simple =COUNTIF formula. Create two cells far off to the right of the data and use the following formula: =COUNTIFS(H:H,"Pistol round", AO:AO,1). This checks the Type column and restricts it to pistol rounds and then counts the frequency of 1 in the Round Winner = Match Winner? column. Directly this we use `=COUNTIFS(H:H,"Pistol round", AO:AO,0) to get the frequency of 0 in the same column.

Simple division with =BB2/COUNTIF(H:H, "Pistol round") and =BB3/COUNTIF(H:H, "Pistol round”) allow us to calculate the proportion of wins and losses. According to the data, this results in:

The data shows a 67% match rate, which indicates that the Pistol Round winner is the Match Winner 67% of the time. This is to say that winning the pistol round achieves a 2/3 probability of winning the entire match. To clarify, the word “Match” in the chart indicates rounds where the pistol round winner matches the match winner.

Economic Resource Impact

Before calculating the impact resources has on a match, the data must describe the frequency of economic impact on individual rounds. This can partially be calculated in Excel, however more advanced analysis will require Stata.

Type	Value	Proportion
Match	55	67%
Doesn’t Match	27	33%

First, it is necessary to create a new formula that outputs a dummy variable. 1 should define when the winner has more resources, 0 should define when the winner has less. The simplest way to accomplish this is to create a new column called Winner Spend More On Equipment? and reference the General sheet and Match Winner column:

This formula uses a nested =IF. The interior statement checks if Team 1 starts with more equipment than Team 2. If Team one starts with more, the formula pulls Team 1’s name from the General sheet. If Team 2 starts with more, it pulls that name instead. Next, the exterior statement compares the string result from the interior statement to the string in the Match Winner column. If the strings match, Team 1 is the winner, and the formula outputs a 1. If Team 2 is the winner, the formula outputs 0. This column now calculates which team has better equipment and if having better equipment corresponds with a win.

As before, use =COUNTIF(AR:AR,1) and =COUNTIF(AR:AR,0) to count the frequency at which this specific test occurs. This formula is far simpler than before because we are not using other columns to restrict counting.

Type	Value	Proportion
Winner has More Equipment	633	60%
Winner has Less Equipment	449	40%

Further, =BB8/SUM(BB8:BB9) and =BB9/SUM(BB8:BB9) can be used to calculate the relevant proportion.

The winner has better equipment 60% of the time — barely more than half. This demonstrates that, at the professional level, the game’s economy has little effect on various outcomes.

Prepping for Stata

Sadly, Excel hits a wall when it comes to more advanced econometric plotting and analysis. However, Stata’s data entry system can be slow and arduous; it is often a better choice to use Excel to create the data and then simply copy it into Stata. This section will describe several new columns that will not be used in the Excel analysis but will come into play for the Stata section.

Stata Analysis

Charting Kill Frequency

The first visualization in Stata is relatively simple. First, we must adjust the kill data on the Players sheet of MLG2016Full.xlsx. This adjustment will account for the amount of rounds players participated in. If the data is not adjusted, players that played more rounds that have higher overall totals will end up above players that have a higher ratio of kills per round.

To accomplish this, add 5 columns, one for each kill quantity.⁹ Because the number of rounds played is static, the formula =L2/$J2 can be used to calculate the correct ratios. Locking the denominator to the J column ensures the Rounds column is always used while also allowing the formula to be filled relatively.

Next, the kill totals must also be adjusted. Adding a column next to kills with =D2/C2¹⁰ will adjust the kill count to a per-match ratio. If we do not adjust the kill totals, we get skewed output like this:

Once the data is imported to Stata, we use the scatter command to make the scatterplot. To scatter Total Kills to the Adjusted 1K Ratio, enter scatter killspermatch adj1, mlabel(name). To scatter the other adjusted kill ratios, use adj2, adj3, and so on. Adding ,mlabel(name) tags each point with the corresponding player name.

Vertical height denotes more overall kills per round. Horizontal extension denotes the propensity of a player to hit that certain amount of kills: the further right a player is, the more often they hit that number of kills per round.

I omitted adj5 from this visualization because only 7 players achieved 5k.¹¹ There is not enough data to draw a reasonable conclusion.

Equipment Value

Each round rewards players with various income based on individual and group performance. Regression models can describe how the gameplay economy affects match outcomes.

The Rounds sheet holds all of the data used for the regression analysis. Thanks to the columns added prior, copying and pasting directly into Stata allows us to dove right into the data.¹²

These imported variable names are quite long. Luckily, Stata has an autocomplete feature: press the Tab key to bring up a list of variables that complete the text entered in the command line.

Regressing variables onto isteam1thewinner will find the impact said variables have on influencing Team 1 to beat Team 2. For example, reg isteam1thewinner equipementvalueteam1 equipementvalueteam2 does a multiple regression to ascertain the effects variables equipementvalueteam1 and equipementvalueteam2 have on the variable isteam1thewinner. This command results in:

This tells is that β₂ and β₃ — the respecive effects of equipementvalueteam1 and equipementvalueteam2 — have inverse impacts on Team 1 winning a round (y). β₁, the constant, shows that if both teams are equal, there is a 50% chance that team 1 wins.

The full equation reads y = 0.0000164x - 0.0000186z + 52.5 where x is the dollar value of Team 1’s gear and z is the dollar value of Team 2’s gear. Each dollar of Team 1 gear raises their chance of winning by 0.00164% and each dollar of Team 2 Gear lowers Team 1’s chance of winning by 0.00186%. All things considered, this results in very little economic impact on professional gameplay unless there are large differences in the buying power of each team.

The variable doesteam1havemoreequipment allows us to perform another simple regression onto isteam1thewinner, this time to determine the overall effect of having more equipment.

The constant here is approximately 0.31 while β₂ is about 0.32. This means the regression line would be y=0.32x + 0.31 where y is the percent chance of winning. This tells us that there is a 31% chance of winning when Team 1 has less equipment and a 63% chance of winning when Team 1 has more equipment. This confirms the earlier Excel analysis that posited the winner has more equipment 60% of the time:

Eco and Semi-Eco

Type	Value	Proportion
Winner has More Equipment	633	60%
Winner has Less Equipment	449	40%

Value of team inventories occasionally forces teams to save and purposely not buy inventory — a strategy dubbed Eco or Semi-eco to try and push back later. CSGO-Demos-Manager is able to automatically analyze the type of round and conveniently creates a Type column that contains the information:

To study the effects of this strategy the data needs to be restricted to only show Eco and Semi-eco rounds: simply filter the Type column on the Round sheet.¹³

Once the data has been restricted and imported to Stata, regressing doesteam1havemoremoney onto isteam1thewinner elucidates the effects of better equipment:

The regression line is y = 0.79x + 0.126. This tells us that having more equipment — β₂ — raises the odds of winning an eco or semi-eco round by 79%. While this is high, it means that variables other than equipment affect the outcome of eco rounds – but only 9% of the time.

Raw Data

This data comes in 3 major formats: Demo (.dem), Excel (.xlsx), and Stata (.dta). HLTV.org hosts the demo files. All of the files used are located in the /data directory of this post. These are the individual files with respective descriptions:

Disclosure and Notes

Software Used

Proprietary

Open Source

Originally “A” ↩︎
Although it is not technically unique, this value serves to relate separate pieces of the data. ↩︎
This assumes you have not modified the existing order of the columns. If you have, use an =IF formula to check if Team 1’s score is larger than Team 2’s score. If true, output Team 1’s name; if false, output Team 2’s name. ↩︎
Again, if you have moved cells around, you will need to use different references. ↩︎
One could also accomplish this without creating a new column by using =COUNTIFS or =SUMIFS and focussing on all three relevant columns, however because later analysis uses the round winner = match winner? column it makes sense to create it now. ↩︎
Actually “Equipment Value” and not “Spend” per se. ↩︎
Actually “Has Better Equipment” and not “Spend More On” per se. ↩︎
Actually “Equipment Values” and not “Spends” per se. ↩︎
Kill quantity refers to how many kills a single player gets in a given round, i.e. 1k is one kill, 2k is 2 kills, 3k is three kills, and so on. ↩︎
Inserting a column here will preserve the references to the right. ↩︎
As a result, everyone is bunched up above the X axis at 0. ↩︎
MLGFull.dta located below contains this data in a pre-made Stata file. ↩︎
Another solution is to use gen or tabulate in Stata, however it is often simpler to adjust the data using Excel filters and create a new Stata file. ↩︎

Christopher Sardegna's Blog

Thoughts on technology, design, data analysis, and data visualization.