MLG Columbus 2016: An Econometric Analysis of CounterStrike
Introduction
eSports — a form of sports where the primary aspects are facilitated by electronic systems — provide unique opportunities to mine and analyze data. Every aspect of eSports matches can be recorded digitally and analyzed with software, vastly reducing the energy required to collect and interpret patterns. The recent MLG Major CounterStrike Global Offensive competition in Columbus, Ohio is one of these opportunities.
CounterStrike: Global Offensive (CS:GO) is an internationally acclaimed competitive skillbased First Person Shooter developed by Valve LLC. Released in 2000, it has a long history in competitive gaming (eSports), attracting over a million viewers at the 2016 Major League Gaming competition in Columbus, Ohio.
Game Overview
In competitive mode, matches are broken down as follows:
 Two 15round halves (30 possible rounds without overtime)
 16 round wins required to win (more if overtime has been reached)
 The first round of each half restricts weapon purchases to pistols
 Depending on the outcome of the round, each team is rewarded with ingame money to purchase better gear
 Winning the pistol round guarantees a minimum $3000 reward while losing nets $1400
Research Question(s)
Originally, this project’s aim was to ascertain the impact winning a pistol round in CounterStrike: Global Offensive has on winning the match. As the data grew, the scope of this project increased beyond this single question to a general analysis of the effectiveness of teams’ economic management.
Data Processing
HLTV.org keeps a brilliant searchable index of historical CS:GO matches. To focus on the Columbus 2016 competition, set Events to MLG Columbus 2016 and download the demo files for each match. Demo files contain highly compressed information about matches. They allow users to accurately recreate matches inside the game engine. Demo files also provide a unique window for analyzing play without manually recording a single piece of data.
Parsing the Demo
CSGODemosManager is an invaluable opensource tool for analyzing CS:GO demo files. It will “play” through a demo, tracking many different variables. Because the project is open source, it is easily extensible. However, it only runs on Windows.
Once CSGODemosManager has analyzed the relevant demo files, users can choose to export the data to Excel as a group or individually. Both of the exports look more or less the same: a 6sheet Excel table:
This provides a strong basis to start analyzing various aspects of the event as well as ascertain important levels of performance.
Massaging the Data
The most important column to add to the table is also the most tedious. Before the CSGODemosManager created column Number
^{1} it is best to a new column, Demo ID
. This creates a faux primary key^{2} to compare individual matches with ending match results. Without this, there is no way to pull data related to the overall match from the General
sheet to the Rounds
sheet. To do this, manually copy the relevant Demo ID from column B
on the General
sheet.
General Sheet
Sadly, CSGODemosManager improperly symbolicates the Winner
column. To resolve this, replace that column with the following formula^{3}: =IF(N2>O2,L2,M2)
.
This checks if Team 1’s score is larger than Team 2’s score. If this case is met, the formula outputs Team 1’s name; if the case is not met, it outputs Team 2’s name.
Rounds Sheet
The Rounds
tab can now be related to the General
sheet thanks to the Demo ID
column added earlier.
Match Winner
This column references the General
sheet and tells us what the winner of the entire match is. This is important when looking for variables that affect the outcome of entire matches and not individual rounds. To do this, use =VLOOKUP(A2,General!B:AT,19,FALSE)
.^{4}
As a sanity check, the Match Winner
column should remain the same for a given set of rounds. It does, so this calculation appears accurate.
Round Winner = Match Winner?
Now that the data shows who won the overall match, it is trivial to see how often a match winner was a round winner. The simple formula =IF(AJ2=E2,1,0)
checks if the winner of the round is the same as the winner of the match. It returns a dummy variable, 0, if the teams do not match and 1 if they do.
As a sanity check, this column should have a larger sum than the number of rows since winners naturally have more round wins. The spreadsheet has 1,112 rows, 663 of which read 1, so this calculation appears accurate.
Analysis
Now that the data has been adequately prepared, the real work can begin. Both Excel and Stata prove to be integral tools for analyzing and visualizing the data sets.
Excel Analysis
For simpler analysis and visualization, Excel’s tools are easy to use and works well.
Pistol Round Impact
The original research question posited that the pistol round positively influenced a team’s chance to win the match. This is easily ascertained with a basic Excel analysis.
First, the relevant data must be extracted from the Rounds
sheet. We can use the Match Winner
column created earlier to make a simple dummy variable to simplify counting. Insert a new column to the right of the dataset called Round Winner = Match Winner?
and use =IF(AI2=E2,1,0)
to check if the Match Winner is the same as the Round Winner. If they are the same, the equation returns 1. If not it returns 0.^{5}
Once the Round Winner = Match Winner?
column has been created, it is simple to count the data with a simple =COUNTIF
formula. Create two cells far off to the right of the data and use the following formula: =COUNTIFS(H:H,"Pistol round", AO:AO,1)
. This checks the Type
column and restricts it to pistol rounds and then counts the frequency of 1 in the Round Winner = Match Winner?
column. Directly this we use `=COUNTIFS(H:H,"Pistol round", AO:AO,0)
to get the frequency of 0 in the same column.
Simple division with =BB2/COUNTIF(H:H, "Pistol round")
and =BB3/COUNTIF(H:H, "Pistol round”)
allow us to calculate the proportion of wins and losses. According to the data, this results in:
Type  Value  Proportion 

Match  55  67% 
Doesn’t Match  27  33% 
The data shows a 67% match rate, which indicates that the Pistol Round winner is the Match Winner 67% of the time. This is to say that winning the pistol round achieves a 2/3 probability of winning the entire match. To clarify, the word “Match” in the chart indicates rounds where the pistol round winner matches the match winner.
Economic Resource Impact
Before calculating the impact resources has on a match, the data must describe the frequency of economic impact on individual rounds. This can partially be calculated in Excel, however more advanced analysis will require Stata.
First, it is necessary to create a new formula that outputs a dummy variable. 1 should define when the winner has more resources, 0 should define when the winner has less. The simplest way to accomplish this is to create a new column called Winner Spend More On Equipment?
and reference the General
sheet and Match Winner
column:
=IF(IF(Z2>AB2,VLOOKUP(A2,General!B:AT,11,FALSE),VLOOKUP(A2,General!B:AT,12,FALSE))=AI2,1,0)
This formula uses a nested =IF
. The interior statement checks if Team 1 starts with more equipment than Team 2. If Team one starts with more, the formula pulls Team 1’s name from the General
sheet. If Team 2 starts with more, it pulls that name instead. Next, the exterior statement compares the string result from the interior statement to the string in the Match Winner
column. If the strings match, Team 1 is the winner, and the formula outputs a 1. If Team 2 is the winner, the formula outputs 0. This column now calculates which team has better equipment and if having better equipment corresponds with a win.
As before, use =COUNTIF(AR:AR,1)
and =COUNTIF(AR:AR,0)
to count the frequency at which this specific test occurs. This formula is far simpler than before because we are not using other columns to restrict counting.
Type  Value  Proportion 

Winner has More Equipment  633  60% 
Winner has Less Equipment  449  40% 
Further, =BB8/SUM(BB8:BB9)
and =BB9/SUM(BB8:BB9)
can be used to calculate the relevant proportion.
The winner has better equipment 60% of the time — barely more than half. This demonstrates that, at the professional level, the game’s economy has little effect on various outcomes.
Prepping for Stata
Sadly, Excel hits a wall when it comes to more advanced econometric plotting and analysis. However, Stata’s data entry system can be slow and arduous; it is often a better choice to use Excel to create the data and then simply copy it into Stata. This section will describe several new columns that will not be used in the Excel analysis but will come into play for the Stata section.
These columns in the Rounds
sheet are as follows:
 Match Winner
 Outlined above
=VLOOKUP(A2,General!B:AT,19,FALSE)
 Checks if Team 1 is the winner based on the Team 1 in the
general
sheet.  Dummy variable that returns 1 if Team 1 is the winner, 0 if not
=IF(VLOOKUP(A2,General!B:AT,11,FALSE)=E2,1,0)
 Checks to see if Team 1 has more money than Team 2.
 Dummy variable that returns 1 if Team 1 has more money, 0 if not
=IF(Y2>AA2,1,0)
 Dummy variable that returns 1 if Team 1 has better equipment, 0so this calculation appears accurate if not
=IF(Z2>AB2,1,0)
 Determines the difference between Team 1’s economy and Team 2’s economy
=Y2AA2
 Determines the difference between Team 1’s buy and Team 2’s buy
=Z2AB2
 Outlined above
=IF(AI2=E2,1,0)
 Checks if the winning team is counterterrorist or terrorist
 Returns a dummy variable 1 if the winning team is counterterrorist and 0 if not
=IF(F2="CT",1,0)
 Uses a nested
=IF
to create a dummy variable 1 if the Winner starts with more money and 0 if not  Similar to
winner spend more on equipment
outlined above
=IF(IF(Y2>AA2,VLOOKUP(A2,General!B:AT,11,FALSE),VLOOKUP(A2,General!B:AT,12,FALSE))=AI2,1,0)
 Outlined above
=IF(IF(Z2>AB2,VLOOKUP(A2,General!B:AT,11,FALSE),VLOOKUP(A2,General!B:AT,12,FALSE))=AI2,1,0)
 Calculates the difference in resources between Team 1 and Team 2.
=IF(VLOOKUP(A2,General!B:AT,11,FALSE)=E2,Y2AA2,AA2Y2)
 Calculates the difference in equipment value between Team 1 and Team 2.
=IF(VLOOKUP(A2,General!B:AT,11,FALSE)=E2,Z2AB2,AB2Z2)
Stata Analysis
State provides a robust platform for advanced econometric analysis.
Charting Kill Frequency
The first visualization in Stata is relatively simple. First, we must adjust the kill data on the Players
sheet of MLG2016Full.xlsx. This adjustment will account for the amount of rounds players participated in. If the data is not adjusted, players that played more rounds that have higher overall totals will end up above players that have a higher ratio of kills per round.
To accomplish this, add 5 columns, one for each kill quantity.^{9} Because the number of rounds played is static, the formula =L2/$J2
can be used to calculate the correct ratios. Locking the denominator to the J
column ensures the Rounds
column is always used while also allowing the formula to be filled relatively.
Next, the kill totals must also be adjusted. Adding a column next to kills with =D2/C2
^{10} will adjust the kill count to a permatch ratio. If we do not adjust the kill totals, we get skewed output like this:
The file MLGPlayerData.dta attached below already has these columns inserted.
Once the data is imported to Stata, we use the scatter
command to make the scatterplot. To scatter Total Kills to the Adjusted 1K Ratio, enter scatter killspermatch adj1, mlabel(name)
. To scatter the other adjusted kill ratios, use adj2
, adj3
, and so on. Adding ,mlabel(name)
tags each point with the corresponding player name.
Vertical height denotes more overall kills per round. Horizontal extension denotes the propensity of a player to hit that certain amount of kills: the further right a player is, the more often they hit that number of kills per round.
I omitted adj5
from this visualization because only 7 players achieved 5k.^{11} There is not enough data to draw a reasonable conclusion.
Equipment Value
Each round rewards players with various income based on individual and group performance. Regression models can describe how the gameplay economy affects match outcomes.
The Rounds
sheet holds all of the data used for the regression analysis. Thanks to the columns added prior, copying and pasting directly into Stata allows us to dove right into the data.^{12}
These imported variable names are quite long. Luckily, Stata has an autocomplete feature: press the Tab key to bring up a list of variables that complete the text entered in the command line.
Regressing variables onto isteam1thewinner
will find the impact said variables have on influencing Team 1 to beat Team 2. For example, reg isteam1thewinner equipementvalueteam1 equipementvalueteam2
does a multiple regression to ascertain the effects variables equipementvalueteam1
and equipementvalueteam2
have on the variable isteam1thewinner
. This command results in:
This tells is that β_{2} and β_{3} — the respecive effects of equipementvalueteam1
and equipementvalueteam2
— have inverse impacts on Team 1 winning a round (y). β_{1}, the constant, shows that if both teams are equal, there is a 50% chance that team 1 wins.
The full equation reads y = 0.0000164x  0.0000186z + 52.5
where x
is the dollar value of Team 1’s gear and z
is the dollar value of Team 2’s gear. Each dollar of Team 1 gear raises their chance of winning by 0.00164% and each dollar of Team 2 Gear lowers Team 1’s chance of winning by 0.00186%. All things considered, this results in very little economic impact on professional gameplay unless there are large differences in the buying power of each team.
The variable doesteam1havemoreequipment
allows us to perform another simple regression onto isteam1thewinner
, this time to determine the overall effect of having more equipment.
The constant here is approximately 0.31
while β_{2} is about 0.32
. This means the regression line would be y=0.32x + 0.31
where y
is the percent chance of winning. This tells us that there is a 31% chance of winning when Team 1 has less equipment and a 63% chance of winning when Team 1 has more equipment. This confirms the earlier Excel analysis that posited the winner has more equipment 60% of the time:
Type  Value  Proportion 

Winner has More Equipment  633  60% 
Winner has Less Equipment  449  40% 
Eco and SemiEco
Value of team inventories occasionally forces teams to save and purposely not buy inventory — a strategy dubbed Eco or Semieco to try and push back later. CSGODemosManager is able to automatically analyze the type of round and conveniently creates a Type
column that contains the information:
To study the effects of this strategy the data needs to be restricted to only show Eco and Semieco rounds: simply filter the Type
column on the Round
sheet.^{13}
Once the data has been restricted and imported to Stata, regressing doesteam1havemoremoney
onto isteam1thewinner
elucidates the effects of better equipment:
The regression line is y = 0.79x + 0.126
. This tells us that having more equipment — β_{2} — raises the odds of winning an eco or semieco round by 79%. While this is high, it means that variables other than equipment affect the outcome of eco rounds – but only 9% of the time.
Raw Data
This data comes in 3 major formats: Demo (.dem), Excel (.xlsx), and Stata (.dta). HLTV.org hosts the demo files. All of the files used are located in the /data directory of this post. These are the individual files with respective descriptions:
 MLG2016Full.xlsx (461 kb)  Direct Download  View Google Sheet
 Annotated and formatted like this article describes
 Excel file with 6 sheets of data summarizing the entire event:
General
contains a summary for each matchWinner
column augmented with formula=IF(N2>O2,L2,M2)
Players
contains statistics for each playerK/D
column augmented with formula=D2/F2
Maps
contains map statisticsTeams
contains team statisticsWeapons
contains various weapon statistics Contains additional calculated columns
Damage per Shot
calculated with=D2/E2
Damage per Hit
calculated with=C2/F2
 Contains additional calculated columns
Rounds
contains round statistics Contains several formulaic columns separated from static columns by a vertical black line; index included beside the data.
 MLGTeamData.dta (26 kb)  Direct Download
 Stata file with variables pulled from the
Teams
sheet of MLG2016Full.xlsx
 Stata file with variables pulled from the
 MLGPlayerData.dta (45 kb)  Direct Download
 Stata file with variables pulled from the
Players
sheet of MLG2016Full.xlsx
 Stata file with variables pulled from the
 MLGFull.dta (256 kb)  Direct Download
 Stata file with variables pulled from the
Rounds
sheet of MLG2016Full.xlsx
 Stata file with variables pulled from the
 rawData.zip (1.1 mb)  Direct Download
 Contains an unannotated copy of MLG2016Full.xlsx as well as individual data exports for each match in the Columbus 2016 competition folder according to level of bracket.
 This is the raw data that CSGODemosManager spits out after analyzing demo files and has not been organized.
Disclosure and Notes
 All data was sourced from HLTVhosted match demos
 Findings, calculations, and all data are published under the CCBY–4.0 license
 EnVyUs v. CLG match demo was corrupted and thus the data was not accessible
Software Used
Proprietary
 Microsoft Excel
 Data organization, calculation, and visualization
 StataCorp Stata
 Regression analysis and visualization
Open Source
 CSGODemosManager
 Demo parsing
Discussion on r/GlobalOffensive  View as: PDF / Markdown

Originally “A” ↩︎

Although it is not technically unique, this value serves to relate separate pieces of the data. ↩︎

This assumes you have not modified the existing order of the columns. If you have, use an
=IF
formula to check if Team 1’s score is larger than Team 2’s score. If true, output Team 1’s name; if false, output Team 2’s name. ↩︎ 
Again, if you have moved cells around, you will need to use different references. ↩︎

One could also accomplish this without creating a new column by using
=COUNTIFS
or=SUMIFS
and focussing on all three relevant columns, however because later analysis uses theround winner = match winner?
column it makes sense to create it now. ↩︎ 
Actually “Equipment Value” and not “Spend” per se. ↩︎

Actually “Has Better Equipment” and not “Spend More On” per se. ↩︎

Actually “Equipment Values” and not “Spends” per se. ↩︎

Kill quantity refers to how many kills a single player gets in a given round, i.e. 1k is one kill, 2k is 2 kills, 3k is three kills, and so on. ↩︎

Inserting a column here will preserve the references to the right. ↩︎

As a result, everyone is bunched up above the X axis at 0. ↩︎

MLGFull.dta located below contains this data in a premade Stata file. ↩︎

Another solution is to use
gen
ortabulate
in Stata, however it is often simpler to adjust the data using Excel filters and create a new Stata file. ↩︎