I never understood the AWS billing very well and happily left it to my CTO.
A couple of days ago, however, my CTO secretly told me he's mainly interested in learning 2 things from those bills:
- How much are we paying for every component of the application?
- How much are we paying for every type of AWS service?
With that knowledge came the task of helping him extract that information from the billing statements.
Being a strong believer in self-service, I crafted some scripts to generate a CSV to answer both questions (and possible intersections, which are also interesting).
- Make sure your different AWS resources (EC2 VMs, S3 buckets, CloudFront distributions...) are tagged with the same tag, describing their "applicative" component (e.g.
Component: FrontEnd-PHP). This tag will help us aggregate the lines in the detailed AWS billing reports (see below)
- Include this tag under "Cost Allocation Tags"
- Using the preferences page, set up a bucket and enable "Cost allocation report" (contains billing with our selected tags) and "Detailed billing report" (contains billing per resource). We'll use these reports as our data source
If you take a look at the "Cost allocation report" (
aaaaaaaaaaaa-aws-cost-allocation-yyyy-mm.csv), you'll find these columns that are of interest:
- "user:TAGNAME", correlating with the tag we chose earlier (e.g.
- "UsageType", succinctly describing the type of usage (e.g. "EU-SAE1-AWS-In-Bytes")
- "TotalCost", containing the cost (in USD for me) of that usage of that resource
Pivoting these columns should give me a matrix I can pass the executives. I chose this form:
| \ | UsageType | | user:Component | sum:TotalCost |
So the columns will be the different usage types, the rows will be the different components, and the values will be the cost of this specific combination (the "sum" is just a formality, since every combination should appear only once).
One can do that pivot manually (using LibreOffice Calc / Microsoft Excel, for instance), but I like scripts.
I chose to use the Pandas library in Python because it's built for stuff like this (I believe a vanilla Python solution is possible but will take more time).
This is a basic script for pivoting our data:
import pandas import numpy tag_name='user:Component' raw = pandas.read_csv('COST_ALOCATION.csv', skiprows=1) # First row is some warning about tags not included by default raw['UsageType'].fillna('???',inplace=True) raw[tag_name].fillna('???',inplace=True) pivot = raw.pivot_table(rows=[tag_name], cols=['UsageType'], values=['TotalCost'], aggfunc=numpy.sum, margins=True) # for newer versions of pandas: raw.pivot_table(index=[tag_name], columns=['UsageType'], values=['TotalCost'], aggfunc=numpy.sum, margins=True) pivot.to_csv('OUT_FILE.csv')
Things to note
- I'm using
fillnato replace "empty" items in selected columns (e.g. lines with the component tag empty) with "???". This is important as Pandas discards such rows, but still includes them for its subtotals
numpy.sumis somehow better than using
sumas an aggregator. If I don't use it, I don't get a grand total
Say you have a cell that costs you more than you think is proper (for instance, "queue-manager"/"EU-SAE1-AWS-In-Bytes") and you want to analyze it.
The "Detailed billing report with resources and tags" (
aaaaaaaaaaaa-aws-billing-detailed-line-items-with-resources-and-tags-yyyy-mm.csv.zip), which is also located in our billing bucket (although zipped) contains the billing at an item-level detail, along with our tags.
You can try analyzing this report manually (e.g. using Excel), but since it's a huge file it will take some time.
Using the same library, this is how we extract the values for a specific pivot cell (Component/Usage combination):
import pandas tag_name='user:Component' tag_value='ExpensiveComonent' usage_value='ExpensiveUsage' raw = pandas.read_csv('DETAILED.csv') filtered = raw.loc[raw[tag_name]==tag_value][[tag_name,'ResourceId','UsageType','Cost']] # Optionally filter by non-zero costs filtered = filtered.loc[filtered['Cost'] > 0] filtered.to_csv('OUT_FILE.csv')
These are some other reports I did that might be interesting
Pivot by resource id / usage type in a specific component:
import pandas import numpy tag_name='user:Component' tag_value='ExpensiveComonent' raw = pandas.read_csv('DETAILED.csv') filtered = raw.loc[raw[tag_name]==tag_value][['ResourceId','UsageType','Cost']] filtered['ResourceId'].fillna('???',inplace=True) pivot = filtered.pivot_table(rows=['ResourceId'], cols=['UsageType'], values=['Cost'], aggfunc=numpy.sum, margins=True) # Newer pandas - see above pivot.to_csv('OUT_FILE.csv')
Pivot component/usage type, use regex to select usage type, only show cells that cost money, and show Usage amount instead of cost
import pandas import numpy tag_name='user:Component' usage_regex='.*-Out-Bytes$' raw = pandas.read_csv('COST_ALOCATION.csv', skiprows=1) raw_nonna = raw.dropna(subset=['UsageType']) # Drop lines without usage type filtered = raw_nonna.loc[raw_nonna['UsageType'].str.match(usage_regex)] filtered[tag_name].fillna('???',inplace=True) filtered = filtered.loc[filtered['Cost'] > 0] pivot = filtered.pivot_table(rows=[tag_name], cols=['UsageType','Rate'], values=['UsageQuantity'], aggfunc=numpy.sum, margins=True) #TODO complete pivot.to_csv('OUT_FILE.csv')