Power Query 101: Get NHL Standings Data with Power Query

Last week at the PASS Summit 2014, I attended Matt Masson and Tessa Palmer’s session on Power Query Deep Dive. One of Tessa Palmer’s demo was a shout out to the Vancouver Canucks and how she has kept track of the standings of the Canucks by using Power Query.

PASS Summit 2014 - Power Query Deep Dive

That demo really piqued my interest (I like the Canucks too, but they keep on breaking my heart), so I wanted to try it out myself. This is my first stab at it, so I will probably look back at this post a few months from now and see where I could improve the process.

1.First thing to do, after installing and setting up Power Query, is to get the URL for the NHL standings site. The division standings can be found in: http://www.nhl.com/ice/standings.htm?type=DIV#&navid=nav-stn-div

The data is deeply embedded and formatted in HTML tables, as can be seen in the screenshot below. Getting this data pre-Power Query would have been possible, but would have required some nifty programming Smile

nhl html tables

2.Open Excel and go to the Power Query tab. Since we have the URL, we can go ahead and use the “From Web” ribbon and paste in the URL http://www.nhl.com/ice/standings.htm?type=DIV#&navid=nav-stn-div

nhl from web

Once you click on OK, you will notice in your Navigator pane on the left a number of tables retrieved by Power Query. There’s Table 0, which corresponds to the first table in the web page for Eastern Conference – Atlantic. Table 1 is also Eastern Conference – Metropolitan. Table 2 is Western Conference – Central, and Table 3 is Western Conference – Pacific.

nhl tables preview

3.Check the checkbox to select multiple items, and select Table 0 to 3. Click on Load.

nhl load

4.Once the tables are loaded, click on Append. Set Table 0 as primary, and append the other tables to it, starting with Table 1.

nhl append

When you click OK, a query editor pops up. Just click on Close & Load to load this to the spreadsheet.

5.To load the next tables, you will need to set the primary table to the final result set that was returned. You should see this result set under the Workbook Queries. In the screenshot below, it’s the “Append1” query.

nhl append1 table2

Once all the tables are loaded, you will see that some of the fields are formatted weird, so we need to do some cleanup.

nhl need cleanup

6.Double click on Append3 from Workbook Queries pane to edit the data set.
6a. Rename the leftmost column. These are the current standings, so we will just call this Standing.
6b. Add two new columns called Conference and Division.

Since in the Query Editor we cannot manually change values, we can temporarily assign a value first for the new columns. After we Load & Close, we can adjust the values from the spreadsheet.

nhl add column

After the columns are added, Load & Close and adjust the values from the spreadsheet.

Recall:
Table 0 Eastern Conference – Atlantic
Table 1 Eastern Conference – Metropolitan
Table 2 Western Conference – Central
Table 3 Western Conference – Pacific

For the Conference column, the value of the first two sets of 8 would be “Eastern” and the value of the next two sets (one with 8, another with 7) would be “Western”
For the Division column, these have been imported as column headers. We need to use these as values for the rows.

(I am sure there is a better way to automate this, but  I will leave that to a later post, once I get more familiar with Power Query and M)

Your final worksheet should look like this:

nhl excel

It is now ready to be visualized!

Side note

Matt Masson just posted a Notepad++ language file for M (Power Query’s formula language). Will come in quite handy when working with M.

Advertisements

One comment

  • I think below code that will be short than:

    let
    Source = Web.Page(Web.Contents(” http://www.nhl.com/ice/standings.htm?type=DIV#&navid=nav-stn-div“)),
    FilterRows = Table.SelectRows(Source, each ([ClassName] = “data standings Division”)),
    AddDivision = Table.AddColumn(FilterRows, “Division”, each Table.ColumnNames( [Data] ) {1}),
    RemoveCol = Table.RemoveColumns(AddDivision,{“Caption”, “Source”, “ClassName”, “Id”}),
    Rename = Table.TransformColumns( RemoveCol, { “Data”, each Table.RenameColumns( _, {Table.ColumnNames(_) {1}, “Team”} ) } ),
    ExpandData = Table.ExpandTableColumn(Rename, “Data”, {“Team”, “GP”, “W”, “L”, “OT”, “P”, “ROW”, “GF”, “GA”, “Diff”, “Home”, “Away”, “S/O”, “L10”, “Streak”}, {“Team”, “GP”, “W”, “L”, “OT”, “P”, “ROW”, “GF”, “GA”, “Diff”, “Home”, “Away”, “S/O”, “L10”, “Streak”}),
    ReorderedCol = Table.ReorderColumns(ExpandData,{“Division”, “Team”, “GP”, “W”, “L”, “OT”, “P”, “ROW”, “GF”, “GA”, “Diff”, “Home”, “Away”, “S/O”, “L10”, “Streak”})
    in
    ReorderedCol

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s