Metacritic is a wonderful website: we can all agree on that. However, it isn’t very practical when it comes to looking up the scores of multiple titles. I therefore decided to tackle this problem a few days ago.
My language of choice was obviously Python, and after a bit of poking around, I determined that I should use the urllib2 module already present in Python 2.7.
The first revision of the script was around 50 lines long. It was fairly straighforward: get the source of a game title’s page, find a certain string in the page, and print out the Metascore of the given title. Unfortunately, I didn’t implement any error-handling, so any anomalies rendered the script useless. Additionally, it didn’t seem too practical; that is, entering game titles one by one - if a mistake occurs, the whole thing breaks.
A bigger problem was slowness. The main cause of this was due to the script’s usage of the default urllib2. I changed that in a later revision.
The script needed further improvement. The first step was to allow the script to read a file that contained titles in game:console format, one per line, then output the titles and respective scores to another text file. No error-handling was added, so the script was still pretty flimsy. I also tweaked the formatting of the output to make it look a bit more professional.
Next came error-handling. To solve this problem, I added a few try-except blocks to to some places in the code to ensure that, in the case of a missing userscore (games not yet released to the public), it printed “Not available”. Good, but still not complete.
The aforementioned “lag” had to be fixed. A quick post on r/learnpython allowed me to pinpoint a solution rather quickly: multi-threading. I didn’t use that, and instead ended up using the quite known httplib2 in place of urllib2. Loading became a bit faster.
I also tried using a single dictionary instead of two lists, but that prevented the output from being in the same order as the input, so I opted to revert back to using lists.
The script could now output to either a file or the console itself (two separate scripts). However, I wanted a more polished and all-in-one solution.
I divided the script into four methods: file, console, csv, and scrape. File outputted the results to a file, console ouputted to the console, csv outputted the results to a nicely formatted CSV (a table in Excel), and scrape took care of all the dirty work i.e. the scraping.
To choose which kind of output you want, you’d simply pass an argument in the console:
python main.py -argument <location of output>
In the case of outputting to the console, no argument or output location is necessary.
That’s enough typing. Here’s the project’s repo on Github:
I’ll be doing a few modifications now and then, so please follow me there.
Anyhow, I hope this post was helpful in some way or another. If you have any questions or suggestions, feel free to reply here or contact me on Twitter.