Note: This seems to be partially broken at the moment. :(

Have you ever wondered which games are available for both Linux and the Dreamcast? No? Well, now you don’t have to!

To keep myself sane over the Christmas holidays, I wrote a little script to scrape Wikipedia category pages and find the intersections between two categories. You can specify more than two categories and it will do the two-way intersection between each pair. For instance: http://csclub.uwaterloo.ca/~rcfox/wiki-intersect.pl?Linux_games&Dreamcast_games&Quake

This script is by no means optimal. At the time of writing, I didn’t realize that you could tell Wikipedia to just give you raw data without the HTML. Also, it seems that Mediawiki has an API for getting data, which means that you could potentially write a script to work with an Mediawiki wiki.

Source: https://github.com/rcfox/Wikipedia-Category-Intersections



blog comments powered by Disqus

Published

06 February 2011

Tags