Wednesday, July 15, 2009

Musical Fun With YQL

Vernian Process have released a large amount of their back catalog available for free download from last.fm (63 mp3 files worth)1. As someone who never says no to free dark ambient, I thought I would quickly screen scrape the free download urls from the 6 month chart page and feed them into wget instead of manually downloading them2.

For the screen scraping, I turned to Yahoo's YQL service. In effect, YQL makes the web "programmable" with a SQL-like language. One of YQL's features is that it can use any webpage as a data source (it also supports various web APIs, like flickr's, but all I need for this is it's html ability). Here's the YQL query:


select * from html
where url="http://www.last.fm/music/Vernian+Process/+charts?rangetype=6month&subtype=tracks"
and xpath="//a[starts-with(@href, 'http://freedownloads.last.fm')]"


That will return all the free download anchors contained on the page.

It was then a single command to download all the tracks (thanks to curl and wget)

curl "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.last.fm%2Fmusic%2FVernian%2BProcess%2F%2Bcharts%3Frangetype%3D6month%26subtype%3Dtracks%22%20and%20xpath%3D%22%2F%2Fa%5Bstarts-with(%40href%2C%20'http%3A%2F%2Ffreedownloads.last.fm')%5D%22&format=xml" | wget --force-html -i -


1. They've done it because they've improved changed styles and have gone more Industrial.

2. In case you're wondering, I used the Chart and not the Free Tracks page as the Free Tracks page only had 4 of the freely downloadable tracks.

No comments: