Speaking my (programming) language?: ruby

Friday, June 27, 2008

Ruby for mp3 file organizing

So there we are, me and my friend Oliver, caught in a business trip. We're already bored of the only decent pub in the village, our families are a long distance away, so what's a developer to do? Code in a programming language he/she's not allowed on the job, of course! Oliver seemed interested in Ruby and I've already done a couple of small scripts with it, so we were curious to see what the fuss is about.

It goes without saying that if you want to learn to program (in a particular language) you should not rely too much on books. The only way is to find a task you want to have automated, and then code it using your language of choice. Surely, you must pick the task (or the language) carefully, since not all languages are suitable for all tasks.

One of the things which Oliver has struggled with was organizing all of his podcasts in his player, sorted neatly by directories of author and title. Having found both a hammer and a nail, we were ready to start pounding.

After a bit of research we found the mp3info and id3tag Ruby libraries. id3tag had different fields for ID3v1 and ID3v2 data and didn't have write support (not that we needed it). mp3info didn't have ID3v2.2 support, but I found an interesting link about ID3 internals- the format of the fields was something that could be useful.

After a while our pair programming session has reached a milestone- our script works. It doesn't seem very modular though, so we spend some time making classes and discussing what is the responsibility of each class. Should there be a manager-class? Or should the objects manage themselves? I go with the second approach, and here's the result:


# Class for handling information of the mp3 file
class Mp3File
  attr_reader :title, :artist, :album

  def initialize filename
    @artist = @album = "unknown"
    @filename = filename
    @title = File.basename(filename, ".mp3")

    read_attributes
  end

  def title
    sanitize(@title)
    if @title == "unknown" then @title = File.basename(@filename, ".mp3") end
    @title
  end

  def read_attributes
    begin
      Mp3Info.open(@filename) do |mp3info|
        (@title, @artist, @album) = %w{title artist album}.collect { |attrib|
          begin
            (result = mp3info.tag.send(attrib)).empty? ? "unknown" : result
          rescue
            "unknown"
          end
        }
      end
    rescue
    end
  end

  def sanitize str
    str.tr_s!("?'","_")
  end

  def transfer(newPath)
    newPath = eval('"' + newPath + '"')
    FileUtils.mkdir_p File.dirname(newPath)
    FileUtils.cp @filename, newPath
  end
end

This is the class which is initialized with the location of the file and then extracts information about the artist, title and track name. The read_attributes method is meant to show off our new knowledge about the dynamic nature of Ruby- we build a list of methods to invoke on the Mp3Info object, and if no meaningful result, return "unknown". Finally, as the class knows about the current location and mp3 meta-info, it has a method for copying the file to a new location. The new path is passed as a template, where the #@artist, #@album, #@title are substituted with the value of these fields.


class Mp3List

  attr_reader :files

  def files
    @files.map {|file| Mp3File.new(file) }
  end

  def initialize(sourcePath, days = 7)
    @sourcePath = sourcePath
    @days = days
    @files = read_new
  end

  def read_new
      Dir["#@sourcePath/**/*.mp3"].find_all do |path|
        test(?M, path) > (Time.now - (@days * 60 * 60 * 24))
      end
  end

  def to_s
    @files.inspect
  end
end

Here comes the class, which represents a list of mp3 files in a certain directory (and subdirectories), which satisfies some criteria- in this case, how long ago the files were created (modified). Could it be made more general? Certainly, but in a 80-line script? Maybe next time.


  list = Mp3List.new("/home/whoami/Music", 730)
  list.files.each do |mp3|
    #~ puts "Processing #{filename}"
    mp3.transfer('/tmp/music/#@artist/#@album/#@title.mp3')
  end

What's left was an example of how to use these classes. Seems good to me- and best of all, it works.

The only thing left was to prepare a patch for the mp3info library for ID3v2.2 support. I actually implemented one (still not incorporated in base), and it also initializes the common fields with either the v2 or v1 data, whatever present (v2 still has precedence, if both are present).

Conclusions from our short session:

Ruby is neat for quick hack jobs

mp3info does not provide an exhaustive ID3 handling support, but is good enough and workable

Pair programming might not be smooth from the start, but you will learn a lot about yourself

Organizing your music can sometimes take longer than total time spent looking for your tracks

You should choose your business trip accomodation place carefully if you can

Friday, April 18, 2008

Twitter: do you follow me?

This week's hacking task was to implement a "follow all" function for Twitter.

Even for Twitter users, this needs some explanation: the follow functionality now means "enable notifications". However, the command interface in IM/SMS wasn't changed, so the command name remains "follow". For brevity, I will use the word "follow" instead of "enable notifications".

The reason for having this command is that there used to be a function "follow all" in Twitter. It used to instantly turn on notifications for all your friends (users you're following in new terminology). Now there's a user, called "all" and the function doesn't work (ok, maybe that's not the real reason). This put an end to a very useful feature for users who rely often on the Twitter IM integration.

Having a quick look at the Twitter API it seemed pretty straightforward to fetch all users and enable notifications for all of them one by one. It would be fairly slow, but there was no information in the user list whether notifications are enabled for a user or not. This would have eliminated the need to send requests for users, for whom we already have notifications enabled. Ah well...

The first tool I reach in my toolbox is Ruby. I tried using JSON, but had to give up- I simply couldn't handle Unicode issues:

/usr/lib/ruby/1.8/json.rb:288:in `chr': 1090 out of char range (RangeError)

It turned that it was much smoother with REXML, and it really is a superior library for XML processing (Python's are either easy or full-featured, REXML seems to be both).

I initially took the path of using 'open-uri' for fetching the data over http. After all, it handled even http base authentication and abstracted the nitty-gritty details, and so was easy to use.

But it isn't meant to be used for more fine-grained control, and I soon ran into performance problems, which required special treatment. I found that I quickly exhausted the rate limit of the Twitter API- it's only 70 requests per hour, and with one request per user... you get the picture. The web interface wasn't actually subject to such restrictions, so I wanted to check how it's doing it. A slightly different URL, but worked like a charm, and rate limits seemed to be no problem now!

This time, though, the script ran much longer- 80 seconds compared to about 30 before the change. I analyzed the requests and found out that each received a 302 response, forwarding back to the home page. That meant that open-uri was downloading the whole home page for each user!

At that point open-uri had to go and make way for Net::HTTP. It took more lines to rewrite it, but now I had the choice not to follow redirect responses. I only needed to toggle notifications and didn't care what I got back (as long as it's not an error code). In addition, I could use the same Net::HTTP object, meaning that I use the same HTTP keep-alive connection (not sure if open-uri can do this).

And here's the result- dirty, but still quick. You can configure the action to "follow" or "leave" (to disable all notifications). You need to configure the user and password. Putting the configuration options as command-line arguments is left as an exercise to the reader.

#!/usr/bin/env ruby

require 'uri'
require 'net/http'
require 'rexml/document'
include REXML

user = "lazyuser"
pass = "notmypassword"
action = "follow"
PAGE_USERS = 100

Net::HTTP.start("twitter.com") do |http|
    page = 0
    begin
        page += 1
        req = Net::HTTP::Get.new("/statuses/friends.xml?lite=true&page=#{page}")
        req.basic_auth(user, pass)

        doc = Document.new(http.request(req).body)
        ids = doc.elements.to_a("/users/user/id")
        ids.each do |entry|
            req_follow = Net::HTTP::Get.new("/friends/#{action}/" + entry.text)
            req_follow.basic_auth(user, pass)
            http.request(req_follow)
        end
    end while ids.size == PAGE_USERS
end

Speaking my (programming) language?

Friday, June 27, 2008

Ruby for mp3 file organizing

Friday, April 18, 2008

Twitter: do you follow me?

Twitter Updates

Blog Archive

About Me