Extracting e-mails from a vCard file with Python

Let’s say that you have a vCard file. You can export it from your Mac OS X AddressBook.app, or from any other similar application. Now you need to extract some information from it, namely the e-mails, for spamming your friends with some boring news. Typical.

Enter vobject. This Python library is part of the Chandler effort (which seems to be somewhat ill-fated since Mitch Kapor announced he was leaving the project). Anyway, you can download this library from here and then install it using the common sequence:

python setup.py build python setup.py install

Finally, here’s a bit of code to quickly extract the names and e-mail addresses from a vCard file called “vCards.vcf” containing lots of vCard instances, one after the other (AddressBook.app exports data this way, instead of creating on file per contact):

import vobject

f = open("vCards.vcf")
s = "".join(f.readlines())
f.close()

a = vobject.readComponents(s)

counter = 0

while True:
  try:
    \# "next()" seems to throw an exception
    \# when there aren't any more "Components"
    \# in the stream...
    \# Talk about nice flow control!
    b = a.next()
    counter += 1
    if b.contents.has_key('email'):
      \# "repr()" below avoids
      \# unicode --> ascii exceptions
      print repr(b.fn.value), repr(b.email.value)
  except:
    break

print "%d e-mails found." % counter

The library documentation, to put it simply, isn’t as good as the library itself (see? I can be politically correct sometimes :) Maybe I missed a better way to iterate over the contents of the whole stream of vCard instances inside the file (using exceptions for that is yuck!), but then again, feel free to add your comments below as usual.