MySQL and CSV to Python Objects (Made Easy)

May 17, 2010

So, I like nice, clean code as much as the next guy, and probably even more. Good code is elegant like poetry. Bad code is gross and offensive. I’ll be honest, I’ve written my share of gross code (I’m certain there’s a special place in software development hell for people like me). Unfortunately, sometimes for one reason or another, we all get put into a position where we just need to hack something together to get a job done. For this blog post, I figured I’d share a couple of the scripts I bust out when it’s go-time.

The first class I’ll introduce is my wrapper around key-value maps (or dictionaries in python parlance) which is at the core of my object-mapping “strategy”. Maybe I’m a little lazy, but I like to avoid typing “[‘somekey’]” as much as possible, especially when “.somekey” is an option. Python allows you to intercept the call to access fields using the “__getattr__” function. In this case, I do just delegate to a dictionary lookup. If it the dictionary doesn’t contain the key, I’ll just use the __getattr__ function of the dictionary. Pretty straightforward. Also as a note, key names that cannot be used as field names won’t be accessible using the ‘.’ syntax, but you can still get them from the “[]” syntax (i.e. obj.2 doesn’t work, but obj[‘2’] does).

class DictObj ( dict ):
  "A wrapper for dictionaries that feel like py objects"
  def __getattr__(self, key):
    if key in self:
      return self[key]
        return super(DictObj, self).__getattr__(key)
    return None

And here are some examples:

d = DictObj({'ka':'Ching!'})
print d.__class__ # shows the DictObj class
print dir(d) # same functions as dict
print d # prints like a dict
print d.ka # looks up fields like an object
print d['ka'] #looks up fields like a map, too
Comma-separated value Files to Objects

The next function I like to use is a few helper for reading/writing comma-separated value files (CSV). The idea here is that if you have a CSV file with a header, we can create objects for each row with minimal coding. Each row in the file is an object, with the field names corresponding to the header. There’s also the inverse operation, taking our DictObj’s (or regular dict’s too) and generating a csv file from them. You can pass in the header you’d like to write out, or else the function will use the keys of the first dict object in the list you’ve passed in as the header. A row is written for each object in the list, with the column values supplied by essentially calling row[‘column_header’] on the row object. Don’t worry, we’ll do an example later.

import csv

def fix_name(n):
  'Fixes a string to be nicer (usable) variable names.'
  return n.replace(' ','_').replace('.','_')

def read_csv(inf, close_file=True):
  'Reads a csv file in as a list of objects'
  def from_csv_line(l, h):
    return DictObj(dict(zip(h, l)))
  iter = csv.reader(inf).__iter__()
  header = map(fix_name,
  for i in iter:
    yield from_csv_line(i, header) 
  if close_file:

def write_csv(out, objs, cols=None, close_file=True):
  'Writes a list of dicts out as a csv file.'
  def to_csv_line(obj, header):
    def lookup_or_empty(o, k, default=''):
      if k in o:
        return o[k]
      return default
    return map(lambda h : lookup_or_empty(obj,h), header)
  iter = objs.__iter__()
  output = csv.writer(out)
  first =
  if cols is None:
    cols = sorted(first.keys())
  output.writerow(to_csv_line(first, cols))
  output.writerows(map(lambda x: to_csv_line(x, cols), iter))
  if close_file:
MySQL Query to Objects

I write a lot of SQL select statements. Tons. It’s also pretty often that what I extract from the DB needs to be combined with more data from another external source. In these cases, I like to use another function that takes the results of a SQL query and generate objects for me to work with. In general, the concept is extremely similar to the CSV reader. Grab each row of the result as an object and use the header to to define the fields. Here, most of the heavy lifting is passed off to the mysql client application. You could easily use the mysql python wrapper, but in this case it was just an arbitrary choice not to. Again, don’t worry, there’s an example at the end that will pull this all together.

def db_query(host, db, query, user, pw):
  "A mysql database query to python objects"
  cmd = 'mysql -h%s -u%s -p%s -e"%s" %s' 
  f = os.popen(cmd % (host, user, pw, query, db))
  header = map(fix_name, f.readline().strip().split('t'))
  for line in f:
    yield DictObj(dict(zip(header, line.strip().split('t'))))
Pulling it all together

Ok, we’ve gone long enough. It’s time for the example. Let’s say I have 5 stocks I own, and I wanted to track the value of my portfolio over the month of April. My portfolio is stored in a database and we can use Yahoo! Finance to get the quotes (we can get the quote data as CSV’s). Ready… GO!


First, we’ll get my holdings. Since you don’t have access to the database, I’ll fake the output.

#holdings = db_query("portfolio_machine", "portfolios", "john", "********", "select symbol, quantity from holdings where userid = 12345;")
holdings = map(lambda x:DictObj({'symbol':x[0],'quantity':x[1]}), 
  [('GOOG','10'), ('AAPL', '11'), ('IBM', '20'), ('NFLX', '5'), ('F', '8')])

Then we need function to download the quotes from Yahoo! Finance. They provide a great resource to get stock quote data as CSV files that we can great objects from.

import urllib2
def get_quotes(symbol, start_y, start_m, start_d, end_y, end_m, end_d):
  url = '' 
      % (symbol, start_m-1, start_d, start_y, end_m-1, end_d, end_y)
  return urllib2.urlopen(url)

So we’ll grab the prices for the quotes and multiply by the quantity. We’d also like to reverse the list, because the quotes are ordered by descending date. I also hard-coded the date, but you obviously don’t need to.

def values(q):
  qs = reversed(list(read_csv(get_quotes(q.symbol, 2010,4,1, 2010,4,30))))
  return map(lambda x:(x.Date, float(x.Close) * int(q.quantity)), qs)

Finally, we’ll sum the values of all the stocks for every day.

def get_perf(hs):
  _1 = lambda x: x[1]
  _0 = lambda x: x[0]
  # Get the performance for each holding
  perf = map(values, holdings)
  # Sum the values together
  total_perf = map(lambda x: (_0(_0(x)), sum(map(_1, x))), zip(*perf))
  return total_perf

print "-- Holdings --"
for h in holdings:
  print '%s: %s' % (h.symbol,h.quantity)
print "-- Performance --"
for x in get_perf(holdings):
  print '%s: %s' % (x[0], x[1])

And we get:

-- Holdings --
GOOG: 10
AAPL: 11
IBM: 20
F: 8
-- Performance --
2010-04-01: 11324.71
2010-04-05: 11422.6
2010-04-06: 11414.19
2010-04-07: 11350.89
2010-04-08: 11376.74
2010-04-09: 11410.95
2010-04-12: 11477.75
2010-04-13: 11649.25
2010-04-14: 11762.24
2010-04-15: 11853.15
2010-04-16: 11369.41
2010-04-19: 11396.17
2010-04-20: 11381.32
2010-04-21: 11522.16
2010-04-22: 11599.22
2010-04-23: 11641.16
2010-04-26: 11552.03
2010-04-27: 11368.35
2010-04-28: 11374.45
2010-04-29: 11508.73
2010-04-30: 11307.65

Ship it. Long live the lambda.


Get the code here: