Tuesday, February 26, 2008

Twining: Iron Python DSL for DB Update III

{

My thoughts as exhibited in a little Python script now have a name: Twining. 

Unplanned illness slowed me down over the weekend, but the beauty of lambda expressions make transformations a breeze. I'll let the code document what I added.

cn = "Data Source=.\\sqlexpress;Initial Catalog=MVCBaby;Integrated Security=SSPI;"
mytrans = {"Field1":"upper","Field1": lambda x: x.upper(),"Field3": lambda x: x[::-1]}
database(cn).table("Photos").transform(mytrans).copyto.csv("c:\\foo.csv")

As you can see, "mytrans" is a dictionary containing a matching of column names to both a keyword string, "upper", and also lambda expressions that model functions to operate on the constituent data (the [::-1] idiom reverses a string).  I'd add a few keywords for simple things people often do to data but the beauty of lambda expressions is that it keeps my code lazy and empowers whoever is using the script to make their own judgement about what to do with data.


Here's what goes on behind the scenes - the copyto class has a checkAndTransform method that matches up a field value with it's "transform":


	def checkAndTransform(self, field, value):
if registrar.transforms.has_key(field):
if repr(type(registrar.transforms[field])) == "<type 'function'>":
value = registrar.transforms[field](value)
if registrar.transforms[field] == "upper":
value = value.upper()
return value


I'm not Pythonic enough yet to know if I need to call repr and do a string comparison when testing if a type is a function, but it works just fine.


So a few thoughts:


1. What are the usual changes we make to data when exporting or importing?
2. Is it stupid to have shortcut strings or should I just fall back on writing lambdas?
3. Is there a better way of testing for a function than
repr(type(registrar.transforms[field])) == "<type 'function'>"


The to-do list from my previous post is still intact, I'll be working through it in the near future.  Have a look at the complete script and if you have thoughts or opinions, please let me know. A big goal here is not only to make something useful, but to learn Python (and be Pythonic, the two don't necessarily coincide).


Here's a full list of what can be done at this point:


mytrans = {"Field1":"upper","Field2": lambda x: x.upper(),"Field3": lambda x: x[::-1]}

database(cn).table("MyTable").transform(mytrans).copyto.csv("c:\\foo.csv") #with transform
database(cn).table("MyTable").copyto.csv("c:\\foo.csv") #without transform
database(cn).table("MyTable").copyto.database(cn2)
database(cn).table("MyTable").copyto.xmlFieldsAsAttributes("c:\\here.xml")
database(cn).table("MyTable").copyto.xmlFieldsAsElements("c:\\there.xml")
database(cn).backupto("C:\\temp\\my_backup.bak")



}

6 comments:

Michael Foord said...

Your technique for checking if objects are functions is 'cute'. :-)

The best way is to use the types module from the Python standard library.

from types import FunctionType

isinstance(something, FunctionType)

If you want to avoid the import you can do:

def nothing: pass
FunctionType = type(nothing)

Hope that helps.

Michael Foord said...

You can also replace your lambdas with builtin functions:

str.upper
reversed

Michael Foord said...

Third and final thought (I promise...) - even better than checking for objects that are functions you can simply check if the object is callable:

if callable(something):

This is more Pythonic as it doesn't restrict you to using functions, but allows any callable object.

Unknown said...

David, you're writing code faster than I can read it and comment. :)

What fuzzyman said about using callable - good advice. Writing Pythonically, you try not to think about what an object is, but what you can do with it, because then you don't break polymorphism - if I write some funky object that doesn't look like a function to your test, but can be called, then you still want to accept it.

You can go a step further - not test at all and apply EAFP - Easier to Ask Forgiveness than Permission -

try:
value = registrar.transforms[field](value)
except TypeError:
pass

Also, "if registrar.transforms.has_key(field):" is 100% legitimate, but I like "if field in registrar.transforms" better, because it looks prettier.

Anyway, I like your approach! I'm looking at some ETL code I wrote years ago and thinking that it would be better using your dict-of-transforms idea.

Michael Foord said...

Regarding this approach:


try:
value = registrar.transforms[field](value)
except TypeError:
pass

It has a minor problem that TypeErrors raised inside the transform will then be swallowed. *Personally* I think using callable is slightly better for this situation.

David Seruyange said...

Great feedback!

I changed my function checking 'cuteness' for callable - what you say makes total sense and I ran into it while studying Mark Pilgrim's Dive Into Python.

@fuzzyman - I started with the idea of built in types but the thing that I'm loving about the lambda is that it's a generic, lazy approach that defers exactly which types of "transforms" are needed until the time of use if that makes sense.

@fuzzyman/catherine - I tried to merge both of your comments into the following compromise: I use the try/except from your comment but rather than let it pass I put a list in registrar (which is a goofy way of having a "global" space for things) called "warnings" in which I store the exceptional occurance. After something occurs I'll check the length of warnings and display anything I find in there. Something like:
except TypeError:
registrar.warnings.append("Your transform failed on field %s" % field)

I'll post the updates I've made within the next day or so.