Monday, March 03, 2008

Twining: Iron Python DSL for DB Update IV

{

long entry because it was written at an airport, just back from celebrating 3 years with my beautiful wife in Monterey, CA...

A few interesting things, but first just a summary of what can be done with what's new in red:

#connection strings
cn = "Data Source=.\\sqlexpress;Initial Catalog=DB_SOURCE;Integrated Security=SSPI;"
cn2 = "Data Source=.\\sqlexpress;Initial Catalog=DB_DESTINATION;Integrated Security=SSPI;"

#back up the database to a file
database(cn).backupto("C:\\temp\\loungin.bak")

#copy/export basic
database(cn).table("MY_TABLE").copyto.csv("c:\\spam.csv")
database(cn).table("MY_TABLE").copyto.xmlFieldsAsAttributes("c:\\spam.xml")
database(cn).table("MY_TABLE").copyto.xmlFieldsAsElements("c:\\spam.xml")

#pick your delimiter
database(cn).table("Photos").copyto.delimited("\t", "c:\\spam.tsv")

#copy to database
database(cn).table("MY_TABLE").copyto.database(cn2)

#all copy operations can leverage transform dictionary
mytrans = {"FIELD1":"upper","FIELD2": lambda x: x.upper(),"FIELD3": lambda x: x[::-1]}

database(cn).table("MY_TABLE").transform(mytrans).copyto.xmlFieldsAsAttributes("c:\\spam.xml")
database(cn).table("MY_TABLE").transform(mytrans).copyto.xmlFieldsAsElements("c:\\spam.xml")
database(cn).table("MY_TABLE").transform(mytrans).copyto.delimited("\t", "c:\\spam.tsv")
database(cn).table("MY_TABLE").transform(mytrans).copyto.database(cn2)

... and so on, see above

Some internal changes:
1. Error Handling, quite lazy for now simply stores a combination of a friendly message along with full text of the exception thrown:

class errorhandle:
    def print_error(self, text, error):
        """ Handle an exception by just printing a friendly message and then the error's guts"""
        print text
        print error

try:
    sqlhelper().run_sql(registrar.databaseRegister, sqlBackup)
except Exception, inst:
    errorhandle().print_error("Error occured during backup", inst)

2. Some Pythonic prettiness (via Catherine):

#if registrar.transforms.has_key(field): -- es muy fao!
if field in registrar.transforms:       

I do like the more idiomatic form myself.  This a perfect example of where, as a .NET developer I look around for methods and miss a beautiful idiom.

3. Right under my nose, back to idiom, were a few simple pythonic forms that took away a lot of the line noise and clutter I was producing during my initial rabid sketches of the idea. Doing multiple assignment out of a SqlDataReader is something I only thought of now, though it makes such perfect sense:

column_name, column_type, column_length, column_identity = \
    r["name"], r["typename"], str(r["max_length"]), r["is_identity"]

4. Returning tuples rather than lists.  Tuples are more effecient and as effective.  A question that's begun burgeoning now is whether excessive use of tuples is lazy when a specific type is more clear.  For example, when I read column data from a table, I need to know two things: the column name and whether it's an identity (autonumbering). Right now I'm doing the following lazy gesture:

colTypeInfo.append((column_name, column_identity)) #(column_name, column_identity) is the tuple added to the list.

But if I start to add other useful bits of info, like the datatype of the underlying column, I wonder if a class would be better since you could have something like:

colTypeInfo.append(ColumnData(column_name, column_identity, column_type)) #Where ColumnData is a type somewhere

Right now the tuple approach works fine and a class feels like overkill. For now, I'm just afraid of it bloating as needs present themselves...

5. Speaking of identity columns, I added better support for them in tables being copied.  I also now use brackets for column names since certain people like to use reserved words for columns (or don't have a choice).

6. I mentioned in a comment that I tried to reconcile feedback from both Catherine and Michael "Fuzzyman" Foord.  When I match columns for transformations I allow any TypeError to be suppressed, but also store it in a "warnings" list that I'll incorporate into the completed process.

def checkAndTransform(self, field, value):
    """ Perform transformation on fields based on either:
        a) keyword entries ("upper", "lower", etc... )
        b) lambda expressions
    """
    if field in registrar.transforms:       
        if callable(registrar.transforms[field]):
            try:
                value = registrar.transforms[field](value)
            except TypeError:
                registrar.warnings.append("Your transform failed on field %s" % field)
        elif registrar.transforms[field] == "upper": #more of this kind of stuff _maybe later.
            value = value.upper()
    return value   

As always you can have a look at the entire thing here.  It's been great getting guidance, any comment is welcome and will be studied.

What is on the list?
1. I've been using a cheap, unstudied approach to delimited files. I'll need to look at the "official" way of escaping entries.
2. Importing data
3. A few more variants of operating on databases, like "create."

}

1 comment:

Muthu SEO Expert India said...

Hi,
Nice Blog!
We provides exclusive and highly accurate Data, data coding at least of 99.995% accuracy. With complete confidentiality and high level of accuracy the customers prefer to outsource their jobs to us. We Offer services for all kinds of textual data capturing from printed matter, manuscripts, scanned images, web research etc. with high accuracy levels.