Tuesday, February 26, 2008

Twining: Iron Python DSL for DB Update III


My thoughts as exhibited in a little Python script now have a name: Twining. 

Unplanned illness slowed me down over the weekend, but the beauty of lambda expressions make transformations a breeze. I'll let the code document what I added.

cn = "Data Source=.\\sqlexpress;Initial Catalog=MVCBaby;Integrated Security=SSPI;"
mytrans = {"Field1":"upper","Field1": lambda x: x.upper(),"Field3": lambda x: x[::-1]}

As you can see, "mytrans" is a dictionary containing a matching of column names to both a keyword string, "upper", and also lambda expressions that model functions to operate on the constituent data (the [::-1] idiom reverses a string).  I'd add a few keywords for simple things people often do to data but the beauty of lambda expressions is that it keeps my code lazy and empowers whoever is using the script to make their own judgement about what to do with data.

Here's what goes on behind the scenes - the copyto class has a checkAndTransform method that matches up a field value with it's "transform":

	def checkAndTransform(self, field, value):
if registrar.transforms.has_key(field):
if repr(type(registrar.transforms[field])) == "<type 'function'>":
value = registrar.transforms[field](value)
if registrar.transforms[field] == "upper":
value = value.upper()
return value

I'm not Pythonic enough yet to know if I need to call repr and do a string comparison when testing if a type is a function, but it works just fine.

So a few thoughts:

1. What are the usual changes we make to data when exporting or importing?
2. Is it stupid to have shortcut strings or should I just fall back on writing lambdas?
3. Is there a better way of testing for a function than
repr(type(registrar.transforms[field])) == "<type 'function'>"

The to-do list from my previous post is still intact, I'll be working through it in the near future.  Have a look at the complete script and if you have thoughts or opinions, please let me know. A big goal here is not only to make something useful, but to learn Python (and be Pythonic, the two don't necessarily coincide).

Here's a full list of what can be done at this point:

mytrans = {"Field1":"upper","Field2": lambda x: x.upper(),"Field3": lambda x: x[::-1]}

database(cn).table("MyTable").transform(mytrans).copyto.csv("c:\\foo.csv") #with transform
database(cn).table("MyTable").copyto.csv("c:\\foo.csv") #without transform


Thursday, February 21, 2008

Iron Python DSL for DB Update II


Attepting releasing early & often - here are a few new features to the ideas I'm putting together for the Database specific DSL.  I'll summarize all that's possible with it now with the new stuff highlighted in red.

#some connection strings
cn = "Data Source=.\\sqlexpress;Initial Catalog=MVCBaby;Integrated Security=SSPI;"
cn2 = "Data Source=.\\sqlexpress;Initial Catalog=Irvine;Integrated Security=SSPI;"

#some things to do

The overall idea is to build a language centric, developer oriented implementation of what is done in a tool like SSIS or its predecessor DTS - I'm using SQL Server for a start but there's no reason this can't include other database platforms. It's quite exciting that even without knowing much Python I can get that much done in the language.  I'll spend time refactoring and reorganizing the script I've started with, but please feel free to download and modify it. Please post any improvements or ideas for how I can improve the Python and added any other features to the dsl. (Point on using __getattr__ well taken).

My todo list?
1. Aforementioned "transforms" for data in migration
2. Exception handling :^)
3. Copying tables is a rough sketch but needs a lot of work
4. Export formats besides CSV and XML: arbitrary delimiters and fixed field length
5. __getattr__ :^)


Tuesday, February 19, 2008

Iron Python DSL for ETL


Hanselman wrote about "fluent" interfaces recently which confirmed something I'd been thinking for a while. What if rather than massive tools we just had a language to express ourselves when we wanted to move data around?  What if we could say something like:

database("connection string").table("table").copyto.csv("C:\\spam\\eggs.csv")
database("connetion string").table("table").copyto.database("connection string 2")

In the spirit of releasing early and often, here's an implementation of the above language for sql server with Python. I'll be extending the flexibility quite a bit, and my Python skills are still those of a learner, but here it is for comment.  Before I post the whole script, here are a few goals:
1. XML data:
database("connection string").table("table").copyto.xml("C:\\spam\\eggs.xml")
2. Transforms:
transformDictionary = {'column1':'upper', 'column2':'some regex'} #these could become a fluent interface of their own.
database("connection string').table("table").withTransform(transformDictionary).copyto.csv("C:\\spam\\eggs.csv")

So here's that rather unrefined code download the latest implementation

Below was just some scratchwork from a Friday afternoon when I started sketching:

# 02/08/2008 - David Seruyange - @FRIDAY BURNOUT
import clr
from System import *
from System.Data import *
from System.Data.SqlClient import *
from System.IO import *

class registrar:
tableRegister = ""
databaseRegister = ""

class database:
"Model of a database somewhere"
def __init__(self, connectString):
"Initialize the rectangle"
self.connect = connectString
registrar.databaseRegister = connectString
self.copyto = copyto()

def table(self, t):
registrar.tableRegister = t
return self

class copyto:
"Model of a copy of some object"
def csv(self, path):
""" Save the result to a CSV file """
cn = SqlConnection(registrar.databaseRegister)
cmd = SqlCommand("select * from " + registrar.tableRegister, cn)
r = cmd.ExecuteReader()
fileOut = open(path, 'w')
while r.Read():
sOut = ""
for i in range(r.FieldCount):
sOut += str(r[i]) + ","
sOut += "\n"

def database(self, connect):
(ddl, colInfo) = buildDDL(registrar.tableRegister)
cnTarget = SqlConnection(connect)
cmdBuildTargetTable = SqlCommand(ddl, cnTarget)
cmdBuildTargetTable.ExecuteNonQuery() #builds the target table
cnSource = SqlConnection(registrar.databaseRegister)
cmdSourceIter = SqlCommand("select * from " + registrar.tableRegister, cnSource)
sourceReader = cmdSourceIter.ExecuteReader()
insertCode = ""
insertTemplate = "INSERT INTO " + registrar.tableRegister + " VALUES("
for i in range(sourceReader.FieldCount):
insertTemplate += "'" + str(sourceReader[i]) + "',"
insertCode += insertTemplate[:-1] + ")\n"
cmdExecuteInsert = SqlCommand(insertCode, cnTarget)
# "save: %s\n table: %s\n to path: %s" % (registrar.databaseRegister, registrar.tableRegister, connect)

def buildDDL(tName):
query = """
c.[name], c.max_length, c.[precision], c.scale,
ty.[name] as typename
sys.columns c
join sys.tables t on c.object_id = t.object_id
join sys.types ty on c.system_type_id = ty.system_type_id
t.[name] = '__table__'
query = query.replace('__table__', tName)
cn = SqlConnection(registrar.databaseRegister)
cmd = SqlCommand(query, cn)
r = cmd.ExecuteReader()

ddl = """

columnsDefinition = ""
colTypeInfo = []
while r.Read():
#todo: support for other column types such as numeric or decimal
columnTemplate = columnTemplate.replace('COLUMN_NAME', str(r[0]))
columnTemplate = columnTemplate.replace('DATATYPE', str(r[4]))
if str(r[4]) == "varchar":
columnTemplate = columnTemplate.replace('LENGTH', "(" + str(r[1]) + ")")
elif str(r[4]) == "char":
columnTemplate = columnTemplate.replace('LENGTH', "(" + str(r[1]) + ")")
columnTemplate = columnTemplate.replace('LENGTH', "")
columnsDefinition += columnTemplate + "\n"

ddl = ddl.replace('__xyz__', tName).replace('__columns__', columnsDefinition).replace(",\n)", "\n\t)")
return [ddl, colTypeInfo]

cn = "Data Source=.\\sqlexpress;Initial Catalog=MVCBaby;Integrated Security=SSPI;"
cn2 = "Data Source=.\\sqlexpress;Initial Catalog=Irvine;Integrated Security=SSPI;"
# jepp
#print "---"


Monday, February 18, 2008

Release Early & Often


Paul Buchheit (who we can thank for Gmail) comments on a post by Marc Andreessen:

For web based products at least, there's another very powerful technique: release early and iterate. The sooner you can start testing your ideas, the sooner you can start fixing them.

I wrote the first version of Gmail in one day. It was not very impressive. All I did was stuff my own email into the Google Groups (Usenet) indexing engine. I sent it out to a few people for feedback, and they said that it was somewhat useful, but it would be better if it searched over their email instead of mine. That was version two. After I released that people started wanting the ability to respond to email as well. That was version three. That process went on for a couple of years inside of Google before we released to the world.

Interesting and it goes along with something Guy Kawasaki was saying in a recent podcast I commented on hearing.  So the next question comes down to technique - when you're working on something by yourself, how do you schedule builds?  I'm thinking I may start to take deliberate "coding vacations" - days I take off* from work in order to push through releases on my various little software projects.  Hm... other thoughts?

*My employer offers "comp days" so it's not at the expense of my day job.


Saturday, February 16, 2008

Check File Existence, Javascript (Prototype)


If you're like me, you spend a lot of time using System.IO and can easily check for the existence of a file on the filesystem with a File.Exists(@"C:\spam\eggs.file")

It's a little surprising that I hadn't run into it before but a few weeks ago I needed that same functionality with javascript. I found a few things online and what works best is using AJAX to make a request and check the HTTP status for whether or not the request failed.  My approach was to use Prototype and something like this:

function checkForFile(imgName, displayHolder){
new Ajax.Request('Repo/' + imgName + '.jpg',
onSuccess: function(transport){
var response = transport.status || "no response text";
if(parseInt(response) == 200){
displayHolder.src = 'Repo/' + imgName + '.jpg';
onFailure: function(){
displayHolder.src = 'Repo/NoImage.gif';

Works like a charm.




Some good things to repeat to yourself from Dormando. I've a long way to go on so many of these and was geeky enough to give myself a scorecard.

I'll spare you that (after just deleting a lot of babble that would only be interesting to me) but there are two areas that I can do with some serious improvement:

1. Relentless automation
It's shameful how often I'm doing things I could easily automate (repeated FTP sessions?)
2. Monitoring
I wonder how out of line it would be to programmatically enable "breadcrumbs" for user's activity?  Finding out what they really used and what they insisted upon for nothing.


Saturday, February 09, 2008

Spitting Code Like Lam


I started changing my Visual Studio settings after seeing Hanselman's post and thought I'd be able to spit fire like John Lam once I'd downloaded his settings (it's gotta be the shoes!). The contrast was a bit much (though working in low light helped) and after a night's worth of being like JLam I wanted to go back to (dare I admit?) VS default settings. That didn't work out too well (even using the "default settings" option in VS, go figure) so I opted for Brad Wilson's settings which I've grown to like.

When the IDE "hot or not" thing first cropped up I changed settings but what makes me boring, preferring the default, is that I'm on a lot of machines at different times; my laptop, desktop, remote on a couple of servers, and when I help others.


Monday, February 04, 2008

Grabbing SQL 2005 Blobs with IronPython


On Saturday I needed to grab some blob data and create files out of it. The following was some Python I used, inspired very much by this code.

The biggest point of interest is the strongly typed array, which in IronPython is simple once you know how: using Array.CreateInstance.



# 02/02/2008 - David - @THE COMMAND CENTRE

import clr
from System import *
from System.Data import *
from System.Data.SqlClient import *
from System.IO import *

CONNECT = "Data Source=(local)\sqlexpress;Initial Catalog=DATABASE;Integrated Security=SSPI"

sql = "select ID_FIELD, BLOB_FIELD from product_set"
cn = SqlConnection(CONNECT)
cmd = SqlCommand(sql, cn)
reader = cmd.ExecuteReader(CommandBehavior.CloseConnection)
buffSize = 100
buff = Array.CreateInstance(Byte, buffSize)
recCounter = 0
while reader.Read():
print "Creating " + str(reader[0])
fs = FileStream(str(reader[0]) + str(recCounter) + "_diagram.png", FileMode.OpenOrCreate, FileAccess.Write)
bw = BinaryWriter(fs)
if(reader[1] != DBNull.Value):
posit = 0
b = reader.GetBytes(1, posit, buff, 0, buffSize)
while(b == buffSize):
posit = posit + buffSize
b = reader.GetBytes(1, posit, buff, 0, buffSize)
bw.Write(buff, 0, b)
recCounter = recCounter + 1


I'd like to clean it up a bit, but I've written about 5,000,000 things that I "intend" to clean up before posting and they're never posted.


What Comes First; Rocks


Oliver Steele has a new essay entitled Adding the Easy Piece; or, The Metaphor of the Rock.  Insightful as usual, and I love the visualizations.

Personally, I'm experiencing the effects of pushing a rather large rock around a room.