Saturday, March 29, 2008

Data Access Strategy


What data access strategy do you use in your .NET work? The question has been on the mind of Scott Hanselman it seems because the last few podcasts have covered two different strategies - LINQ and CLSA.  Tonight I noticed that the ADO guy has a poll of his own with some interesting results:

Web Poll Powered By MicroPoll


I will admit that I'm pretty consistently using ADO.NET + Datasets but more so because the project I'm working on was well under way before the release of VS 2008. I wonder of those responding how many are working on new versus older, longer projects?


Wednesday, March 26, 2008

In Defense Of Lurkers


Hanselman's 7 Blogging Statistics Rules... post was good medicine for a haphazard blogger like me. There was one portion to which I had comment:

I feel like we've (that means me and you, Dear Reader) have a little community here. When you comment, I am happy because I feel more connected to the conversation as this blog is my 3rd place. I blog to be social, not to have a soapbox. I'm even happier when the comments are better and more substantive than the post itself. I would take half the traffic and twice the comments any day. If you're a "lurker," why not join the conversation?

I've spent most of my time on the web lurking.  It started with a mailing list for people who like electronic music ("IDM") I joined while in college - in the 10 years it's been I've probably posted less than 20 messages.  But I have spent a lot of time as a curator for posts I've enjoyed as well as making notes of what music I would investigate the next time I was in a record store. 

In a similar fashion I tend to hang onto posts I enjoy and go back to them.  To out myself further as a geek I even print out the ones I'd like to focus on so that I can read them offline at a coffee house or bookstore. I file them away and when I do my occasional cleaning it's hard not to smile and pause to re-read something that was good on the first effort. It's hard to measure us lurkers because we don't vocalize our responses right away and when we do finally get them the post may have tens or hundreds of comments at which point all seems moot.  But I think as a lurker I can be a better blog consumer in the sense that I hang onto what's said a little more - it's not just a random post I skimmed in the aggregator. 

Forgive the lurker as they make sputtering or failed attempts at immediate conversation, but maybe that conversation will be like the one you have with a great dead tree publication being read and reread.




simplejson is a JSON library for Python 2.3+, courtesy of here.  A while back I goofed off with IronPython and Prototype writing a web based "shell" inerface.  I may have to return to that project although my problem wasn't really with encoding, it was with keeping context while using multiple commands (like cd .. and then dir with the new directory as your location) in a session with cmd.exe. I've been meaning to write the PoshConsole guy to ask specifically about this...


Chris Wilson interviewed on Pixel8


The interview is here, it's good stuff. Imagine a half billion users. That's a humbling thought...


Tuesday, March 25, 2008

Jeff Zeldman, Pixel8 Interview


Joel's got a big enough voice that I'm sure most people didn't miss his Martian Headphones essay which explained the futility of standards.  Jeff Zeldman, who I've always been aware of in my time lurking on the web chimes in on web standards on Craig Shoemaker's new gig, "Pixel 8."  The podcast sound is very rough but it's a good context builder for the notion of standards and what they mean.  If you've ever had to wrangle with a different look in different browsers, you appreciate standards even if they are complicated and never achieve perfection.

Just noticed that Pixel8 has a podcast with Chris Wilson. That will have to wait until tomorrow -


Unit Tests with Iron Python


"Testing is the engineering rigor of software development" - Neal Ford

Got a little tip from fuzzyman on using unittest for Unit Testing with Iron Python. Because I'm a slouch I had an old IronPython 1.1 release and kept getting a BaseException error when my tests failed.  The class has since been implemented and should work with the latest version of IronPython that you can download (I'm using a beta release of 2.0).

Here is the code from my unit tests, of course it's destined to change but I was in the mode of getting it to work with IronPython:

from Twining import *
import exceptions
import random
import unittest
import os

class TestSequenceFunctions(unittest.TestCase):

def setUp(self): = "Data Source=.\\sqlexpress;Initial Catalog=MVCBaby;Integrated Security=SSPI;"

def test_exportcsv(self):
""" Export a CSV style data """
outPath = "c:\\foo.csv"
self.assert_(os.path.exists(outPath), "File was not created")

def test_exportxmlattribstyle(self):
""" Export xml data attribute style """
outPath = "c:\\xml_as_attrib.xml"
self.assert_(os.path.exists(outPath), "File was not created")

def test_exportxmlelementstyle(self):
""" Export xml data attribute style """
outPath = "c:\\xml_as_elem.xml"
self.assert_(os.path.exists(outPath), "File was not created")

def tearDown(self):
if os.path.exists("c:\\foo.csv"): os.remove("c:\\foo.csv")
if os.path.exists("c:\\xml_as_attrib.xml"): os.remove("c:\\xml_as_attrib.xml")
if os.path.exists("c:\\xml_as_elem.xml"): os.remove("c:\\xml_as_elem.xml")

if __name__ == '__main__':

You can download this and the latest version of things from the new Twining page


Monday, March 24, 2008

Twining Updates


What's the big picture and why are you doing it?

A comment from my last post got me thinking I need to get back to the big picture. Robert wrote:

Wow. That looks like a hard way to solve your problem.
If you absolutely have to mangle Excel data through Python and into a database... can I recommend Resolver? It won't cost you anything and it might just solve your problem.

I'll post about Resolver in a second but I want to go back to my original thoughts on developing Twining. I spend a lot of time doing quick and dirty types of data import and export.  Sometimes it's grabbing stuff off a production system so that I can develop off of it. Other times we get a simple file from someone and we need to pull it into the database.  Still other times we move data from one location to the next because we've done staging or cleanup. 

Traditionally the Microsoft tools for this are quite heavy.  DTS, which I used a lot, and SSIS which I haven't used much but have heard and read about are both GUI/IDE approaches towards the problem.  SSIS in particular isn't an out of the box solution for a developer like myself who doesn't always install every iota of SQL Server tool there is to install.  Usually I can express my problems in a simple sentence:

"Copy the data from this table to that table."
"Build an insert script for this table's data"
"Import this CSV file to this database"

I wondered why language couldn't be used as a tool for this type of work rather than launching a big GUI tool or writing throwaway C# console applications for the job.  That's the origin of the thought, cultivated when I heard Neal Ford give a talk at CodeMash on DSLs.  A single "sentence" lodged itself in my head and I thought, wow, that would be a nice solution:


I have to admit to being more of a gardener than an architect when it comes to these things so I started a notepad file with some "forms" I thought would work and one friday afternoon, after being burned out on regular work I started what I call "sketching"; seeing how things might work. Since then I've been gnawing on it in a public way since learning Python and thinking in a Pythonic way are probably even more important than my little module which may end up just being fabulous meware.

I had posted that I had a little database import project of about 110,000 records distributed over two excel files.  The story I got was exactly the one I envisioned:

cn = "Data Source=.\\sqlexpress;Initial Catalog=MyDB;Integrated Security=SSPI;"
colmap = {"Field1":0, "Field2":5, "Field3":7} #destination:source - you may want to use the same source multiple times...

I will admit it wasn't perfect the first time - my lazy use of string concatenation rather than the .NET StringBuilder was a huge penalty (~10 minutes). I rewrote those pieces and the whole process took less than 30 seconds. Cheers for ADO.NET on that one - the updates were _massive.

So, back to Resolver; I read a glowing review from Larry O'Brien (his column is the first thing I read in SD Times when it crosses my desk) and the comment propelled me to watch a few screencasts and dig around but the goal of the product wasn't quite in line with the story I recount above.  Perhaps it's possible but it seems more like an interface in which one dwells on providing complex functionality in the spreadsheet rather than just quick and dirty import/export business.

More updates forthcoming but a short note: I spent a good part of the weekend trying to read through the code of ConfigObj 4 courtesy of Michael Foord (aka "Fuzzyman" aka "Voidspace").  I'll dwell on it further soon but one thing I realized I'd done that was a big mistake is to break the project into lots of files. I put them back together again and am in the process of figuring out how and where to host them. CodePlex/Google Code? Not sure if it's ready for that yet.  My own abode t3rse might work but I'd like to figure out how to generate documentation before I do that...

Enough said, here is what's latest, details and unit tests tomorrow.


Wednesday, March 19, 2008

Twining: Iron Python DSL for DB Update V


I'm still at it with Twining. A few projects and events have distracted me temporarily but I had a few moments last night to do a few things that I've intended to for some time.  First a summary of functionality with what's new in red.

cn = "Data Source=.\\sqlexpress;Initial Catalog=MyDB1;Integrated Security=SSPI;"
cn2 = "Data Source=.\\sqlexpress;Initial Catalog=MyDB2;Integrated Security=SSPI;"

# dest, source (you can use the same source column over and over)
# dest, source can be field names, example below uses ordinal position
colmap = {"DestinationField1":1, "DestinationField2":2, "DestinationField3":1}

#old stuff
mytrans = {"Field1":"upper","Field2": lambda x: x.upper(),"Field3": lambda x: x[::-1]}
database(cn).table("MyTable").transform(mytrans).copyto.delimited("\t", "c:\\foaf.tsv")
database(cn).table("MyTable").copyto.delimited("\t", "c:\\foaf_no_trans.tsv")

Footnotes: I'm working on changing my implementation of all things CSV/delimiter to using ADO.NET providers - the manually concatenated strings were just proof of concept. Back in 2004 in a thread that none other than Joel Spolsky responded to I learned not to try to reinvent a very hard CSV wheel.

The other bigger change I made was breaking the project apart. What was originally in multiple files as a "sketch" was becoming a nuisance besides poorly structured. I've deconstructed into the following files: - holds global, static data. - runs dynamic sql statements - exception handling - reference to databases, constituent tables, and holds the copyto and importfrom objects - object for exporting as CSV, XML, and database to database - object for importing data (currently on rough sketch for CSV import)

Most remains unchanged from the last upload but here is the code for the importfrom functionality:

import clr
from System import *
from System.Data import *
from System.Data.SqlClient import *
from System.Data.OleDb import *
from System.IO import *

from registrar import *
from sqlhelper import *

class importfrom:
"""Model data import process"""

def __init__(self):
self.columnmap = {}

def build_destination(self):
""" If destination does not exist, construct table"""

def csv(self, path):
""" Import data from a csv file """
fileName = path.split("\\")[-1]
fileDir = path.replace(fileName, "")
connect = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='text;HDR=No;FMT=Delimited'"
csvConnection = OleDbConnection(String.Format(connect, fileDir))
csvCommand = OleDbCommand("select * from " + fileName, csvConnection)
sourceReader = csvCommand.ExecuteReader(CommandBehavior.CloseConnection)
masterInsert = ""

while sourceReader.Read():
cols, vals = "", ""
for k in self.column_map.keys():
cols, vals = cols + "[" + k + "],", vals + "'" + str(sourceReader[self.column_map[k]]) + "',"
masterInsert += "insert into " + registrar.tableRegister + "(" + cols[:-1] + ")VALUES(" + vals[:-1] + ")\n"

sqlhelper().run_sql(registrar.databaseRegister, masterInsert)
except Exception, inst:
errorhandle().print_error("Error occured during import", inst)

Download everything here as a zip file. I've hopefully carved a little space in the schedule amongst various projects to spend more time going back and polishing things. I've just been tasked with migrating about 120,000 records out of Excel so this should provide a good opportunity to add functionality and test it's breakability for the type of thing I often have to do.


Wednesday, March 12, 2008

Brash Boys


When I read the article on 37 Signals in Wired, I admit I alternated between shaking my head, laughing, and sighing.

When Hanson commented upon the term arrogance, I laughed:

"Arrogant is usually something you hurl at somebody as an insult," Hansson said. "But when I actually looked it up — having an aggravated sense of one's own importance or abilities' — I thought, sure."

When the article describes him as a "philosopher king" I shook my head.  And his response to naysayers, "F--- you" made me sigh; not a convincing piece of rhetoric at all. 

Despite all this, despite Atwood's recent swipe at what he calls "douchebaggery," despite the disagreement of Don Norman, there is something there. 

It's not that I'm one of the fanboys of the Ruby or Apple world, or that I agree with Hansson's attitude towards windows developers - it's that I think it's a trait of people I call successful to be able to concentrate and get things done without regard for outside opinions.  It's less of a "stay the course" mentality than an understanding of self and drawing clear lines with work rather than spending a lot of time second guessing and self doubting.

One area in which I can look at hard data and side with 37signals is in the realm of additional or complex features; I really do question whether they are worth it.  I just recently quantified this with an application we wrote for a customer that they are using successfully.  I compared the baseline functionality of a particular feature to some exceptional scenarios I had to write code for - it was easy because one can tell the usage by counting records in the database.  Anyhow, for the baseline functionality I counted some 6,100 instances of its use.  For the exceptional scenarios, I counted 9 entries.  Here's the clincher: it took about 3 or 4 times as long to implement support for the exceptional scenarios (and it added a lot of complexity to the code for the baseline feature).  So the ability to handle about one tenth of a percent of scenarios, cost and complexity was inflated significantly.

Here's where I disagree with them: it's not my decision, it's the customer's. My goal should be to communicate that trade off as well as I can.


Sunday, March 09, 2008

VB.NET Design


Paul Vick posts on some design considerations for future versions of VB.NET and hits upon something I was complaining about just this week - the fact that one has to speckle their code with line continuation characters in places where they should be implicit.  This seems to happen for me in two spots, assigning parameters and string concatenation. 

It's reassuring that not only does the VB.NET team listen, but they are looking for solutions as well.


Wednesday, March 05, 2008

Mix 08 Keynote


Just watched it while working on some stuff tonight.  Good things seem to be happening although much of it is in CTP or Beta form right now.  I'll need to resurrect my VPC images to install and play when I have the time. What's baking that's cool?

IE 8 - the integrated Firebug like debugging is what's got me excited for now.
Silverlight 2.0 - Dino Esposito joked about how Silverlight 1.0 was a declarative way to produce an animated gif (unless you just want a video player).  Now that a real control library is being developed, people like me will be able to seriously think of ways to use it.  I don't stream video but I do write a lot of code :)
Expression 2.0 / WPF - Ditto on Silverlight 2.0, WPF needs controls - and not just the kind you have to pay $$$ for.  In my best Steve Balmer voice I'd say "controls! controls! controls!" The blank canvas thing is cool but limited when you need to build something on a real world timeline with real world resources.

I didn't see ASP.NET MVC stuff in the keynote under demonstration but it was mentioned. Via Hanselman, Preview 2 is released today.

John Lam seems to be writing a multi-part article on DLR + IronRuby - a cool way to experiment with Silverlight although now I'm officially in love with Python and may not have room for a while. 

I'm excited to see what happens but I'm skeptical to put too much effort into this stuff until it's properly released.  Since I'm no longer a trainer and the apps I write have to be installable on client machines, I have less flexibility and time to get full fledged into beta stuff. It will also be interesting to see who wins the Silverlight / Flash war (and a war it will be). Microsoft is building its installed base pretty fast and with a better developer toolkit I will be interested to see what Adobe does with Flash.

Lots more lurking to do - even though all the cool people are at MIX, so much is online and so many people are twittering (I now have a "Twit" folder in my reader), one can almost imagine themselves there.


A Gem in the Morning Coffee


Harry "DevHawk" Pierson comments (emphasis mine):

My opinion, since you asked Nick, is that EA (Enterprise Architecture) fails to deliver value because it tries to control the uncontrollable. Trying to gain efficiency thru establishing standards and eliminating overlap via reuse are pipe dreams, though literally millions of $$$ have been poured into those sink-holes. There are a few areas where centrally funded infrastructure projects can solve big problems that individual projects can't effectively tackle on their own. EA should focus their time there, they can actually make a difference. Otherwise, they should stay out of project's way.

I concur and there's follow up if you read on in his blog, but this was quite succinct.

A funny little aside: I was called for a marketing survey and agreed to it after I found the reward was a $50 coupon for Amazon. The woman caller asked my title and I said "Software Architect" (my given title, though I really fit the mould of a lead developer and plan to have it changed).  She paused on "architect" and asked: "But do you write code?"  I answered in the affirmative and had to do so to variants of the question (e.g. "But do you implement specifications in code? But do you really know C#?") about 4 or 5 times.  Moral of the story, don't use architect in your title, use developer. Unless you're Don Box.


Just Do It Yourself


I go back and forth on the very thing Larry calls "Lead-Developer Compression Ratios." Two things happen over the long haul that affect my thinking: first, even though it's time consuming communicating needs to a junior developer, it seems to pay off when it comes to fine tuning and bug fixing - it's not something you have to worry about since it's delegated. "Finished" in my world means it's been through a few cycles of testing, cycles that take a lot of time.  The second thing though is when something is delegated and I think is done fairly well only to find at the aforementioned testing/bug fix cycle is horribly broken and either has to be hacked into a technical debt ridden working order or needs to be done again - from scratch.  It seems to come down to the complexity of the tasks - if it's simple you can take the time, if it's complex just do it yourself.


Monday, March 03, 2008

Twining: Iron Python DSL for DB Update IV


long entry because it was written at an airport, just back from celebrating 3 years with my beautiful wife in Monterey, CA...

A few interesting things, but first just a summary of what can be done with what's new in red:

#connection strings
cn = "Data Source=.\\sqlexpress;Initial Catalog=DB_SOURCE;Integrated Security=SSPI;"
cn2 = "Data Source=.\\sqlexpress;Initial Catalog=DB_DESTINATION;Integrated Security=SSPI;"

#back up the database to a file

#copy/export basic

#pick your delimiter
database(cn).table("Photos").copyto.delimited("\t", "c:\\spam.tsv")

#copy to database

#all copy operations can leverage transform dictionary
mytrans = {"FIELD1":"upper","FIELD2": lambda x: x.upper(),"FIELD3": lambda x: x[::-1]}

database(cn).table("MY_TABLE").transform(mytrans).copyto.delimited("\t", "c:\\spam.tsv")

... and so on, see above

Some internal changes:
1. Error Handling, quite lazy for now simply stores a combination of a friendly message along with full text of the exception thrown:

class errorhandle:
    def print_error(self, text, error):
        """ Handle an exception by just printing a friendly message and then the error's guts"""
        print text
        print error

    sqlhelper().run_sql(registrar.databaseRegister, sqlBackup)
except Exception, inst:
    errorhandle().print_error("Error occured during backup", inst)

2. Some Pythonic prettiness (via Catherine):

#if registrar.transforms.has_key(field): -- es muy fao!
if field in registrar.transforms:       

I do like the more idiomatic form myself.  This a perfect example of where, as a .NET developer I look around for methods and miss a beautiful idiom.

3. Right under my nose, back to idiom, were a few simple pythonic forms that took away a lot of the line noise and clutter I was producing during my initial rabid sketches of the idea. Doing multiple assignment out of a SqlDataReader is something I only thought of now, though it makes such perfect sense:

column_name, column_type, column_length, column_identity = \
    r["name"], r["typename"], str(r["max_length"]), r["is_identity"]

4. Returning tuples rather than lists.  Tuples are more effecient and as effective.  A question that's begun burgeoning now is whether excessive use of tuples is lazy when a specific type is more clear.  For example, when I read column data from a table, I need to know two things: the column name and whether it's an identity (autonumbering). Right now I'm doing the following lazy gesture:

colTypeInfo.append((column_name, column_identity)) #(column_name, column_identity) is the tuple added to the list.

But if I start to add other useful bits of info, like the datatype of the underlying column, I wonder if a class would be better since you could have something like:

colTypeInfo.append(ColumnData(column_name, column_identity, column_type)) #Where ColumnData is a type somewhere

Right now the tuple approach works fine and a class feels like overkill. For now, I'm just afraid of it bloating as needs present themselves...

5. Speaking of identity columns, I added better support for them in tables being copied.  I also now use brackets for column names since certain people like to use reserved words for columns (or don't have a choice).

6. I mentioned in a comment that I tried to reconcile feedback from both Catherine and Michael "Fuzzyman" Foord.  When I match columns for transformations I allow any TypeError to be suppressed, but also store it in a "warnings" list that I'll incorporate into the completed process.

def checkAndTransform(self, field, value):
    """ Perform transformation on fields based on either:
        a) keyword entries ("upper", "lower", etc... )
        b) lambda expressions
    if field in registrar.transforms:       
        if callable(registrar.transforms[field]):
                value = registrar.transforms[field](value)
            except TypeError:
                registrar.warnings.append("Your transform failed on field %s" % field)
        elif registrar.transforms[field] == "upper": #more of this kind of stuff _maybe later.
            value = value.upper()
    return value   

As always you can have a look at the entire thing here.  It's been great getting guidance, any comment is welcome and will be studied.

What is on the list?
1. I've been using a cheap, unstudied approach to delimited files. I'll need to look at the "official" way of escaping entries.
2. Importing data
3. A few more variants of operating on databases, like "create."