Monday, March 24, 2008

Twining Updates

{

What's the big picture and why are you doing it?

A comment from my last post got me thinking I need to get back to the big picture. Robert wrote:

Wow. That looks like a hard way to solve your problem.
If you absolutely have to mangle Excel data through Python and into a database... can I recommend Resolver? It won't cost you anything and it might just solve your problem.

I'll post about Resolver in a second but I want to go back to my original thoughts on developing Twining. I spend a lot of time doing quick and dirty types of data import and export.  Sometimes it's grabbing stuff off a production system so that I can develop off of it. Other times we get a simple file from someone and we need to pull it into the database.  Still other times we move data from one location to the next because we've done staging or cleanup. 

Traditionally the Microsoft tools for this are quite heavy.  DTS, which I used a lot, and SSIS which I haven't used much but have heard and read about are both GUI/IDE approaches towards the problem.  SSIS in particular isn't an out of the box solution for a developer like myself who doesn't always install every iota of SQL Server tool there is to install.  Usually I can express my problems in a simple sentence:

"Copy the data from this table to that table."
"Build an insert script for this table's data"
"Import this CSV file to this database"

I wondered why language couldn't be used as a tool for this type of work rather than launching a big GUI tool or writing throwaway C# console applications for the job.  That's the origin of the thought, cultivated when I heard Neal Ford give a talk at CodeMash on DSLs.  A single "sentence" lodged itself in my head and I thought, wow, that would be a nice solution:

database("foo").tables("bar").copyto.database("biff")

I have to admit to being more of a gardener than an architect when it comes to these things so I started a notepad file with some "forms" I thought would work and one friday afternoon, after being burned out on regular work I started what I call "sketching"; seeing how things might work. Since then I've been gnawing on it in a public way since learning Python and thinking in a Pythonic way are probably even more important than my little module which may end up just being fabulous meware.

I had posted that I had a little database import project of about 110,000 records distributed over two excel files.  The story I got was exactly the one I envisioned:

cn = "Data Source=.\\sqlexpress;Initial Catalog=MyDB;Integrated Security=SSPI;"
colmap = {"Field1":0, "Field2":5, "Field3":7} #destination:source - you may want to use the same source multiple times...
database(cn).table("DestinationTable").mapcolumns(colmap).importfrom.csv("C:\\temp\\widgetCo\\update1.csv")
database(cn).table("DestinationTable").mapcolumns(colmap).importfrom.csv("C:\\temp\\widgetCo\\update2.csv")

I will admit it wasn't perfect the first time - my lazy use of string concatenation rather than the .NET StringBuilder was a huge penalty (~10 minutes). I rewrote those pieces and the whole process took less than 30 seconds. Cheers for ADO.NET on that one - the updates were _massive.

So, back to Resolver; I read a glowing review from Larry O'Brien (his column is the first thing I read in SD Times when it crosses my desk) and the comment propelled me to watch a few screencasts and dig around but the goal of the product wasn't quite in line with the story I recount above.  Perhaps it's possible but it seems more like an interface in which one dwells on providing complex functionality in the spreadsheet rather than just quick and dirty import/export business.

More updates forthcoming but a short note: I spent a good part of the weekend trying to read through the code of ConfigObj 4 courtesy of Michael Foord (aka "Fuzzyman" aka "Voidspace").  I'll dwell on it further soon but one thing I realized I'd done that was a big mistake is to break the project into lots of files. I put them back together again and am in the process of figuring out how and where to host them. CodePlex/Google Code? Not sure if it's ready for that yet.  My own abode t3rse might work but I'd like to figure out how to generate documentation before I do that...

Enough said, here is what's latest, details and unit tests tomorrow.

}

1 comment:

Michael Foord said...

"trying to read through the code of ConfigObj 4" - what are you mad! That code wasn't meant to be read. :-)

The parser at the heart of ConfigObj is pretty nasty, but the rest of the code should be ok. A lot of the methods are too long though, the code should definitely be more modular.