30 Years in Tech: from >30 minutes to 3 milliseconds

Jos van Dongen
Nerd For Tech
Published in
5 min readFeb 24, 2021

--

Image source: Warp Speed Screen Saver

This somewhat cryptic title doesn’t refer to the progress made in hardware and software over the past 30 years, although it’s probably close as well. It refers to why (potential) customers reached out to me. In my years as an independent consultant there were many projects that started because of one, or both, of the following reasons:

  1. something takes too long
  2. something crashes while running

the ‘something’ in reason one could be a query, an ETL proces, a report opening; basically anything that takes an arbitrary amount of time that was considered being ‘too long’. Mostly it was business users complaining about this, and an IT or BI department not able to fix it in a satisfactory way. The ‘something’ in reason two was mostly an ETL process that crashed mid way.

And literally every single time, the problem was never in the used technology.

Never, ever.

But still, people insisted that it was the server capacity, the database version, the database brand, the operating system, the BI tool, whatever piece of technology considered to be the root cause of the problem. So that’s what they always pointed out to me: can you fix/upgrade/replace <fill in tech here>? Which wasn’t the problem in the first place! But of course, there WAS a problem, otherwise they wouldn’t have consulted me.

So what caused most, if not all, of these problems? Simple: bad design choices. Which were often caused by lack of knowledge, not having proper processes in place, lack of access to the right resources, or any combination of these. Of course, the technology itself can always be improved by upgrading software and/or extending/modernizing hardware, but that was usually a ‘nice to have’, the ‘icing on the cake’; never the solution in itself. To give you an idea, here are three actual cases I worked on, and I’ll start with the one that caused the most disbelief:

30 minutes to 3 milliseconds

We’re talking fall 2004 and a large telco operator running one of the top web sites of the country. Daily data volume processed was 3–5 GB of log and transaction files, database size > 300GB. You can run this on your laptop today, but 17 years ago these were serious volumes. The key queries (and thus: opening the main management report) took at least 30 minutes. Their own DBA’s couldn’t figure it out and explained that considering the size of the database and the type of queries, it was ‘obvious’ it took so long. I went back to the requirements and the original design, and had a look at their infrastructure and software stack. It turned out they were running on a full version of Oracle EE, which looked promising. So I asked them whether they had tried building materialized views to get the required result set, and got puzzled looks. They had literately no idea what materialized views were, or how to create them, so I showed them.

Jaws dropped. And so did the jaws of the management team when we showed them the results. No upgrades, no new shiny tools, just using and optimizing what’s already there. You won’t believe how many problems that will solve.

Crashing ETL process

This one was a summer project in 2007 at one of my oldest customers. I hadn’t been there for a couple of years but in my absence their own staff developed a pretty sophisticated ETL process using Business Objects Data Integrator loading a SQL Server 2005 database. There were daily loads (which ran fine most of the time), plus weekly and monthly aggregates. Especially the latter two took quite long and crashed at unexpected points during the process. Never at the same point and it always required a complete re-run. Again, no replacement of technology (that was fine), but redesigning the process did the trick. We divided the single, integrated flow into separate tasks and developed an overall control job. Whenever a (sub-)task completed it was checked as such and in the (now rare) occasion the process crashed, it could be restarted and pick up where it left off.

ETL processing taking forever, and crashing

The last case in this post took place in 2006 and was my very first healthcare project. Loved it so much that I wanted to get back into healthcare (which I eventually did), but that’s a story for another post. This one started out as a short assignment; whether I could figure out why the bi-weekly ETL process crashed more often than not, requiring manual intervention during the weekend. The job couldn’t even run weekly because it took too long to complete, usually caused by constant crashes and restarts. Tools at hand were Cognos Data Manager (ok, I’ll admit: that was a serious piece of crap…) and again Oracle EE.

Analyzing the ETL design took me several weeks. The first observation was pretty shocking to me; no incremental loads! They reloaded the entire data warehouse from scratch every single time. Then I asked how they preserved history. “oh that’s easy, history is maintained in the source system”. I then asked what happened when a record was deleted and how they would know about it afterwards. Silence…

Long story short: I concluded the thing was a total mess. For instance: staging tables that needed lookups from aggregate tables built much further downstream, creating a spaghetti of illogical dependencies. Oh, and of course there was no development/test environment. All development took place directly on the production environment. What possibly can go wrong, right?

Throwing everything away and starting over was the best advice I could give them, and that’s what happened. With proper initial and incremental loads, history preservation, a daily update, seperate development/test and production, and, of course, no more crashes. And no, we didn’t even replace Cognos Data Manager :-)

Wrapping up

The lesson here is pretty simple and straightforward: when you think some new shiny tool can solve your data problems, please think again. The technology itself is rarely the problem; more often than not it’s the way the technology is implemented and being used.

--

--

Jos van Dongen
Nerd For Tech

Analytics Advisor at SAS | Helping organizations derive value from (big) data and analytics | Every day is a school day!