Normalization — It’s not Your Friend…or Your Enemy: Dataversity Webinar

As she does on a regular basis, my friend Karen Lopez ( blog | @datachick) is hosting a new webinar this week hosted by Dataversity. The topic, as the title of this post suggests, is about the good, bad, and craziness of normalization. The event is this Thursday at 2:00p Eastern Daylight (GMT -4).

Why am I sharing this? Well, I’m going to be there, too, playing the role of sidekick, because Karen’s the one that actually knows what she’s talking about 😉 . These webinars are always a good time, and you usually learn something, to boot.

It is free to everyone, but registration is required. More information and a link to register is available on this page.

If you join, stop in early while we do some audio checks, hang out and chat a bit beforehand. It’s a fun, informal time before the webinar proper starts. Stay tuned in via Twitter, as well. Monitor the #heartdata hashtag to participate in the conversation.

Hope to see you there!

Big Challenges in Data Modeling: Ethics & Data Modeling

From the “There’s a first time for everything” file, I can announce that I’m going to be joining an online panel discussion this Thursday (ie, tomorrow), April 24 at 2:00p EDT (11a Pacific). I know!

Topic

This discussion will be about Ethics and Data Modeling. It’s part of a monthly series put on by Dataversity covering Big Challenges in Data Modeling.

We’ll cover questions like what to do when asked to do something “wrong” (and maybe what the definition of “wrong” is in the first place) and if there are any items in particular that a data modeler/someone doing that task need to be especially aware of. Although these questions apply to anyone in the data field—or anyone in IT or business at all, for that matter—this conversation will be focusing on how they apply to data modeling specifically.

Details

Participating will be Len Silverston, Denny Cherry, and Tamera Clark, with the whole apparatus MC’d/hosted by Karen Lopez (the one and only DataChick).

The broadcast is free, but you do have to register to get the sign-in information. That can be done at the webinar’s main announcement page (look for the round “Click to Register” graphic), along with reading full bios for all of us.

In addition to the Q&A and participant chat that will be going on during the discussion, you can follow the #BCDMOdeling hashtag on the tweeter. We’ll all be watching that as well.

Sign up, come out, ask some questions, and generally have a good time. Oh, and probably learn something, too. Can’t forget that.

T-SQL Tuesday #21: “This Ugly Hack is Only Temporary”

…Unless “nothing is temporary around here” also applies to your shop. Then…well…good luck with that (I feel your pain).

Wednesday is better than Tuesday anyway; everyone knows that

It’s been a while since I’ve participated in a T-SQL Tuesday; haven’t been writing at all, really, as you can see. Summer apparently does that to me. If you’re reading this, then it means I got it shoehorned in.

T-SQL Tuesday is the brainchild of Adam Machanic (blog | @AdamMachanic) that started, apparently, 21 months ago. Its purpose is for bloggers to get together and cover a particular topic every month. This is good for readers and also is an easy way to get a blog topic, if one has a hard time coming up with them (like me). This month’s topic is hosted by Adam, and has been billed as “revealing your crap to the world.” There will be lots of good stuff to read this month, calling it now.

RI Enforced by aTrigger? Check.

I’m going to preface this one by saying that it never actually saw the light of day. Sometime between Design & Development, the project that it was part of got canned, so the Development instance was the only place this was ever deployed to. But, it was still crap, and it was crap that I came up with, at that, so here we go.

There once was a big ugly “Location” table. This table was way wider than it should have been, mainly due to the fact that “normalization” was a term missing from the historical evolution of the system it was a part of. As you can imagine, this table contained a lot of character columns of varying widths (up to and including TEXT). To make matters worse, in lots of cases, the UI allows free text entry, which makes much of the contents suspect, at best.

A project came along that basically would require normalizing out a couple of columns related to a certain location attribute. For the sake of discussion, this could be just about anything—we’ll use “Preferred Shipping Carrier” as an example.

At the beginning of design, this “PreferredShipCarrier” column is full of shady entries. There are some “UPS”es, “FedEx”es, and “None”s, but there’s also a lot of NULLs (which shouldn’t really happen), “N/A”s, and “UPPS”es, all of which are incorrect for one reason or another. Obviously as part of normalizing this column out, there would need  to be some corresponding data cleanup, but that’s beside the point.

Where things go south is how this was to be implemented. Since the column in the table is VARCHAR, a real Foreign Key relationship can’t be set up to the INT Surrogate Key in the new “PreferredShipper” table. This is the first sign of trouble. Second sign is that we can’t change the UI—it has to stay in its free-text entry state. This leaves the database exposed to bad data, as there can’t be RI to enforce valid entries.

Enter a Trigger. This can be used to, upon an attempted row insert into the Location table, look up the value entered for Location.PreferredShipCarrier, and throw an error if it doesn’t find a matching row. I hated it, but it got the job done.

Auditing with Triggers, too? What is up with you and Triggers?

SQL 2000 was terrible for auditing—it didn’t have all of the new fancy stuff that we have now for this sort of thing. That means, when you’re dealing with 2000, and the business wants an audit trail of who changed what when, your options are/were pretty limited.

At one point, a requirement came along to implement audit trails on some tables. This means I needed to duplicate tables, adding some metadata columns to them (audit timestamp, who dunnit, what the action was, etc), and then put a trigger on each of the source tables. Some of these particular tables were busy, so the triggers & log tables got a lot of action and I hated having to do this to get what was needed.

If the trigger was fired for a DELETE, it would log the row that was deleted. If an UPDATE was happening, it would first check to see if any column contents were actually changing, and if so, log the previous state of the row to the log table. These triggers will grow relative to the width of the table (because they contain three explicit lists of the tables’ columns), so if the tables being audited are wide, the trigger will get pretty long. Additionally, since triggers in SQL Server are set-based operations, running on all rows that are being DELETEd or INSERTed, special care needs to be taken so they can operate when more than one row is operated on. This can make them a bit ticklish to write.

I know this isn’t necessarily crap, as when you’ve got to audit, you’ve got to audit. I don’t like it if for no other reason than the extra clutter it puts in to the DB and just the general idea of triggers all over the place. All manner of things can happen in triggers, and if you are operating in an unfamiliar DB or troubleshooting a goofy problem, things could be going on in there behind the curtain making troubleshooting harder.

In short: Friends don’t let friends mash F5 on CREATE TRIGGER.