jump to navigation

Own your Data 11/29/2010

Posted by TBoehm30 in Data.
Tags: , , , ,
add a comment

“There is too much data at my company to be useful.”  “The data we have is old and out-of-date.”  “Users around here don’t enter good data.”  How often are these statements said or thought?

Data is important and should be controlled properly.  The only way to ensure that you have good data is to make someone responsible for it, and have them own the data.  They need to use the data as well as have authority over the information put into the database.  Only when someone relies on data will they take an active part in making sure that the data is valid.

I once had a meeting with users who didn’t like the process of putting in a caller’s first name and last name in the text boxes provide.  They wanted to put all of the caller’s information into the notes section.  I was supposed to figure out how to make that work.  You can’t report on notes, you can’t sort on them, you can’t aggregate them, they are practically useless as a management tool.  I had to tell the users to get in line with the processes of the day and fill in the caller’s information in the proper location on the screen.

Do you have documentation on how data flows through your company?  Do you know where your data comes from?  How good is your data?  How often is it refreshed?  Do you have duplicate information?  What about duplicate sources of information?  If somebody came to me and told me that a report was wrong, I can trace it back to the source to figure out the problem.  Maybe he is looking at an old report, maybe the data is coming from the wrong columns, and maybe a data point or two are bad.  Whatever the problem, I know how to find the original data, determine the refresh rate, evaluate the quality of the data and explain any variance.

Once data gets old, it becomes difficult to validate.  You will need to de-dup your data, or find and remove any duplicates.  This is impossible if you don’t have good data.  How will you know if 2 entries with the same name are the same person or different people who just happen to have similar names?  I don’t know many people who can go through thousands of records to find duplicates and finish without going crazy.  If you find someone like that, keep them around.

Software exists that can do a lot of the validation for you.  You might have to give up control of your data to use it; or it might cost a lot of money.  The last time I looked at that kind of software they gave us percentage rates on the validity of the results.  I wouldn’t use the cheap stuff that had less than an 80% chance of being right.

I once worked with a guy who had to de-dup a huge database.  He explained that very few people with the same last name were born on the same day.  So, if you had their last name and birth date, you could be reasonably certain of duplicates.  Of course, twins and multiples are a problem.  In that case you add in the first name and it is extremely rare to find duplicates that were in fact more than one person.

He told me about twins with the same first name, but different middle names.  Why would a parent do that?  He told me about a time when he had a name with two addresses and two Social Security Numbers that were one digit off.  He just knew that the SSN was a typo for one record, but he couldn’t prove it.  He had to list the two records as two people and it was bugging him.  So, he drove by the guy’s house to see if the addresses were correct (he wasn’t allowed to contact anyone because of privacy).  Sure enough the addresses were on a corner and looked to be the same house.  What could he do about it?  Nothing – he was not allowed to fix the data.

What sort of data issues do you wish you could fix at your company?  Do the right people own the data and take care of it?  Do you have governance to control the quality of the information?

Companies need to value their information, validate it often, and use it to their advantage.  After all, it’s a global world out there, and Technology makes it happen.