Data Topics: Dirty, Clean, Messy and Actionable
It is so easy for any business to accumulate years and piles worth of data. Some of it is obvious as it stacks up visibly within the corners of office floors; some extending upwards to the ceiling. Other data is not so obvious as it simply accumulates within the folders, documents, lists and hidden shared drives on our systems or the dreaded Outlook folders. Regardless, our best intentions are that “we may need ‘it’ for a rainy day.” In most cases, that rainy day seldom arrives and we keep doing what we have always done. Our motions become almost second nature or subconscious; with the folders already defined, we simply file the next set of daily info accordingly and go about our daily routine. When will ‘one day’ finally arrive when we step off the treadmill of mindless filing to really see what we have accumulated.
My garage is a case in point as I have limited space to accumulate scrap wood from past wood projects. I like to keep scrapes as I may need them some day for a similar project. The point is that the wet day seldom arrives; or arrives in a varied state such that I am not able to use the saved wood stock. Good intentions. As the saying goes, the road to crazy is often paved with good intentions.
So, am I saying that one of the keys to reeling in the extensive data storage challenge is to limit the storage available? Although that sounds good, it’s probably not a long-term strategy. Storing and managing wood scrapes is a little bit different animal from that of data. The main message here is that we need to have a strategy and be intentional about managing to it.
To understand what dirty data looks like, we often need to understand what clean data looks and feels like. The quote I often use is “… people don’t know different until they see different.”
Another quote that may apply is “… Perspective is not everything, it’s the ONLY thing.” We often need to force ourselves to change position, to change direction, to use light in a new way or move ourselves outside of our comfort zone to get a new perspective. We don’t improve anything by sitting still.
We need to see our data from a new perspective to appreciate it’s value or potential. How can we ‘clean’ up our data to arrive at a position of knowing what to throw out?
The Marketing perspective might be to augment the existing data with additional demographics from 3rd party vendors to arrive at various methods for segmenting the data. It might also be advantageous to cleanse any applicable address data for consistency and correctness. The results may then be associated with various business entities (ie; groups, sectors, lines) and objectives to determine if there are any relationships or statistically significant identifiers.
The Legal perspective might be to retain only that data that is required for the success of the business. Retaining data just to retain data may obligate the business to unfavorable fines in the future. For example, the more data retained, the more lawyer expenses that may be incurred to sift thru the mounds of data in the event of a future lawsuit. Review the legal requirements for your business engagements and adjust the data parameters and filters for corporate advantage and success.
The HR perspective may to retain only those employee related communications (ie; emails, IM’s, conference recordings) that relate to critical business projects, engagements, contracts, etc. All other communications of a personal nature should be scrubbed off the corporate servers as soon as possible. There is no need to retain data that has no business value. And similar to the Legal perspective, it’s often better to aim for less data than more; especially, when it concerns personal matters.
The topic of messy data is interpretive. One person’s definition of messy is another’s definition of organized. Messy requires that there be a set of criteria from which everyone in an organization is aware and adheres. Without such criteria, everything may actually be classified as ‘messy’. The important thing here is to just get started. Write something down in black and white that can be reviewed, tried, tested and updated over time. Make these criteria or rules visible throughout the organization and encourage feedback and improvement from all business entities.
And last but not least, with the prior three (3) facets covered, we hope to arrive at the all important facet of making our data actionable. We have clean data with well-defined criteria for managing it over time. We now need to add the means to further segment the data by business need and build applicable processes for marketing, engaging and supporting our current and future clients. Whether that be thru traditional means (ie; direct marketing, emails, mailings) or thru online social engagements, the all important step is the first step. Start today to make use of the data you already have in your business systems and continue to be intentional about managing future data so that it is useful and supports the business objectives well into the next quarter and year.
As defined by many throughout the Big Data industry now, there are mainly two (2) means for classifying data:
– structured (documents, forms, lists, pages)
– unstructured (Facebook, IM, Twitter, online forums)
There are many 3rd party tools available to assist with data augmentation, segmentation and execution. Whether you decide to use Hadoop, Microsoft suite of BI tools, Oracle tools, etc. doesn’t really matter to me; get started on the road to squeezing more value from the data you already have. Augment to existing data as needed, establish your CSF’s and KPI’s and track for improvements over time.
This topic is so huge that I simply glanced the surface with this article. Hopefully, I peaked your interest enough to keep your eyes open to other perspectives.