There is much folklore surrounding data mining, as expert Bob Small, pointed out in an article entitled Debunking Data Mining Myths(1), "much of what is attributed to data mining is incomplete, exaggerated or wrong". One early myth he does not mention was the assertion that, with so much data available, most market research surveys would become redundant. Another is that data mining is somehow always connected with using a data warehouse and not with the kind of survey data we work with. Data warehousing involves a large scale investment in creating a replicated and cleaned set of data from different sources within a company's systems which is then "mined". The analogy is of sifting through large amounts of data for the few nuggets of gold that you want.
Data mining is a battery of techniques for carrying out transformations on data as an aid analysis and interpretation - and it only requires that you have a lot of data, not a data warehouse. The Parallel Computer Centre at Queen's University Belfast lists the principal approaches in data mining as clustering, data summarisation, learning classification rules, finding dependency networks, analysing changes and detecting anomalies. As market researchers, we are more familiar with these as factor analysis, cluster analysis, tabulation, regression, correlation and correspondence analysis. Is this just a case of old wine in new bottles? Have we been mining our data for years without realising it? The answer is probably not, because data mining actually takes analysis a stage further.
Doug Dow is VP in charge of data mining at SPSS. He has been applying data mining techniques to market research, and encouraged several large clients in the USA to make the philosophical leap to this way of working. He told Research "From traditional data analysis, you pull data out of its source, transform it slightly, and produce your tables. The first shift most researchers took was going for output you could cut and paste directly into your report. Data mining takes you a whole new step in that direction. You can apply the results of your analysis to creating a decision tree, write the rules for how people will be have and score this in your database, visually edit and clean your data. It challenges the traditional way of doing research - and it does create a nervousness with some researchers."
Doug Dow's claim is that researchers could use SPSS Data Mining solutions to transform not only their data but the whole way they approach their data. The core SPSS product and their Chaid analysis product are widely used by researchers, others such as the AnswerTree decision making and grouping tool are likely to make greater inroads into the market research community once users realise their power in sifting through a mountain of results to come out with the answers - whatever they might be. He spoke excitedly of the potential offered by the "bi-directionality" of these tools, whereby you can aggregate data or follow it through to the individual. He talked of the ability to perform "data brushing" to work on outliers and even modify their values. For instance, SPSS offer sophisticated tools for performing missing value analysis and ascribing answers to non-responses.
Doug Dow continued: "Some may argue that the traditional cross-tabular tool is passé and that newer technology is superseding it. But the staying power of the survey is the intentional data which you just can't get from any other source. Our clients in the US have been challenged to develop techniques to collect behavioural and intentional data and merge this in with data from other sources in the organization such as sales, marketing and customer service data."
Martin Callingham, Group Market Research Director for Whitbread, finds data mining tools save him a lot of time and effort. He said "It requires an entire philosophical change because you are able to work at an individual not an aggregate level. What you are doing is you are trying to get at individual relationships in your data. I use the analogy of the sculptor who does not know the lie of the rock in advance, but as he chisels away, the form, the thing the sculptor is creating emerges from beneath all the layers. What is strange is that you don't seem to produce very much in the way of reams of output, because you only produce what you need." Martin Callingham uses the tools to steer him in the direction of what is interesting in the data. This is true data mining. He explained how he will often perform a one-dimensional cluster analysis on a key variable just as a simple way of breaking it out into groups, in the same way that traditionally you would divide it by percentage groupings or quartiles. He does a lot of work on derived variables. In the end, he is able to produce charts and some simple tables, so that superficially the end product of his analysis looks similar to that of a traditional researcher - but the way he got there is completely different.
He continued: "For the most part, market researchers use computers in a cosmetic way to make their tabs look more pretty. With SPSS you can really do something different. I am not a statistician, I am just an ordinary researcher who has bothered to get his head stuck into these things. The hardest thing is to get your brain into a completely new way of working."
But before you jump to the conclusion that data transformation, decision trees, data brushing are simply the latest and faddiest ways of fiddling the data, think back to your first reaction when the technique of weighting was explained to you. These tools also have a role to play in bringing the meaning out of data, and deserve our attention.
(1) Information Week, 20 January 1997
Tim Macer provides independent advice and training in market research software applications. Tel. 0700 4MACER (462237). Web: www.meaning.uk.com
SPSS Market Research 0171 625 7222 or www.spss.com
Two Crows Corp. www.twocrows.com
© Copyright Tim Macer 1998. All rights reserved. Reproduced with permission.