Online research fraud

Netting the Online Fraudsters

From Research Sector Analysis: Online Research, June 2008

By Tim Macer

Rising concerns over internet survey fraud have led to the development of new software that promises to catch rotten respondents red-handed

Interview fraud is hardly new to the industry, as any fieldwork supervisor will attest. Yet the revelations from conferences, published papers and the press over the last four years of respondents who are on the take – and on a massive scale – has worked the industry up into a froth that ranges from moral indignation to schadenfreude, according to who you are talking to. Increase the ease by which it can be perpetrated in the anonymous context of the Internet, combine this with temptation in the form of panel incentives, do little to monitor or prevent it, and is it any wonder that fraud blossoms? Neverthless, the estimates Comscore revealed at the 2005 MRA conference are still shocking: that 0.25% of online respondents accounted for 30% of all online interviews collected.

Not that some multiple panel participation is necessarily a bad thing – despite some long held views about ‘over exposure’ of respondents to research. As with all crime, it is the intention that matters, and when panel members become hyperactive, as a number of studies have shown, they substitute truth for whatever answer is most advantageous for them to be rewarded for the least effort. Bear in mind that some ‘respondents’ may be no more than computer programs written specially to complete surveys automatically by generating junk responses that fit the form on screen. Unchecked, this drops straight through to the reliability of the responses. Knowledge Networks has concluded the mean error in online surveys can be as high as 21%. Just as there are countless anecdotes of individual respondents who have been uncovered with 100 or 200 different panel memberships, so too have the examples been circulating of online surveys with different access panel providing absolutely contradictory results

Ali Moiz, COO at Peanut Labs, one of the first firms to develop commercially available software specifically aimed at combating internet survey fraud, remarks: “I do not think the sky is falling in, but I do think there is a data quality problem that the industry would be wise to address.” Realising that the solution to a very large problem lies with the quarter per cent that are making up answers makes the problem manageable, his firm has developed its Optimus software for research companies and panel operators that will identify both fraudulent responses and respondents. There are other, parallel initiatives taking place. Markettools has announced its TrueSample initiative in April month. This is a software-based approach that combines duplicate panel member identification technology with other fraud detection measures which it is applying to all its own panels. The German panel specialist Mo’Web has developed several lines of defence of its own, and is taking out a patent on its own ‘fingerprinting’ technology to identify individual PCs that are being used repeatedly.

Common to each initiative is a three-stage approach to fraud detection and prevention.

First, is to fingerprint individuals by recognising the PCs they use to complete surveys. This is not as simple as it first seems. IP address, the identifier used by the Internet – the web equivalent of phone numbers – suffers many of the same problems – it can be masked, and the same one may be shared among many user in a company. Worse, a domestic broadband user will often be given a ‘dynamic’ IP address by the ISP, in order to save cost – which means the number will vary from day to day. Cookies are also virtually useless, as many users delete these on a frequent basis, and a fraudster certainly will. Instead, fingerprinting relies on finding other ways to pinpoint what might be unique about any particular PC.

Herbert Höckel, joint Managing Director of Mo’Web, explains: “A web browser has a talkative discussion with any PC as to what resources are available. Based on that this information, we can profile a browser and identify it. In itself that is not quite reliable enough to identify individual PCs, but we also obtain information about the connection from German telecom and by using that, we can work out which node has been used to access the internet. This is physical information which cannot be manipulated easily.”

However, there are limitations in this approach. Two or three family members may legitimately share one PC. Students at college using shared machines or users of internet cafés would certainly be using a PC that had already been fingerprinted, and are likely to be excluded.

Moiz comments: “You may choose tolerate a small amount of duplication. It is always delicate balance.”

The second tier of defence is to look for fraudulent behaviour when participants are completing a survey. Key traits to detect are the speeders, who rush through the survey suspiciously quickly; straightliners, who always pick option 1, or click the same option in grids and scale batteries, innatentives, who don’t really seem to care what they say and provide illogical answers, satisficers who provide answers that merely satisfy your logic checks but are otherwise meaningless. Any of these behaviours may be displayed by both a fraudulent respondent and an automated script that completes surveys and provides random answers.

Herbert Höckel’s advice is to “clean your panel on the basis of what is not very likely.”

For him, the first stage of identifying duplicates in their existing panel was a revelation. “We immediately kicked out 10% of our panel on duplicated IDs,” he recalls. Mo’Web now checks for duplicates at the time when a panellist applies to join. In parallel, it monitors carefully the behaviour of respondents on any surveys where it is engaged to do the online fieldwork. As a result, Höckel reports an observable improvement in data quality.

“The experiences with our clients have been very positive,” says Höckel. “Before, we would come in, in the morning, and see that a survey had gone well and had reached all its targets, but a week or two later, a client might come back and say to us that responses were inconsistent. Now, that this kind of negative feedback from clients has been greatly reduced.”

The third line of defence is perhaps the most problematic. Insurance companies now routinely share information on claims as an effective means to combat the crime of individuals making multiple fraudulent insurance claims. Pooling information on hyperactive survey takers could bring similar benefits to the research industry, but it relies on research companies sharing information with one another.

Offered as a part of Peanut Lab’s Optimus solution is the wittily named “Respondent Rehab Database” – a sin-bin for multiple offenders. This allows users to share data on respondents that are hyperactive, and even for respondents to redeem themselves eventually, once better behaviour has been detected. No actual personally identifying data is involved – it only works at the respondent fingerprint level. However, it is the most powerful defence, if put into play, as it allows the researcher to identify hyperactive respondents who have signed up to all the panels.

“It is optional if our clients wish to share and contribute information,” says Moiz at Peanut Labs. Clearly, not all do. Similarly, Mo’Web encourages clients who purchase sample for fieldwork they conduct themselves to build in tests for fraudulent behaviour and, at the clients option, to flag up respondents exhibiting suspicious behaviour when reporting on sample usage at the end of the fieldwork. Disappointingly, Höckel reports that customers have been slow to adopt this security measure. It does, after all, involve a little effort. Furthermore, having another reason to exclude respondents is likely to add slightly to fieldwork times and costs.

There is only so much that technology can do to prevent professional respondents from invading panels and adulterating survey results. Ultimately making respondent fraud a crime that doesn’t pay depends on the will of the industry to bear the cost of better security

10 simple techniques for combating online interview fraud

  • Include timestamps at the beginning and end of each interview; compare the length of the interview with the expected length or, once there is sufficient data, the average length. Suspect those that complete in less than 50% of the ‘normal’ time and exclude those who complete in less than 10% of the time.
  • Introduce a few redundant questions at opposite ends of the interview, such as age group and year of birth, or re-ask selected verification data already held in the panel database. Look for discrepancies.
  • Look out for those who repeatedly select option one, or who select the same answer every time in a battery of questions (“straightlining”).
  • Trap speeders and inattentive respondents in grids or statement batteries by adding in a trick question which simply requests “For verification purposes, please answer this statement by selecting the answer slightly agree”.
  • Look for other examples of inconsistent or illogical answers and for unlikely values in numeric questions.
  • Look for open-ended answers that contain rubbish – it could be only a single character (or maybe as few as 3 letters) or made up words.
  • Exclude the respondent from your survey if there have been two or three violations of any of the above.
  • If you are maintaining a panel, introduce a new interview outcome code of ‘suspected fraud’ to sit alongside complete, incomplete and no participation outcomes. Suspend panelists once the count exceeds your acceptable limit.
  • If you are purchasing sample from a panel supplier, feed back to them the panel IDs of suspect interviews and the reasons why, and request free-of-charge replacements.
  • Compare the performance of different panel providers across your different surveys, using common fraud indicators such interview length variance and obvious straightlining.

Article published in research, June 2008. Copyright © 2008, MRS/meaning ltd.Reproduced with permission. Further reproduction without permission is a violation of copyright.