Data waves

Big Data, big problems

Blogs
Source: Stockfresh

8 January 2016

Paul HearnsWe tend to think of Big Data and analytics as tools with which to discern intelligence from data, often where that data has not been utilised for such insights before.

This is generally thought to be a good thing, as too often data is stored only to be let go ‘cold’, its insights unexplored as we lacked the tools, time or compute power to do anything with it.

Now, with a surplus of nearly all of these things — well not time — analytics reveal insights of all kinds. But what if there was the possibility for some of these insights, to be less, eh, insightful than we at first thought?

“Big data analytics may lead to wider propagation of the problem and make it more difficult for the company using such data to identify the source of discriminatory effects and address it,” FTC report

The US Federal Trade Commission (FTC) has released a report entitled “Big Data: A Tool for Inclusion or Exclusion?” which examines scenarios under which big data could become a tool for exclusion, leading to consequences from poor business outcomes to more serious situations with legal ramifications.

The report discusses whether there are “ethical” or “fairness” concerns to businesses relying on or making major decisions based on big data analysis.

It says that “Companies should assess the factors that go into an analytics model and balance the predictive value of the model with fairness considerations.” By way of example, it highlights one company that “determined that employees who live closer to their jobs stay at these jobs longer than those who live farther away. However, another company decided to exclude this factor from its hiring algorithm because of concerns about racial discrimination, particularly since different neighbourhoods can have different racial compositions.”

The report also warns against bias or inaccuracies that may have been included in the way data was gathered, and the potential problems this may have if such issues make it through to the analysis phase. It then gives an example of instances where credit card customers had their limits reduced based on “analysis of other customers with a poor repayment history that had shopped at the same establishments where the customer had shopped,” instead of being based on that customer’s own credit and purchasing histories.

The report highlights concerns that not only can big data fall prey to old problems with data analysis, but that it has the potential to have a greater effect than previously.

“Although the use of inaccurate or biased data and analysis to justify decisions that have harmed certain populations is not new, some commenters worry that big data analytics may lead to wider propagation of the problem and make it more difficult for the company using such data to identify the source of discriminatory effects and address it,” says the report.

It goes on to talk about the danger of “meaningless correlations” and the old chestnut about the difference between correlation and causation.

It says “while big data may be highly effective in showing correlations, it is axiomatic that correlation is not causation. Indeed, with large enough data sets, one can generally find some meaningless correlations.”

It supports the argument with the example that in eighteen out of the past twenty US presidential elections, if the Washington, DC professional football team won its last home game before the election, the incumbent’s party continued to hold the presidency. “Other examples of spurious correlations abound,” it states.

“If companies use correlations to make decisions about people without understanding the underlying reasons for the correlations, those decisions might be faulty and could lead to unintended consequences or harm for consumers and companies.”

Once again the report very effectively highlights the need for proper analytics and human insight to be applied to any results to ensure that spurious correlations are not taken to be causations upon which action is taken. While big data has yielded incredible insights that may have previously taken so long to produce as to be no longer relevant, the old cautions and safeguards are just as important as ever.

The fact that the FTC has taken the time to produce the 50-page report shows the importance of the issue and the examples show that some organisations are already falling prey to the bad practices.

While there is no doubt of the value of the analytics, old lessons, hard won, cannot be ignored in use.

Read More:


Back to Top ↑

TechCentral.ie