Bigger collections, bigger threats

The recent discovery of vast collections of stolen identities are perplexing. Just what are the blackhats up to?

Blogs

(Image: Stockfresh)

14 February 2019

Whenever any major security breach occurs, especially those that say ‘sophisticated attack’, there tends to be something simple that opened the way.

Whether it is an unsecured access point allied to little or no internal segmentation, hacked CCTV that picked up administrator credentials on a Post-It note, or a carefully crafted phishing attack for which someone who should know better fell prey, there is usually something relatively minor behind it.

However, that could change if certain concerns about hacking developments come to fruition, and just recently, they moved inexorably in that direction.

“If you could look at a set of names, email addresses, passwords, join dates, password change dates and then you were able to match those to geographical data, cross-referenced with platform details and current affairs, one might be able to discern some very interesting information”

This hack had blogged on TechCentral.ie regarding the startling, though somewhat perplexing, discovery by HaveIBeenPwned.com owner and operator Troy Hunt of a massive tranche of hacked emails and identities known simply as Collection#1.

This nearly three quarters of a billion-item haul was on sale for a paltry $60 or so, and was made up of collections of email addresses and identities collected from more than 2,000 previous incidents.

While many raced to offer advice as to what to do in case your email address, or several of them, appeared in the haul, others began to speculate as what might be the purpose of such a collection.

The sheer quantity of data set this out as different from previous lists and set speculation in the direction of different things.

Training data
Personally, I was reminded of a particular comment from a panel discussion. Trend Micro’s senior threat researcher, Bob McArdle, speaking at the Cyber Security Skills conference, was commenting on whether hackers were using artificial intelligence (AI) in cyber attacks. He said from experience, AI needs masses of data to be trained properly and that is not something that the blackhats generally have either the time or the inclination to amass. Hence, in his opinion, we were still some way off seeing AI being used in such circumstances.

Suddenly, the Collection#1 haul started to look a little less perplexing. Then, a mere week or so later, it emerged that researchers at the Hasso Plattner Institute have revealed Collections#2-#5, representing up to 2.19 billion e-mail addresses and passwords.

Initial analysis has shown that some 611 million of the credentials in Collections#2–5 were not included in the Collection#1 database.

Once again, one has to ask why the blackhats would be going to such trouble to amass such large quantities of data.

General intelligence
Looking at this pragmatically, we have seen how fears for artificial intelligence taking over the world are really unfounded, and that a general artificial intelligence is really still some way off.

However, what AI and machine learning (ML) are really good at is spotting patterns, repeated patterns and evolving patterns. It then gets really good at predicting patterns.

So let’s speculate for a moment. In the current world where social media has turned out to be a powerful tool, when carefully manipulated, to influence public opinion, being able to spot patterns in masses of data becomes even more powerful.

For example, if you could look at a set of names, email addresses, passwords, join dates, password change dates and then you were able to match those to geographical data, cross-referenced with platform details and current affairs, one might be able to discern some very interesting information.

Say for example, you were able to discern that people from a certain area, social strata, educational level or household income reacted to news of a potential data breach involving their identity in a certain way, then that would be valuable.

Let’s say, on average, only 33% of people actually changed their password after a major breach notification, or that only 22% of people enabled two factor authentication, or that more than half of people took no action at all — all of this could materially inform any strategy that might involve nefarious use of stolen or spoofed identities in a mass campaign to influence public opinion in an election, a referendum or a popular political movement.

Furthermore, if you were to learn the basic patterns of people’s password use, you could use that to make far more intelligent password cracking technologies, that could do it in say three guesses – the average number of attempts that most systems use.

Value driven
An AI system attuned to common password uses, not the common passwords, but the patterns by which people make them up, particularly if this could be correlated with the importance of the service being used, would be of great value on the black market. If an AI was able to work out that for a simple social media service, people composed passwords within the restriction levels in certain ways, and that escalated with the importance of the service right up to something like a financial service, such information could be used to profile users, reducing the amount of time it would take to crack a password at any level.

As we have seen from other examples, the blackhat world has become increasing professionalised, offering all sorts of services, with the assurances of escrow, user testimonials and 24 x 7 support. Now it would seem that AI platforms to craft tailored attacks as part of mass campaigns might not be far off too.

The old adages of going for the weakest point seem to be fully at play here.

Instead of trying to use AI to crack the security systems themselves, the attackers seem to be amassing personal data, data about the human — the slowest evolving element of the equation — to begin to harness the power of these emerging technologies for criminal gain.

Towards an end
While it is early days yet, and there is no direct evidence of these large caches of data being used in this manner, the speculation is still valid. If security professionals are of the opinion that this is what the materials could be used for, one can be fairly certain that somewhere, there is a hacker working toward that end.

But one important point to make here is that we must not start thinking of this as yet another ‘humans are the weakest link’ issue. The first step is to inform, educate and encourage good password and identity management practice. Secondly, the onus is on the industry to respond with usable, accessible and available tools and services to protect people, not blame them.

The other thing is to be ever vigilant for signs of influence, hijack and theft of identity.