Time to drain some data lakes
I was interested in this report on the recent TechFire webinar on sustainability. Fittingly, the discussion was centred around storage because that is, to my mind, a big offender when it comes to waste.
Mike Roan, EMEA senior director of systems engineering for Pure Storage, suggested that the shift away from hard disk drives to solid state drives would lead to savings on power consumption and emissions. More significantly, he added that the move by many companies towards data reduction would help reduce the environmental impact of storage arrays.
Ah yes, data reduction. I wonder why companies are looking at that now after years of relentlessly spewing out ever increasing amounts of data? I mean, some of them have generated so much data over the years that they are now the proud owners of ‘data lakes’. As far as I recollect, there were quite a few years where no one felt that having so much data was a cause for concern or condemnation. Far from it. Organisations that generated enormous amounts of data were lauded for gaining a massive information advantage over others.
Of course, the storage vendors and the ecosystem were only too happy to keep on supplying hard drives to store all that data. Even though, for some organisations, it was work enough to have the storage in place to keep pace with the data they were generating.
The problem with having so much data is trying to ascertain how much of it is useful and getting access to it. It stands to reason that an awful lot of the data being held in those lakes has to be more or less useless or of so little value that trying to gain insights from it is probably more expensive than deleting it. There were some who argued it was still possible to mine some value from that data but for most people the economically and environmentally sound response should probably have been: “Who cares?”
A lot of people like to equate data with gold but there’s quite a bit of it that’s just rubbish. Generating and storing that data has an environmental cost. Up until recently, no one has cared that much about it.
After all, it’s not that long ago people used to reply “throw more storage at it” when confronted with the problem of data proliferation. From a sustainability perspective, they were failing at both ends in terms of data generation and data storage although, to be fair, you could legitimately call it a circular economy. Just not a particularly virtuous one.
Perhaps we shouldn’t be surprised. After all, the approach adopted by the industry to data storage and the huge rise in the volume of data has been to try to reuse and recycle first. Only now is it looking to reduce.
I accept that trying to eliminate valueless data when it is generated requires a lot of time and effort but there’s something to be said for doing so from the perspective of trying to limit the effects of data pollution on our world. It might sound a bit extreme but I use the word ‘pollution’ because it helps to concentrate our minds on the environmental costs of storing so much useless data.
It’s also helpful in potentially taking the discussion on storage sustainability beyond the issues of carbon emissions and the environmental cost attached to the materials contained in storage products. Maybe the next step is to consider how we deal with data emissions?