Wondering where is your data? It’s probably cold on a hard drive

0

Sponsored

Robert Anderson’s tongue is only part of his cheek when he says, “The biggest thing for the hard drive industry was the smartphone. “

Extending the lifespan of technology that has been around since the 1950s was probably not the primary motivation when companies put a device for creating content in the hands, potentially, of every person on the planet.

But all of those cat selfies and videos made a big contribution to the 64.2 ZB of data created or replicated last year, according to IDC’s DataSphere and StorageSphere reports, which also forecast 23% data growth in CAGR for by 2025.

Surprisingly, less than 2% of this data has actually been stored and retained until 2021, but the StorageSphere of installed storage still reached 6.72 ZB of capacity last year and is expected to grow 19.2% CAGR until 2021. in 2025. And, warns IDC, organizations – and individuals – should consider retaining even more data as they embark on digital transformation, increase resilience, and accelerate data analytics initiatives.

Data creation therefore increases and a greater percentage of data will be stored permanently. But where? If you take a look inside end-user devices and the high-performance data arrays that grab the headlines, you’d be forgiven for thinking that the flash will absorb this tsunami of data.

But you’d be wrong, argues Western Digital Marketing Director Robert Anderson. In fact, almost two thirds of the data is stored on the hard drive. This is because most of the data created and maintained is heading to data centers and the cloud, and things are getting colder there. The data lifecycle is exemplified by what is happening to these vast oceans of consumer-generated content that have been created over the past decade and a half by smartphones, Anderson says. “When you get it ‘for free in the cloud’, why should I delete anything? “

But of course, cloud providers monetize that content, either through advertising or by developing insights that can be resold. The problem is, over time, the data is rarely, if ever, accessed. As Anderson points out, consider the amount of cell phone footage from concerts “that nobody ever looks at”.

The same cycle applies to corporate data. The deluge of information created in the normal course of business is further inflated by the flood of data from sensors, smart manufacturing and video. Organizations assume that at some point the right analytics software will come and allow them to understand it and create value. So they shrug their shoulders and say, “Why delete it. Maybe I’ll use it later. In the meantime, Anderson continues, they think “if it’s cheap enough to store it, I don’t want to spend time figuring out what to keep and what to delete. It’s a waste of someone’s time. The cloud is where your data is going to cool. Likewise, the cloud provider explains, “I’ll take it all. And I’ll keep it for you. Because I can do it inexpensively, and I can now monitor what you access, how often you access it, and then I can be more optimized around the structure of how I store your data … but if you want to get them back then i’m going to charge you [because] then I lose my monetization abilities.

The result, Anderson explains, is “that ever-growing anchor of legacy data residing in these large public clouds.”

Anchor can have negative connotations, but in reality it just means that this data moves from a profit center to a cost center, as vendors face the challenge of reducing the cost of keeping this data which, perhaps. to be, one day, will have to be accessed again.

Traditionally, data that needed to be kept in perpetuity had a secure and inexpensive home tape. “There is still a strong market for tapes for archival purposes, especially for businesses,” says Anderson. “The government uses a lot of tapes.”

The problem for cloud services – and business and research organizations – is that there is always a chance that someone wants to access this old data and wants it quickly. The ribbon does not lend itself to this, even if it is not stored in a salt mine. And for this reason, says Anderson, “it breaks the implied service level agreement.”

The long tail of the gang

Instead, it’s the hard drives that absorb the cooling data. The evolution of hard drives to ever larger capacities further cements them in this role, says Anderson.

As hard drives get bigger, “your absolute performance on a spec sheet is faster. But your access density is lower… Before, I had five 4TB drives running simultaneously to use 20TB of data. Now I have a 20TB drive.

If that means data recovery might take a few seconds or a few minutes, that’s still a far cry from the hours or days it takes to recover a tape from a remote location, let alone recover the data it contains. “So the record starts to fill that gap more and more closer to the tape,” says Anderson. “Does he eat duct tape for lunch?” Not really. Because the profitability of the band is another level.

Does that mean businesses will no longer think in terms of traditional storage archives. “I would say the exact opposite,” Anderson says. And while tape is the cheapest medium for cold data, the large capacity hard drive also offers full cost of ownership benefits at scale in the data center, whether in the enterprise or in a hyperscaler cloud. “Assuming the operator needs 100 units of storage – terabytes, petabytes, whatever – well, if I can fit it in half the racks, I have half the metal,” says Anderson. . Then, by extension, “I have fewer node systems in the rack, I have half the memory dedicated to each of them. I have less network because I have less connected devices and less cabling? Even as the power per device increases, my power per terabyte decreases, so my total power decreases and the space I occupy decreases.

Perhaps counterintuitively, he says, “the higher your non-storage costs, the greater the savings in total cost of ownership.”

What does performance mean now?

It also explains why the traditional high-end level of enterprise hard drives – the 10,000 RPM and 15,000 RPM devices – is indeed “an endangered species.” It’s harder to increase capacity with these devices, Anderson says. At the same time, their performance advantage is being eroded by SSDs.

While SSDs can absorb more of the market’s performance, they will not replace HDD technology overall, even in the medium term. The performance demands of AI and HPC do indeed favor SSDs, but only while they are running simulations, training, or inference. When it comes to this rapidly cooling data, SSDs are still a hugely expensive option, Anderson explains.

“The idea that I’m using flash for consumer, high-capacity, long-term storage?” This is simply not true. I mean, you can, and do some people? Yes. But it is very, very expensive. Anything that needs a large set of data that you don’t want to pay a fortune to maintain storage will end up on the hard drive. You’ll store your large AI and ML datasets on the hard drive, then migrate them while you do your calculation and move them back when you’re done.

And while SSD shipments can grow faster than hard drives, they come from a much smaller base, while easy “binary” switches, like in client devices, have already been made, he says.

“So in absolute terms the gap is widening for a long, long time,” says Anderson. “And it takes an incredibly long time for it to close.”

And during this period, the flash and the hard drive are likely to evolve. Magnetic shingled recorders, for example, are still not common. But with the hard drive absorbing more and more cooler, immutable, and infrequently accessed data, the benefits of a sequential-oriented format are obvious. The advantages are even more evident when it comes to object data formats such as video, which, almost by definition, are sequential.

So flash could be the hot storage format that makes the headlines, while tape, like a diamond, is forever.

But whether it’s selfies, corporate data, video feeds, or data lakes that enable recommendation engines or climate simulations, it’s the hard drive that will absorb most of our data for a while. years and maybe decades to come.

Sponsored by Western Digital

Leave A Reply

Your email address will not be published.