Unstructured Data: A Deep Dive into its Value and Growth using High-Performance Object Storage
December 27, 2023
Blog
Data is the lifeblood of any business, and increasingly, that data is unstructured. Generated from social media, Internet of Things (IoT) devices, text, images and multimedia, the amount of unstructured data is growing significantly.
As big data analytics, artificial intelligence (AI) and machine learning (ML) continue to transform how businesses operate, this amount of data will continue to expand. All this unstructured data will need to be stored, managed, secured, and made accessible.
While hard disk drives are the king of capacity, recent advancements in Non-Volatile Memory Express (NVMe™) solid state drives (SSD) around performance, and scalability are making high-performance object-based storage (OBS) a viable option for storing unstructured data that powers today’s data-driven applications and workloads like AI and ML.
The Explosion of Unstructured Data
Organizations use unstructured data to extract valuable insights that help create operational efficiencies, enhance decision making, and uncover competitive advantages. For example, business intelligence efforts around market research, customer insights, sentiment analysis, and predictive analytics rely heavily on unstructured data. Same goes for AI/ML initiatives powered by natural language processing (NLP) and computer vision as well as personalization and recommendation engines. Unstructured data is also used to power personalized marketing, automated customer support (such as chatbots), and new product development and innovation.
According to IDC, 180 zettabytes of data will be generated globally in 2025 (up from 64 zettabytes in 2020) – much of it unstructured. Another IDC report estimates that 139 exabytes of unstructured data will reside in private cloud and traditional object storage by 2025, with another 1 zettabyte in public cloud object storage. These exploding storage requirements are putting pressure on organizations that will either make them successful or struggle based on their use of unstructured data to drive business outcomes.
Managing Unstructured Data with Object-Based Storage
Managing these large volumes of unstructured data in a viable, efficient manner is challenging. First of all, as we’ve just established, there’s a lot of unstructured data out there, and finding enough capacity in the data center or in the cloud can be difficult and expensive. Secondly, unstructured data comes in a lot of different formats, and whether it’s email, social media posts or smart video, they all require different storage requirements. Thirdly, how the data is used varies widely. Some unstructured data may sit unused for years while other data types are fed into AI engines constantly, and in real time.
While traditionally used for backup and archive, OBS has emerged as a recommended option when storing large volumes of unstructured data. In fact, OBS offers several advantages over block and file-based alternatives – especially when it comes to cost, scalability, and deployment flexibility. As a result, OBS is now supporting modern workloads such as data analytics and AI/ML, and is commonly used today by cloud storage services, content delivery networks (CDNs), and many organizations spread across media and entertainment, healthcare, and other industries that rely on unstructured data.
OBS, File, and Block Storage
The main benefit of OBS is scalability. Given the massive amounts of unstructured data being created and accessed today and the speed at which capacities are growing, it makes sense to rely on a storage format that can scale on the petabyte and exabyte level. OBS is also highly redundant because it is often distributed across multiple locations. This improves performance and makes data loss due to errors or reliability issues less likely. Object storage can also provide cost savings – both up front procurement and on-going operational costs. The ability to scale as needs arise eliminates the need to overprovision resources – helping to control costs.
But where OBS really shines is with unstructured data with high performance needs. While hard disk drive (HDD) storage is the most common media due to its huge capacity and ability to scale efficiently, NVMe SSD-based OBS is gaining traction to help deliver performance at scale – including the use of flash drives to help speed up data storage and metadata operations for AI/ML.
OBS doesn't require NVMe SSDs for every use case, but there are many scenarios where the use of NVMe SSDs can speed up the performance of an OBS system, reduce access time, and provide high IOPS for higher performance workloads. It's important to note that NVMe SSDs can be more expensive than HDDs on a cost/TB basis. Therefore, their adoption should be based on a careful assessment of the specific performance needs of the object storage use case. In many OBS deployments, a tiered storage approach that combines NVMe SSDs for high-performance tiers and HDDs for less frequently accessed data can strike a balance between performance and cost effectiveness.
Finding the Right Mix of Cost Per Performance
Unstructured data is helping organizations extract valuable insights that help create operational efficiencies, enhance decision making, and uncover competitive advantages. The increase of unstructured data and the role of AI/ML and advanced analytics will only increase the need for OBS. In order to meet these data needs, organizations need to invest wisely and consider the right HDD and NVMe SSD-based solutions based on the specific capacity and performance needs of each application. Given these choices, it’s important to keep your options open and choose the best media that makes sense for each use case.
Praveen Midha is director, segment marketing & technical marketing, data center flash, at Western Digital. Praveen is focused on expanding Western Digital’s enterprise flash storage business across data center customers. He’s focused on product strategy, portfolio expansion and key GTM initiatives. Praveen earned a master’s in business administration from Santa Clara University and a bachelor’s degree from Indian Institute of Technology (BHU) Varanasi, India. He is based at the company’s San Jose, CA headquarters location.