Webinar
ITGLOBAL.COM events

Storage (Data storage system)

A data storage system (DSS) is a complex of hardware and software that is designed for storing and processing information, usually of a large volume. Information is files, including media, structured (DBMS) and unstructured data (big data), backups, archives. Hard drives are used as storage media, mainly SSDs (All Flash Array systems), as well as hybrid solutions combining SSD and HDD drives in one storage.

Storage systems differ from a custom hard drive in their complex architecture, the ability to combine storage into a data network, the availability of separate software for managing the storage system, advanced backup, compression and virtualization technologies.

Data storage systems differ in several parameters, the choice of which determines the use of storage.

Select storage

Storage levels

Block storage

The storage is used as a regular disk, which can be formatted, installed on it by the OS, and create logical disks. Data is stored not in files, but in blocks, which speeds up I/O operations. It is more often used in SAN (Storage Attached Network) type networks. It is suitable for high-performance computing, DBMS, storage of large amounts of data, as development environments (Dev/Test). Of the disadvantages: a) the complexity of setup and maintenance, which require appropriate qualifications; b) high cost.

File Storage

The data is stored as files that are placed in directories. Such storage is used to store “cold” information that is not required for operational calculations. NAS (Network Attached Storage) are usually built on file storages. Disadvantages: with the accumulation of large amounts of data, the folder hierarchy becomes more complicated, and the speed of storage operation gradually decreases. It is not suitable for loads that require a high response rate.

Object Storage

A type of storage that is focused on working with large unstructured data up to petabytes in size. Information is stored not as files, but as “objects” with a unique identifier and metadata. Therefore, the object storage is similar in structure to a database. It is used in analytics, big data, machine learning, for storing “heavy” media files and backups, developing and operating applications in the cloud, and hosting websites. In terms of speed, it is inferior to block storage in tasks related to transactional loads.

Network access

NAS (network-attached storage)

A file server that is connected to the local network. Access to disk storage is organized via NFS protocols (on UNIX/Linux systems) or CIFS (Windows). The NAS is used to work with file—type data that needs collective simultaneous access – for example, to shared Word and Excel documents. The NAS works “on top” of an existing LAN, via shared switches/routers.

SAN (storage area network)

A network that is suitable for using different types of storage (disks, optical drives, tape arrays), but which are perceived by the operating system as a single logical data store, or as a network logical disk. Protocols: iSCSI (IP-SAN) and FibreChannel (FC). HBA (Host Bus Adapter) adapters are used to connect computers. The SAN uses mainly a block type of data storage.

The SAN/NAS separation is no longer as strict as it was in the early 2000s, since with the advent of the iSCSI protocol, manufacturers began to produce hybrid solutions.

Fault tolerance

To assess the ability of a storage system to recover from failures, two indicators are used — RPO and RTO.

RPO (recovery point objective)

The period for which data will be lost is between the moment of the accident and the time when the last backup was created. If the RPO is equal to 12 hours, if the storage fails, data accumulated over the last 12 hours may be lost. RPO affects the choice of disaster recovery technology and depends on the cost of losing a specific amount of data.

RTO (recovery time objective)

The time it takes to restore access to the storage. The RTO value is important for estimating the cost of system downtime.

Backup

The frequency of backups is selected based on specific tasks and the required level of protection. The same applies to placement: work data and their backup can be stored in geographically distributed storage (for example, in data centers located in different countries and even continents).

In addition to backups, snapshots are made — snapshots that are used to roll back to the latest working version of the system.

Deduplication is used to make backups take up less space. In this case, only the data that has changed is copied to the copy. The difference between backups does not exceed 2% on average, so deduplication helps to save disk space.

How to choose a storage system

First of all, you need to understand what tasks it will solve. Before contacting the supplier (or integrator), you should determine several basic parameters.

Data type

Different types of data require different access speeds, processing technologies, compression, and so on. For example, a storage system for working with large media files differs from one that is suitable for working with a transactional DBMS, or from a system that will work with unstructured data for a neural network.

The amount of data

The choice of disk drives depends on this. Sometimes you can do with a consumer—grade SSD – if you know that the storage capacity, even in the worst case, will not exceed 300 GB, and the access speed is not critical.

Fault tolerance

It is necessary to imagine what the cost of data loss is over a certain period of time. This will help you calculate the RPO and RTO, as well as avoid unnecessary backup costs.

Efficiency

If the storage is being purchased for a new project (service), the load of which is difficult to judge, it is better to communicate with colleagues who have already solved this problem. Or contact an experienced supplier who has already launched similar projects. The ideal option is to test the storage.

Vendor

Sometimes even a low-cost or medium-level solution (StarWind, Huawei, Fujitsu) is suitable for a resource-intensive service. However, the top manufacturers — NetApp, HPE, Dell EMC — have a fairly wide product line, and relatively inexpensive storage systems can also be found here. In any case, it is advisable not to greatly expand the number of vendors on the same infrastructure.

 

 

 

We use cookies to optimise website functionality and improve our services. To find out more, please read our Privacy Policy.
Cookies settings
Strictly necessary cookies
Analytics cookies