Server RAM Modules – Where Reliability and Stability Are Crucial
January 12, 2023
Blog
Wiesław Wilk, CEO of Wilk Elektronik – the manufacturer and owner of the Goodram and IRDM brands – shares insights into server RAM modules and how to choose the right one.
Server RAM, or more precisely, server RAM modules, are products comprising many components. As with any DRAM module, its basic component is a PCB onto which integrated circuits and passive components, like resistors and capacitors, are soldered during the production process. The main difference between server RAM and consumer RAM is the quality and quantity of the chips. A server RAM needs to run 24 hours a day, so the chips used in the product must be high quality and carefully selected in pre-manufacturing. There are different server RAM modules – ECC UDIMM modules (unbuffered ECC), registered RDIMM modules, and LRDIMM, or Load Reduced DIMM modules.
EEC UDIMM RAM modules feature an additional 8-bit chip for every 64 bits. In practical terms, each RAM module has nine memory chips, not eight. For a dual rank module, it is eighteen chips instead of sixteen. These extra ICs are not used for writing data. Their only function is to support the ECC (Error Correction Code) algorithm that detects and corrects errors in the RAM module.
RDIMM modules feature ECC and what is called the Register, which provides communication between the RAM module and the memory controller. The controller in the CPU communicates with the Register in the memory module, then the register manages how the RAM chips operate. As a result, RDIMM modules enjoy high RAM stability in multi-module systems, which allows it to operate with more modules on a single motherboard, as the Register takes the load off the memory controller.
The last type of RAM modules, LRDIMM, have data buffers to complement the Register. The CPU memory controller communicates with the Register and with separate buffers dedicated to each individual RAM chip. The RAM module control is managed by the Register while the buffers transfer data.
There are some components in a server RAM module structure that are not in consumer-grade RAM modules. While a consumer RAM module can have 8 or 16 chips on the PCB, a server RAM module, depending on the type, can have 9 or 18 chips for ECC, while RDIMM and LRDIMM modules can have 9, 18, or 36 chips. RDIMMs have a Register and LRDIMMs feature data buffers that further differentiates them from consumer solutions.
There are some specific characteristics of these server RAM modules. By definition, any RAM type is volatile and requires electric power; this means it is inherently not resistant to data loss. RAM modules store the temporary data necessary for applications to work. During the data exchange process, RAM is exposed to errors from disturbances in the operation of the hardware that can lead to data loss or system instability. One of the advantages of server-grade RAM modules over commercial RAM modules is therefore support for error correction codes, a function known by the acronym ECC.
ECC, or Error Correction Code, is a special algorithm that detects and corrects errors that may occur in the RAM module during operation. The data errors that the ECC algorithm corrects are most often caused by incorrect electrical and dynamic parameters of the RAM module interface or damage to the memory cells. ECC uses an additional 8 bits per memory bank to store a checksum. The checksum is generated each time data is written to the RAM chips. When the RAM chips are read, the data is verified against the checksum stored for it. If the data matches the checksum it is sent back to the CPU. If there is a data mismatch, the data is corrected with the checksum and then it is returned to the CPU.
In simple terms, the process is as shown in the flowchart below:
Therefore, by detecting and removing errors, ECC increases the stability of the entire system.
Unbuffered consumer-grade RAM modules do not have this built-in algorithm, which makes their stability and data transfer validity poor compared to ECC-enabled RAM modules. On the other hand, however, ECC finds no application in consumer PCs. ECC-enabled RAM modules operate in a specific way and mainly find an application in servers, advanced workstations, and industrial computers, which means anywhere where even momentary instability of the system is unacceptable.
PC must meet certain requirements to support ECC RAM modules. Note that not every PC configuration can use the potential of ECC RAM modules. Compatible CPU and motherboard are required to run ECC modules properly. Before purchasing ECC RAM modules, it is important to make sure that the CPU, motherboard, and the DRAMs will be compatible with one another. Currently, the most popular standard among ECC RAM modules is their fourth generation, DDR4, which consume as much as 20% less power than DDR3 modules! Note that the 4th generation RAM modules are not backwards compatible and are not supported by the motherboards designed to work with DDR3 RAM modules. That is why it is critical to remember this when choosing the right RAM modules for your hardware configuration.
Upgrading a server and expanding the RAM is not complicated. However, it makes sense to note a few basic rules. When expanding the RAM of a server or a PC workstation, avoid combining different types of RAM modules. It is not possible to support unbuffered RAM modules with registered RAM modules at the same time. A configuration like this on a single motherboard will not work. To ensure system stability, it is the safest bet to use the RAM modules of the same type, which means the same RAM chip layout, number of memory banks, memory size, and clocking. For example, DDR4 16GB ECC RAM modules are currently available in dual rank configurations with eighteen chips in the 1024Mx8 layout and single rank configurations with nine chips in the 2048Mx8 layout. When expanding a server that has 16GB 1024Mx8 dual rank ECC RAM, it is the safest thing to use dual rank RAM modules. This approach will ensure the stability and reliability of the entire system.
When choosing RAM modules for professional hardware, like servers, workstations, or industrial PCs, it makes sense to look at the components used to build the RAM modules and the manufacturing process itself, provided it can be understood in depth. By components, I mean the integrated circuits, or simply, “chips.” For professional applications, RAM modules that use components from one of the world’s three largest manufacturers, Samsung, Hynix, or Micron. These microchip brands are used in the production of Goodram-branded server RAM modules are recommended. The RAM integrated circuit is the most important component of any RAM module, and it determines if a RAM module in continuous operation (24/7) can perform for one year or many years. These manufacturers do not release their ICs without a prior testing and validation process. If a chip from Samsung, Hynix, or Micron passes the testing and validation process, it is always marked with the manufacturer’s part number.
There are methods how the buyer or server owner can verify the quality of the selected RAM modules. Verifying whether an IC is of a high quality is possible for the end user and, despite appearances, does not require diagnostic equipment. Genuine ICs are always marked with the manufacturers’ serial numbers. Chip manufacturers do not permit the selling of “black chips,” or chips without the markings that explicitly identify their origin and manufacturer. Many RAM module manufacturers, wishing to offer the product at a lower price, use chips of unknown origin (which are black or sometimes with the RAM module brand name or logo printed on them). For professional applications, it is safer to shy away from these RAM modules as they suffer a much higher failure rate and shorter life cycle. It is best to request the RAM module manufacturer to specify the origin and OEM of the chips when purchasing modules.
There are specific types of RAM modules that customers are most often looking for. Given the current market trends, the most sought-after type of RAM modules are those manufactured in DDR3 and DDR4 processes. Wilk Elektronik company still receives enquiries for older types of RAM modules, DDR2 and DDR1. Polish manufacturer still produces these generations, which is a differentiator for the company. Manufacturing line can output more than one million units in one month. We have DDR3 ECC RAM modules in regular production, rated at 1600 MHz and 1866 MHz, and DDR4 modules rated at 3200 MHz, 2933 MHz, 2666 MHz, and 2400 MHz. The memory size of ECC RAM modules made by Goodram is 4 GB and 8 GB for DDR3 and up to 32 GB for DD4, the RAM module generation that is perfectly compatible with Intel and AMD server CPUs.
It is still possible to produce older generations of RAM modules. Polish Goodram will continue to make generation 1 and 2 RAM modules as long as our customers need it. Such an approach, however, is not the market standard and many PC DRAM manufacturers have long since phased out older solutions, which does not change the fact that the demand for them is still there. Because Wilk Elektronik has it's own manufacturing line and long-standing contracts with global chip manufacturers, it can make any RAM module, even in small quantities. This also sets company apart from global manufacturers who only fulfill large orders. Goodram focuses on an individual approach to customers. If the RAM modules are to operate at elevated temperatures or in the Arctic cold, the manufacturer will be able to make the right product – it is for these applications that we have equipped our manufacturing facilities with a climate testing chamber and RAM module diagnostic testing facilities. Thanks to the climate chamber, the lab’s engineers can test any RAM module at temperatures from -40°C to 85°C.
In summary, there are many contributors to the quality and reliability of server-grade RAM modules. Extremly important are the quality and origin of the ICs, the quality of the passive components, the manufacturing process, and the stringency of the verification processes. The RAM module manufacturer is responsible for these contributing factors. Another factor is the correct process of RAM module validation, or approval for use in the intended system.