This article mainly describes the solution to the problem of disk stalls that may be encountered when using the new SAS hard disks in SAS hard disk enclosures or hot-swap server hard disk racks. The principle is very simple, it only needs to shield the signal of the pin3 part of the power supply pin of the hard disk to solve the problem.
1>Is the hard drive broken?
This time, a Ceph cluster was set up at home, and a new batch of Seagate enterprise-level Exos SAS disks was added. After all, SAS/SATA have been mixed in these years. In the end, the feeling of using SAS disks is that they are more reliable in terms of reliability. The interface design supports two data channels and has a certain degree of redundancy. Enterprise-class disks are better than special NAS disks in terms of design materials and workmanship. For example, Seagate’s SATA or SAS can have a five-year warranty from the manufacturer. Under 7 x 24h working conditions, if something goes wrong during the warranty period, you can still find an agent business exchange.
When I took the hard drive I bought back, I installed the hot-swap tray and plugged it into the server. After the power is turned on, it is found that the hard disk power indicator on the front panel of the chassis works normally, but the hard disk read and write lights are not seen flashing during the power-on process. The motor sound of the disk is a little more obvious than that of the SATA disk).
The first suspicion is that there is something wrong with the configuration of my adapter card, it may also be a problem with my backplane, or a connection problem with the SAS cable. However, there happens to be a 10k hard drive that restricts SAS2, and it can be directly recognized after it is plugged in, which is obviously not as straightforward as I thought at first. So I began to wonder if there was a problem with this hard drive, or if I had encountered a profiteer. After all, this hard drive was a second-hand one, but it was very new and also had a joint warranty. It's a big deal if there is a problem with the hard drive, just find a local agent to replace it.
So I tried the second hard drive again, and it couldn't be recognized either. According to my feeling of quality control of enterprise-level products, it is generally impossible for two consecutive hard drives to have problems. So I began to realize that this problem may not be a hardware failure, but a protocol compatibility problem, so I went to the stack exchange to find out if there were similar problems, and finally found the cause of the problem.
The mainstream SAS hard disks that can be encountered now generally implement the SAS 3 standard, but in the revised version of the specification, the power disable (PWDIS) function for controlling hard disk energy saving has been added, allowing SATA hard disks to control the stopping of HDD disks. In 2017, some manufacturers began to support this feature on SAS hard disks. The PWDIS function uses the third pin of the 15-pin power supply line interface to control the rotation of the HDD disc. When there is a 3v voltage signal on this pin, the hard drive disc will be in a stopped state and cannot work normally. From the manufacturer's description of the physical pin definition of the hard disk, you can see the relevant introduction.
The physical definition of the SAS/SATA interface itself actually adopts the physical specification of SFF-8482. The only difference is that SAS has an additional set of 7-pin signal lines. Therefore, the general server's hot-swap hard disk rack can easily realize the mixed support of the two physical devices, of course, the premise is that there is a SAS controller in the rear to provide support for the SAS protocol.
The updated U2 interface actually uses a physical interface form similar to SAS, but it has a new design for the redundant signal line part, which directly realizes the direct connection of the PCI-E protocol to the disk rack.
The solution can actually be handled according to the situation. For friends who directly use the SAS cable to connect the hard disk, the 15pin power cable that comes with the power supply will also encounter this problem because it contains a 3v power supply output, but in this case, The easiest way to solve this problem is to directly use the 4pin power supply (commonly known as the big D port) to transfer the sata power cable to solve it.
Because the pin definition of 4pin only includes the definition of 12v and 5v, the use of 3v signal in the computer itself is very rare. The large D port to SATA power supply is purely physical connection processing, and there will be no more 3v signal, so it directly avoids this problem.
However, for students like me, who use the SAS backplane to connect to the hard disk, it may not be so smooth, and additional hands are needed to hack the equipment in their hands. In fact, it is not complicated. Find an insulating transparent glue or single-sided glue to cover the pins that control the PWDIS. After shielding this control signal, you can avoid this problem. Like what I do in the picture below.
If you feel that the pins are too small and the hand stick is not good, you can actually shield all pin1 ~ pin3 directly. The first two pins are reserved, and full shielding has no effect.
In addition, players who are more violent can also directly cut off this part of the copper sheet with a utility knife, but this approach will affect the warranty and future second-hand transactions of the hard disk. This approach is generally not recommended.
After a simple transformation, plug the hard disk into the disk rack of the server, turn on the power supply, and the hard disk read and write indicators work normally. The hard disk can already be recognized by UEFI and the operating system.
For my Ceph usage scenario, the log records of each microservice are written continuously, and the hard disk stop function is not required. For my application scenario, blocking this function will not bring What is the loss? Generally, for server application scenarios, the PWDIS function is also a bit tasteless.
Send your message to us:
Post time: Aug-12-2022