A Technical Overview of Smartmontools

A Technical Overview of Smartmontools

Introduction

Smartmontools is a very well known toolset for monitoring and querying storage health. The information that the project utilizes is a part of the Self-Monitoring, Analysis and Reporting Technology System (or S.M.A.R.T.) which is a standard implemented by many modern hard drives. In this post, we are going to take a look at the code of Smartmontools and see what we can learn from it. We will also take a look at the Linux storage system hierarchy and how communication takes place from a softwares perspective. Let's begin!

Checking out the Codebase

We will be checking out the code through the official SVN repository, although a Github mirror exists as well. Revision 4934 will be used throughout this review.

Checking out the codebase is as easy as:

svn co https://svn.code.sf.net/p/smartmontools/code/trunk/smartmontools smartmontools

Building the Code

We will be using Linux for building and reviewing the code. Smartmontools uses Automake as the build system. From a clean repository, we can run the build to produce the binaries with the following commands:

./autogen.sh
./configure
make
sudo make install

Reading the INSTALL file will give details on specifics of default settings and overrides which are possible during compilation.

Reviewing the Interfaces and Data Structures

The code is written in C++ with use of a few base classes and mixins for code abstractions across the supported platforms.

Figure 1: Platform Inheritance Hierarchy (Not a Complete Depiction)

The smart device and smart interface classes are the basic building blocks of the code. In addition, generic ATA, SCSI, and NVMe type classes are included for extension. These base classes are defined in dev_interface.h.

The smart_device class can be used to downcast each device object into specific implementation of the extended smart_device classes such as:

  • smart_device.to_ata() returns ata_device
  • smart_device.to_scsi() returns scsi_device
  • smart_device.to_nvme() returns nvme_device

There are ten platforms that are supported by Smartmontools. Each of these platforms have their own unique structures and driver interfaces for querying the underlying storage. Smartmontools uses object inheritance in order to differentiate the implementations of the smart device classes and smart interfaces for each platform. These are defined in the os_<platform> cpp and header files.

In addition, the code uses a global registry singleton object to define the smart interface to use for the platform being supported at compile time. At the end of each of the platform specific source files, smart_interface::init will be implemented which will set the global interface to the platform specific implementation.

For example, in os_linux.cpp, the following code is used to register the linux smart interface as the source of truth:

void smart_interface::init()
{
  static os_linux::linux_smart_interface the_interface;
  smart_interface::set(&the_interface);
}

In conclusion, we have found how the codebase makes use of object inheritance in order to differentiate the methods of pulling S.M.A.R.T. data from the underlying storage. This provides a clean interface for extension and addition of future devices based on the platform running the executable.

Getting to the Meat of the Pie

As we go deeper into the code, we find some more interesting things to learn. For instance, the logic used to scan for devices and detect a device type. Another example we will examine is how Smartmontools is able to gather information about storage behind RAID controllers and USB devices.

Understanding Storage Device Types

One aspect we glossed over is the supported device types. There are three standards for communication with mass media used from a software perspective: ATA, SCSI, and the newer NVMe. Each offers different command structures for sending requests and receiving responses from the storage.

The ATA standards are commonly found on desktop computers. First appearing as Parallel ATA (PATA) which used the IDE "ribbon" cables. The more modern SATA devices, connectors, and host controllers continue to utilize the ATA command set. There is also the improved AHCI standard for SATA which includes additional instructions such as Native Command Queueing and TRIM support for SSDs.

SCSI is an older standard written for more than just hard disks as compared with ATA. Serial Attached SCSI (SAS) is a replacement of Parallel SCSI (SPI) and is often found in servers and enterprise storage hardware. One thing to note is that SATA drives are compatible with SAS controllers but SAS drives may not be used with SATA drives.

Finally, there is the NVM Express (NVMe) standard which is used with modern SSD devices. This standard was written from the ground up with the low latency and high parallelism of SSDs in mind. NVMe is gaining more traction in desktop and enterprise environments and will continue to gain market share as SSD devices become more economical and powerful.

Scanning Storage Devices

With smartctl, one can scan for all devices connected to the system. As an example:

$ smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device

We can see here, that two SCSI devices are detected on the system. Let's take a look at how Smartmontools detects and reports on these devices.

Figure 2: Linux Glob Patterns Used for Searching for Devices

Within the Linux subsystem, storage devices are exposed through specific device nodes on the udev or devtmpfs mounted filesystem at /dev/. The codebase makes use of naming conventions used by Linux in order to scan for the storage devices attached to the system.

These device nodes can be deceiving and may not accurately depict whether a device is ATA, SCSI, or NVMe. For example, let's look at the device located at /dev/sdb:

$ udevadm info -a -n /dev/sdb | grep -E 'looking|DRIVER'
  looking at device '/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb':
    DRIVER==""
  looking at parent device '/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0':
    DRIVERS=="sd"
  looking at parent device '/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0':
    DRIVERS==""
  looking at parent device '/devices/pci0000:00/0000:00:1f.2/ata2/host1':
    DRIVERS==""
  looking at parent device '/devices/pci0000:00/0000:00:1f.2/ata2':
    DRIVERS==""
  looking at parent device '/devices/pci0000:00/0000:00:1f.2':
    DRIVERS=="ahci"
  looking at parent device '/devices/pci0000:00':
    DRIVERS==""

As we can see, the /dev/sdb block device uses the sd kernel driver which handles SCSI devices. In the parent device hierarchy for /dev/sdb, we can see that the ahci kernel driver is used for the ata2 host adapter. AHCI is a standard used with SATA, and thus this is an indicator that the underlying device may be SATA.

$ lsscsi
[0:0:0:0]    disk    ATA      Samsung SSD 850  2B6Q  /dev/sda
[1:0:0:0]    disk    ATA      HGST HTS725050A7 B550  /dev/sdb

As we can continue to probe the system with the lssci tool, we see that /dev/sdb is detected as ATA. So why is Smartmontools not detecting the proper device type?

Auto-Detecting Device Types

As we discussed earlier, the SCSI command set is often times used for more than hard drives. USB and RAID devices are commonly configured as SCSI devices regardless of the storage hardware they are connecting to on the system.

Linux Storage Stack Diagram

Within the Linux kernel itself, we can see above that the SCSI mid layer is responsible for many device types and handles the translation to low-level drivers including libata (for ATA hardware).

Smartmontools uses a few techniques to detect the underlying storage type behind these generic block storage devices. One is by issuing a SCSI INQUIRY command to the host device and parsing the result. According to the T10 Specification, an ATA device should respond with a vendor identification of 'ATA     '  (ATA in an 8 byte frame).

Using our previous /dev/sdb and the sg_inq tool, we can see that this is indeed the case:

$ sg_inq /dev/sdb | grep 'Vendor identification'
 Vendor identification: ATA

Smartmontools will also detect this and use SCSI/ATA Translation (SAT) to communicate through the SCSI application layer directly to the ATA device. Under the hood, this is done using the autodetect_open methods. Unfortunately, the smartctl --scan does not go down this code path so will not always report on the proper device type. Running smartctl -a /dev/sdb does auto detect the device type and will give you a more accurate understanding of the actual hardware configuration.

Understanding Passthrough Devices

There are other instances of a storage device not being directly accessible to the system. For instance, a hard drive may be attached through a USB bridge, such as with an external hard drive or USB drive. We have seen previously that these USB devices can be expressed as generic SCSI devices through the kernel drivers. In these cases, all SCSI commands for querying S.M.A.R.T. data must be passed through the controller on the USB bridge to the hard drive. In order to query the S.M.A.R.T data of the device, the request must go through SAT or SCSI / NVME Translation.

Another instance requiring custom passthrough commands are for RAID controllers. For example, one class of RAID controllers which are supported by Smartmontools is the MegaRAID. Again, MegaRAID devices will be exposed as a block device on the Linux system through the sd kernel module similar to the following:

$ udevadm info -a -n /dev/sda | grep -E 'looking|DRIVER'
  looking at device '/devices/pci0000:00/0000:00:07.0/0000:06:00.0/host0/target0:2:0/0:2:0:0/block/sda':
    DRIVER==""
  looking at parent device '/devices/pci0000:00/0000:00:07.0/0000:06:00.0/host0/target0:2:0/0:2:0:0':
    DRIVERS=="sd"
  looking at parent device '/devices/pci0000:00/0000:00:07.0/0000:06:00.0/host0/target0:2:0':
    DRIVERS==""
  looking at parent device '/devices/pci0000:00/0000:00:07.0/0000:06:00.0/host0':
    DRIVERS==""
  looking at parent device '/devices/pci0000:00/0000:00:07.0/0000:06:00.0':
    DRIVERS=="megaraid_sas"
  looking at parent device '/devices/pci0000:00/0000:00:07.0':
    DRIVERS=="pcieport"
  looking at parent device '/devices/pci0000:00':
    DRIVERS==""

In order to gain information about the underlying storage behind a RAID device, Smartmontools must issue passthrough commands to the adapter. This is done using special IOCTL system calls which will be interpreted by the kernel driver and will be passed along to the firmware. The IOCTL command consists of a specially constructed packet which is sent to the kernel driver for processing. This packet will contain the necessary passthrough command fields such as:

  • SCSI Command
  • SCSI Command Data
  • Target device ID

Since IOCTLs are not standard across all device drivers or RAID controllers, Smartmontools must create custom device classes to support all of the different ways to passthrough commands to the storage hardware.

Conclusion

We've taken a look at how Smartmontools operates and how it is able to query the different device types. We have also taken a deep dive into the different device types and how they differ between each other. In addition, the Linux storage subsystem was explored to get an idea of how devices are exposed through the Operating System. Finally, we took a look at home Smartmontools issues translation and passthrough commands to communicate with storage behind many different bridges and controllers.