ngflushd - A Smart Disk Spin-Down Daemon

NAME

SYNOPSIS

ngflushd [-d] [-jfeature] [-pfile] [-svalue] [-v] [-wlimit] [-zmask]: [-xdisk]... [ [-llimit] [-mmax] [-ttime] [-idisk | -a]... ]...
ngflushd -c command
ngflushd -h

This is a smart Disk Spin-down Daemon that can handle IDE, SATA, USB and SCSI. All disks types other than IDE are controlled by the SCSI subsystem of the kernel. Automatic SCSI spin up (which you will need) works at least since kernel 2.6.15. For older 2.6 kernels you may need a patch, 2.4 kernels are not supported. ngflushd will disable SCSI spin down for older 2.6 kernels automatically - but this can be overridden (using -z0).

The basic operation principle is that the kernel caches most data read from disk. And when the disk is spun down writes shall be cached too. As long as no non-cached data needs to be read and as long as there is enough memory to cache writes the disk does not need to spin.

Unfortunately there is a tradeoff concerning possible data loss. When write data is cached and the computer suffers a power loss the cached data would get lost. The kernel and journalling file systems usually try to write modified data back to disk early (after 5 .. 30 seconds) to reduce the risk of data loss.

Even if you decided that spinning down disks is more important than the risk of data loss the paranoid attitude of the kernel and journalling file systems would normally hinder your disks from spinning down for reasonable periods of time. This is why the daemon has to use a couple of sophisticated algorithms to keep the disks spun down:

- The dbflush daemon is stopped as required: The dbflush daemons are a kernel mechanism to write cached but dirty disk blocks back to the disk. To understand this it is important to know that there is big difference between the real data (e.g. the content of a file) and meta data (e.g. the file name, attributes or the modification date). ngflushd can instruct the kernel to stop flushing file data until it really becomes necessary (the later in consequence makes the disk spin up). But bdflush usually has no control about the file system's meta data.
- Optional: VM gets dynamically reconfigured (lazy mode): Here VM stands for Virtual Memory. ngflushd can adjust the VM configuration to keep more disk data cached while some disks are spun down. When all disks are spinning (e.g. in a busy system) the VM settings are restored to default.
- Optional: Unmount autofs or subfs partitions: When partitions are not used the ultimate way to free memory (and to prevent corruption after power loss) is to unmount the partitions. This can be done before spin down for partitions that are mounted via the auto mounter or submount. When an attempt is made to access such a partition again the kernel would remount it, which in turn would make the disk spin up.
- Optional: Remount Ext4/Ext3/ReiserFS partitions with "commit=<xxx>: Journalling file systems do not handle meta data via bdflush. They usually have their own flush daemon so that they can be sure that the journals (and the meta data) are flushed early. This would typically make your disk spin up after 30s or less. A trick of ngflushd is to use the commit mount option to increase this time (by default to 1hour!).
- Optional: Remount XFS partitions via /proc/sys/fs/xfs/...: This is a variant of remounting implemented by the journalling XFS that does not use the mount() system call rather than a file-based kernel interface.

Good scenarios for ngflushd are workstations where the root partition is Ext3 or a server with a lot of archive disks that you want to spin down. Although it is possible to make a server's root disk spin down too (at night time for example) this requires a careful system configuration. See below.

RISKS AND BENEFITS

The risk of data loss due to power failure or soft and hardware problems has been mentioned above. If you feel that this is not a big concern for you - then what are the benefits of using ngflushd?

- Less noise
- Reduced heat production
- Lower power consumption (5 .. 8 Watts/h per disk saved)

Another problem that should be mentioned is that if a disk is spun down it will take one to five seconds to spin it up again. Also if the file system was automatically unmounted the remounting will take time (depending on the partition size some files systems can be quite slow). During this period your computer might appear as if it were hanging.

Finally it should be stated that a typical desktop disk will survive only about 40.000 spin up cycles. So you better prevent disks from spinning down too often. The option -l48 could be used to limit the number of spin-downs per day to 48 (or any number you want. The default is 24).

SOME EXAMPLES

Usually the default configuration (used at boot time) will be good enough:

   (1)   ngflushd -v -a

In example (1) the program will run as daemon, will auto scan for disks and will only write important messages to syslog. Every four hours a short status report will by sent to syslog.

When you want to learn how it works try to start with something simple:

   (2)   ngflushd -vv -d -t1 -a

Example (2) runs in foreground (on a console) makes verbose reports to stderr and uses a very short spin down interval of 60s. Other examples are:

   (3)   ngflushd -v -s193 -a
   (4)   ngflushd -v -s8 -a
   (5)   ngflushd -vv -a -x hda
   (6)   ngflushd -v -m90 -i hdc -i hdd
   (7)   ngflushd -v -z2 -a

Example (3) gives one detailed status report per day, (4) one short status report per hour. Number (5) is very verbose and excludes the system disk from spin down. (6) Only includes hdc and hdd, sets max commit time to 90 min. Finally (7) does not include auto scanned SCSI disks in spin down.

GENERAL OPTIONS

-c

Send a command to the running daemon or check if the daemon is running. The commands that the -c option can send are:

check

If a ngflushd daemon runs the daemon's PID is printed, otherwise the current instance exits with an error. In bash you could write:

   NG_PID=$(ngflushd -c check)
   [ -z "$NG_PID" ] && return 1
   echo "The daemon has PID $NG_PID"

suspend

(Sent to the daemon via SIGUSR1) Restore VM to normal state, re-enable bdflush, but disable spin down. The daemon enters the PAUSED state, see below. Using this command together with a later resume is more efficient than stopping and restarting the daemon (statistics are retained and after resume disks may spin down earlier).

resume

(Sent to the daemon via SIGUSR2) Resume normal operation after a suspend. The command can also be used to check if the daemon is running (with the side effect that a PAUSED state would end). See the description of daemon states below.

logstat

(Sent to the daemon via SIGHUP) Write current status and statistics to syslog. The command also restarts the timer that controls periodic reporting.

status

(Sent to the daemon via SIGHUP) Write current status and statistics to stdout. Like logstat but nothing gets written to syslog and the reporting timer is not restarted.

terminate

(Sent to the daemon via SIGTERM) Terminate a running daemon. The daemon will re-enable the normal VM and bdflush behaviour but will not explicitly spin up disks. Please do not use SIGKILL to terminate the daemon - this would eventually leave bdflush and VM in an unpleasant state.

-d

Debug; run in foreground and write messages to stderr instead of syslog. Trace messages (-vvv option) will only be generated in this mode. Tracing requires a debug build of the daemon.

-h

Print a short help summary and quits.

-p file

Use a specific PID file. Without this option the default is /var/run/ngflushd.pid. The PID file is used to identify the ngflushd daemon instance.

-s value

Enable or disable the writing of periodic status reports to syslog. Without this option the default is 32. A value of 0 can be used to disable periodic status reports.

When the lower bit is set (e.g. for odd numbers) a more detailed report including the disks is created. Otherwise only a summary gets written.

The other bits (e.g. the result of an integer division by 2) times 15 are taken as the number of minutes to wait between status reports. Some examples:

value = 0: no periodic reporting
value = 8: a short report every hour
value = 9: a detailed report every hour
value = 32: a short report every 4 hours
value = 193: a detailed report once a day

-v

Be verbose; use twice to get more output. In a debug build of the program -vvv would enable trace messages. Trace messages would never be written to syslog.

-w limit

Sets a rate limit in kByte/s of writes that a disk must not exceed to be still considered as a candidate for spin down. Usually writes would not prevent disk spin downs, but they can be taken as an indicator that the disk is not completely idle. The default setting is 20 (e.g. 20 kByte/s). Use 0 to disable this feature. The -v option can be helpful to find a better value for your system, check for messages like:

ngflushd: Above write limit (23 blocks): /dev/sda

DISK SELECTION OPTIONS

For ngflushd the order of some of the options is important. There are only option arguments and no non-option arguments must be given.

There are to classes of disks: those declared via -x or -i and those that get automatically detected by scanning hardware descriptions (see SPIN DOWN OPTIONS below). Disks that are excluded via -x will not be spun down. The options described here define shared properties for all disks.

-a

Enable spin down for auto detected disks (this also includes hot plugged drives). The option also saves the current values of -l, -m and -t to be shared by all auto detected disks. Usually -a is implied as the last daemon option, but this is not the case when -x or -i are used too.

-i disk

Include; spin down this disk (examples: -i /dev/hda -i sda). Symbolic link names can be used (and are relative to /dev not to current folder). Make sure that you write udev rules for generating specific device nodes when using a symbolic link name (like -x camera).

-j feature

(The default for feature is 7) Support for journalling file systems. The flags for the -j option can be or-ed and have the following functions:

feature = 1: remount journalling file systems; supports Ext4/Ext3/XFS and ReiserFS
feature = 2: tells the bdflush daemon to become lazy after spin-down
feature = 4: release subfs/autofs mounts instead of remounting them

-x disk

Exclude; this disk will be not be spun down. See the description of -i. Please note that the use of -x or -i disables the automatic appending of -a to the list of options.

-z flags

(The default for flags is 0) The auto scan will ignore (exclude) certain types of disks. The flags for the -z option can be or-ed and have the following functions:

flags = 1: Ignore IDE disks
flags = 2: Ignore SCSI (and USB or SATA) disks

This can be used to base the exclusion of disks on type rather than having to specify disk names. Another use is to disable the spin down of SCSI drives for older kernels.

DISK SPIN DOWN OPTIONS

The following options can be used prior to -a or -i to override the default values for single disks or groups of disks. The effect of these options is cumulative, -a saves the actual settings to be used fore auto scanned drives (example: -m90 -i /dev/hda -m60 -a):

-l limit: (The default for limit is 20) Limits the spin downs per disk and per day to protect the disk. Desktop drives are designed for about 40.000 spin up cycles. Similar limits apply to Flash memory drives (like cards and USB sticks).
-m max: (The default for max is 60) Maximum commit-to-disk time in minutes. This value is used to compute the time argument for the commit option when remounting journalling file systems. A value of 0 disables the remounting.
-t time: (The default for time is 15) Timeout for spin down in minutes. Whenever an IO to the disk is detected the timeout value gets reset so that a disk cannot spin down before being idle for time minutes. A value of 0 disables the spin down for this disk or group of disks.

DAEMON STATES

It is important to know that disks that are excluded (via -x) and that are present in hardware are always counted as spinning. The same happens if a disk has previously caused an error in some operation (the disk is said to be ignored).

While running the daemon is always in one of the following states:

PAUSED: A suspend command was received, the VM and bdflush configurations were set to default and the daemon will not spin down disks. Anyhow it continues to monitor disks and to collect statistics. When remounting of journalling file systems is enabled all remounts are reverted to a commit time of 30s.
SYSTEM: All disks are spinning and bdflush is in default mode (e.g. it is bdflush that controls the write back of dirty disk buffers). When remounting of journalling file systems is enabled all remounts are reverted to a commit time of 30s. In this state ngflushd is inactive and waits until it can spin down a disk and thereby enters the ACTIVE state.
ACTIVE: At least one disk has been spun down but some disks are still spinning. The VM and bdflush are configured to cache data instead of writing it. When remounting of journalling file systems is enabled all selected partitions are remounted with the configured commit time. The write back of dirty disk buffers is controlled by ngflushd. Every 5 seconds a sync() operation on all mounted non-read-only partitions is issued. The sync() also gets caught by the journalling file systems and makes them flushing their meta data.
IDLE: When all disks are spun down there is no reason to sync() partitions any longer. In this state ngflushd mostly collects statistics and waits for a state change.

HOW TO MAKE IT WORK

Some programs try to do synchronous disk IO to prevent data loss (by using the sync system call or by using a deprecated kernel feature for synchronous IO). If you can please reconfigure them (syslog is a good example for this). A few programs simply behave badly (like kalarmd, samba's nmbd or squid). Experiment from run-level 1 or so (with almost no programs or daemons running) and start ngflushd using -t1 for an extremely short spin down time. Then start one of the test candidates after another to see who makes your disk spin up too often. Consider sending bug-reports. Programs that spin up your disk too often are also likely to kill media like Flash-Cards or USB-Sticks.

Another kind of problem is caused by disk monitoring software. smartmon for example is aware of spun down disks but if you cause the tool to launch a short disk test once per hour it will still spin up the disk too often. Consider running disk tests only once per day and starting them from a cron job.

Example: Samba's nmbd periodically writes /var/lib/samba/wins.dat: This is not very harmful and a good candidate for echoing a value of 1 into /proc/sys/vm/block_dump. The kernel will then send IO trace messages to syslog. Make sure that your system is otherwise idle before doing this. The trace can be disabled again by writing 0.
Example: The Squid Http Proxy does periodic synchronous reads: This is another example for synchronous IO (like O_DIRECT), but here it's used completely wrong. It would spin up a disk. The IO goes to a database like file which cannot easily be "linked" to a RAM disk. A solution for a server that has long up-times is to put squid's database completely on a RAM disk. This is no problem for Linux, the RAM is backed up by the swap file (if you configured it correctly!). But there are two drawbacks: (1) you have to patch /etc/init.d/squid to create a squid directory on the RAM disk and (2) on reboots the cache is lost.

BUGS AND LIMITS

Murphies Law: You will often notice that ngflushd spins down a disk just before your fingers hit the keyboard to launch a program that in turn spins the disk up again.
Do not expect too much: ngflushd works best for a document server with a couple of archive disks that are infrequently used. It is also fine for a workstation with a single disk. It is not recommended for notebooks.
Incompatible hardware: Some hardware (like external notebook disks) do their own power management. In the best case such hardware ignores ngflushd's activities. Consider using the -x option to exclude such devices.
Broken software: Some completely broken or ildesigned programs may try to poll devices. In some early versions the Linux Hardware Abstraction Layer (HAL) did poll all disks every few seconds to see if it should display some "new device found" dialog on the desktop. Other candidates are squid and nmbd, see above.

FILES

/etc/init.d/ngflushd: This init script can be used to launch ngflushd at boot time. On a DEBIAN system use update-rc.d to configure this.
/etc/default/ngflushd: This is used by the DEBIAN init script to configure if a daemon should be started at boot time and if so which parameters it receives.
/var/run/ngflushd.pid /var/run/ngflushd.requ /var/run/ngflushd.done: These files are used for communicating between multiple ngflushd instances (the -c option does this).

COPYRIGHT

This software is published under a BSD style license and has been written for educational purposes only, no warranties! Try it at your own risk.