Weird ZFS misbehaviors

(spoiler: on one of my machines, I run ZFS off a LUKS-encrypted device. I know this is sub-optimal, but at the time it’s working okay for me).

I try to host as much of my core services as it is reasonable to (without it becoming a second part-time job).

Part of this means that I also take care of backup, replication and (hopefully as far as possible in time) recovery.

Today I was notified about one of my machines being rebooted. That’s fine, it happens from time to time.

What doesn’t usually happen from time to time is LUKS bringing the machine down to its knees when opening an encrypted device. That was weird.

After some testing and probing, it seemed that upon opening the LUKS device would also import the underlying zpool, and that would start hit system resources really bad, big time. Memory (a mere 3GB) would run out, and swap too (8GB on ssd, because i prefer the system slowing down more than processes crashing). 11GB of memory is apparently not enough to import this 90+ GB dataset, and cryptsetup would crash. Uh, weird.

If you are in a similar situation, here is how i (apparently) fixed this situation:

First, we have to decouple luks opening the device from zfs importing the zpool.

In order to do so, I found out that there’s an option for the zfs kernel module called zfs_autoimport_disable which when set to 1, disables auto-importing zpool upon kernel module insertion (duh!). Also, in order to (try to) prevent very high memory usage, I set zfs_arc_max to 2GB (you’ll have to compute the value in bytes for that).

All this can be then set into an appropriate modprobe configuration file (I decided to put this into /etc/modprobe.d/zfs.conf):

options zfs zfs_arc_max=2147483648 zfs_autoimport_disable=1

(Brief note: this machine’s processor is old enough not to have AES implemented in hardware.)

So at this point I was able to luksOpen the device and zfs not import the zpool.

I was then able to import the zpool by hand the usual way (zpool import -a).

At this point I decided to run a zpool scrub against the zpool. The zpool was fine, so at this point I am not really sure what happened.

Wither way, that’s it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.