diff mbox

Stuck btrfs-cleaner on 4.7 and 4.6

Message ID npds64$nu3$1@blaine.gmane.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jean-Denis Girard Aug. 22, 2016, 3:39 a.m. UTC
Hi list,

After upgrading my Fedora 23 system from 4.4.12 to 4.7.2, I'm seeing one
btrfs-cleaner process stuck at 100% CPU. The problem disappears when
going back to 4.4 kernel (4.4.17), but is also present with Fedora
kernel 4.6.6-200.fc23.

4.4.12 and 4.4.17 are built from source, with 2 patches (see attached).
4.7.2 is built from source without any patch.

Main Btrfs is RAID1 on 2 disks behind bcache, with 13 sub-volumes, and
less than 300 snapshots (more details below). There are 2 other Btrfs
used for backup, so not mounted when the problem appears.

The btrfs-cleaner jumps at 100% after about ~15 min uptime. I let it run
about ~18 hours, btrfs-cleaner stayed at 100%. Unmounting all the
sub-volumes clears the problem. There is no error in the logs, all the
sub-volumes are mounted ok, I can use the system. I did a scrub and
balance, which finished without any error.

I'm back on 4.4.17 now, but what can I do to debug this problem ?


[jdg@tiare ~]$ sudo btrfs fi sh
Label: none  uuid: c5b8386b-b81d-4473-9340-7b8a74fc3a3c
        Total devices 2 FS bytes used 1.04TiB
        devid    1 size 1.82TiB used 1.08TiB path /dev/bcache0
        devid    2 size 1.82TiB used 1.08TiB path /dev/bcache1

Label: none  uuid: e86cf0f5-ae16-408c-a4f8-19727aa2a3d4
        Total devices 1 FS bytes used 191.20GiB
        devid    1 size 279.46GiB used 240.06GiB path /dev/sdd

Label: none  uuid: d0d09c79-42d7-4958-bccb-480eb27aec38
        Total devices 1 FS bytes used 611.38GiB
        devid    1 size 931.51GiB used 620.07GiB path /dev/sde

[jdg@tiare ~]$ sudo btrfs fi usage /home/jdg/
Overall:
    Device size:                   3.64TiB
    Device allocated:              2.16TiB
    Device unallocated:            1.48TiB
    Device missing:                  0.00B
    Used:                          2.08TiB
    Free (estimated):            798.35GiB      (min: 798.35GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:1.08TiB, Used:1.04TiB
   /dev/bcache0    1.08TiB
   /dev/bcache1    1.08TiB

Metadata,RAID1: Size:4.00GiB, Used:2.74GiB
   /dev/bcache0    4.00GiB
   /dev/bcache1    4.00GiB

System,RAID1: Size:32.00MiB, Used:256.00KiB
   /dev/bcache0   32.00MiB
   /dev/bcache1   32.00MiB

Unallocated:
   /dev/bcache0  757.99GiB
   /dev/bcache1  757.99GiB

[jdg@tiare ~]$ mount -t btrfs
/dev/bcache0 on /var/lib/pgsql type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1131,subvol=/pgsql)
/dev/bcache0 on /home/SysNux type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1062,subvol=/SysNux)
/dev/bcache0 on /home/Vidéos type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1281,subvol=/Vidéos)
/dev/bcache0 on /var/lib/libvirt/images type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1136,subvol=/images-vm)
/dev/bcache0 on /mnt/snapshots type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1292,subvol=/Snapshots)
/dev/bcache0 on /home/Photos type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=676,subvol=/Photos)
/dev/bcache0 on /home/vaiana type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1076,subvol=/vaiana)
/dev/bcache0 on /home/Films type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=258,subvol=/Films)
/dev/bcache0 on /home/Partage type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1059,subvol=/Partage)
/dev/bcache0 on /home/jdg type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1073,subvol=/jdg)
/dev/bcache0 on /home/michael type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1075,subvol=/michael)
/dev/bcache0 on /home/cathy type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=1074,subvol=/cathy)
/dev/bcache0 on /home/Musique type btrfs
(rw,noatime,nodiratime,seclabel,compress=zlib,ssd,space_cache,autodefrag,skip_balance,subvolid=961,subvol=/Musique)



Thanks,

Comments

Jean-Denis Girard Aug. 23, 2016, 5:17 p.m. UTC | #1
Le 21/08/2016 à 17:39, Jean-Denis Girard a écrit :
> Hi list,
> 
> After upgrading my Fedora 23 system from 4.4.12 to 4.7.2, I'm seeing one
> btrfs-cleaner process stuck at 100% CPU. The problem disappears when
> going back to 4.4 kernel (4.4.17), but is also present with Fedora
> kernel 4.6.6-200.fc23.

Just for the archives, I found that the problem was related to
fragmentation. Mounting without autodefrag on 4.7.2 avoided
btrfs-cleaner stuck at 100 % CPU. Then I did manually defragment all
Btrfs volumes, and could then remount with autodefrag. Every thing is ok
after ~12 hours uptime.


Thanks,
diff mbox

Patch

diff -Naur linux-4.4.6.ORIG/fs/btrfs/ctree.c linux-4.4.6/fs/btrfs/ctree.c
--- linux-4.4.6.ORIG/fs/btrfs/ctree.c	2016-01-10 13:01:32.000000000 -1000
+++ linux-4.4.6/fs/btrfs/ctree.c	2016-03-30 06:19:16.397973820 -1000
@@ -20,6 +20,7 @@ 
 #include <linux/slab.h>
 #include <linux/rbtree.h>
 #include "ctree.h"
+#include <linux/vmalloc.h>
 #include "disk-io.h"
 #include "transaction.h"
 #include "print-tree.h"
@@ -5362,10 +5363,13 @@ 
 		goto out;
 	}
 
-	tmp_buf = kmalloc(left_root->nodesize, GFP_NOFS);
+	tmp_buf = kmalloc(left_root->nodesize, GFP_KERNEL | __GFP_NOWARN);
 	if (!tmp_buf) {
-		ret = -ENOMEM;
-		goto out;
+      tmp_buf = vmalloc(left_root->nodesize);
+      if (!tmp_buf) {
+		   ret = -ENOMEM;
+   		goto out;
+      }
 	}
 
 	left_path->search_commit_root = 1;
@@ -5566,7 +5570,7 @@ 
 out:
 	btrfs_free_path(left_path);
 	btrfs_free_path(right_path);
-	kfree(tmp_buf);
+	kvfree(tmp_buf);
 	return ret;
 }