diff mbox

Notify caching_thread()s to give up on extent_commit_sem when needed.

Message ID CAOcd+r3zDtf3uNs5YOk=9LhGannN2WK-CpF0PQMdrnu58phBiw@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Lyakas Aug. 29, 2013, 10:31 a.m. UTC
caching_thread()s do all their work under read access to extent_commit_sem.
They give up on this read access only when need_resched() tells them, or
when they exit. As a result, somebody that wants a WRITE access to this sem,
might wait for a long time. Especially this is problematic in
cache_block_group(),
which can be called on critical paths like find_free_extent() and in commit
path via commit_cowonly_roots().

This patch is an RFC, that attempts to fix this problem, by notifying the
caching threads to give up on extent_commit_sem.

On a system with a lot of metadata (~20Gb total metadata, ~10Gb extent tree),
with increased number of caching_threads, commits were very slow,
stuck in commit_cowonly_roots, due to this issue.
With this patch, commits no longer get stuck in commit_cowonly_roots.

This patch is not indented to be applied, just a request to comment on whether
you agree this problem happens, and whether the fix goes in the right direction.

Signed-off-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
---
 fs/btrfs/ctree.h       |    7 +++++++
 fs/btrfs/disk-io.c     |    1 +
 fs/btrfs/extent-tree.c |    9 +++++----
 fs/btrfs/transaction.c |    2 +-
 4 files changed, 14 insertions(+), 5 deletions(-)

Comments

Josef Bacik Aug. 29, 2013, 2:38 p.m. UTC | #1
On Thu, Aug 29, 2013 at 01:31:05PM +0300, Alex Lyakas wrote:
> caching_thread()s do all their work under read access to extent_commit_sem.
> They give up on this read access only when need_resched() tells them, or
> when they exit. As a result, somebody that wants a WRITE access to this sem,
> might wait for a long time. Especially this is problematic in
> cache_block_group(),
> which can be called on critical paths like find_free_extent() and in commit
> path via commit_cowonly_roots().
> 
> This patch is an RFC, that attempts to fix this problem, by notifying the
> caching threads to give up on extent_commit_sem.
> 
> On a system with a lot of metadata (~20Gb total metadata, ~10Gb extent tree),
> with increased number of caching_threads, commits were very slow,
> stuck in commit_cowonly_roots, due to this issue.
> With this patch, commits no longer get stuck in commit_cowonly_roots.
> 

But what kind of effect do you see on overall performance/runtime?  Honestly I'd
expect we'd spend more of our time waiting for the caching kthread to fill in
free space so we can make allocations than waiting on this lock contention.  I'd
like to see real numbers here to see what kind of effect this patch has on your
workload.  (I don't doubt it makes a difference, I'm just curious to see how big
of a difference it makes.)

> This patch is not indented to be applied, just a request to comment on whether
> you agree this problem happens, and whether the fix goes in the right direction.
> 

So I think we should do 2 things here

1) Make a spin_lock for the caching ctl list.  This is independant of the
purpose of the extent_commit_sem, so we should lock it independantly.

2) Your idea for triggering the caching kthreads to stop what they are doing is
good, but it seems like a waste of effort when we could easily check the
semaphore to see if anybody is waiting on this lock.  So I'm going to rig up a
function in the rwsem library to do this for us, and that way we can do
something like

if (need_resched() || rwsem_is_contended(extent_commit_sem)) {
	drop and resched();
}

and that way we only have to add yet another spin lock to the fs_info for the
caching ctl list and we can avoid this issue.  How does that sound to you?
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Lyakas Aug. 29, 2013, 7:09 p.m. UTC | #2
Hi Josef,

On Thu, Aug 29, 2013 at 5:38 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> On Thu, Aug 29, 2013 at 01:31:05PM +0300, Alex Lyakas wrote:
>> caching_thread()s do all their work under read access to extent_commit_sem.
>> They give up on this read access only when need_resched() tells them, or
>> when they exit. As a result, somebody that wants a WRITE access to this sem,
>> might wait for a long time. Especially this is problematic in
>> cache_block_group(),
>> which can be called on critical paths like find_free_extent() and in commit
>> path via commit_cowonly_roots().
>>
>> This patch is an RFC, that attempts to fix this problem, by notifying the
>> caching threads to give up on extent_commit_sem.
>>
>> On a system with a lot of metadata (~20Gb total metadata, ~10Gb extent tree),
>> with increased number of caching_threads, commits were very slow,
>> stuck in commit_cowonly_roots, due to this issue.
>> With this patch, commits no longer get stuck in commit_cowonly_roots.
>>
>
> But what kind of effect do you see on overall performance/runtime?  Honestly I'd
> expect we'd spend more of our time waiting for the caching kthread to fill in
> free space so we can make allocations than waiting on this lock contention.  I'd
> like to see real numbers here to see what kind of effect this patch has on your
> workload.  (I don't doubt it makes a difference, I'm just curious to see how big
> of a difference it makes.)

Primarily for me it affects the commit thread right after mounting,
when it spends time in the "critical part" of the commit, in which
trans_no_join is set, i.e., it is not possible to start a new
transaction. So all the new writers that want a transaction are
delayed at this point.

Here are some numbers (and some more logs are in the attached file).

Filesystem has a good amount of metadata (btrfs-progs modified
slightly to print exact byte values):
root@dc:/home/zadara# btrfs fi df /btrfs/pool-00000002/
Data: total=846116945920(788.01GB), used=842106667008(784.27GB)
System: total=4194304(4.00MB), used=94208(92.00KB)
Metadata: total=31146901504(29.01GB), used=25248698368(23.51GB)

original code, 2 caching workers, try 1
Aug 29 13:41:22 dc kernel: [28381.203745] [17617][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6627] COMMIT
extwr:0 wr:1
Aug 29 13:41:25 dc kernel: [28384.624838] [17617][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6627] COMMIT took
3421 ms committers=1 open=0ms blocked=3188ms
Aug 29 13:41:25 dc kernel: [28384.624846] [17617][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6627] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 13:41:25 dc kernel: [28384.624850] [17617][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6627] wc:0 wpc:0
wew:0 fps:0
Aug 29 13:41:25 dc kernel: [28384.624854] [17617][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6627] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 13:41:25 dc kernel: [28384.624858] [17617][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6627] cfr:0
ccr:2088 pec:1099
Aug 29 13:41:25 dc kernel: [28384.624862] [17617][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6627] wrw:230 wrs:1

I have a breakdown of commit times here, to identify bottlenecks of
the commit. Times are in ms.
Names of phases are:

roo - btrfs_run_ordered_operations
rdr1 - btrfs_run_delayed_refs (call 1)
cbg - btrfs_create_pending_block_groups
rdr2 - btrfs_run_delayed_refs (call 2)
wc - wait_for_commit (if was needed)
wpc - wair for previous commit (if was needed)
wew - wait for "external writers to detach"
fps - flush_all_pending_stuffs
ww - wait for all the other writers to detach
cs - create_pending_snapshots
rdi - btrfs_run_delayed_items
rdr3 - btrfs_run_delayed_refs (call 3)
cfr - commit_fs_roots
ccr - commit_cowonly_roots
pec - btrfs_prepare_extent_commit
wrw - btrfs_write_and_wait_transaction
wrs - write_ctree_super

Two lines marked as "-" are the "critical part" of the commit.


original code, 2 caching workers, try 2
Aug 29 13:43:30 dc kernel: [28508.683625] [22490][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6630] COMMIT
extwr:0 wr:1
Aug 29 13:43:31 dc kernel: [28510.569269] [22490][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6630] COMMIT took
1885 ms committers=1 open=0ms blocked=1550ms
Aug 29 13:43:31 dc kernel: [28510.569276] [22490][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6630] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 13:43:31 dc kernel: [28510.569281] [22490][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6630] wc:0 wpc:0
wew:0 fps:0
Aug 29 13:43:31 dc kernel: [28510.569285] [22490][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6630] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 13:43:31 dc kernel: [28510.569288] [22490][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6630] cfr:0
ccr:1550 pec:0
Aug 29 13:43:31 dc kernel: [28510.569292] [22490][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6630] wrw:333 wrs:1

So you see that 1-2 secs are spent in "commit cowonly roots". Now the
patched code, and here, I admit, difference is not so dramatic:

patched code, 2 caching workers, try 1
Aug 29 14:08:19 dc kernel: [29997.819307] [24783][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6642] COMMIT
extwr:0 wr:1
Aug 29 14:08:20 dc kernel: [29998.800342] [24783][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6642] COMMIT took
981 ms committers=1 open=0ms blocked=881ms
Aug 29 14:08:20 dc kernel: [29998.800350] [24783][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6642] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 14:08:20 dc kernel: [29998.800354] [24783][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6642] wc:0 wpc:0
wew:0 fps:0
Aug 29 14:08:20 dc kernel: [29998.800358] [24783][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6642] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 14:08:20 dc kernel: [29998.800362] [24783][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6642] cfr:0 ccr:880
pec:1
Aug 29 14:08:20 dc kernel: [29998.800365] [24783][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6642] wrw:98 wrs:1

patched code, 2 caching workers, try 2
Aug 29 14:09:18 dc kernel: [30057.375432] [24781][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6645] COMMIT
extwr:0 wr:1
Aug 29 14:09:19 dc kernel: [30058.079811] [24781][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6645] COMMIT took
704 ms committers=1 open=0ms blocked=643ms
Aug 29 14:09:19 dc kernel: [30058.079820] [24781][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6645] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 14:09:19 dc kernel: [30058.079824] [24781][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6645] wc:0 wpc:0
wew:0 fps:0
Aug 29 14:09:19 dc kernel: [30058.079828] [24781][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6645] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 14:09:19 dc kernel: [30058.079832] [24781][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6645] cfr:0 ccr:642
pec:1
Aug 29 14:09:19 dc kernel: [30058.079836] [24781][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6645] wrw:59 wrs:0

but still there is some improvement of commit time.

Now I changed the number of caching workers to 32, to improve the time
to load the metadata, otherwise it takes a lot of time for the FS to
become responsive. Also I modified the code to start caching workers
like this:
btrfs_init_workers(&fs_info->caching_workers, "cache",
               32, NULL/*async_helper*/);
/* use low thresh to quickly spawn needed new threads */
fs_info->caching_workers.idle_thresh = 2;

As I explained in my other email named "btrfs:async-thread:
atomic_start_pending=1 is set, but it's too late", even with two
caching threads, only one does all the job. So I don't pass the async
helper (don't know if that's a correct thing to do).

So:

original code, 32 caching workers, try 1
Aug 29 13:53:56 dc kernel: [29135.456301] [539][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6636] COMMIT
extwr:0 wr:1
Aug 29 13:54:56 dc kernel: [29195.561173] [539][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6636] COMMIT took
60104 ms committers=1 open=0ms blocked=60049ms
Aug 29 13:54:56 dc kernel: [29195.561187] [539][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6636] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 13:54:56 dc kernel: [29195.561201] [539][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6636] wc:0 wpc:0
wew:0 fps:0
Aug 29 13:54:56 dc kernel: [29195.561216] [539][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6636] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 13:54:56 dc kernel: [29195.561220] [539][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6636] cfr:0
ccr:59163 pec:885
Aug 29 13:54:56 dc kernel: [29195.561224] [539][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6636] wrw:54 wrs:1

60 seconds to commit, out of which 59 are in commit_cowonly_roots.

original code, 32 caching workers, try 2
Aug 29 13:56:54 dc kernel: [29312.747760] [6121][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6639] COMMIT
extwr:0 wr:1
Aug 29 13:58:15 dc kernel: [29394.289640] [6121][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6639] COMMIT took
81541 ms committers=1 open=0ms blocked=81396ms
Aug 29 13:58:15 dc kernel: [29394.289649] [6121][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6639] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 13:58:15 dc kernel: [29394.289688] [6121][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6639] wc:0 wpc:0
wew:0 fps:0
Aug 29 13:58:15 dc kernel: [29394.289694] [6121][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6639] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 13:58:15 dc kernel: [29394.289700] [6121][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6639] cfr:0
ccr:80309 pec:1086
Aug 29 13:58:15 dc kernel: [29394.289705] [6121][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6639] wrw:140 wrs:5

81 seconds to commit, out of which 80 seconds in commt_cowonly_roots!

Now the patched code with 32 threads:

patched code, 32 caching workers - try 1
Aug 29 14:12:29 dc kernel: [30248.074275] [1696][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6648] COMMIT
extwr:0 wr:1
Aug 29 14:12:31 dc kernel: [30249.974844] [1696][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6648] COMMIT took
1900 ms committers=1 open=0ms blocked=1725ms
Aug 29 14:12:31 dc kernel: [30249.974851] [1696][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6648] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 14:12:31 dc kernel: [30249.974855] [1696][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6648] wc:0 wpc:0
wew:0 fps:0
Aug 29 14:12:31 dc kernel: [30249.974859] [1696][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6648] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 14:12:31 dc kernel: [30249.974863] [1696][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6648] cfr:0
ccr:1720 pec:5
Aug 29 14:12:31 dc kernel: [30249.974867] [1696][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6648] wrw:169 wrs:4

patched code, 32 caching workers, try 2
Aug 29 14:13:35 dc kernel: [30314.378026] [1698][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6651] COMMIT took
1117 ms committers=1 open=0ms blocked=999ms
Aug 29 14:13:35 dc kernel: [30314.378033] [1698][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6651] roo:0 rdr1:0
cbg:0 rdr2:0
Aug 29 14:13:35 dc kernel: [30314.378041] [1698][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6651] wc:0 wpc:0
wew:0 fps:0
Aug 29 14:13:35 dc kernel: [30314.378132] [1698][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6651] ww:0 cs:0
rdi:0 rdr3:0
Aug 29 14:13:35 dc kernel: [30314.378136] [1698][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6651] cfr:0 ccr:994
pec:4
Aug 29 14:13:35 dc kernel: [30314.378140] [1698][tx]btrfs
[ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6651] wrw:115 wrs:2

much better.
(And I promise these prints are real).
In the attached log, there are some prints that let you see how much
time it takes to queue a caching request with the old-vs-new code.

>
>> This patch is not indented to be applied, just a request to comment on whether
>> you agree this problem happens, and whether the fix goes in the right direction.
>>
>
> So I think we should do 2 things here
>
> 1) Make a spin_lock for the caching ctl list.  This is independant of the
> purpose of the extent_commit_sem, so we should lock it independantly.
Yes, I also thought that the caching ctl list should be protected by
some other quicker lock,
but in the code it is always accessed under this rwsem.

>
> 2) Your idea for triggering the caching kthreads to stop what they are doing is
> good, but it seems like a waste of effort when we could easily check the
> semaphore to see if anybody is waiting on this lock.  So I'm going to rig up a
> function in the rwsem library to do this for us, and that way we can do
> something like
>
> if (need_resched() || rwsem_is_contended(extent_commit_sem)) {
>         drop and resched();
> }
>
Perfect! Will it be accepted to mainline just like that?:) Cool!

> and that way we only have to add yet another spin lock to the fs_info for the
> caching ctl list and we can avoid this issue.  How does that sound to you?
Awesome!

Thanks,
Alex.


> Thanks,
>
> Josef
Filesystem has a good amount of metadata (btrfs-progs modified slightly to print exact byte values):
root@dc:/home/zadara# btrfs fi df /btrfs/pool-00000002/
Data: total=846116945920(788.01GB), used=842106667008(784.27GB)
System: total=4194304(4.00MB), used=94208(92.00KB)
Metadata: total=31146901504(29.01GB), used=25248698368(23.51GB)

root@dc:/home/zadara# btrfs su list /btrfs/pool-00000002/ | wc -l
595

original code, 2 caching workers, try 1
Aug 29 13:41:19 dc kernel: [28378.306952] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6626 /dev/dm-119
Aug 29 13:41:19 dc kernel: [28378.312494] btrfs: setting nodatasum
Aug 29 13:41:19 dc kernel: [28378.312498] btrfs: setting nodatacow, compression disabled
Aug 29 13:41:19 dc kernel: [28378.312501] btrfs: disabling tree log
Aug 29 13:41:19 dc kernel: [28378.312505] btrfs: disabling disk space caching
Aug 29 13:41:19 dc kernel: [28378.312508] btrfs: force clearing of disk cache
Aug 29 13:41:22 dc kernel: [28381.203745] [17617][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6627] COMMIT extwr:0 wr:1
Aug 29 13:41:22 dc kernel: [28381.203772] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 13:41:22 dc kernel: [28381.203781] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203785] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203790] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203795] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203800] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203804] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203808] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203812] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203816] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203821] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203826] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203831] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203837] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203842] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203846] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203851] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203855] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203859] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203863] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203867] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203872] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203877] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203881] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203886] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203918] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203924] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203928] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203932] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203936] [17617][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 13:41:22 dc kernel: [28381.203965] [20494][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 13:41:22 dc kernel: [28381.250769] [20494][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=46ms ret=0 [FINISHED]
Aug 29 13:41:22 dc kernel: [28381.250784] [20494][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 13:41:25 dc kernel: [28384.624838] [17617][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6627] COMMIT took 3421 ms committers=1 open=0ms blocked=3188ms
Aug 29 13:41:25 dc kernel: [28384.624846] [17617][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6627] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 13:41:25 dc kernel: [28384.624850] [17617][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6627] wc:0 wpc:0 wew:0 fps:0
Aug 29 13:41:25 dc kernel: [28384.624854] [17617][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6627] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 13:41:25 dc kernel: [28384.624858] [17617][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6627] cfr:0 ccr:2088 pec:1099
Aug 29 13:41:25 dc kernel: [28384.624862] [17617][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6627] wrw:230 wrs:1
...

original code, 2 caching workers, try 2
Aug 29 13:43:27 dc kernel: [28505.698851] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6629 /dev/dm-119
Aug 29 13:43:27 dc kernel: [28505.701433] btrfs: setting nodatasum
Aug 29 13:43:27 dc kernel: [28505.701437] btrfs: setting nodatacow, compression disabled
Aug 29 13:43:27 dc kernel: [28505.701440] btrfs: disabling tree log
Aug 29 13:43:27 dc kernel: [28505.701444] btrfs: disabling disk space caching
Aug 29 13:43:27 dc kernel: [28505.701448] btrfs: force clearing of disk cache
Aug 29 13:43:30 dc kernel: [28508.683625] [22490][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6630] COMMIT extwr:0 wr:1
Aug 29 13:43:30 dc kernel: [28508.683649] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 13:43:30 dc kernel: [28508.683679] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683684] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683689] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683693] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683697] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683702] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683706] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683710] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683714] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683719] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683724] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 13:43:30 dc kernel: [28508.683730] [25384][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 13:43:30 dc kernel: [28508.724775] [25384][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=41ms ret=0 [FINISHED]
Aug 29 13:43:30 dc kernel: [28508.724783] [25384][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 13:43:30 dc kernel: [28508.724792] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404090] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404106] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404114] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404120] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404127] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404133] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404139] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404144] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404149] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404154] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404159] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404164] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404169] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404174] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404178] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404182] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 13:43:30 dc kernel: [28509.404187] [22490][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 13:43:31 dc kernel: [28510.569269] [22490][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6630] COMMIT took 1885 ms committers=1 open=0ms blocked=1550ms
Aug 29 13:43:31 dc kernel: [28510.569276] [22490][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6630] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 13:43:31 dc kernel: [28510.569281] [22490][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6630] wc:0 wpc:0 wew:0 fps:0
Aug 29 13:43:31 dc kernel: [28510.569285] [22490][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6630] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 13:43:31 dc kernel: [28510.569288] [22490][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6630] cfr:0 ccr:1550 pec:0
Aug 29 13:43:31 dc kernel: [28510.569292] [22490][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6630] wrw:333 wrs:1

patched code, 2 caching workers, try 1
Aug 29 14:08:16 dc kernel: [29995.451925] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6641 /dev/dm-119
Aug 29 14:08:16 dc kernel: [29995.454361] btrfs: setting nodatasum
Aug 29 14:08:16 dc kernel: [29995.454366] btrfs: setting nodatacow, compression disabled
Aug 29 14:08:16 dc kernel: [29995.454369] btrfs: disabling tree log
Aug 29 14:08:16 dc kernel: [29995.454373] btrfs: disabling disk space caching
Aug 29 14:08:16 dc kernel: [29995.454377] btrfs: force clearing of disk cache
Aug 29 14:08:19 dc kernel: [29997.819307] [24783][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6642] COMMIT extwr:0 wr:1
Aug 29 14:08:19 dc kernel: [29997.819333] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 14:08:19 dc kernel: [29997.819361] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 14:08:19 dc kernel: [29997.819366] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 14:08:19 dc kernel: [29997.819370] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 14:08:19 dc kernel: [29997.819370] [27688][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 14:08:19 dc kernel: [29997.819454] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 14:08:19 dc kernel: [29997.820866] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 14:08:19 dc kernel: [29997.822165] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 14:08:19 dc kernel: [29997.823103] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 14:08:19 dc kernel: [29997.824503] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 14:08:19 dc kernel: [29997.825591] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 14:08:19 dc kernel: [29997.826557] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 14:08:19 dc kernel: [29997.827628] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 14:08:19 dc kernel: [29997.828748] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 14:08:19 dc kernel: [29997.829828] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 14:08:19 dc kernel: [29997.830804] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 14:08:19 dc kernel: [29997.831119] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 14:08:19 dc kernel: [29997.831245] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 14:08:19 dc kernel: [29997.832419] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 14:08:19 dc kernel: [29997.833480] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 14:08:19 dc kernel: [29997.835320] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 14:08:19 dc kernel: [29997.835459] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 14:08:19 dc kernel: [29997.836348] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 14:08:19 dc kernel: [29997.837173] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 14:08:19 dc kernel: [29997.837895] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 14:08:19 dc kernel: [29997.839047] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 14:08:19 dc kernel: [29997.841042] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 14:08:19 dc kernel: [29997.841932] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 14:08:19 dc kernel: [29997.842753] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 14:08:19 dc kernel: [29997.845764] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 14:08:19 dc kernel: [29997.845837] [24783][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 14:08:19 dc kernel: [29997.864361] [27688][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=44ms ret=0 [FINISHED]
Aug 29 14:08:19 dc kernel: [29997.864385] [27688][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 14:08:20 dc kernel: [29998.800342] [24783][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6642] COMMIT took 981 ms committers=1 open=0ms blocked=881ms
Aug 29 14:08:20 dc kernel: [29998.800350] [24783][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6642] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 14:08:20 dc kernel: [29998.800354] [24783][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6642] wc:0 wpc:0 wew:0 fps:0
Aug 29 14:08:20 dc kernel: [29998.800358] [24783][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6642] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 14:08:20 dc kernel: [29998.800362] [24783][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6642] cfr:0 ccr:880 pec:1
Aug 29 14:08:20 dc kernel: [29998.800365] [24783][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6642] wrw:98 wrs:1

patched code, 2 caching workers, try 2
Aug 29 14:09:15 dc kernel: [30054.620699] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6644 /dev/dm-119
Aug 29 14:09:15 dc kernel: [30054.622582] btrfs: setting nodatasum
Aug 29 14:09:15 dc kernel: [30054.622587] btrfs: setting nodatacow, compression disabled
Aug 29 14:09:15 dc kernel: [30054.622592] btrfs: disabling tree log
Aug 29 14:09:15 dc kernel: [30054.622597] btrfs: disabling disk space caching
Aug 29 14:09:15 dc kernel: [30054.622601] btrfs: force clearing of disk cache
Aug 29 14:09:18 dc kernel: [30057.375432] [24781][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6645] COMMIT extwr:0 wr:1
Aug 29 14:09:18 dc kernel: [30057.375452] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 14:09:18 dc kernel: [30057.375476] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 14:09:18 dc kernel: [30057.375482] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 14:09:18 dc kernel: [30057.375486] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 14:09:18 dc kernel: [30057.375490] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 14:09:18 dc kernel: [30057.375494] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 14:09:18 dc kernel: [30057.375497] [32221][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 14:09:18 dc kernel: [30057.375570] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 14:09:18 dc kernel: [30057.376902] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 14:09:18 dc kernel: [30057.378739] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 14:09:18 dc kernel: [30057.383334] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 14:09:18 dc kernel: [30057.384658] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 14:09:18 dc kernel: [30057.385579] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 14:09:18 dc kernel: [30057.386620] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 14:09:18 dc kernel: [30057.387719] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 14:09:18 dc kernel: [30057.388681] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 14:09:18 dc kernel: [30057.390975] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 14:09:18 dc kernel: [30057.392150] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 14:09:18 dc kernel: [30057.392322] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 14:09:18 dc kernel: [30057.392397] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 14:09:18 dc kernel: [30057.393550] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 14:09:18 dc kernel: [30057.394217] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 14:09:18 dc kernel: [30057.395318] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 14:09:18 dc kernel: [30057.395434] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 14:09:18 dc kernel: [30057.397230] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 14:09:18 dc kernel: [30057.397933] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 14:09:18 dc kernel: [30057.398883] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 14:09:18 dc kernel: [30057.400445] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 14:09:18 dc kernel: [30057.401397] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 14:09:18 dc kernel: [30057.405748] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 14:09:18 dc kernel: [30057.406566] [24781][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 14:09:18 dc kernel: [30057.430982] [32221][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=55ms ret=0 [FINISHED]
Aug 29 14:09:18 dc kernel: [30057.431007] [32221][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 14:09:19 dc kernel: [30058.079811] [24781][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6645] COMMIT took 704 ms committers=1 open=0ms blocked=643ms
Aug 29 14:09:19 dc kernel: [30058.079820] [24781][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6645] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 14:09:19 dc kernel: [30058.079824] [24781][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6645] wc:0 wpc:0 wew:0 fps:0
Aug 29 14:09:19 dc kernel: [30058.079828] [24781][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6645] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 14:09:19 dc kernel: [30058.079832] [24781][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6645] cfr:0 ccr:642 pec:1
Aug 29 14:09:19 dc kernel: [30058.079836] [24781][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6645] wrw:59 wrs:0

original code, 32 caching workers, try 1
Aug 29 13:53:54 dc kernel: [29132.695525] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6635 /dev/dm-119
Aug 29 13:53:54 dc kernel: [29132.701816] btrfs: setting nodatasum
Aug 29 13:53:54 dc kernel: [29132.701821] btrfs: setting nodatacow, compression disabled
Aug 29 13:53:54 dc kernel: [29132.701824] btrfs: disabling tree log
Aug 29 13:53:54 dc kernel: [29132.701828] btrfs: disabling disk space caching
Aug 29 13:53:54 dc kernel: [29132.701831] btrfs: force clearing of disk cache
Aug 29 13:53:56 dc kernel: [29135.456301] [539][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6636] COMMIT extwr:0 wr:1
Aug 29 13:53:56 dc kernel: [29135.456323] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 13:53:56 dc kernel: [29135.456332] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 13:53:56 dc kernel: [29135.456336] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 13:53:56 dc kernel: [29135.456367] [3456][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 13:53:56 dc kernel: [29135.456544] [3516][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:111690121216:1073741824]
Aug 29 13:53:56 dc kernel: [29135.466573] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 13:53:56 dc kernel: [29135.499915] [3456][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=43ms ret=0 [FINISHED]
Aug 29 13:53:56 dc kernel: [29135.499922] [3456][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 13:54:01 dc kernel: [29139.812183] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 13:54:01 dc kernel: [29139.812471] [3570][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:212621852672:1073741824]
Aug 29 13:54:02 dc kernel: [29140.828108] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 13:54:03 dc kernel: [29141.956198] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 13:54:03 dc kernel: [29141.960108] [3593][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:286710038528:1073741824]
Aug 29 13:54:03 dc kernel: [29142.470157] [3516][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:111690121216:1073741824] took=7013ms ret=0 [FINISHED]
Aug 29 13:54:03 dc kernel: [29142.470169] [3516][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:166450954240:1073741824]
Aug 29 13:54:03 dc kernel: [29142.470304] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 13:54:04 dc kernel: [29142.956215] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 13:54:04 dc kernel: [29142.956324] [3601][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:357576998912:1073741824]
Aug 29 13:54:05 dc kernel: [29144.512331] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 13:54:07 dc kernel: [29145.906759] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 13:54:07 dc kernel: [29145.907408] [3678][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:419854024704:1073741824]
Aug 29 13:54:08 dc kernel: [29146.724138] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 13:54:12 dc kernel: [29151.298360] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 13:54:12 dc kernel: [29151.298710] [3782][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:468172406784:1073741824]
Aug 29 13:54:16 dc kernel: [29155.292258] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 13:54:16 dc kernel: [29155.292506] [3456][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:3242196992:1073741824] took=19792ms ret=0 [FINISHED]
Aug 29 13:54:17 dc kernel: [29156.344821] [3516][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:166450954240:1073741824] took=13874ms ret=0 [FINISHED]
Aug 29 13:54:17 dc kernel: [29156.344985] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 13:54:17 dc kernel: [29156.345221] [3516][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:524006981632:1073741824]
Aug 29 13:54:19 dc kernel: [29157.856139] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 13:54:19 dc kernel: [29158.360207] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 13:54:19 dc kernel: [29158.360330] [3456][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:566956654592:1073741824]
Aug 29 13:54:21 dc kernel: [29160.535794] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 13:54:22 dc kernel: [29161.316847] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 13:54:22 dc kernel: [29161.316971] [3911][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:606685102080:1073741824]
Aug 29 13:54:24 dc kernel: [29163.351650] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 13:54:24 dc kernel: [29163.351967] [3570][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:212621852672:1073741824] took=23539ms ret=0 [FINISHED]
Aug 29 13:54:24 dc kernel: [29163.351976] [3570][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:250202816512:1073741824]
Aug 29 13:54:27 dc kernel: [29165.724212] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 13:54:27 dc kernel: [29165.724365] [3966][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:668962127872:1073741824]
Aug 29 13:54:28 dc kernel: [29166.884347] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 13:54:29 dc kernel: [29168.388156] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 13:54:29 dc kernel: [29168.388415] [3971][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:730165411840:1073741824]
Aug 29 13:54:30 dc kernel: [29169.564154] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 13:54:37 dc kernel: [29176.280377] [3966][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:668962127872:1073741824] took=10556ms ret=0 [FINISHED]
Aug 29 13:54:37 dc kernel: [29176.280386] [3966][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:704395608064:1073741824]
Aug 29 13:54:37 dc kernel: [29176.280489] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 13:54:37 dc kernel: [29176.280728] [4079][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:785999986688:1073741824]
Aug 29 13:54:40 dc kernel: [29178.932405] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 13:54:47 dc kernel: [29186.232296] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 13:54:47 dc kernel: [29186.232434] [4207][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:828949659648:1073741824]
Aug 29 13:54:47 dc kernel: [29186.232610] [3966][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:704395608064:1073741824] took=9952ms ret=0 [FINISHED]
Aug 29 13:54:48 dc kernel: [29187.516412] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 13:54:48 dc kernel: [29187.516619] [3966][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:864383139840:1073741824]
Aug 29 13:54:51 dc kernel: [29189.740218] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 13:54:52 dc kernel: [29191.562619] [539][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 13:54:52 dc kernel: [29191.667871] [3516][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:524006981632:1073741824] took=35322ms ret=0 [FINISHED]
Aug 29 13:54:52 dc kernel: [29191.667892] [3516][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:544408076288:1073741824]
Aug 29 13:54:56 dc kernel: [29195.505451] [3971][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:730165411840:1073741824] took=27117ms ret=0 [FINISHED]
Aug 29 13:54:56 dc kernel: [29195.505460] [3971][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:769893859328:1073741824]
Aug 29 13:54:56 dc kernel: [29195.561173] [539][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6636] COMMIT took 60104 ms committers=1 open=0ms blocked=60049ms
Aug 29 13:54:56 dc kernel: [29195.561187] [539][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6636] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 13:54:56 dc kernel: [29195.561201] [539][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6636] wc:0 wpc:0 wew:0 fps:0
Aug 29 13:54:56 dc kernel: [29195.561216] [539][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6636] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 13:54:56 dc kernel: [29195.561220] [539][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6636] cfr:0 ccr:59163 pec:885
Aug 29 13:54:56 dc kernel: [29195.561224] [539][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6636] wrw:54 wrs:1

original code, 32 caching workers, try 2
Aug 29 13:56:51 dc kernel: [29310.342690] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6638 /dev/dm-119
Aug 29 13:56:51 dc kernel: [29310.345344] btrfs: setting nodatasum
Aug 29 13:56:51 dc kernel: [29310.345348] btrfs: setting nodatacow, compression disabled
Aug 29 13:56:51 dc kernel: [29310.345351] btrfs: disabling tree log
Aug 29 13:56:51 dc kernel: [29310.345355] btrfs: disabling disk space caching
Aug 29 13:56:51 dc kernel: [29310.345358] btrfs: force clearing of disk cache
Aug 29 13:56:54 dc kernel: [29312.747760] [6121][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6639] COMMIT extwr:0 wr:1
Aug 29 13:56:54 dc kernel: [29312.747782] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 13:56:54 dc kernel: [29312.747813] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 13:56:54 dc kernel: [29312.747818] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 13:56:54 dc kernel: [29312.747934] [8990][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 13:56:54 dc kernel: [29312.748114] [9036][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:111690121216:1073741824]
Aug 29 13:56:54 dc kernel: [29312.791701] [8990][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=43ms ret=0 [FINISHED]
Aug 29 13:56:54 dc kernel: [29312.791708] [8990][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 13:56:54 dc kernel: [29312.791717] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 13:56:54 dc kernel: [29313.184165] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 13:56:54 dc kernel: [29313.184426] [9050][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:212621852672:1073741824]
Aug 29 13:56:55 dc kernel: [29313.803131] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 13:56:55 dc kernel: [29313.836602] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 13:56:55 dc kernel: [29313.836788] [9051][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:286710038528:1073741824]
Aug 29 13:56:55 dc kernel: [29313.940121] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 13:56:55 dc kernel: [29314.228769] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 13:56:55 dc kernel: [29314.229007] [9052][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:357576998912:1073741824]
Aug 29 13:56:56 dc kernel: [29314.728170] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 13:56:56 dc kernel: [29314.940170] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 13:56:56 dc kernel: [29314.940500] [9053][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:419854024704:1073741824]
Aug 29 13:57:00 dc kernel: [29319.180235] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 13:57:01 dc kernel: [29319.803697] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 13:57:01 dc kernel: [29319.803975] [9036][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:111690121216:1073741824] took=7055ms ret=0 [FINISHED]
Aug 29 13:57:01 dc kernel: [29319.803982] [9036][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:166450954240:1073741824]
Aug 29 13:57:01 dc kernel: [29319.804205] [9124][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:468172406784:1073741824]
Aug 29 13:57:06 dc kernel: [29325.540181] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 13:57:08 dc kernel: [29327.304177] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 13:57:08 dc kernel: [29327.304363] [9259][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:524006981632:1073741824]
Aug 29 13:57:09 dc kernel: [29328.084699] [9050][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:212621852672:1073741824] took=14900ms ret=0 [FINISHED]
Aug 29 13:57:09 dc kernel: [29328.084711] [9050][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:250202816512:1073741824]
Aug 29 13:57:13 dc kernel: [29332.341449] [9036][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:166450954240:1073741824] took=12537ms ret=0 [FINISHED]
Aug 29 13:57:13 dc kernel: [29332.341531] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 13:57:13 dc kernel: [29332.341775] [9036][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:544408076288:1073741824]
Aug 29 13:57:14 dc kernel: [29333.468251] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 13:57:16 dc kernel: [29334.989144] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 13:57:19 dc kernel: [29338.600130] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 13:57:19 dc kernel: [29338.603976] [9409][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:606685102080:1073741824]
Aug 29 13:57:21 dc kernel: [29340.637329] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 13:57:23 dc kernel: [29341.944259] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 13:57:23 dc kernel: [29341.944657] [9464][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:668962127872:1073741824]
Aug 29 13:57:29 dc kernel: [29348.036155] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 13:57:32 dc kernel: [29350.832273] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 13:57:32 dc kernel: [29350.832426] [9532][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:730165411840:1073741824]
Aug 29 13:57:32 dc kernel: [29351.476475] [9409][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:606685102080:1073741824] took=12872ms ret=0 [FINISHED]
Aug 29 13:57:32 dc kernel: [29351.476484] [9409][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:624938713088:1073741824]
Aug 29 13:57:37 dc kernel: [29356.167563] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 13:57:37 dc kernel: [29356.167808] [9464][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:668962127872:1073741824] took=14223ms ret=0 [FINISHED]
Aug 29 13:57:37 dc kernel: [29356.167815] [9464][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:704395608064:1073741824]
Aug 29 13:57:44 dc kernel: [29363.390949] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 13:57:44 dc kernel: [29363.391364] [9700][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:785999986688:1073741824]
Aug 29 13:57:44 dc kernel: [29363.391374] [9532][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:730165411840:1073741824] took=12558ms ret=0 [FINISHED]
Aug 29 13:57:44 dc kernel: [29363.391381] [9532][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:769893859328:1073741824]
Aug 29 13:57:47 dc kernel: [29365.824280] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 13:57:52 dc kernel: [29370.680175] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 13:57:52 dc kernel: [29370.680389] [9764][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:828949659648:1073741824]
Aug 29 13:57:53 dc kernel: [29371.826360] [9052][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:357576998912:1073741824] took=57597ms ret=0 [FINISHED]
Aug 29 13:57:53 dc kernel: [29371.826369] [9052][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:385494286336:1073741824]
Aug 29 13:57:56 dc kernel: [29375.264382] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 13:57:57 dc kernel: [29375.700883] [9259][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:524006981632:1073741824] took=48396ms ret=0 [FINISHED]
Aug 29 13:57:57 dc kernel: [29375.700986] [9259][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:584136523776:1073741824]
Aug 29 13:58:00 dc kernel: [29379.168976] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 13:58:00 dc kernel: [29379.169313] [9881][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:875120558080:1073741824]
Aug 29 13:58:07 dc kernel: [29385.979541] [9764][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:828949659648:1073741824] took=15299ms ret=0 [FINISHED]
Aug 29 13:58:07 dc kernel: [29385.979554] [9764][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:864383139840:1073741824]
Aug 29 13:58:07 dc kernel: [29385.979730] [6121][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 13:58:08 dc kernel: [29386.825468] [9464][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:704395608064:1073741824] took=30657ms ret=0 [FINISHED]
Aug 29 13:58:14 dc kernel: [29393.057303] [9764][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:864383139840:1073741824] took=7077ms ret=0 [FINISHED]
Aug 29 13:58:14 dc kernel: [29393.126372] [9036][sc]btrfs*[caching_thread:556] FS[dm-119]: done caching BG[M:544408076288:1073741824] took=60784ms ret=0 [FINISHED]
Aug 29 13:58:14 dc kernel: [29393.126384] [9036][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:566956654592:1073741824]
Aug 29 13:58:15 dc kernel: [29394.289640] [6121][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6639] COMMIT took 81541 ms committers=1 open=0ms blocked=81396ms
Aug 29 13:58:15 dc kernel: [29394.289649] [6121][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6639] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 13:58:15 dc kernel: [29394.289688] [6121][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6639] wc:0 wpc:0 wew:0 fps:0
Aug 29 13:58:15 dc kernel: [29394.289694] [6121][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6639] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 13:58:15 dc kernel: [29394.289700] [6121][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6639] cfr:0 ccr:80309 pec:1086
Aug 29 13:58:15 dc kernel: [29394.289705] [6121][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6639] wrw:140 wrs:5

patched code, 32 caching workers - try 1
Aug 29 14:12:26 dc kernel: [30245.531236] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6647 /dev/dm-119
Aug 29 14:12:26 dc kernel: [30245.533887] btrfs: setting nodatasum
Aug 29 14:12:26 dc kernel: [30245.533891] btrfs: setting nodatacow, compression disabled
Aug 29 14:12:26 dc kernel: [30245.533894] btrfs: disabling tree log
Aug 29 14:12:26 dc kernel: [30245.533898] btrfs: disabling disk space caching
Aug 29 14:12:26 dc kernel: [30245.533901] btrfs: force clearing of disk cache
Aug 29 14:12:29 dc kernel: [30248.074275] [1696][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6648] COMMIT extwr:0 wr:1
Aug 29 14:12:29 dc kernel: [30248.074301] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 14:12:29 dc kernel: [30248.074335] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 14:12:29 dc kernel: [30248.074340] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 14:12:29 dc kernel: [30248.074425] [4722][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 14:12:29 dc kernel: [30248.074498] [4733][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:111690121216:1073741824]
Aug 29 14:12:29 dc kernel: [30248.076305] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 14:12:29 dc kernel: [30248.076373] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 14:12:29 dc kernel: [30248.076515] [4734][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:212621852672:1073741824]
Aug 29 14:12:29 dc kernel: [30248.079908] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 14:12:29 dc kernel: [30248.081510] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 14:12:29 dc kernel: [30248.081646] [4735][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:286710038528:1073741824]
Aug 29 14:12:29 dc kernel: [30248.083312] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 14:12:29 dc kernel: [30248.089596] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 14:12:29 dc kernel: [30248.090012] [4736][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:357576998912:1073741824]
Aug 29 14:12:29 dc kernel: [30248.091476] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 14:12:29 dc kernel: [30248.098188] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 14:12:29 dc kernel: [30248.098344] [4737][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:419854024704:1073741824]
Aug 29 14:12:29 dc kernel: [30248.104105] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 14:12:29 dc kernel: [30248.109513] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 14:12:29 dc kernel: [30248.109678] [4738][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:468172406784:1073741824]
Aug 29 14:12:29 dc kernel: [30248.112489] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 14:12:29 dc kernel: [30248.114549] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 14:12:29 dc kernel: [30248.114780] [4740][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:524006981632:1073741824]
Aug 29 14:12:29 dc kernel: [30248.117005] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 14:12:29 dc kernel: [30248.119353] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 14:12:29 dc kernel: [30248.119509] [4741][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:566956654592:1073741824]
Aug 29 14:12:29 dc kernel: [30248.121382] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 14:12:29 dc kernel: [30248.124236] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 14:12:29 dc kernel: [30248.124484] [4742][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:606685102080:1073741824]
Aug 29 14:12:29 dc kernel: [30248.126677] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 14:12:29 dc kernel: [30248.128823] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 14:12:29 dc kernel: [30248.129308] [4743][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:668962127872:1073741824]
Aug 29 14:12:29 dc kernel: [30248.133638] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 14:12:29 dc kernel: [30248.135738] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 14:12:29 dc kernel: [30248.136885] [4744][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:730165411840:1073741824]
Aug 29 14:12:29 dc kernel: [30248.138653] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 14:12:29 dc kernel: [30248.142555] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 14:12:29 dc kernel: [30248.142656] [4746][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:785999986688:1073741824]
Aug 29 14:12:29 dc kernel: [30248.146061] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 14:12:29 dc kernel: [30248.150414] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 14:12:29 dc kernel: [30248.157685] [4750][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:828949659648:1073741824]
Aug 29 14:12:29 dc kernel: [30248.163294] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 14:12:29 dc kernel: [30248.173472] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 14:12:29 dc kernel: [30248.173763] [4751][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:875120558080:1073741824]
Aug 29 14:12:29 dc kernel: [30248.177897] [1696][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 14:12:29 dc kernel: [30248.258486] [4722][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=184ms ret=0 [FINISHED]
Aug 29 14:12:29 dc kernel: [30248.258503] [4722][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 14:12:31 dc kernel: [30249.974844] [1696][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6648] COMMIT took 1900 ms committers=1 open=0ms blocked=1725ms
Aug 29 14:12:31 dc kernel: [30249.974851] [1696][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6648] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 14:12:31 dc kernel: [30249.974855] [1696][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6648] wc:0 wpc:0 wew:0 fps:0
Aug 29 14:12:31 dc kernel: [30249.974859] [1696][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6648] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 14:12:31 dc kernel: [30249.974863] [1696][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6648] cfr:0 ccr:1720 pec:5
Aug 29 14:12:31 dc kernel: [30249.974867] [1696][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6648] wrw:169 wrs:4

patched code, 32 caching workers, try 2
Aug 29 14:13:31 dc kernel: [30310.285564] device fsid 580c0613-bfc9-49ba-a166-3da7d0cb6e92 devid 1 transid 6650 /dev/dm-119
Aug 29 14:13:31 dc kernel: [30310.288139] btrfs: setting nodatasum
Aug 29 14:13:31 dc kernel: [30310.288144] btrfs: setting nodatacow, compression disabled
Aug 29 14:13:31 dc kernel: [30310.288147] btrfs: disabling tree log
Aug 29 14:13:31 dc kernel: [30310.288151] btrfs: disabling disk space caching
Aug 29 14:13:31 dc kernel: [30310.288154] btrfs: force clearing of disk cache
Aug 29 14:13:34 dc kernel: [30313.260482] [1698][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6651] COMMIT extwr:0 wr:1
Aug 29 14:13:34 dc kernel: [30313.260504] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:4194304:8388608]
Aug 29 14:13:34 dc kernel: [30313.260535] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:3242196992:1073741824]
Aug 29 14:13:34 dc kernel: [30313.260540] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:111690121216:1073741824]
Aug 29 14:13:34 dc kernel: [30313.260649] [9290][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:4194304:8388608]
Aug 29 14:13:34 dc kernel: [30313.260916] [9340][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:111690121216:1073741824]
Aug 29 14:13:34 dc kernel: [30313.262002] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:166450954240:1073741824]
Aug 29 14:13:34 dc kernel: [30313.262164] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:212621852672:1073741824]
Aug 29 14:13:34 dc kernel: [30313.262518] [9341][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:212621852672:1073741824]
Aug 29 14:13:34 dc kernel: [30313.263734] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:250202816512:1073741824]
Aug 29 14:13:34 dc kernel: [30313.265375] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:286710038528:1073741824]
Aug 29 14:13:34 dc kernel: [30313.265864] [9342][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:286710038528:1073741824]
Aug 29 14:13:34 dc kernel: [30313.266930] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:323217260544:1073741824]
Aug 29 14:13:34 dc kernel: [30313.268448] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:357576998912:1073741824]
Aug 29 14:13:34 dc kernel: [30313.268877] [9343][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:357576998912:1073741824]
Aug 29 14:13:34 dc kernel: [30313.270097] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:385494286336:1073741824]
Aug 29 14:13:34 dc kernel: [30313.272873] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:419854024704:1073741824]
Aug 29 14:13:34 dc kernel: [30313.273225] [9344][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:419854024704:1073741824]
Aug 29 14:13:34 dc kernel: [30313.274520] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:445623828480:1073741824]
Aug 29 14:13:34 dc kernel: [30313.276489] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:468172406784:1073741824]
Aug 29 14:13:34 dc kernel: [30313.276687] [9346][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:468172406784:1073741824]
Aug 29 14:13:34 dc kernel: [30313.278096] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:500384661504:1073741824]
Aug 29 14:13:34 dc kernel: [30313.280208] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:524006981632:1073741824]
Aug 29 14:13:34 dc kernel: [30313.280336] [9347][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:524006981632:1073741824]
Aug 29 14:13:34 dc kernel: [30313.282019] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:544408076288:1073741824]
Aug 29 14:13:34 dc kernel: [30313.283963] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:566956654592:1073741824]
Aug 29 14:13:34 dc kernel: [30313.284508] [9348][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:566956654592:1073741824]
Aug 29 14:13:34 dc kernel: [30313.288092] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:584136523776:1073741824]
Aug 29 14:13:34 dc kernel: [30313.289796] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:606685102080:1073741824]
Aug 29 14:13:34 dc kernel: [30313.289981] [9349][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:606685102080:1073741824]
Aug 29 14:13:34 dc kernel: [30313.294937] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:624938713088:1073741824]
Aug 29 14:13:34 dc kernel: [30313.297358] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:668962127872:1073741824]
Aug 29 14:13:34 dc kernel: [30313.297446] [9350][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:668962127872:1073741824]
Aug 29 14:13:34 dc kernel: [30313.299118] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:704395608064:1073741824]
Aug 29 14:13:34 dc kernel: [30313.301007] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:730165411840:1073741824]
Aug 29 14:13:34 dc kernel: [30313.301182] [9353][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:730165411840:1073741824]
Aug 29 14:13:34 dc kernel: [30313.302726] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:769893859328:1073741824]
Aug 29 14:13:34 dc kernel: [30313.306093] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:785999986688:1073741824]
Aug 29 14:13:34 dc kernel: [30313.306387] [9354][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:785999986688:1073741824]
Aug 29 14:13:34 dc kernel: [30313.308106] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:803179855872:1073741824]
Aug 29 14:13:34 dc kernel: [30313.311301] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:828949659648:1073741824]
Aug 29 14:13:34 dc kernel: [30313.311419] [9355][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:828949659648:1073741824]
Aug 29 14:13:34 dc kernel: [30313.316166] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:864383139840:1073741824]
Aug 29 14:13:34 dc kernel: [30313.318037] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:875120558080:1073741824]
Aug 29 14:13:34 dc kernel: [30313.318184] [9357][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:875120558080:1073741824]
Aug 29 14:13:34 dc kernel: [30313.321307] [1698][sc]btrfs [cache_block_group:685] FS[dm-119]: trigger caching BG[M:876194299904:1073741824]
Aug 29 14:13:34 dc kernel: [30313.340526] [9290][sc]btrfs [caching_thread:556] FS[dm-119]: done caching BG[M:4194304:8388608] took=79ms ret=0 [FINISHED]
Aug 29 14:13:34 dc kernel: [30313.340541] [9290][sc]btrfs [caching_thread:380] FS[dm-119]: start caching BG[M:3242196992:1073741824]
Aug 29 14:13:35 dc kernel: [30314.378026] [1698][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6651] COMMIT took 1117 ms committers=1 open=0ms blocked=999ms
Aug 29 14:13:35 dc kernel: [30314.378033] [1698][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6651] roo:0 rdr1:0 cbg:0 rdr2:0
Aug 29 14:13:35 dc kernel: [30314.378041] [1698][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6651] wc:0 wpc:0 wew:0 fps:0
Aug 29 14:13:35 dc kernel: [30314.378132] [1698][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6651] ww:0 cs:0 rdi:0 rdr3:0
Aug 29 14:13:35 dc kernel: [30314.378136] [1698][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6651] cfr:0 ccr:994 pec:4
Aug 29 14:13:35 dc kernel: [30314.378140] [1698][tx]btrfs [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6651] wrw:115 wrs:2
Josef Bacik Aug. 29, 2013, 7:55 p.m. UTC | #3
On Thu, Aug 29, 2013 at 10:09:29PM +0300, Alex Lyakas wrote:
> Hi Josef,
> 
> On Thu, Aug 29, 2013 at 5:38 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> > On Thu, Aug 29, 2013 at 01:31:05PM +0300, Alex Lyakas wrote:
> >> caching_thread()s do all their work under read access to extent_commit_sem.
> >> They give up on this read access only when need_resched() tells them, or
> >> when they exit. As a result, somebody that wants a WRITE access to this sem,
> >> might wait for a long time. Especially this is problematic in
> >> cache_block_group(),
> >> which can be called on critical paths like find_free_extent() and in commit
> >> path via commit_cowonly_roots().
> >>
> >> This patch is an RFC, that attempts to fix this problem, by notifying the
> >> caching threads to give up on extent_commit_sem.
> >>
> >> On a system with a lot of metadata (~20Gb total metadata, ~10Gb extent tree),
> >> with increased number of caching_threads, commits were very slow,
> >> stuck in commit_cowonly_roots, due to this issue.
> >> With this patch, commits no longer get stuck in commit_cowonly_roots.
> >>
> >
> > But what kind of effect do you see on overall performance/runtime?  Honestly I'd
> > expect we'd spend more of our time waiting for the caching kthread to fill in
> > free space so we can make allocations than waiting on this lock contention.  I'd
> > like to see real numbers here to see what kind of effect this patch has on your
> > workload.  (I don't doubt it makes a difference, I'm just curious to see how big
> > of a difference it makes.)
> 
> Primarily for me it affects the commit thread right after mounting,
> when it spends time in the "critical part" of the commit, in which
> trans_no_join is set, i.e., it is not possible to start a new
> transaction. So all the new writers that want a transaction are
> delayed at this point.
> 
> Here are some numbers (and some more logs are in the attached file).
> 
> Filesystem has a good amount of metadata (btrfs-progs modified
> slightly to print exact byte values):
> root@dc:/home/zadara# btrfs fi df /btrfs/pool-00000002/
> Data: total=846116945920(788.01GB), used=842106667008(784.27GB)
> System: total=4194304(4.00MB), used=94208(92.00KB)
> Metadata: total=31146901504(29.01GB), used=25248698368(23.51GB)
> 
> original code, 2 caching workers, try 1
> Aug 29 13:41:22 dc kernel: [28381.203745] [17617][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6627] COMMIT
> extwr:0 wr:1
> Aug 29 13:41:25 dc kernel: [28384.624838] [17617][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6627] COMMIT took
> 3421 ms committers=1 open=0ms blocked=3188ms
> Aug 29 13:41:25 dc kernel: [28384.624846] [17617][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6627] roo:0 rdr1:0
> cbg:0 rdr2:0
> Aug 29 13:41:25 dc kernel: [28384.624850] [17617][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6627] wc:0 wpc:0
> wew:0 fps:0
> Aug 29 13:41:25 dc kernel: [28384.624854] [17617][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6627] ww:0 cs:0
> rdi:0 rdr3:0
> Aug 29 13:41:25 dc kernel: [28384.624858] [17617][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6627] cfr:0
> ccr:2088 pec:1099
> Aug 29 13:41:25 dc kernel: [28384.624862] [17617][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6627] wrw:230 wrs:1
> 
> I have a breakdown of commit times here, to identify bottlenecks of
> the commit. Times are in ms.
> Names of phases are:
> 
> roo - btrfs_run_ordered_operations
> rdr1 - btrfs_run_delayed_refs (call 1)
> cbg - btrfs_create_pending_block_groups
> rdr2 - btrfs_run_delayed_refs (call 2)
> wc - wait_for_commit (if was needed)
> wpc - wair for previous commit (if was needed)
> wew - wait for "external writers to detach"
> fps - flush_all_pending_stuffs
> ww - wait for all the other writers to detach
> cs - create_pending_snapshots
> rdi - btrfs_run_delayed_items
> rdr3 - btrfs_run_delayed_refs (call 3)
> cfr - commit_fs_roots
> ccr - commit_cowonly_roots
> pec - btrfs_prepare_extent_commit
> wrw - btrfs_write_and_wait_transaction
> wrs - write_ctree_super
> 
> Two lines marked as "-" are the "critical part" of the commit.
> 
> 
> original code, 2 caching workers, try 2
> Aug 29 13:43:30 dc kernel: [28508.683625] [22490][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6630] COMMIT
> extwr:0 wr:1
> Aug 29 13:43:31 dc kernel: [28510.569269] [22490][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6630] COMMIT took
> 1885 ms committers=1 open=0ms blocked=1550ms
> Aug 29 13:43:31 dc kernel: [28510.569276] [22490][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6630] roo:0 rdr1:0
> cbg:0 rdr2:0
> Aug 29 13:43:31 dc kernel: [28510.569281] [22490][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6630] wc:0 wpc:0
> wew:0 fps:0
> Aug 29 13:43:31 dc kernel: [28510.569285] [22490][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6630] ww:0 cs:0
> rdi:0 rdr3:0
> Aug 29 13:43:31 dc kernel: [28510.569288] [22490][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6630] cfr:0
> ccr:1550 pec:0
> Aug 29 13:43:31 dc kernel: [28510.569292] [22490][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6630] wrw:333 wrs:1
> 
> So you see that 1-2 secs are spent in "commit cowonly roots". Now the
> patched code, and here, I admit, difference is not so dramatic:
> 
> patched code, 2 caching workers, try 1
> Aug 29 14:08:19 dc kernel: [29997.819307] [24783][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6642] COMMIT
> extwr:0 wr:1
> Aug 29 14:08:20 dc kernel: [29998.800342] [24783][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6642] COMMIT took
> 981 ms committers=1 open=0ms blocked=881ms
> Aug 29 14:08:20 dc kernel: [29998.800350] [24783][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6642] roo:0 rdr1:0
> cbg:0 rdr2:0
> Aug 29 14:08:20 dc kernel: [29998.800354] [24783][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6642] wc:0 wpc:0
> wew:0 fps:0
> Aug 29 14:08:20 dc kernel: [29998.800358] [24783][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6642] ww:0 cs:0
> rdi:0 rdr3:0
> Aug 29 14:08:20 dc kernel: [29998.800362] [24783][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6642] cfr:0 ccr:880
> pec:1
> Aug 29 14:08:20 dc kernel: [29998.800365] [24783][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6642] wrw:98 wrs:1
> 
> patched code, 2 caching workers, try 2
> Aug 29 14:09:18 dc kernel: [30057.375432] [24781][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6645] COMMIT
> extwr:0 wr:1
> Aug 29 14:09:19 dc kernel: [30058.079811] [24781][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6645] COMMIT took
> 704 ms committers=1 open=0ms blocked=643ms
> Aug 29 14:09:19 dc kernel: [30058.079820] [24781][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6645] roo:0 rdr1:0
> cbg:0 rdr2:0
> Aug 29 14:09:19 dc kernel: [30058.079824] [24781][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6645] wc:0 wpc:0
> wew:0 fps:0
> Aug 29 14:09:19 dc kernel: [30058.079828] [24781][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6645] ww:0 cs:0
> rdi:0 rdr3:0
> Aug 29 14:09:19 dc kernel: [30058.079832] [24781][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6645] cfr:0 ccr:642
> pec:1
> Aug 29 14:09:19 dc kernel: [30058.079836] [24781][tx]btrfs
> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6645] wrw:59 wrs:0
> 
> but still there is some improvement of commit time.
> 

Ok so that's what I expected, a marginal improvement under normal conditions.

> Now I changed the number of caching workers to 32, to improve the time
> to load the metadata, otherwise it takes a lot of time for the FS to
> become responsive. Also I modified the code to start caching workers
> like this:
> btrfs_init_workers(&fs_info->caching_workers, "cache",
>                32, NULL/*async_helper*/);
> /* use low thresh to quickly spawn needed new threads */
> fs_info->caching_workers.idle_thresh = 2;
> 
> As I explained in my other email named "btrfs:async-thread:
> atomic_start_pending=1 is set, but it's too late", even with two
> caching threads, only one does all the job. So I don't pass the async
> helper (don't know if that's a correct thing to do).
> 

So does jacking the limit up to 32 help the load?  The old "only let 2 cachers
go at once" stuff was because we used to do the caching on the normal extent
tree, so we took locks and it killed performance.  But now that we use the
commit root there is really no reason to limit our caching threads, other than
the fact we'll be thrashing the disk by all of the threads reading so much.
I'll look at the other email you sent, I've not read it.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Lyakas Aug. 29, 2013, 10:18 p.m. UTC | #4
On Thu, Aug 29, 2013 at 10:55 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> On Thu, Aug 29, 2013 at 10:09:29PM +0300, Alex Lyakas wrote:
>> Hi Josef,
>>
>> On Thu, Aug 29, 2013 at 5:38 PM, Josef Bacik <jbacik@fusionio.com> wrote:
>> > On Thu, Aug 29, 2013 at 01:31:05PM +0300, Alex Lyakas wrote:
>> >> caching_thread()s do all their work under read access to extent_commit_sem.
>> >> They give up on this read access only when need_resched() tells them, or
>> >> when they exit. As a result, somebody that wants a WRITE access to this sem,
>> >> might wait for a long time. Especially this is problematic in
>> >> cache_block_group(),
>> >> which can be called on critical paths like find_free_extent() and in commit
>> >> path via commit_cowonly_roots().
>> >>
>> >> This patch is an RFC, that attempts to fix this problem, by notifying the
>> >> caching threads to give up on extent_commit_sem.
>> >>
>> >> On a system with a lot of metadata (~20Gb total metadata, ~10Gb extent tree),
>> >> with increased number of caching_threads, commits were very slow,
>> >> stuck in commit_cowonly_roots, due to this issue.
>> >> With this patch, commits no longer get stuck in commit_cowonly_roots.
>> >>
>> >
>> > But what kind of effect do you see on overall performance/runtime?  Honestly I'd
>> > expect we'd spend more of our time waiting for the caching kthread to fill in
>> > free space so we can make allocations than waiting on this lock contention.  I'd
>> > like to see real numbers here to see what kind of effect this patch has on your
>> > workload.  (I don't doubt it makes a difference, I'm just curious to see how big
>> > of a difference it makes.)
>>
>> Primarily for me it affects the commit thread right after mounting,
>> when it spends time in the "critical part" of the commit, in which
>> trans_no_join is set, i.e., it is not possible to start a new
>> transaction. So all the new writers that want a transaction are
>> delayed at this point.
>>
>> Here are some numbers (and some more logs are in the attached file).
>>
>> Filesystem has a good amount of metadata (btrfs-progs modified
>> slightly to print exact byte values):
>> root@dc:/home/zadara# btrfs fi df /btrfs/pool-00000002/
>> Data: total=846116945920(788.01GB), used=842106667008(784.27GB)
>> System: total=4194304(4.00MB), used=94208(92.00KB)
>> Metadata: total=31146901504(29.01GB), used=25248698368(23.51GB)
>>
>> original code, 2 caching workers, try 1
>> Aug 29 13:41:22 dc kernel: [28381.203745] [17617][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6627] COMMIT
>> extwr:0 wr:1
>> Aug 29 13:41:25 dc kernel: [28384.624838] [17617][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6627] COMMIT took
>> 3421 ms committers=1 open=0ms blocked=3188ms
>> Aug 29 13:41:25 dc kernel: [28384.624846] [17617][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6627] roo:0 rdr1:0
>> cbg:0 rdr2:0
>> Aug 29 13:41:25 dc kernel: [28384.624850] [17617][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6627] wc:0 wpc:0
>> wew:0 fps:0
>> Aug 29 13:41:25 dc kernel: [28384.624854] [17617][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6627] ww:0 cs:0
>> rdi:0 rdr3:0
>> Aug 29 13:41:25 dc kernel: [28384.624858] [17617][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6627] cfr:0
>> ccr:2088 pec:1099
>> Aug 29 13:41:25 dc kernel: [28384.624862] [17617][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6627] wrw:230 wrs:1
>>
>> I have a breakdown of commit times here, to identify bottlenecks of
>> the commit. Times are in ms.
>> Names of phases are:
>>
>> roo - btrfs_run_ordered_operations
>> rdr1 - btrfs_run_delayed_refs (call 1)
>> cbg - btrfs_create_pending_block_groups
>> rdr2 - btrfs_run_delayed_refs (call 2)
>> wc - wait_for_commit (if was needed)
>> wpc - wair for previous commit (if was needed)
>> wew - wait for "external writers to detach"
>> fps - flush_all_pending_stuffs
>> ww - wait for all the other writers to detach
>> cs - create_pending_snapshots
>> rdi - btrfs_run_delayed_items
>> rdr3 - btrfs_run_delayed_refs (call 3)
>> cfr - commit_fs_roots
>> ccr - commit_cowonly_roots
>> pec - btrfs_prepare_extent_commit
>> wrw - btrfs_write_and_wait_transaction
>> wrs - write_ctree_super
>>
>> Two lines marked as "-" are the "critical part" of the commit.
>>
>>
>> original code, 2 caching workers, try 2
>> Aug 29 13:43:30 dc kernel: [28508.683625] [22490][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6630] COMMIT
>> extwr:0 wr:1
>> Aug 29 13:43:31 dc kernel: [28510.569269] [22490][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6630] COMMIT took
>> 1885 ms committers=1 open=0ms blocked=1550ms
>> Aug 29 13:43:31 dc kernel: [28510.569276] [22490][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6630] roo:0 rdr1:0
>> cbg:0 rdr2:0
>> Aug 29 13:43:31 dc kernel: [28510.569281] [22490][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6630] wc:0 wpc:0
>> wew:0 fps:0
>> Aug 29 13:43:31 dc kernel: [28510.569285] [22490][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6630] ww:0 cs:0
>> rdi:0 rdr3:0
>> Aug 29 13:43:31 dc kernel: [28510.569288] [22490][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6630] cfr:0
>> ccr:1550 pec:0
>> Aug 29 13:43:31 dc kernel: [28510.569292] [22490][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6630] wrw:333 wrs:1
>>
>> So you see that 1-2 secs are spent in "commit cowonly roots". Now the
>> patched code, and here, I admit, difference is not so dramatic:
>>
>> patched code, 2 caching workers, try 1
>> Aug 29 14:08:19 dc kernel: [29997.819307] [24783][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6642] COMMIT
>> extwr:0 wr:1
>> Aug 29 14:08:20 dc kernel: [29998.800342] [24783][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6642] COMMIT took
>> 981 ms committers=1 open=0ms blocked=881ms
>> Aug 29 14:08:20 dc kernel: [29998.800350] [24783][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6642] roo:0 rdr1:0
>> cbg:0 rdr2:0
>> Aug 29 14:08:20 dc kernel: [29998.800354] [24783][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6642] wc:0 wpc:0
>> wew:0 fps:0
>> Aug 29 14:08:20 dc kernel: [29998.800358] [24783][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6642] ww:0 cs:0
>> rdi:0 rdr3:0
>> Aug 29 14:08:20 dc kernel: [29998.800362] [24783][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6642] cfr:0 ccr:880
>> pec:1
>> Aug 29 14:08:20 dc kernel: [29998.800365] [24783][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6642] wrw:98 wrs:1
>>
>> patched code, 2 caching workers, try 2
>> Aug 29 14:09:18 dc kernel: [30057.375432] [24781][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_STARTED:439] FS[dm-119] txn[6645] COMMIT
>> extwr:0 wr:1
>> Aug 29 14:09:19 dc kernel: [30058.079811] [24781][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:519] FS[dm-119] txn[6645] COMMIT took
>> 704 ms committers=1 open=0ms blocked=643ms
>> Aug 29 14:09:19 dc kernel: [30058.079820] [24781][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:524] FS[dm-119] txn[6645] roo:0 rdr1:0
>> cbg:0 rdr2:0
>> Aug 29 14:09:19 dc kernel: [30058.079824] [24781][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:529] FS[dm-119] txn[6645] wc:0 wpc:0
>> wew:0 fps:0
>> Aug 29 14:09:19 dc kernel: [30058.079828] [24781][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:534] -FS[dm-119] txn[6645] ww:0 cs:0
>> rdi:0 rdr3:0
>> Aug 29 14:09:19 dc kernel: [30058.079832] [24781][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:538] -FS[dm-119] txn[6645] cfr:0 ccr:642
>> pec:1
>> Aug 29 14:09:19 dc kernel: [30058.079836] [24781][tx]btrfs
>> [ZBTRFS_TXN_COMMIT_PHASE_DONE:541] FS[dm-119] txn[6645] wrw:59 wrs:0
>>
>> but still there is some improvement of commit time.
>>
>
> Ok so that's what I expected, a marginal improvement under normal conditions.
>
>> Now I changed the number of caching workers to 32, to improve the time
>> to load the metadata, otherwise it takes a lot of time for the FS to
>> become responsive. Also I modified the code to start caching workers
>> like this:
>> btrfs_init_workers(&fs_info->caching_workers, "cache",
>>                32, NULL/*async_helper*/);
>> /* use low thresh to quickly spawn needed new threads */
>> fs_info->caching_workers.idle_thresh = 2;
>>
>> As I explained in my other email named "btrfs:async-thread:
>> atomic_start_pending=1 is set, but it's too late", even with two
>> caching threads, only one does all the job. So I don't pass the async
>> helper (don't know if that's a correct thing to do).
>>
>
> So does jacking the limit up to 32 help the load?  The old "only let 2 cachers
> go at once" stuff was because we used to do the caching on the normal extent
> tree, so we took locks and it killed performance.  But now that we use the
> commit root there is really no reason to limit our caching threads, other than
> the fact we'll be thrashing the disk by all of the threads reading so much.
> I'll look at the other email you sent, I've not read it.  Thanks,
What we see is that a single caching_thread, with an underlying SSD,
can generate only up to 4Mb/s of reading-in the pages, because it
reads them synchronously, one-by-one as it scans its part of the
extent tree. Whereas with 32 threads, we are able to see 40Mb/s of
caching-in the block groups. So yes, the system becomes more
responsive on SOD (start-of-day), where it needs to "warm-up" the free
space cache, which is exactly a critical period of time, when host is
waiting to resume the IO.

What you did not ask me (and thanks for that!) is why not to use the
free-space-cache, which is supposed to solve this exact problem. Apart
from one stack overrun we hit with it (that was fixed in later
kernels), we see that free space EXTENT_DATAs are allocated out of
DATA block groups. For us, this is somewhat an issue, because we are
routing METADATA IOs to a faster SSD storage. I haven't yet looked
deeper into whether it is straightforward to alloc free-space-entries
out of DATA block groups, but do you think this would make sense?
After all, free-space entries are metadata...

Thanks!
Alex.



>
> Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c90be01..b602611 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1427,6 +1427,13 @@  struct btrfs_fs_info {
     struct mutex ordered_extent_flush_mutex;

     struct rw_semaphore extent_commit_sem;
+    /* notifies the readers to give up on the sem ASAP */
+    atomic_t extent_commit_sem_give_up_read;
+#define BTRFS_DOWN_WRITE_EXTENT_COMMIT_SEM(fs_info)                  \
+    do { atomic_inc(&(fs_info)->extent_commit_sem_give_up_read);     \
+         down_write(&(fs_info)->extent_commit_sem);                  \
+         atomic_dec(&(fs_info)->extent_commit_sem_give_up_read);     \
+    } while (0)

     struct rw_semaphore cleanup_work_sem;

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 69e9afb..b88e688 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2291,6 +2291,7 @@  int open_ctree(struct super_block *sb,
     mutex_init(&fs_info->cleaner_mutex);
     mutex_init(&fs_info->volume_mutex);
     init_rwsem(&fs_info->extent_commit_sem);
+    atomic_set(&fs_info->extent_commit_sem_give_up_read, 0);
     init_rwsem(&fs_info->cleanup_work_sem);
     init_rwsem(&fs_info->subvol_sem);
     sema_init(&fs_info->uuid_tree_rescan_sem, 1);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 95c6539..28fee78 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -442,7 +442,8 @@  next:
             if (ret)
                 break;

-            if (need_resched()) {
+            if (need_resched() ||
+                atomic_read(&fs_info->extent_commit_sem_give_up_read) > 0) {
                 caching_ctl->progress = last;
                 btrfs_release_path(path);
                 up_read(&fs_info->extent_commit_sem);
@@ -632,7 +633,7 @@  static int cache_block_group(struct
btrfs_block_group_cache *cache,
         return 0;
     }

-    down_write(&fs_info->extent_commit_sem);
+    BTRFS_DOWN_WRITE_EXTENT_COMMIT_SEM(fs_info);
     atomic_inc(&caching_ctl->count);
     list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
     up_write(&fs_info->extent_commit_sem);
@@ -5462,7 +5463,7 @@  void btrfs_prepare_extent_commit(struct
btrfs_trans_handle *trans,
     struct btrfs_block_group_cache *cache;
     struct btrfs_space_info *space_info;

-    down_write(&fs_info->extent_commit_sem);
+    BTRFS_DOWN_WRITE_EXTENT_COMMIT_SEM(fs_info);

     list_for_each_entry_safe(caching_ctl, next,
                  &fs_info->caching_block_groups, list) {
@@ -8219,7 +8220,7 @@  int btrfs_free_block_groups(struct btrfs_fs_info *info)
     struct btrfs_caching_control *caching_ctl;
     struct rb_node *n;

-    down_write(&info->extent_commit_sem);
+    BTRFS_DOWN_WRITE_EXTENT_COMMIT_SEM(fs_info);
     while (!list_empty(&info->caching_block_groups)) {
         caching_ctl = list_entry(info->caching_block_groups.next,
                      struct btrfs_caching_control, list);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index cac4a3f..976d20a 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -969,7 +969,7 @@  static noinline int commit_cowonly_roots(struct
btrfs_trans_handle *trans,
             return ret;
     }

-    down_write(&fs_info->extent_commit_sem);
+    BTRFS_DOWN_WRITE_EXTENT_COMMIT_SEM(fs_info);
     switch_commit_root(fs_info->extent_root);
     up_write(&fs_info->extent_commit_sem);