[RFC,v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect

This is a second attempt at the fix for the list_del corruption
issue.  This patch adds a similar refcount approach in 
clean_demultiplex_info() since that was noticed.  However, for
some reason I now get the list_del corruption come back fairly
quickly (after a few minutes) and eventually a softlockup.
I have not tracked down the problem yet.

There's a race between the demultiplexer thread and the request
issuing thread similar to the race described in
commit 696e420bb2a6 ("cifs: Fix use after free of a mid_q_entry")
where both threads may obtain and attempt to call list_del_init
on the same mid and a list_del corruption similar to the
following will result:

[  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[  430.464668] ------------[ cut here ]------------
[  430.466569] kernel BUG at lib/list_debug.c:51!
[  430.468476] invalid opcode: 0000 [#1] SMP PTI
[  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32 a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff 0f 0b
[  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
[  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
[  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
[  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
[  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
[  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
[  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000) knlGS:0000000000000000
[  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0
[  430.510426] Call Trace:
[  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
[  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
[  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
[  430.517452]  ? try_to_wake_up+0x212/0x650
[  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
[  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[  430.525116]  kthread+0xfb/0x130
[  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
[  430.528514]  ? kthread_park+0x90/0x90
[  430.530019]  ret_from_fork+0x35/0x40

To fix the above, inside cifs_reconnect unconditionally set the
state to MID_RETRY_NEEDED, and then take a reference before we
move any mid_q_entry on server->pending_mid_q to the temporary
retry_list.  Then while processing retry_list make sure we check
the state is still MID_RETRY_NEEDED while holding GlobalMid_Lock
before calling list_del_init.  Then after mid_q_entry callback
has been completed, drop the reference.  In the code paths for
request issuing thread, avoid calling list_del_init if we
notice mid->mid_state != MID_RETRY_NEEDED, avoiding the
race and duplicate call to list_del_init.  In addition to
the above MID_RETRY_NEEDED case, handle the MID_SHUTDOWN case
in a similar fashion to avoid the possibility of a similar
crash.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
---
 fs/cifs/connect.c   | 30 ++++++++++++++++++++++--------
 fs/cifs/transport.c |  8 ++++++--
 2 files changed, 28 insertions(+), 10 deletions(-)

Message ID	20191019233505.14191-1-dwysocha@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=C9ZR=YM=vger.kernel.org=linux-cifs-owner@kernel.org> From: Dave Wysochanski <dwysocha@redhat.com> To: Pavel Shilovskiy <pshilov@microsoft.com>, Ronnie Sahlberg <lsahlber@redhat.com> Cc: linux-cifs@vger.kernel.org, Frank Sorenson <sorenson@redhat.com> Subject: [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Date: Sat, 19 Oct 2019 19:35:05 -0400 Message-Id: <20191019233505.14191-1-dwysocha@redhat.com> In-Reply-To: <DM5PR21MB0185857761946DBE12AEB3ADB66C0@DM5PR21MB0185.namprd21.prod.outlook.com> References: <DM5PR21MB0185857761946DBE12AEB3ADB66C0@DM5PR21MB0185.namprd21.prod.outlook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Sender: linux-cifs-owner@vger.kernel.org Precedence: bulk
Series	[RFC,v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect \| expand [RFC,v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect

[RFC,v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect

Commit Message

Comments

Patch