Message ID | 20191114061646.22122-2-lsahlber@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cifs: fix race between compound_send_recv() and the | expand |
ср, 13 нояб. 2019 г. в 22:17, Ronnie Sahlberg <lsahlber@redhat.com>: > > There is a race where the open() may be interrupted between when we receive the reply > but before we have invoked the callback in which case we never end up calling > handle_cancelled_mid() and thus leak an open handle on the server. > > Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> > --- > fs/cifs/connect.c | 1 - > fs/cifs/transport.c | 2 +- > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > index ccaa8bad336f..802604a7e692 100644 > --- a/fs/cifs/connect.c > +++ b/fs/cifs/connect.c > @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p) > if (mids[i] != NULL) { > mids[i]->resp_buf_size = server->pdu_size; > if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) && > - mids[i]->mid_state == MID_RESPONSE_RECEIVED && > server->ops->handle_cancelled_mid) > server->ops->handle_cancelled_mid( > mids[i]->resp_buf, > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > index ca3de62688d6..0f219f7653f3 100644 > --- a/fs/cifs/transport.c > +++ b/fs/cifs/transport.c > @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, > midQ[i]->mid, le16_to_cpu(midQ[i]->command)); > send_cancel(server, &rqst[i], midQ[i]); > spin_lock(&GlobalMid_Lock); > - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { > + if (is_interrupt_error(rc)) { > midQ[i]->mid_flags |= MID_WAIT_CANCELLED; > midQ[i]->callback = cifs_cancelled_callback; > cancelled_mid[i] = true; > -- > 2.13.6 > It doesn't seem that RC may be anything other than -ERESTARTSYS but is_interrupt_error() should work. Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> -- Best regards, Pavel Shilovsky
пт, 15 нояб. 2019 г. в 10:07, Pavel Shilovsky <piastryyy@gmail.com>: > > ср, 13 нояб. 2019 г. в 22:17, Ronnie Sahlberg <lsahlber@redhat.com>: > > > > There is a race where the open() may be interrupted between when we receive the reply > > but before we have invoked the callback in which case we never end up calling > > handle_cancelled_mid() and thus leak an open handle on the server. > > > > Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> > > --- > > fs/cifs/connect.c | 1 - > > fs/cifs/transport.c | 2 +- > > 2 files changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > > index ccaa8bad336f..802604a7e692 100644 > > --- a/fs/cifs/connect.c > > +++ b/fs/cifs/connect.c > > @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p) > > if (mids[i] != NULL) { > > mids[i]->resp_buf_size = server->pdu_size; > > if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) && > > - mids[i]->mid_state == MID_RESPONSE_RECEIVED && > > server->ops->handle_cancelled_mid) > > server->ops->handle_cancelled_mid( > > mids[i]->resp_buf, > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > > index ca3de62688d6..0f219f7653f3 100644 > > --- a/fs/cifs/transport.c > > +++ b/fs/cifs/transport.c > > @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, > > midQ[i]->mid, le16_to_cpu(midQ[i]->command)); > > send_cancel(server, &rqst[i], midQ[i]); > > spin_lock(&GlobalMid_Lock); > > - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { > > + if (is_interrupt_error(rc)) { > > midQ[i]->mid_flags |= MID_WAIT_CANCELLED; > > midQ[i]->callback = cifs_cancelled_callback; > > cancelled_mid[i] = true; > > -- > > 2.13.6 > > > > It doesn't seem that RC may be anything other than -ERESTARTSYS but > is_interrupt_error() should work. > > Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> > > -- > Best regards, > Pavel Shilovsky Tested it out. The following part of this change diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index ca3de62688d6..0f219f7653f3 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, midQ[i]->mid, le16_to_cpu(midQ[i]->command)); send_cancel(server, &rqst[i], midQ[i]); spin_lock(&GlobalMid_Lock); - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { + if (is_interrupt_error(rc)) { midQ[i]->mid_flags |= MID_WAIT_CANCELLED; midQ[i]->callback = cifs_cancelled_callback; cancelled_mid[i] = true; is causing NULL-pointer dereference on my system: [681000.970523] BUG: kernel NULL pointer dereference, address: 000000000000000e [681000.970526] #PF: supervisor read access in kernel mode [681000.970527] #PF: error_code(0x0000) - not-present page [681000.970528] PGD 0 P4D 0 [681000.970531] Oops: 0000 [#1] SMP PTI [681000.970533] CPU: 3 PID: 15946 Comm: cifsd Tainted: G OE 5.3.7-050307-generic #201910180652 [681000.970534] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [681000.970554] RIP: 0010:smb2_get_credits+0x1f/0x30 [cifs] [681000.970556] Code: 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 57 6c 55 48 89 e5 83 fa 04 74 09 31 c0 83 fa 10 74 02 5d c3 48 8b 47 60 5d <0f> b7 40 0e c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [681000.970558] RSP: 0018:ffffa9d680433df0 EFLAGS: 00010246 [681000.970559] RAX: 0000000000000000 RBX: ffff9a0ef1775000 RCX: 0000000000000000 [681000.970560] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff9a0ef15c0780 [681000.970561] RBP: ffffa9d680433e18 R08: 0000000000000218 R09: 00000000001bfc3a [681000.970562] R10: 00000000000dfe1d R11: ffff9a0ef1b7cae0 R12: ffff9a0ef15c0780 [681000.970563] R13: ffffa9d680433e80 R14: ffffa9d680433ea8 R15: 0000000000000001 [681000.970565] FS: 0000000000000000(0000) GS:ffff9a0ef7b80000(0000) knlGS:0000000000000000 [681000.970566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [681000.970566] CR2: 000000000000000e CR3: 00000000e7e0a001 CR4: 00000000003606e0 [681000.970569] Call Trace: [681000.970585] ? cifs_compound_callback+0x33/0x80 [cifs] [681000.970599] cifs_compound_last_callback+0x12/0x20 [cifs] [681000.970611] cifs_demultiplex_thread+0x6d4/0xc40 [cifs] [681000.970615] kthread+0x104/0x140 [681000.970627] ? cifs_handle_standard+0x190/0x190 [cifs] [681000.970629] ? kthread_park+0x80/0x80 [681000.970631] ret_from_fork+0x35/0x40 Still haven't figured out why this is happening. Looking. -- Best regards, Pavel Shilovsky
temporarily removed this patch from cifs-2.6.git for-next to give time to respin the patch On Wed, Nov 20, 2019 at 5:57 PM Pavel Shilovsky <piastryyy@gmail.com> wrote: > > пт, 15 нояб. 2019 г. в 10:07, Pavel Shilovsky <piastryyy@gmail.com>: > > > > ср, 13 нояб. 2019 г. в 22:17, Ronnie Sahlberg <lsahlber@redhat.com>: > > > > > > There is a race where the open() may be interrupted between when we receive the reply > > > but before we have invoked the callback in which case we never end up calling > > > handle_cancelled_mid() and thus leak an open handle on the server. > > > > > > Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> > > > --- > > > fs/cifs/connect.c | 1 - > > > fs/cifs/transport.c | 2 +- > > > 2 files changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > > > index ccaa8bad336f..802604a7e692 100644 > > > --- a/fs/cifs/connect.c > > > +++ b/fs/cifs/connect.c > > > @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p) > > > if (mids[i] != NULL) { > > > mids[i]->resp_buf_size = server->pdu_size; > > > if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) && > > > - mids[i]->mid_state == MID_RESPONSE_RECEIVED && > > > server->ops->handle_cancelled_mid) > > > server->ops->handle_cancelled_mid( > > > mids[i]->resp_buf, > > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > > > index ca3de62688d6..0f219f7653f3 100644 > > > --- a/fs/cifs/transport.c > > > +++ b/fs/cifs/transport.c > > > @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, > > > midQ[i]->mid, le16_to_cpu(midQ[i]->command)); > > > send_cancel(server, &rqst[i], midQ[i]); > > > spin_lock(&GlobalMid_Lock); > > > - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { > > > + if (is_interrupt_error(rc)) { > > > midQ[i]->mid_flags |= MID_WAIT_CANCELLED; > > > midQ[i]->callback = cifs_cancelled_callback; > > > cancelled_mid[i] = true; > > > -- > > > 2.13.6 > > > > > > > It doesn't seem that RC may be anything other than -ERESTARTSYS but > > is_interrupt_error() should work. > > > > Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> > > > > -- > > Best regards, > > Pavel Shilovsky > > Tested it out. The following part of this change > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > index ca3de62688d6..0f219f7653f3 100644 > --- a/fs/cifs/transport.c > +++ b/fs/cifs/transport.c > @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, > struct cifs_ses *ses, > midQ[i]->mid, le16_to_cpu(midQ[i]->command)); > send_cancel(server, &rqst[i], midQ[i]); > spin_lock(&GlobalMid_Lock); > - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { > + if (is_interrupt_error(rc)) { > midQ[i]->mid_flags |= MID_WAIT_CANCELLED; > midQ[i]->callback = cifs_cancelled_callback; > cancelled_mid[i] = true; > > is causing NULL-pointer dereference on my system: > > [681000.970523] BUG: kernel NULL pointer dereference, address: 000000000000000e > [681000.970526] #PF: supervisor read access in kernel mode > [681000.970527] #PF: error_code(0x0000) - not-present page > [681000.970528] PGD 0 P4D 0 > [681000.970531] Oops: 0000 [#1] SMP PTI > [681000.970533] CPU: 3 PID: 15946 Comm: cifsd Tainted: G OE > 5.3.7-050307-generic #201910180652 > [681000.970534] Hardware name: Microsoft Corporation Virtual > Machine/Virtual Machine, BIOS 090008 12/07/2018 > [681000.970554] RIP: 0010:smb2_get_credits+0x1f/0x30 [cifs] > [681000.970556] Code: 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 > 8b 57 6c 55 48 89 e5 83 fa 04 74 09 31 c0 83 fa 10 74 02 5d c3 48 8b > 47 60 5d <0f> b7 40 0e c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 > 00 00 > [681000.970558] RSP: 0018:ffffa9d680433df0 EFLAGS: 00010246 > [681000.970559] RAX: 0000000000000000 RBX: ffff9a0ef1775000 RCX: > 0000000000000000 > [681000.970560] RDX: 0000000000000004 RSI: 0000000000000000 RDI: > ffff9a0ef15c0780 > [681000.970561] RBP: ffffa9d680433e18 R08: 0000000000000218 R09: > 00000000001bfc3a > [681000.970562] R10: 00000000000dfe1d R11: ffff9a0ef1b7cae0 R12: > ffff9a0ef15c0780 > [681000.970563] R13: ffffa9d680433e80 R14: ffffa9d680433ea8 R15: > 0000000000000001 > [681000.970565] FS: 0000000000000000(0000) GS:ffff9a0ef7b80000(0000) > knlGS:0000000000000000 > [681000.970566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [681000.970566] CR2: 000000000000000e CR3: 00000000e7e0a001 CR4: > 00000000003606e0 > [681000.970569] Call Trace: > [681000.970585] ? cifs_compound_callback+0x33/0x80 [cifs] > [681000.970599] cifs_compound_last_callback+0x12/0x20 [cifs] > [681000.970611] cifs_demultiplex_thread+0x6d4/0xc40 [cifs] > [681000.970615] kthread+0x104/0x140 > [681000.970627] ? cifs_handle_standard+0x190/0x190 [cifs] > [681000.970629] ? kthread_park+0x80/0x80 > [681000.970631] ret_from_fork+0x35/0x40 > > Still haven't figured out why this is happening. Looking. > > -- > Best regards, > Pavel Shilovsky
I figured out the problem: there is another race between the demultiplex thread and a system call thread which leads to mid->resp_buf being NULL when it is being accessed to get credits. This is unrelated to this patch but it probably just increases probability of this to happen. Diving deeply into the code, I think this patch doesn't solve the problem completely - there is still a possibility that the demultiplex thread finished processing of the mid by the time the mid is set as cancelled. Also we can't set cancelled callback in the case when a response has been already processed because the cancelled callback won't be called and the mid will be leaked. I created a patch to fix the race another way: by moving handling of cancelled mids into the mid destructor. By that time there should be only one thread referencing the mid, so, a race should occur there. I will post the patch to the list together with other patches. -- Best regards, Pavel Shilovsky ср, 20 нояб. 2019 г. в 16:10, Steve French <smfrench@gmail.com>: > > temporarily removed this patch from cifs-2.6.git for-next to give time > to respin the patch > > On Wed, Nov 20, 2019 at 5:57 PM Pavel Shilovsky <piastryyy@gmail.com> wrote: > > > > пт, 15 нояб. 2019 г. в 10:07, Pavel Shilovsky <piastryyy@gmail.com>: > > > > > > ср, 13 нояб. 2019 г. в 22:17, Ronnie Sahlberg <lsahlber@redhat.com>: > > > > > > > > There is a race where the open() may be interrupted between when we receive the reply > > > > but before we have invoked the callback in which case we never end up calling > > > > handle_cancelled_mid() and thus leak an open handle on the server. > > > > > > > > Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> > > > > --- > > > > fs/cifs/connect.c | 1 - > > > > fs/cifs/transport.c | 2 +- > > > > 2 files changed, 1 insertion(+), 2 deletions(-) > > > > > > > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > > > > index ccaa8bad336f..802604a7e692 100644 > > > > --- a/fs/cifs/connect.c > > > > +++ b/fs/cifs/connect.c > > > > @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p) > > > > if (mids[i] != NULL) { > > > > mids[i]->resp_buf_size = server->pdu_size; > > > > if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) && > > > > - mids[i]->mid_state == MID_RESPONSE_RECEIVED && > > > > server->ops->handle_cancelled_mid) > > > > server->ops->handle_cancelled_mid( > > > > mids[i]->resp_buf, > > > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > > > > index ca3de62688d6..0f219f7653f3 100644 > > > > --- a/fs/cifs/transport.c > > > > +++ b/fs/cifs/transport.c > > > > @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, > > > > midQ[i]->mid, le16_to_cpu(midQ[i]->command)); > > > > send_cancel(server, &rqst[i], midQ[i]); > > > > spin_lock(&GlobalMid_Lock); > > > > - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { > > > > + if (is_interrupt_error(rc)) { > > > > midQ[i]->mid_flags |= MID_WAIT_CANCELLED; > > > > midQ[i]->callback = cifs_cancelled_callback; > > > > cancelled_mid[i] = true; > > > > -- > > > > 2.13.6 > > > > > > > > > > It doesn't seem that RC may be anything other than -ERESTARTSYS but > > > is_interrupt_error() should work. > > > > > > Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> > > > > > > -- > > > Best regards, > > > Pavel Shilovsky > > > > Tested it out. The following part of this change > > > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > > index ca3de62688d6..0f219f7653f3 100644 > > --- a/fs/cifs/transport.c > > +++ b/fs/cifs/transport.c > > @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, > > struct cifs_ses *ses, > > midQ[i]->mid, le16_to_cpu(midQ[i]->command)); > > send_cancel(server, &rqst[i], midQ[i]); > > spin_lock(&GlobalMid_Lock); > > - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { > > + if (is_interrupt_error(rc)) { > > midQ[i]->mid_flags |= MID_WAIT_CANCELLED; > > midQ[i]->callback = cifs_cancelled_callback; > > cancelled_mid[i] = true; > > > > is causing NULL-pointer dereference on my system: > > > > [681000.970523] BUG: kernel NULL pointer dereference, address: 000000000000000e > > [681000.970526] #PF: supervisor read access in kernel mode > > [681000.970527] #PF: error_code(0x0000) - not-present page > > [681000.970528] PGD 0 P4D 0 > > [681000.970531] Oops: 0000 [#1] SMP PTI > > [681000.970533] CPU: 3 PID: 15946 Comm: cifsd Tainted: G OE > > 5.3.7-050307-generic #201910180652 > > [681000.970534] Hardware name: Microsoft Corporation Virtual > > Machine/Virtual Machine, BIOS 090008 12/07/2018 > > [681000.970554] RIP: 0010:smb2_get_credits+0x1f/0x30 [cifs] > > [681000.970556] Code: 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 > > 8b 57 6c 55 48 89 e5 83 fa 04 74 09 31 c0 83 fa 10 74 02 5d c3 48 8b > > 47 60 5d <0f> b7 40 0e c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 > > 00 00 > > [681000.970558] RSP: 0018:ffffa9d680433df0 EFLAGS: 00010246 > > [681000.970559] RAX: 0000000000000000 RBX: ffff9a0ef1775000 RCX: > > 0000000000000000 > > [681000.970560] RDX: 0000000000000004 RSI: 0000000000000000 RDI: > > ffff9a0ef15c0780 > > [681000.970561] RBP: ffffa9d680433e18 R08: 0000000000000218 R09: > > 00000000001bfc3a > > [681000.970562] R10: 00000000000dfe1d R11: ffff9a0ef1b7cae0 R12: > > ffff9a0ef15c0780 > > [681000.970563] R13: ffffa9d680433e80 R14: ffffa9d680433ea8 R15: > > 0000000000000001 > > [681000.970565] FS: 0000000000000000(0000) GS:ffff9a0ef7b80000(0000) > > knlGS:0000000000000000 > > [681000.970566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [681000.970566] CR2: 000000000000000e CR3: 00000000e7e0a001 CR4: > > 00000000003606e0 > > [681000.970569] Call Trace: > > [681000.970585] ? cifs_compound_callback+0x33/0x80 [cifs] > > [681000.970599] cifs_compound_last_callback+0x12/0x20 [cifs] > > [681000.970611] cifs_demultiplex_thread+0x6d4/0xc40 [cifs] > > [681000.970615] kthread+0x104/0x140 > > [681000.970627] ? cifs_handle_standard+0x190/0x190 [cifs] > > [681000.970629] ? kthread_park+0x80/0x80 > > [681000.970631] ret_from_fork+0x35/0x40 > > > > Still haven't figured out why this is happening. Looking. > > > > -- > > Best regards, > > Pavel Shilovsky > > > > -- > Thanks, > > Steve
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index ccaa8bad336f..802604a7e692 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p) if (mids[i] != NULL) { mids[i]->resp_buf_size = server->pdu_size; if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) && - mids[i]->mid_state == MID_RESPONSE_RECEIVED && server->ops->handle_cancelled_mid) server->ops->handle_cancelled_mid( mids[i]->resp_buf, diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index ca3de62688d6..0f219f7653f3 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -1119,7 +1119,7 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, midQ[i]->mid, le16_to_cpu(midQ[i]->command)); send_cancel(server, &rqst[i], midQ[i]); spin_lock(&GlobalMid_Lock); - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { + if (is_interrupt_error(rc)) { midQ[i]->mid_flags |= MID_WAIT_CANCELLED; midQ[i]->callback = cifs_cancelled_callback; cancelled_mid[i] = true;
There is a race where the open() may be interrupted between when we receive the reply but before we have invoked the callback in which case we never end up calling handle_cancelled_mid() and thus leak an open handle on the server. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> --- fs/cifs/connect.c | 1 - fs/cifs/transport.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-)