Message ID | CAN-5tyHD9NspwAzL2YwoQGwcUk52ak0OWZ-dBfwpW6nynAf1Og@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2017-04-13 at 14:00 -0400, Olga Kornievskaia wrote: > Hi folks, > > Looking for suggestions on how to fix a kernel oops. > > It's possible that there is a ctrl-c when the COMMIT is send. In case > of the COPY, it calls > nfs_commit_file() which calls wait_on_commit() that is interrupted by > the crtl-c and frees the nfs_page request. So when asynchronous > COMMIT > rpc comes back it tried to use the nfs_page request and gets the > oops. > Is that call to nfs_free_request() in nfs_commit_file() correct? It looks to me as if the same request will be freed in nfs_commit_release_pages(). Anna? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com
On Thu, Apr 13, 2017 at 2:51 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Thu, 2017-04-13 at 14:00 -0400, Olga Kornievskaia wrote: >> Hi folks, >> >> Looking for suggestions on how to fix a kernel oops. >> >> It's possible that there is a ctrl-c when the COMMIT is send. In case >> of the COPY, it calls >> nfs_commit_file() which calls wait_on_commit() that is interrupted by >> the crtl-c and frees the nfs_page request. So when asynchronous >> COMMIT >> rpc comes back it tried to use the nfs_page request and gets the >> oops. >> > > Is that call to nfs_free_request() in nfs_commit_file() correct? yes, nfs_commit_file() creates a new request via nfs_create_request() and in the end if calls nfs_free_request(); > It looks to me as if the same request will be freed in > nfs_commit_release_pages(). so nfs_commit_release_pages() thru the nfs_unlock_and_release_request() is going to call nfs_release_request() from req->wb_kref.. I'm not sure if this is setup(?) for the copy commit path? Otherwise, it would have seem that we'd be doing a double free and I haven't seen that in testing (not that it can't be true)... > > Anna? > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/13/2017 03:07 PM, Olga Kornievskaia wrote: > On Thu, Apr 13, 2017 at 2:51 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> On Thu, 2017-04-13 at 14:00 -0400, Olga Kornievskaia wrote: >>> Hi folks, >>> >>> Looking for suggestions on how to fix a kernel oops. >>> >>> It's possible that there is a ctrl-c when the COMMIT is send. In case >>> of the COPY, it calls >>> nfs_commit_file() which calls wait_on_commit() that is interrupted by >>> the crtl-c and frees the nfs_page request. So when asynchronous >>> COMMIT >>> rpc comes back it tried to use the nfs_page request and gets the >>> oops. >>> >> >> Is that call to nfs_free_request() in nfs_commit_file() correct? > > yes, nfs_commit_file() creates a new request via nfs_create_request() > and in the end if calls nfs_free_request(); > >> It looks to me as if the same request will be freed in >> nfs_commit_release_pages(). > > so nfs_commit_release_pages() thru the > nfs_unlock_and_release_request() is going to call > nfs_release_request() from req->wb_kref.. I'm not sure if this is > setup(?) for the copy commit path? > > Otherwise, it would have seem that we'd be doing a double free and I > haven't seen that in testing (not that it can't be true)... I haven't seen any double-free messages during my testing either, so I thought it was okay. It's possible I'm wrong, though. I wonder if this is something that memory poisoning can help figure out? Anna > > > >> >> Anna? >> >> -- >> Trond Myklebust >> Linux NFS client maintainer, PrimaryData >> trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/13/2017 03:16 PM, Anna Schumaker wrote: > > > On 04/13/2017 03:07 PM, Olga Kornievskaia wrote: >> On Thu, Apr 13, 2017 at 2:51 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >>> On Thu, 2017-04-13 at 14:00 -0400, Olga Kornievskaia wrote: >>>> Hi folks, >>>> >>>> Looking for suggestions on how to fix a kernel oops. >>>> >>>> It's possible that there is a ctrl-c when the COMMIT is send. In case >>>> of the COPY, it calls >>>> nfs_commit_file() which calls wait_on_commit() that is interrupted by >>>> the crtl-c and frees the nfs_page request. So when asynchronous >>>> COMMIT >>>> rpc comes back it tried to use the nfs_page request and gets the >>>> oops. >>>> >>> >>> Is that call to nfs_free_request() in nfs_commit_file() correct? >> >> yes, nfs_commit_file() creates a new request via nfs_create_request() >> and in the end if calls nfs_free_request(); >> >>> It looks to me as if the same request will be freed in >>> nfs_commit_release_pages(). >> >> so nfs_commit_release_pages() thru the >> nfs_unlock_and_release_request() is going to call >> nfs_release_request() from req->wb_kref.. I'm not sure if this is >> setup(?) for the copy commit path? >> >> Otherwise, it would have seem that we'd be doing a double free and I >> haven't seen that in testing (not that it can't be true)... > > I haven't seen any double-free messages during my testing either, so I thought it was okay. It's possible I'm wrong, though. I wonder if this is something that memory poisoning can help figure out? After some experimenting, I can still use the nfs_page after nfs_commit_inode() has returned withotu any memory issues. > > Anna > >> >> >> >>> >>> Anna? >>> >>> -- >>> Trond Myklebust >>> Linux NFS client maintainer, PrimaryData >>> trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 13, 2017 at 4:13 PM, Anna Schumaker <Anna.Schumaker@netapp.com> wrote: > > > On 04/13/2017 03:16 PM, Anna Schumaker wrote: >> >> >> On 04/13/2017 03:07 PM, Olga Kornievskaia wrote: >>> On Thu, Apr 13, 2017 at 2:51 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>>> On Thu, 2017-04-13 at 14:00 -0400, Olga Kornievskaia wrote: >>>>> Hi folks, >>>>> >>>>> Looking for suggestions on how to fix a kernel oops. >>>>> >>>>> It's possible that there is a ctrl-c when the COMMIT is send. In case >>>>> of the COPY, it calls >>>>> nfs_commit_file() which calls wait_on_commit() that is interrupted by >>>>> the crtl-c and frees the nfs_page request. So when asynchronous >>>>> COMMIT >>>>> rpc comes back it tried to use the nfs_page request and gets the >>>>> oops. >>>>> >>>> >>>> Is that call to nfs_free_request() in nfs_commit_file() correct? >>> >>> yes, nfs_commit_file() creates a new request via nfs_create_request() >>> and in the end if calls nfs_free_request(); >>> >>>> It looks to me as if the same request will be freed in >>>> nfs_commit_release_pages(). >>> >>> so nfs_commit_release_pages() thru the >>> nfs_unlock_and_release_request() is going to call >>> nfs_release_request() from req->wb_kref.. I'm not sure if this is >>> setup(?) for the copy commit path? >>> >>> Otherwise, it would have seem that we'd be doing a double free and I >>> haven't seen that in testing (not that it can't be true)... >> >> I haven't seen any double-free messages during my testing either, so I thought it was okay. It's possible I'm wrong, though. I wonder if this is something that memory poisoning can help figure out? > > After some experimenting, I can still use the nfs_page after nfs_commit_inode() has returned withotu any memory issues. I think perhaps nfs_commit_file() needs to call nfs_release_request() instead of directly calling nfs_free_request(). nfs_release_request does a put on wb_kref and once it's 0 it'll call release. So I think with that change my ctrl-c no longer produces the oops either. I'll test a bit more and I'll send another patch. > >> >> Anna >> >>> >>> >>> >>>> >>>> Anna? >>>> >>>> -- >>>> Trond Myklebust >>>> Linux NFS client maintainer, PrimaryData >>>> trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-04-13 at 16:13 -0400, Anna Schumaker wrote: > > On 04/13/2017 03:16 PM, Anna Schumaker wrote: > > > > > > On 04/13/2017 03:07 PM, Olga Kornievskaia wrote: > > > On Thu, Apr 13, 2017 at 2:51 PM, Trond Myklebust > > > <trondmy@primarydata.com> wrote: > > > > On Thu, 2017-04-13 at 14:00 -0400, Olga Kornievskaia wrote: > > > > > Hi folks, > > > > > > > > > > Looking for suggestions on how to fix a kernel oops. > > > > > > > > > > It's possible that there is a ctrl-c when the COMMIT is send. > > > > > In case > > > > > of the COPY, it calls > > > > > nfs_commit_file() which calls wait_on_commit() that is > > > > > interrupted by > > > > > the crtl-c and frees the nfs_page request. So when > > > > > asynchronous > > > > > COMMIT > > > > > rpc comes back it tried to use the nfs_page request and gets > > > > > the > > > > > oops. > > > > > > > > > > > > > Is that call to nfs_free_request() in nfs_commit_file() > > > > correct? > > > > > > yes, nfs_commit_file() creates a new request via > > > nfs_create_request() > > > and in the end if calls nfs_free_request(); > > > > > > > It looks to me as if the same request will be freed in > > > > nfs_commit_release_pages(). > > > > > > so nfs_commit_release_pages() thru the > > > nfs_unlock_and_release_request() is going to call > > > nfs_release_request() from req->wb_kref.. I'm not sure if this is > > > setup(?) for the copy commit path? > > > > > > Otherwise, it would have seem that we'd be doing a double free > > > and I > > > haven't seen that in testing (not that it can't be true)... > > > > I haven't seen any double-free messages during my testing either, > > so I thought it was okay. It's possible I'm wrong, though. I > > wonder if this is something that memory poisoning can help figure > > out? > > After some experimenting, I can still use the nfs_page after > nfs_commit_inode() has returned withotu any memory issues. > Wait... OK, so nfs_scan_commit_list() does indeed take a reference to the req- >wb_kref, so that balances the call to nfs_release_request() in nfs_commit_release_pages(), however it also means that you should not be calling nfs_free_request(), since doing so circumvents the refcounting mechanism. Secondly, if you want to release the request and you are not sure whether or not it got cleared off the inode's cinfo commit list yet, you may also need to lock that request and call nfs_clear_request_commit(). -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com
On Thu, Apr 13, 2017 at 4:50 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Thu, 2017-04-13 at 16:13 -0400, Anna Schumaker wrote: >> >> On 04/13/2017 03:16 PM, Anna Schumaker wrote: >> > >> > >> > On 04/13/2017 03:07 PM, Olga Kornievskaia wrote: >> > > On Thu, Apr 13, 2017 at 2:51 PM, Trond Myklebust >> > > <trondmy@primarydata.com> wrote: >> > > > On Thu, 2017-04-13 at 14:00 -0400, Olga Kornievskaia wrote: >> > > > > Hi folks, >> > > > > >> > > > > Looking for suggestions on how to fix a kernel oops. >> > > > > >> > > > > It's possible that there is a ctrl-c when the COMMIT is send. >> > > > > In case >> > > > > of the COPY, it calls >> > > > > nfs_commit_file() which calls wait_on_commit() that is >> > > > > interrupted by >> > > > > the crtl-c and frees the nfs_page request. So when >> > > > > asynchronous >> > > > > COMMIT >> > > > > rpc comes back it tried to use the nfs_page request and gets >> > > > > the >> > > > > oops. >> > > > > >> > > > >> > > > Is that call to nfs_free_request() in nfs_commit_file() >> > > > correct? >> > > >> > > yes, nfs_commit_file() creates a new request via >> > > nfs_create_request() >> > > and in the end if calls nfs_free_request(); >> > > >> > > > It looks to me as if the same request will be freed in >> > > > nfs_commit_release_pages(). >> > > >> > > so nfs_commit_release_pages() thru the >> > > nfs_unlock_and_release_request() is going to call >> > > nfs_release_request() from req->wb_kref.. I'm not sure if this is >> > > setup(?) for the copy commit path? >> > > >> > > Otherwise, it would have seem that we'd be doing a double free >> > > and I >> > > haven't seen that in testing (not that it can't be true)... >> > >> > I haven't seen any double-free messages during my testing either, >> > so I thought it was okay. It's possible I'm wrong, though. I >> > wonder if this is something that memory poisoning can help figure >> > out? >> >> After some experimenting, I can still use the nfs_page after >> nfs_commit_inode() has returned withotu any memory issues. >> > > Wait... > > OK, so nfs_scan_commit_list() does indeed take a reference to the req- >>wb_kref, so that balances the call to nfs_release_request() in > nfs_commit_release_pages(), however it also means that you should not > be calling nfs_free_request(), since doing so circumvents the > refcounting mechanism. > Right now when req is created in nfs_commit_file() it starts out with wb_kref=1. After calling, nfs_scan_commit_list() wb_kref=2 Now if in normal operations nfs_commit_release_pages will drop that to 1 And then calling nfs_release_commit in nfs_commit_file() will finally drop to 0 and release. If ctrl-c happened, then nfs_commit_file will drop the ref first but will not free it and that will allow the commit to proceed and finish the rpc? > Secondly, if you want to release the request and you are not sure > whether or not it got cleared off the inode's cinfo commit list yet, > you may also need to lock that request and call > nfs_clear_request_commit(). Looking at what the function does, I don't see why this is needed. "wb_page" is NULL for this type of commit and there is no pnfs in this case. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
T24gRnJpLCAyMDE3LTA0LTE0IGF0IDExOjU2IC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90 ZToNCj4gT24gVGh1LCBBcHIgMTMsIDIwMTcgYXQgNDo1MCBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4gU2Vjb25kbHksIGlm IHlvdSB3YW50IHRvIHJlbGVhc2UgdGhlIHJlcXVlc3QgYW5kIHlvdSBhcmUgbm90IHN1cmUNCj4g PiB3aGV0aGVyIG9yIG5vdCBpdCBnb3QgY2xlYXJlZCBvZmYgdGhlIGlub2RlJ3MgY2luZm8gY29t bWl0IGxpc3QNCj4gPiB5ZXQsDQo+ID4geW91IG1heSBhbHNvIG5lZWQgdG8gbG9jayB0aGF0IHJl cXVlc3QgYW5kIGNhbGwNCj4gPiBuZnNfY2xlYXJfcmVxdWVzdF9jb21taXQoKS4NCj4gDQo+IExv b2tpbmcgYXQgd2hhdCB0aGUgZnVuY3Rpb24gZG9lcywgSSBkb24ndCBzZWUgd2h5IHRoaXMgaXMg bmVlZGVkLg0KPiAid2JfcGFnZSIgaXMgTlVMTCBmb3IgdGhpcyB0eXBlIG9mIGNvbW1pdCBhbmQg dGhlcmUgaXMgbm8gcG5mcyBpbg0KPiB0aGlzDQo+IGNhc2UuDQoNClNvbWV0aGluZyBuZWVkcyB0 byBlbnN1cmUgdGhhdCB0aGUgcmVxdWVzdCBpcyBub3Qgc2l0dGluZyBvbiBhIGNvbW1pdA0KbGlz dC4gVGhhdCBjYW4gaGFwcGVuIGlmIHRoZSBjb21taXQgc3VjY2VlZGVkLCBidXQgcmV0dXJuZWQg YSBkaWZmZXJlbnQNCnZlcmlmaWVyLCBvciBpdCBjYW4gaGFwcGVuIGlmIG5mc19zY2FuX2NvbW1p dCgpIGV4aXRzIGVhcmx5Lg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVu dCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNv bQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 14, 2017 at 3:16 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Fri, 2017-04-14 at 11:56 -0400, Olga Kornievskaia wrote: >> On Thu, Apr 13, 2017 at 4:50 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >> > >> > Secondly, if you want to release the request and you are not sure >> > whether or not it got cleared off the inode's cinfo commit list >> > yet, >> > you may also need to lock that request and call >> > nfs_clear_request_commit(). >> >> Looking at what the function does, I don't see why this is needed. >> "wb_page" is NULL for this type of commit and there is no pnfs in >> this >> case. > > Something needs to ensure that the request is not sitting on a commit > list. That can happen if the commit succeeded, but returned a different > verifier, or it can happen if nfs_scan_commit() exits early. First point I'd like you to consider is that your last statement as a separate problem that needs to be solved with a different patch. Secondly, I don't understand how to determine "if nfs_commit_file() isn't sure whether or not request got cleared off the inode's cinfo commit list". There are two cases you pointed to 1) verifier mismatch and 2) nfs_scan_commit didn't even get to the request (caz INT_MAX commits were queued before that)? For the verifier mismatch: "NFS_CONTEXT_RESEND_WRITES" is set. So nfs_commit_file() can check for that. But I guess I'm not sure what is suppose to be done? Resend the commit or resend the copy? I don't really understand how to handle #2. nfs_commit_inode() return from doing INT_MAX commits but not doing the commit that it was suppose to do nfs_commit_file() asked and I'm at a loss how to determine that. Shouldn't there have been a loop somewhere that handles *all* commit requests and not just INT_MAX requests. Is this problem not a problem from the nfs_commit_file() but rather a generic problem? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 14, 2017 at 5:17 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Fri, Apr 14, 2017 at 3:16 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> On Fri, 2017-04-14 at 11:56 -0400, Olga Kornievskaia wrote: >>> On Thu, Apr 13, 2017 at 4:50 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>> > >>> > Secondly, if you want to release the request and you are not sure >>> > whether or not it got cleared off the inode's cinfo commit list >>> > yet, >>> > you may also need to lock that request and call >>> > nfs_clear_request_commit(). >>> >>> Looking at what the function does, I don't see why this is needed. >>> "wb_page" is NULL for this type of commit and there is no pnfs in >>> this >>> case. >> >> Something needs to ensure that the request is not sitting on a commit >> list. That can happen if the commit succeeded, but returned a different >> verifier, or it can happen if nfs_scan_commit() exits early. > > First point I'd like you to consider is that your last statement as a > separate problem that needs to be solved with a different patch. > > Secondly, I don't understand how to determine "if nfs_commit_file() > isn't sure whether or not request got cleared off the inode's cinfo > commit list". There are two cases you pointed to 1) verifier mismatch > and 2) nfs_scan_commit didn't even get to the request (caz INT_MAX > commits were queued before that)? > > For the verifier mismatch: "NFS_CONTEXT_RESEND_WRITES" is set. So > nfs_commit_file() can check for that. But I guess I'm not sure what is > suppose to be done? Resend the commit or resend the copy? > > I don't really understand how to handle #2. nfs_commit_inode() return > from doing INT_MAX commits but not doing the commit that it was > suppose to do nfs_commit_file() asked and I'm at a loss how to > determine that. Shouldn't there have been a loop somewhere that > handles *all* commit requests and not just INT_MAX requests. Is this > problem not a problem from the nfs_commit_file() but rather a generic > problem? Taking a different approach to fixing it by getting rid of nfs_commit_file() and for the synchronous COPY appending COMMIT to the compound. For the asynchronous COPY, I propose to just add a function nfs4_proc_copy() that uses existing callback functions and just sends the commit without messing with the commit path. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/write.c b/fs/nfs/write.c index abb2c8a..aefff49 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1557,7 +1557,7 @@ static void nfs_writeback_result(struct rpc_task *task, static int wait_on_commit(struct nfs_mds_commit_info *cinfo) { return wait_on_atomic_t(&cinfo->rpcs_out, - nfs_wait_atomic_killable, TASK_KILLABLE); + nfs_wait_atomic_killable, TASK_UNINTERRUPTIBLE); } static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo)