Message ID | 1446563847-14005-46-git-send-email-agruenba@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 3, 2015 at 10:17 AM, Andreas Gruenbacher <agruenba@redhat.com> wrote: > When encoding large, variable-length objects such as acls into xdr_bufs, > it is easier to allocate buffer pages on demand rather than precomputing > the required buffer size. > NACK. We're not doing allocations from inside the XDR encoders. This can and should be done before calling into the SUNRPC layer. Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Trond, On Tue, Nov 3, 2015 at 5:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Tue, Nov 3, 2015 at 10:17 AM, Andreas Gruenbacher > <agruenba@redhat.com> wrote: >> When encoding large, variable-length objects such as acls into xdr_bufs, >> it is easier to allocate buffer pages on demand rather than precomputing >> the required buffer size. > > NACK. We're not doing allocations from inside the XDR encoders. This > can and should be done before calling into the SUNRPC layer. an XDR-encoded ACL can be up to 64k (16 pages) in size. In practice, large ACLs like that will almost never occur and almost all ACLs will fit into a single page though. The XDR-encoded ACL contains strings for the user and group names which need to be looked up when the idmapper is used. Those lookups are somewhat expensive; in addition, the lookup results can change over time. When precomputing the size, allocating space, and then encoding the ACL, we could run out of space when encoding. So we could always allocate the maximum 16 pages, encode the acl, and free the unused pages. This would be rather wasteful though. Given how simple it is to allocate pages as we go, this seems the better choice here. This doesn't break any existing code either; NULL page pointers would have oopsed in xdr_get_next_encode_buffer before. From the memory management point of view, there is no difference in preallocating GFP_NOFS pages and allocating them on demand; the pages are allocated in the same task and locking context in both cases. So could you please explain why you object to this change? Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 5, 2015 at 6:07 AM, Andreas Gruenbacher <agruenba@redhat.com> wrote: > Trond, > > On Tue, Nov 3, 2015 at 5:25 PM, Trond Myklebust > <trond.myklebust@primarydata.com> wrote: >> On Tue, Nov 3, 2015 at 10:17 AM, Andreas Gruenbacher >> <agruenba@redhat.com> wrote: >>> When encoding large, variable-length objects such as acls into xdr_bufs, >>> it is easier to allocate buffer pages on demand rather than precomputing >>> the required buffer size. >> >> NACK. We're not doing allocations from inside the XDR encoders. This >> can and should be done before calling into the SUNRPC layer. > > an XDR-encoded ACL can be up to 64k (16 pages) in size. In practice, > large ACLs like that will almost never occur and almost all ACLs will > fit into a single page though. > > The XDR-encoded ACL contains strings for the user and group names > which need to be looked up when the idmapper is used. Those lookups > are somewhat expensive; in addition, the lookup results can change > over time. When precomputing the size, allocating space, and then > encoding the ACL, we could run out of space when encoding. > > So we could always allocate the maximum 16 pages, encode the acl, and > free the unused pages. This would be rather wasteful though. > > Given how simple it is to allocate pages as we go, this seems the > better choice here. This doesn't break any existing code either; NULL > page pointers would have oopsed in xdr_get_next_encode_buffer before. > > From the memory management point of view, there is no difference in > preallocating GFP_NOFS pages and allocating them on demand; the pages > are allocated in the same task and locking context in both cases. > > So could you please explain why you object to this change? > Allocating memory deep in the bowels of the RPC code with the expectation that it will be freed by the caller of the RPC request is a layering violation of the ugliest sort. How is anyone who is unfamiliar with the code going to be able to understand what is going on without tracing through 1000 lines of code to spot where the allocation is happening? Aside from that, we do not want any non-critical blocking while holding the RPC socket lock. Your allocation request will block all further traffic to the server until it is satisfied. That includes blocking page writeback, which might actually free up memory to satisfy the allocation. As I said above, there is no reason whatsoever to have to do all this inside encode_setacl(). The entire ACL encoding into pages can be done before even calling into the RPC layer, just like we do today. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 5, 2015 at 4:57 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Thu, Nov 5, 2015 at 6:07 AM, Andreas Gruenbacher <agruenba@redhat.com> wrote: >> Trond, >> >> On Tue, Nov 3, 2015 at 5:25 PM, Trond Myklebust >> <trond.myklebust@primarydata.com> wrote: >>> On Tue, Nov 3, 2015 at 10:17 AM, Andreas Gruenbacher >>> <agruenba@redhat.com> wrote: >>>> When encoding large, variable-length objects such as acls into xdr_bufs, >>>> it is easier to allocate buffer pages on demand rather than precomputing >>>> the required buffer size. >>> >>> NACK. We're not doing allocations from inside the XDR encoders. This >>> can and should be done before calling into the SUNRPC layer. >> >> an XDR-encoded ACL can be up to 64k (16 pages) in size. In practice, >> large ACLs like that will almost never occur and almost all ACLs will >> fit into a single page though. >> >> The XDR-encoded ACL contains strings for the user and group names >> which need to be looked up when the idmapper is used. Those lookups >> are somewhat expensive; in addition, the lookup results can change >> over time. When precomputing the size, allocating space, and then >> encoding the ACL, we could run out of space when encoding. >> >> So we could always allocate the maximum 16 pages, encode the acl, and >> free the unused pages. This would be rather wasteful though. >> >> Given how simple it is to allocate pages as we go, this seems the >> better choice here. This doesn't break any existing code either; NULL >> page pointers would have oopsed in xdr_get_next_encode_buffer before. >> >> From the memory management point of view, there is no difference in >> preallocating GFP_NOFS pages and allocating them on demand; the pages >> are allocated in the same task and locking context in both cases. >> >> So could you please explain why you object to this change? > > Allocating memory deep in the bowels of the RPC code with the > expectation that it will be freed by the caller of the RPC request is > a layering violation of the ugliest sort. Ah, there we have it, Godwin's Law for software discussions. What happens here is the following: the caller sets up an xdr_stream that contains an array of NULL page pointers (__nfs4_proc_set_acl -> nfs4_encode_acl -> xdr_init_encode_pages). It does so on purpose to tell the XDR layer to allocate pages for it as needed, and it knows it is responsible for later freeing those pages. Without this patch, the XDR layer would immediately Oops when hitting a NULL page pointer. This tells us that the XDR layer is so far not prepared to handle NULL page pointers, and that we can assign a meaning to NULL page pointers without affecting existing callers. Using the existing XDR layer for XDR-encoding ACLs makes sense because the XDR layer already knows how to align things, encode into multiple pages, handle strings that wrap across pages, etc. I really don't want to duplicate all that for ACLs. > How is anyone who is > unfamiliar with the code going to be able to understand what is going > on without tracing through 1000 lines of code to spot where the > allocation is happening? Somebody unfamiliar with the existing code will not understand it without putting some effort in, either. What do you expect? Besides, the caller doesn't care where the allocation is happening, it just cares that it is happening when it should. > Aside from that, we do not want any non-critical blocking while > holding the RPC socket lock. Your allocation request will block all > further traffic to the server until it is satisfied. That includes > blocking page writeback, which might actually free up memory to > satisfy the allocation. > > As I said above, there is no reason whatsoever to have to do all this > inside encode_setacl(). The entire ACL encoding into pages can be done > before even calling into the RPC layer, just like we do today. You have pointed that out before, and as a consequence, it was fixed in the July 22 snapshot (https://lwn.net/Articles/652058/): * Changes to the nfs patches: acls are now encoded above the sunrpc layer. This means we can no longer encode small acls directly into the scratch area of an xdr_buf, we always have to allocate extra memory. But we also don't need to touch the nfs sunrpc code, which Trond objected to. Since then, the ACL encoding happens in __nfs4_proc_set_acl, before calling into the RPC layer. Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 4439ac4..63c1c36 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -537,6 +537,15 @@ static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, */ xdr->scratch.iov_base = xdr->p; xdr->scratch.iov_len = frag1bytes; + + if (!*xdr->page_ptr) { + struct page *page = alloc_page(GFP_NOFS); + + if (!page) + return NULL; + *xdr->page_ptr = page; + } + p = page_address(*xdr->page_ptr); /* * Note this is where the next encode will start after we've
When encoding large, variable-length objects such as acls into xdr_bufs, it is easier to allocate buffer pages on demand rather than precomputing the required buffer size. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> --- net/sunrpc/xdr.c | 9 +++++++++ 1 file changed, 9 insertions(+)