Message ID | 4DDBCBBA.5040700@panasas.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2011-05-24 18:34, Myklebust, Trond wrote: > On Tue, 2011-05-24 at 18:16 +0300, Benny Halevy wrote: >> On 2011-05-23 21:50, Boaz Harrosh wrote: >> > On 05/23/2011 07:33 PM, Benny Halevy wrote: >> > Benny Hi >> > >> > I have a problem that the default wsize is very small 64K and >> > I get small IOs. I found that the governing member right now >> > is NFS_SERVER()->wsize >> > >> > I did the below hack on My current code, but you took that away >> > from me. >> > >> > diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c >> > index ec40408..f7b09e1 100644 >> > --- a/fs/nfs/objlayout/objlayout.c >> > +++ b/fs/nfs/objlayout/objlayout.c >> > + server->wsize = ((PAGE_SIZE - sizeof(struct bio)) / > sizeof(struct bio_vec)) >> > + * PAGE_SIZE * 2; >> > >> > - dprintk("%s: Return data=%p\n", __func__, data); >> > + dprintk("%s: Return data=%p wsize=0x%x\n", __func__, data, > server->wsize); >> > return 0; >> > } >> > >> > What do you want that we do to replace this. The default 64K is to > small. >> > I don't mind that for pnfs it will be ~0 and the pg_test() will test >> > for maxc_size as well. But then we'll also need the current size or the >> > start_index >> > >> > Boaz >> >> How about this approach? >> >> git diff --stat -p -M >> fs/nfs/pagelist.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c >> index c80add6..3f5508b 100644 >> --- a/fs/nfs/pagelist.c >> +++ b/fs/nfs/pagelist.c >> @@ -293,7 +293,7 @@ static int nfs_pageio_do_add_request(struct >> nfs_pageio_descriptor *desc, >> if (desc->pg_bsize < PAGE_SIZE) >> return 0; >> newlen += desc->pg_count; >> - if (newlen > desc->pg_bsize) >> + if (newlen > desc->pg_bsize && !desc->pg_test) >> return 0; >> prev = nfs_list_entry(desc->pg_list.prev); >> if (!nfs_can_coalesce_requests(prev, req, desc)) > > Alternatively, clean the above up by putting the newlen > desc->pg_bsize > test into a default nfs_generic_test_coalesce() and require ordinary NFS > reads and writes to set that as their desc->pg_test(). Good idea! I'll send a RFC patch including the generic pnfs pg_test for the layout drivers. Fred - I hope you haven't started working on pg_test, have you? Please let me know. Benny > > Cheers > Trond > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/24/2011 06:34 PM, Myklebust, Trond wrote: > On Tue, 2011-05-24 at 18:16 +0300, Benny Halevy wrote: >> On 2011-05-23 21:50, Boaz Harrosh wrote: >> > On 05/23/2011 07:33 PM, Benny Halevy wrote: >> > Benny Hi >> > >> > I have a problem that the default wsize is very small 64K and >> > I get small IOs. I found that the governing member right now >> > is NFS_SERVER()->wsize >> > >> > I did the below hack on My current code, but you took that away >> > from me. >> > >> > diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c >> > index ec40408..f7b09e1 100644 >> > --- a/fs/nfs/objlayout/objlayout.c >> > +++ b/fs/nfs/objlayout/objlayout.c >> > + server->wsize = ((PAGE_SIZE - sizeof(struct bio)) / sizeof(struct bio_vec)) >> > + * PAGE_SIZE * 2; >> > >> > - dprintk("%s: Return data=%p\n", __func__, data); >> > + dprintk("%s: Return data=%p wsize=0x%x\n", __func__, data, server->wsize); >> > return 0; >> > } >> > >> > What do you want that we do to replace this. The default 64K is to small. >> > I don't mind that for pnfs it will be ~0 and the pg_test() will test >> > for maxc_size as well. But then we'll also need the current size or the >> > start_index >> > >> > Boaz >> >> How about this approach? >> >> git diff --stat -p -M >> fs/nfs/pagelist.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c >> index c80add6..3f5508b 100644 >> --- a/fs/nfs/pagelist.c >> +++ b/fs/nfs/pagelist.c >> @@ -293,7 +293,7 @@ static int nfs_pageio_do_add_request(struct >> nfs_pageio_descriptor *desc, >> if (desc->pg_bsize < PAGE_SIZE) >> return 0; >> newlen += desc->pg_count; >> - if (newlen > desc->pg_bsize) >> + if (newlen > desc->pg_bsize && !desc->pg_test) >> return 0; >> prev = nfs_list_entry(desc->pg_list.prev); >> if (!nfs_can_coalesce_requests(prev, req, desc)) > > Alternatively, clean the above up by putting the newlen > desc->pg_bsize > test into a default nfs_generic_test_coalesce() and require ordinary NFS > reads and writes to set that as their desc->pg_test(). > > Cheers > Trond This all approach sounds very good to me. But please I have some questions? In the nfs_pageio_descriptor passed to pg_test() together with the two pages: Which member means the current byte_size (or page_count?) and what is the meaning of some of these fields struct nfs_pageio_descriptor { .... unsigned long pg_bytes_written; Is this for result after read/write? size_t pg_count; Is this the number of pages added up to now? Do we also have the start of the first page? size_t pg_bsize; So I understand this is the max allowed pages. Does that mean also the allocated size or Just the negotiated size with the server? (Really bad name if you ask me) unsigned int pg_base; Is that the index of the first page? That cannot be, the page->index needs to be 64bit. So what is this then? ... }; Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2011-05-24 at 18:56 +0300, Boaz Harrosh wrote: > In the nfs_pageio_descriptor passed to pg_test() together with the two pages: > Which member means the current byte_size (or page_count?) and what is the > meaning of some of these fields > > struct nfs_pageio_descriptor { > .... > unsigned long pg_bytes_written; > Is this for result after read/write? This is the total number of bytes we successfully called nfs_pageio_doio() for. In other words, it should represent the total number of bytes we put on the wire. > size_t pg_count; > Is this the number of pages added up to now? > Do we also have the start of the first page? This is the number of bytes we have successfully coalesced into the current i/o. > size_t pg_bsize; > So I understand this is the max allowed pages. Does > that mean also the allocated size or Just the negotiated > size with the server? (Really bad name if you ask me) It means the 'block size'. In ordinary NFS parlance that will be the 'rsize' or the 'wsize'. > unsigned int pg_base; > Is that the index of the first page? That cannot be, the page->index > needs to be 64bit. So what is this then? It is used when dealing with I/O requests that are not page aligned. If you consider the pages that we are to write out in the current I/O as a single buffer, then the pg_base is the offset of the first byte to write out/read in.
On 05/24/2011 07:21 PM, Trond Myklebust wrote: > On Tue, 2011-05-24 at 18:56 +0300, Boaz Harrosh wrote: >> In the nfs_pageio_descriptor passed to pg_test() together with the two pages: >> Which member means the current byte_size (or page_count?) and what is the >> meaning of some of these fields >> >> struct nfs_pageio_descriptor { >> .... >> unsigned long pg_bytes_written; >> Is this for result after read/write? > > This is the total number of bytes we successfully called > nfs_pageio_doio() for. In other words, it should represent the total > number of bytes we put on the wire. > >> size_t pg_count; >> Is this the number of pages added up to now? >> Do we also have the start of the first page? > > This is the number of bytes we have successfully coalesced into the > current i/o. > >> size_t pg_bsize; >> So I understand this is the max allowed pages. Does >> that mean also the allocated size or Just the negotiated >> size with the server? (Really bad name if you ask me) > > It means the 'block size'. In ordinary NFS parlance that will be the > 'rsize' or the 'wsize'. > >> unsigned int pg_base; >> Is that the index of the first page? That cannot be, the page->index >> needs to be 64bit. So what is this then? > > It is used when dealing with I/O requests that are not page aligned. > > If you consider the pages that we are to write out in the current I/O as > a single buffer, then the pg_base is the offset of the first byte to > write out/read in. > Thanks Trond, I think I understand and can write the proper implementation for objects->pg_test now. One more question. I need the file offset of the beginning of the write would that then be: int objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev, struct nfs_page *req) } struct nfs_page *first_pg = list_entry(pgio->pg_list.next, struct nfs_page, wb_list); u64 io_offset = (first_pg->wb_index << PAGE_SHIFT) + first_pg->wb_offset; ... } Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2011-05-24 at 19:58 +0300, Boaz Harrosh wrote: > Thanks Trond, I think I understand and can write the proper implementation > for objects->pg_test now. One more question. I need the file offset > of the beginning of the write would that then be: > > int objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev, > struct nfs_page *req) > } > struct nfs_page *first_pg = list_entry(pgio->pg_list.next, struct nfs_page, wb_list); > > u64 io_offset = (first_pg->wb_index << PAGE_SHIFT) + first_pg->wb_offset; > ... > } Yes. That looks correct.
On Tue, May 24, 2011 at 11:49 AM, Benny Halevy <bhalevy@panasas.com> wrote: > I'll send a RFC patch including the generic pnfs pg_test for > the layout drivers. > > Fred - I hope you haven't started working on pg_test, have you? > Please let me know. > > Benny > > It is all yours. Fred -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/24/2011 08:05 PM, Trond Myklebust wrote: > On Tue, 2011-05-24 at 19:58 +0300, Boaz Harrosh wrote: >> Thanks Trond, I think I understand and can write the proper implementation >> for objects->pg_test now. One more question. I need the file offset >> of the beginning of the write would that then be: >> >> int objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev, >> struct nfs_page *req) >> } >> struct nfs_page *first_pg = list_entry(pgio->pg_list.next, struct nfs_page, wb_list); >> >> u64 io_offset = (first_pg->wb_index << PAGE_SHIFT) + first_pg->wb_offset; >> ... >> } > > Yes. That looks correct. > Grate the new skim with pg_test looks perfect. We have all we need and it will be really nice with the raid5/6 code. I think I will use a similar system in exofs now. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index c80add6..3f5508b 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -293,7 +293,7 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc, if (desc->pg_bsize < PAGE_SIZE) return 0; newlen += desc->pg_count; - if (newlen > desc->pg_bsize) + if (newlen > desc->pg_bsize && !desc->pg_test) return 0; prev = nfs_list_entry(desc->pg_list.prev);