Message ID | 20200211010348.6872-2-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Change readahead API | expand |
On 11/02/2020 02:05, Matthew Wilcox wrote: > even though I'm pretty sure we're not going to readahead more than 2^32 > pages ever. And 640K is more memory than anyone will ever need on a computer *scnr*
On Tue, Feb 11, 2020 at 08:19:14AM +0000, Johannes Thumshirn wrote: > On 11/02/2020 02:05, Matthew Wilcox wrote: > > even though I'm pretty sure we're not going to readahead more than 2^32 > > pages ever. > > And 640K is more memory than anyone will ever need on a computer *scnr* Sure, but bandwidth just isn't increasing quickly enough to have this make sense. 2^32 pages even on our smallest page size machines is 16GB. Right now, we cap readahead at just 256kB. If we did try to readahead 16GB, we'd be occupying a PCIe gen4 x4 drive for two seconds, just satisfying this one readahead. PCIe has historically doubled in bandwidth every three years or so, so to get this down to something reasonable like a hundredth of a second, we're looking at PCIe gen12 in twenty years or so. And I bet we still won't do it (also, I doubt PCIe will continue doubling bandwidth every three years). And Linus has forbidden individual IOs over 2GB anyway, so not happening until he's forced to see the error of his ways ;-)
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
On 2/10/20 5:03 PM, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > ra_submit() which is a wrapper around __do_page_cache_readahead() already > returns an unsigned long, and the 'nr_to_read' parameter is an unsigned > long, so fix __do_page_cache_readahead() to return an unsigned long, > even though I'm pretty sure we're not going to readahead more than 2^32 > pages ever. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/internal.h | 2 +- > mm/readahead.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 3cf20ab3ca01..41b93c4b3ab7 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -49,7 +49,7 @@ void unmap_page_range(struct mmu_gather *tlb, > unsigned long addr, unsigned long end, > struct zap_details *details); > > -extern unsigned int __do_page_cache_readahead(struct address_space *mapping, > +extern unsigned long __do_page_cache_readahead(struct address_space *mapping, > struct file *filp, pgoff_t offset, unsigned long nr_to_read, > unsigned long lookahead_size); > > diff --git a/mm/readahead.c b/mm/readahead.c > index 2fe72cd29b47..6bf73ef33b7e 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -152,7 +152,7 @@ static int read_pages(struct address_space *mapping, struct file *filp, > * > * Returns the number of pages requested, or the maximum amount of I/O allowed. > */ > -unsigned int __do_page_cache_readahead(struct address_space *mapping, > +unsigned long __do_page_cache_readahead(struct address_space *mapping, > struct file *filp, pgoff_t offset, unsigned long nr_to_read, > unsigned long lookahead_size) > { > @@ -161,7 +161,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping, > unsigned long end_index; /* The last page we want to read */ > LIST_HEAD(page_pool); > int page_idx; What about page_idx, too? It should also have the same data type as nr_pages, as long as we're trying to be consistent on this point. Just want to ensure we're ready to handle those 2^33+ page readaheads... :) thanks,
On Thu, Feb 13, 2020 at 07:19:53PM -0800, John Hubbard wrote: > On 2/10/20 5:03 PM, Matthew Wilcox wrote: > > @@ -161,7 +161,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping, > > unsigned long end_index; /* The last page we want to read */ > > LIST_HEAD(page_pool); > > int page_idx; > > > What about page_idx, too? It should also have the same data type as nr_pages, as long as > we're trying to be consistent on this point. > > Just want to ensure we're ready to handle those 2^33+ page readaheads... :) Nah, this is just a type used internally to the function. Getting the API right for the callers is the important part.
On 2/13/20 8:21 PM, Matthew Wilcox wrote: > On Thu, Feb 13, 2020 at 07:19:53PM -0800, John Hubbard wrote: >> On 2/10/20 5:03 PM, Matthew Wilcox wrote: >>> @@ -161,7 +161,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping, >>> unsigned long end_index; /* The last page we want to read */ >>> LIST_HEAD(page_pool); >>> int page_idx; >> >> >> What about page_idx, too? It should also have the same data type as nr_pages, as long as >> we're trying to be consistent on this point. >> >> Just want to ensure we're ready to handle those 2^33+ page readaheads... :) > > Nah, this is just a type used internally to the function. Getting the > API right for the callers is the important part. > Agreed that the real point of this is to match up the API, but why not finish the job by going all the way through? It's certainly not something we need to lose sleep over, but it does seem like you don't want to have code like this: for (page_idx = 0; page_idx < nr_to_read; page_idx++) { ...with the ability, technically, to overflow page_idx due to it being an int, while nr_to_read is an unsigned long. (And the new sanitizers and checkers are apt to complain about it, btw.) (Apologies if there is some kernel coding idiom that I still haven't learned, about this sort of thing.) thanks,
On Mon, Feb 10, 2020 at 05:03:36PM -0800, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > ra_submit() which is a wrapper around __do_page_cache_readahead() already > returns an unsigned long, and the 'nr_to_read' parameter is an unsigned > long, so fix __do_page_cache_readahead() to return an unsigned long, > even though I'm pretty sure we're not going to readahead more than 2^32 > pages ever. I was going through this and realised it's completely pointless -- the returned value from ra_submit() and __do_page_cache_readahead() is eventually ignored through all paths. So I'm replacing this patch with one that makes everything return void.
diff --git a/mm/internal.h b/mm/internal.h index 3cf20ab3ca01..41b93c4b3ab7 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -49,7 +49,7 @@ void unmap_page_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, struct zap_details *details); -extern unsigned int __do_page_cache_readahead(struct address_space *mapping, +extern unsigned long __do_page_cache_readahead(struct address_space *mapping, struct file *filp, pgoff_t offset, unsigned long nr_to_read, unsigned long lookahead_size); diff --git a/mm/readahead.c b/mm/readahead.c index 2fe72cd29b47..6bf73ef33b7e 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -152,7 +152,7 @@ static int read_pages(struct address_space *mapping, struct file *filp, * * Returns the number of pages requested, or the maximum amount of I/O allowed. */ -unsigned int __do_page_cache_readahead(struct address_space *mapping, +unsigned long __do_page_cache_readahead(struct address_space *mapping, struct file *filp, pgoff_t offset, unsigned long nr_to_read, unsigned long lookahead_size) { @@ -161,7 +161,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping, unsigned long end_index; /* The last page we want to read */ LIST_HEAD(page_pool); int page_idx; - unsigned int nr_pages = 0; + unsigned long nr_pages = 0; loff_t isize = i_size_read(inode); gfp_t gfp_mask = readahead_gfp_mask(mapping);