Message ID | 20230517160948.811355-3-jiaqiyan@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Improve hugetlbfs read on HWPOISON hugepages | expand |
On 05/17/23 16:09, Jiaqi Yan wrote: > When a hugepage contains HWPOISON pages, read() fails to read any byte > of the hugepage and returns -EIO, although many bytes in the HWPOISON > hugepage are readable. > > Improve this by allowing hugetlbfs_read_iter returns as many bytes as > possible. For a requested range [offset, offset + len) that contains > HWPOISON page, return [offset, first HWPOISON page addr); the next read > attempt will fail and return -EIO. > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> > --- > fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 56 insertions(+), 6 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index ecfdfb2529a3..1baa08ec679f 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -282,6 +282,46 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > } > #endif > > +/* > + * Someone wants to read @bytes from a HWPOISON hugetlb @page from @offset. > + * Returns the maximum number of bytes one can read without touching the 1st raw > + * HWPOISON subpage. > + * > + * The implementation borrows the iteration logic from copy_page_to_iter*. > + */ > +static size_t adjust_range_hwpoison(struct page *page, size_t offset, size_t bytes) > +{ > + size_t n = 0; > + size_t res = 0; > + struct folio *folio = page_folio(page); > + > + folio_lock(folio); What is the reason for taking folio_lock? > + > + /* First subpage to start the loop. */ > + page += offset / PAGE_SIZE; > + offset %= PAGE_SIZE; > + while (1) { > + if (find_raw_hwp_page(folio, page) != NULL) > + break; > + > + /* Safe to read n bytes without touching HWPOISON subpage. */ > + n = min(bytes, (size_t)PAGE_SIZE - offset); > + res += n; > + bytes -= n; > + if (!bytes || !n) > + break; > + offset += n; > + if (offset == PAGE_SIZE) { > + page++; > + offset = 0; > + } > + } > + > + folio_unlock(folio); > + > + return res; > +} > + > /* > * Support for read() - Find the page attached to f_mapping and copy out the > * data. This provides functionality similar to filemap_read(). > @@ -300,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) > > while (iov_iter_count(to)) { > struct page *page; > - size_t nr, copied; > + size_t nr, copied, want; > > /* nr is the maximum number of bytes to copy from this page */ > nr = huge_page_size(h); > @@ -328,16 +368,26 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) > } else { > unlock_page(page); > > - if (PageHWPoison(page)) { > - put_page(page); > - retval = -EIO; > - break; > + if (!PageHWPoison(page)) > + want = nr; > + else { > + /* > + * Adjust how many bytes safe to read without > + * touching the 1st raw HWPOISON subpage after > + * offset. > + */ > + want = adjust_range_hwpoison(page, offset, nr); > + if (want == 0) { > + put_page(page); > + retval = -EIO; > + break; > + } > } > > /* > * We have the page, copy it to user space buffer. > */ > - copied = copy_page_to_iter(page, offset, nr, to); > + copied = copy_page_to_iter(page, offset, want, to); > put_page(page); > } > offset += copied; > -- > 2.40.1.606.ga4b1b128d6-goog > Code looks fine, just wondering about that folio_lock.
On Thu, May 18, 2023 at 3:18 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 05/17/23 16:09, Jiaqi Yan wrote: > > When a hugepage contains HWPOISON pages, read() fails to read any byte > > of the hugepage and returns -EIO, although many bytes in the HWPOISON > > hugepage are readable. > > > > Improve this by allowing hugetlbfs_read_iter returns as many bytes as > > possible. For a requested range [offset, offset + len) that contains > > HWPOISON page, return [offset, first HWPOISON page addr); the next read > > attempt will fail and return -EIO. > > > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> > > --- > > fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 56 insertions(+), 6 deletions(-) > > > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > > index ecfdfb2529a3..1baa08ec679f 100644 > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -282,6 +282,46 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > > } > > #endif > > > > +/* > > + * Someone wants to read @bytes from a HWPOISON hugetlb @page from @offset. > > + * Returns the maximum number of bytes one can read without touching the 1st raw > > + * HWPOISON subpage. > > + * > > + * The implementation borrows the iteration logic from copy_page_to_iter*. > > + */ > > +static size_t adjust_range_hwpoison(struct page *page, size_t offset, size_t bytes) > > +{ > > + size_t n = 0; > > + size_t res = 0; > > + struct folio *folio = page_folio(page); > > + > > + folio_lock(folio); > > What is the reason for taking folio_lock? I intended to make this routine (mostly find_raw_hwp_page) to be serialized with folio_clear_hugetlb_hwpoison() and hwpoison_user_mappings() in try_memory_failure_hugetlb(). They don't directly affect the raw_hwp_list. I can remove the lock in v2. > > > + > > + /* First subpage to start the loop. */ > > + page += offset / PAGE_SIZE; > > + offset %= PAGE_SIZE; > > + while (1) { > > + if (find_raw_hwp_page(folio, page) != NULL) > > + break; > > + > > + /* Safe to read n bytes without touching HWPOISON subpage. */ > > + n = min(bytes, (size_t)PAGE_SIZE - offset); > > + res += n; > > + bytes -= n; > > + if (!bytes || !n) > > + break; > > + offset += n; > > + if (offset == PAGE_SIZE) { > > + page++; > > + offset = 0; > > + } > > + } > > + > > + folio_unlock(folio); > > + > > + return res; > > +} > > + > > /* > > * Support for read() - Find the page attached to f_mapping and copy out the > > * data. This provides functionality similar to filemap_read(). > > @@ -300,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) > > > > while (iov_iter_count(to)) { > > struct page *page; > > - size_t nr, copied; > > + size_t nr, copied, want; > > > > /* nr is the maximum number of bytes to copy from this page */ > > nr = huge_page_size(h); > > @@ -328,16 +368,26 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) > > } else { > > unlock_page(page); > > > > - if (PageHWPoison(page)) { > > - put_page(page); > > - retval = -EIO; > > - break; > > + if (!PageHWPoison(page)) > > + want = nr; > > + else { > > + /* > > + * Adjust how many bytes safe to read without > > + * touching the 1st raw HWPOISON subpage after > > + * offset. > > + */ > > + want = adjust_range_hwpoison(page, offset, nr); > > + if (want == 0) { > > + put_page(page); > > + retval = -EIO; > > + break; > > + } > > } > > > > /* > > * We have the page, copy it to user space buffer. > > */ > > - copied = copy_page_to_iter(page, offset, nr, to); > > + copied = copy_page_to_iter(page, offset, want, to); > > put_page(page); > > } > > offset += copied; > > -- > > 2.40.1.606.ga4b1b128d6-goog > > > > Code looks fine, just wondering about that folio_lock. > -- > Mike Kravetz
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ecfdfb2529a3..1baa08ec679f 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -282,6 +282,46 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } #endif +/* + * Someone wants to read @bytes from a HWPOISON hugetlb @page from @offset. + * Returns the maximum number of bytes one can read without touching the 1st raw + * HWPOISON subpage. + * + * The implementation borrows the iteration logic from copy_page_to_iter*. + */ +static size_t adjust_range_hwpoison(struct page *page, size_t offset, size_t bytes) +{ + size_t n = 0; + size_t res = 0; + struct folio *folio = page_folio(page); + + folio_lock(folio); + + /* First subpage to start the loop. */ + page += offset / PAGE_SIZE; + offset %= PAGE_SIZE; + while (1) { + if (find_raw_hwp_page(folio, page) != NULL) + break; + + /* Safe to read n bytes without touching HWPOISON subpage. */ + n = min(bytes, (size_t)PAGE_SIZE - offset); + res += n; + bytes -= n; + if (!bytes || !n) + break; + offset += n; + if (offset == PAGE_SIZE) { + page++; + offset = 0; + } + } + + folio_unlock(folio); + + return res; +} + /* * Support for read() - Find the page attached to f_mapping and copy out the * data. This provides functionality similar to filemap_read(). @@ -300,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) while (iov_iter_count(to)) { struct page *page; - size_t nr, copied; + size_t nr, copied, want; /* nr is the maximum number of bytes to copy from this page */ nr = huge_page_size(h); @@ -328,16 +368,26 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) } else { unlock_page(page); - if (PageHWPoison(page)) { - put_page(page); - retval = -EIO; - break; + if (!PageHWPoison(page)) + want = nr; + else { + /* + * Adjust how many bytes safe to read without + * touching the 1st raw HWPOISON subpage after + * offset. + */ + want = adjust_range_hwpoison(page, offset, nr); + if (want == 0) { + put_page(page); + retval = -EIO; + break; + } } /* * We have the page, copy it to user space buffer. */ - copied = copy_page_to_iter(page, offset, nr, to); + copied = copy_page_to_iter(page, offset, want, to); put_page(page); } offset += copied;
When a hugepage contains HWPOISON pages, read() fails to read any byte of the hugepage and returns -EIO, although many bytes in the HWPOISON hugepage are readable. Improve this by allowing hugetlbfs_read_iter returns as many bytes as possible. For a requested range [offset, offset + len) that contains HWPOISON page, return [offset, first HWPOISON page addr); the next read attempt will fail and return -EIO. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> --- fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 56 insertions(+), 6 deletions(-)