Message ID | 20191126153605.27564-1-sjpark@amazon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/blkback: Avoid unmapping unmapped grant pages | expand |
On Tue, Nov 26, 2019 at 04:36:05PM +0100, SeongJae Park wrote: > From: SeongJae Park <sjpark@amazon.de> > > For each I/O request, blkback first maps the foreign pages for the > request to its local pages. If an allocation of a local page for the > mapping fails, it should unmap every mapping already made for the > request. > > However, blkback's handling mechanism for the allocation failure does > not mark the remaining foreign pages as unmapped. Therefore, the unmap > function merely tries to unmap every valid grant page for the request, > including the pages not mapped due to the allocation failure. On a > system that fails the allocation frequently, this problem leads to > following kernel crash. > > [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > [ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40 > [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 > [ 372.012562] Oops: 0002 [#1] SMP > [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 > ... > [ 372.012746] Call Trace: > [ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40 > [ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback] > ... > [ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback] > ... > Decompressing Linux... Parsing ELF... done. > Booting the kernel. > [ 0.000000] Initializing cgroup subsys cpuset > > This commit fixes this problem by marking the grant pages of the given > request that didn't mapped due to the allocation failure as invalid. > > Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") > > Signed-off-by: SeongJae Park <sjpark@amazon.de> > Reviewed-by: David Woodhouse <dwmw@amazon.de> > Reviewed-by: Maximilian Heyne <mheyne@amazon.de> > Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
On 27.11.19 10:13, Roger Pau Monné wrote: > On Tue, Nov 26, 2019 at 04:36:05PM +0100, SeongJae Park wrote: >> From: SeongJae Park <sjpark@amazon.de> >> >> For each I/O request, blkback first maps the foreign pages for the >> request to its local pages. If an allocation of a local page for the >> mapping fails, it should unmap every mapping already made for the >> request. >> >> However, blkback's handling mechanism for the allocation failure does >> not mark the remaining foreign pages as unmapped. Therefore, the unmap >> function merely tries to unmap every valid grant page for the request, >> including the pages not mapped due to the allocation failure. On a >> system that fails the allocation frequently, this problem leads to >> following kernel crash. >> >> [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 >> [ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40 >> [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 >> [ 372.012562] Oops: 0002 [#1] SMP >> [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 >> ... >> [ 372.012746] Call Trace: >> [ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40 >> [ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback] >> ... >> [ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback] >> ... >> Decompressing Linux... Parsing ELF... done. >> Booting the kernel. >> [ 0.000000] Initializing cgroup subsys cpuset >> >> This commit fixes this problem by marking the grant pages of the given >> request that didn't mapped due to the allocation failure as invalid. >> >> Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") >> >> Signed-off-by: SeongJae Park <sjpark@amazon.de> >> Reviewed-by: David Woodhouse <dwmw@amazon.de> >> Reviewed-by: Maximilian Heyne <mheyne@amazon.de> >> Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > > Thanks, Roger. May I ask some more comments? Thanks, SeongJae Park
On 11/26/19 8:36 AM, SeongJae Park wrote: > From: SeongJae Park <sjpark@amazon.de> > > For each I/O request, blkback first maps the foreign pages for the > request to its local pages. If an allocation of a local page for the > mapping fails, it should unmap every mapping already made for the > request. > > However, blkback's handling mechanism for the allocation failure does > not mark the remaining foreign pages as unmapped. Therefore, the unmap > function merely tries to unmap every valid grant page for the request, > including the pages not mapped due to the allocation failure. On a > system that fails the allocation frequently, this problem leads to > following kernel crash. > > [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > [ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40 > [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 > [ 372.012562] Oops: 0002 [#1] SMP > [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 > ... > [ 372.012746] Call Trace: > [ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40 > [ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback] > ... > [ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback] > ... > Decompressing Linux... Parsing ELF... done. > Booting the kernel. > [ 0.000000] Initializing cgroup subsys cpuset > > This commit fixes this problem by marking the grant pages of the given > request that didn't mapped due to the allocation failure as invalid. > > Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") Queued up with Roger's reviewed-by.
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..3666afa639d1 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -936,6 +936,8 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, out_of_memory: pr_alert("%s: out of memory\n", __func__); put_free_pages(ring, pages_to_gnt, segs_to_map); + for (i = last_map; i < num; i++) + pages[i]->handle = BLKBACK_INVALID_HANDLE; return -ENOMEM; }