Message ID | 20221015211025.16781-1-chrisfriedt@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] hw: misc: edu: fix 2 off-by-one errors | expand |
On 15. 10. 22, 23:10, Chris Friedt wrote: > From: Christopher Friedt <cfriedt@meta.com> > > In the case that size1 was zero, because of the explicit > 'end1 > addr' check, the range check would fail and the error > message would read as shown below. The correct comparison > is 'end1 >= addr' (or 'addr <= end1'). > > EDU: DMA range 0x40000-0x3ffff out of bounds (0x40000-0x40fff)! > > At the opposite end, in the case that size1 was 4096, within() > would fail because of the non-inclusive check 'end1 < end2', > which should have been 'end1 <= end2'. The error message would > previously say > > EDU: DMA range 0x40000-0x40fff out of bounds (0x40000-0x40fff)! > > This change > 1. renames local variables to be more less ambiguous > 2. fixes the two off-by-one errors described above. This should be split into two patches. This way, it's hard to review. > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1254 thanks,
On 221015 1710, Chris Friedt wrote: > From: Christopher Friedt <cfriedt@meta.com> > > In the case that size1 was zero, because of the explicit > 'end1 > addr' check, the range check would fail and the error > message would read as shown below. The correct comparison > is 'end1 >= addr' (or 'addr <= end1'). > > EDU: DMA range 0x40000-0x3ffff out of bounds (0x40000-0x40fff)! > > At the opposite end, in the case that size1 was 4096, within() > would fail because of the non-inclusive check 'end1 < end2', > which should have been 'end1 <= end2'. The error message would > previously say > > EDU: DMA range 0x40000-0x40fff out of bounds (0x40000-0x40fff)! > > This change > 1. renames local variables to be more less ambiguous > 2. fixes the two off-by-one errors described above. > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1254 > > Signed-off-by: Christopher Friedt <cfriedt@meta.com> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> As a side-note, seems strange that edu_check_range will abort the entire VM if the check fails, rather than handling the error more elegantly. Maybe that's useful for students developing kernel drivers against the device. > --- > hw/misc/edu.c | 25 ++++++++++++------------- > 1 file changed, 12 insertions(+), 13 deletions(-) > > diff --git a/hw/misc/edu.c b/hw/misc/edu.c > index e935c418d4..52afbd792a 100644 > --- a/hw/misc/edu.c > +++ b/hw/misc/edu.c > @@ -103,25 +103,24 @@ static void edu_lower_irq(EduState *edu, uint32_t val) > } > } > > -static bool within(uint64_t addr, uint64_t start, uint64_t end) > +static void edu_check_range(uint64_t xfer_start, uint64_t xfer_size, > + uint64_t dma_start, uint64_t dma_size) > { > - return start <= addr && addr < end; > -} > - > -static void edu_check_range(uint64_t addr, uint64_t size1, uint64_t start, > - uint64_t size2) > -{ > - uint64_t end1 = addr + size1; > - uint64_t end2 = start + size2; > - > - if (within(addr, start, end2) && > - end1 > addr && within(end1, start, end2)) { > + uint64_t xfer_end = xfer_start + xfer_size; > + uint64_t dma_end = dma_start + dma_size; > + > + /* > + * 1. ensure we aren't overflowing > + * 2. ensure that xfer is within dma address range > + */ > + if (dma_end >= dma_start && xfer_end >= xfer_start && > + xfer_start >= dma_start && xfer_end <= dma_end) { > return; > } > > hw_error("EDU: DMA range 0x%016"PRIx64"-0x%016"PRIx64 > " out of bounds (0x%016"PRIx64"-0x%016"PRIx64")!", > - addr, end1 - 1, start, end2 - 1); > + xfer_start, xfer_end - 1, dma_start, dma_end - 1); > } > > static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr) > -- > 2.36.1 > >
On Mon, 17 Oct 2022 at 14:50, Alexander Bulekov <alxndr@bu.edu> wrote: > > On 221015 1710, Chris Friedt wrote: > > From: Christopher Friedt <cfriedt@meta.com> > > > > In the case that size1 was zero, because of the explicit > > 'end1 > addr' check, the range check would fail and the error > > message would read as shown below. The correct comparison > > is 'end1 >= addr' (or 'addr <= end1'). > > > > EDU: DMA range 0x40000-0x3ffff out of bounds (0x40000-0x40fff)! > > > > At the opposite end, in the case that size1 was 4096, within() > > would fail because of the non-inclusive check 'end1 < end2', > > which should have been 'end1 <= end2'. The error message would > > previously say > > > > EDU: DMA range 0x40000-0x40fff out of bounds (0x40000-0x40fff)! > > > > This change > > 1. renames local variables to be more less ambiguous > > 2. fixes the two off-by-one errors described above. > > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1254 > > > > Signed-off-by: Christopher Friedt <cfriedt@meta.com> > > Reviewed-by: Alexander Bulekov <alxndr@bu.edu> > > As a side-note, seems strange that edu_check_range will abort the entire > VM if the check fails, rather than handling the error more elegantly. Yes, this is bad for a device model, though we have a lot of older device models that still do it. The preferred pattern is: * for situations which are "if this happens there is a bug in QEMU itself", use assert. hw_error() is a kind of assert that prints a bunch of guest register state: sometimes you want that, but more often you just want normal assert() * for situations where the guest has misprogrammed the device, log that with qemu_log_mask(LOG_GUEST_ERROR, ...) and continue with whatever the real hardware would do, or some reasonable choice if the h/w spec is vague * for situations where the guest is correct but is trying to get the device to do something our model doesn't implement yet, same as above but with LOG_UNIMP. Probably most hw_error() uses in the codebase should be replaced with one of the above options. thanks -- PMM
On Mon, Oct 17, 2022 at 2:23 AM Jiri Slaby <jirislaby@kernel.org> wrote: > On 15. 10. 22, 23:10, Chris Friedt wrote: > > From: Christopher Friedt <cfriedt@meta.com> > This should be split into two patches. This way, it's hard to review. I can do that :-) Thanks for the review
Peter Maydell <peter.maydell@linaro.org> writes: > On Mon, 17 Oct 2022 at 14:50, Alexander Bulekov <alxndr@bu.edu> wrote: >> >> On 221015 1710, Chris Friedt wrote: >> > From: Christopher Friedt <cfriedt@meta.com> >> > >> > In the case that size1 was zero, because of the explicit >> > 'end1 > addr' check, the range check would fail and the error >> > message would read as shown below. The correct comparison >> > is 'end1 >= addr' (or 'addr <= end1'). >> > >> > EDU: DMA range 0x40000-0x3ffff out of bounds (0x40000-0x40fff)! >> > >> > At the opposite end, in the case that size1 was 4096, within() >> > would fail because of the non-inclusive check 'end1 < end2', >> > which should have been 'end1 <= end2'. The error message would >> > previously say >> > >> > EDU: DMA range 0x40000-0x40fff out of bounds (0x40000-0x40fff)! >> > >> > This change >> > 1. renames local variables to be more less ambiguous >> > 2. fixes the two off-by-one errors described above. >> > >> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1254 >> > >> > Signed-off-by: Christopher Friedt <cfriedt@meta.com> >> >> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> >> >> As a side-note, seems strange that edu_check_range will abort the entire >> VM if the check fails, rather than handling the error more elegantly. > > Yes, this is bad for a device model, though we have a lot of > older device models that still do it. The preferred pattern is: > * for situations which are "if this happens there is a > bug in QEMU itself", use assert. hw_error() is a kind of > assert that prints a bunch of guest register state: sometimes > you want that, but more often you just want normal assert() > * for situations where the guest has misprogrammed the device, > log that with qemu_log_mask(LOG_GUEST_ERROR, ...) > and continue with whatever the real hardware would do, or > some reasonable choice if the h/w spec is vague > * for situations where the guest is correct but is trying to > get the device to do something our model doesn't implement > yet, same as above but with LOG_UNIMP. > > Probably most hw_error() uses in the codebase should be > replaced with one of the above options. We should probably document this best practice somewhere in docs/devel but I guess we really need a "guide to writing device emulation" section. > > thanks > -- PMM
> On Oct 17, 2022, at 1:22 PM, Alex Bennée <alex.bennee@linaro.org> wrote: > > > > > Peter Maydell <peter.maydell@linaro.org> writes: > >>> On Mon, 17 Oct 2022 at 14:50, Alexander Bulekov <alxndr@bu.edu> wrote: >>> >>> On 221015 1710, Chris Friedt wrote: >>>> From: Christopher Friedt <cfriedt@meta.com> >>>> >>> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> >>> >>> As a side-note, seems strange that edu_check_range will abort the entire >>> VM if the check fails, rather than handling the error more elegantly. >> >> Yes, this is bad for a device model, though we have a lot of >> older device models that still do it. The preferred pattern is: >> * for situations which are "if this happens there is a >> bug in QEMU itself", use assert. hw_error() is a kind of >> assert that prints a bunch of guest register state: sometimes >> you want that, but more often you just want normal assert() >> * for situations where the guest has misprogrammed the device, >> log that with qemu_log_mask(LOG_GUEST_ERROR, ...) >> and continue with whatever the real hardware would do, or >> some reasonable choice if the h/w spec is vague >> * for situations where the guest is correct but is trying to >> get the device to do something our model doesn't implement >> yet, same as above but with LOG_UNIMP. >> >> Probably most hw_error() uses in the codebase should be >> replaced with one of the above options. > > We should probably document this best practice somewhere in docs/devel > but I guess we really need a "guide to writing device emulation" > section. Should I make a separate PR for that or attach it to the existing series (at v3 now)? Thanks, C
Chris Friedt <cfriedt@meta.com> writes: >> On Oct 17, 2022, at 1:22 PM, Alex Bennée <alex.bennee@linaro.org> wrote: >> >> > >> >> Peter Maydell <peter.maydell@linaro.org> writes: >> >>>> On Mon, 17 Oct 2022 at 14:50, Alexander Bulekov <alxndr@bu.edu> wrote: >>>> >>>> On 221015 1710, Chris Friedt wrote: >>>>> From: Christopher Friedt <cfriedt@meta.com> >>>>> >>>> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> >>>> >>>> As a side-note, seems strange that edu_check_range will abort the entire >>>> VM if the check fails, rather than handling the error more elegantly. >>> >>> Yes, this is bad for a device model, though we have a lot of >>> older device models that still do it. The preferred pattern is: >>> * for situations which are "if this happens there is a >>> bug in QEMU itself", use assert. hw_error() is a kind of >>> assert that prints a bunch of guest register state: sometimes >>> you want that, but more often you just want normal assert() >>> * for situations where the guest has misprogrammed the device, >>> log that with qemu_log_mask(LOG_GUEST_ERROR, ...) >>> and continue with whatever the real hardware would do, or >>> some reasonable choice if the h/w spec is vague >>> * for situations where the guest is correct but is trying to >>> get the device to do something our model doesn't implement >>> yet, same as above but with LOG_UNIMP. >>> >>> Probably most hw_error() uses in the codebase should be >>> replaced with one of the above options. >> >> We should probably document this best practice somewhere in docs/devel >> but I guess we really need a "guide to writing device emulation" >> section. > > Should I make a separate PR for that or attach it to the existing > series (at v3 now)? I don't think improving our developer documentation needs to be part of this fix. However if you want to take a run at improving it in a new series be my guest ;-) > > Thanks, > > C
On 17. 10. 22, 15:44, Alexander Bulekov wrote: > On 221015 1710, Chris Friedt wrote: >> From: Christopher Friedt <cfriedt@meta.com> >> >> In the case that size1 was zero, because of the explicit >> 'end1 > addr' check, the range check would fail and the error >> message would read as shown below. The correct comparison >> is 'end1 >= addr' (or 'addr <= end1'). >> >> EDU: DMA range 0x40000-0x3ffff out of bounds (0x40000-0x40fff)! >> >> At the opposite end, in the case that size1 was 4096, within() >> would fail because of the non-inclusive check 'end1 < end2', >> which should have been 'end1 <= end2'. The error message would >> previously say >> >> EDU: DMA range 0x40000-0x40fff out of bounds (0x40000-0x40fff)! >> >> This change >> 1. renames local variables to be more less ambiguous >> 2. fixes the two off-by-one errors described above. >> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1254 >> >> Signed-off-by: Christopher Friedt <cfriedt@meta.com> > > Reviewed-by: Alexander Bulekov <alxndr@bu.edu> > > As a side-note, seems strange that edu_check_range will abort the entire > VM if the check fails, rather than handling the error more elegantly. > Maybe that's useful for students developing kernel drivers against the > device. Yes, that was exactly the intention. First, as a punishment as they do something really wrong. Second, so they notice -- writing something wrong to a register of a real HW often freezes a machine too. Especially when misprogramming a DMA controller. OTOH, this sucks too. Ext4 (and other FS too) is fine, they don't lose data. However they need to freshly boot, repair FS and investigate/think a lot. This trial and run (and crash) takes several loops for some. So I am for softening it a bit. But they still should be noticed in some obvious way. thanks,
On 17. 10. 22, 16:13, Peter Maydell wrote: > * for situations where the guest has misprogrammed the device, > log that with qemu_log_mask(LOG_GUEST_ERROR, ...) > and continue with whatever the real hardware would do, or > some reasonable choice if the h/w spec is vague As I wrote in the previous mail, can we stop the machine after the print somehow, for example? So that the students have to "cont" in the qemu console as an acknowledgment when this happens. thanks,
Jiri Slaby <jirislaby@kernel.org> writes: > On 17. 10. 22, 16:13, Peter Maydell wrote: >> * for situations where the guest has misprogrammed the device, >> log that with qemu_log_mask(LOG_GUEST_ERROR, ...) >> and continue with whatever the real hardware would do, or >> some reasonable choice if the h/w spec is vague > > As I wrote in the previous mail, can we stop the machine after the > print somehow, for example? So that the students have to "cont" in the > qemu console as an acknowledgment when this happens. You can bring the system to a halt with vm_stop(RUN_STATE_PAUSED) or possible RUN_STATE_DEBUG? I don't know how obnoxious the message should be at this point (i.e. should it be masked by LOG_GUEST_ERROR) because if the system halts the user might not notice. However I guess with this device they would be expecting to check this sort of thing. > > thanks,
On Tue, 18 Oct 2022 at 10:21, Alex Bennée <alex.bennee@linaro.org> wrote: > > > Jiri Slaby <jirislaby@kernel.org> writes: > > > On 17. 10. 22, 16:13, Peter Maydell wrote: > >> * for situations where the guest has misprogrammed the device, > >> log that with qemu_log_mask(LOG_GUEST_ERROR, ...) > >> and continue with whatever the real hardware would do, or > >> some reasonable choice if the h/w spec is vague > > > > As I wrote in the previous mail, can we stop the machine after the > > print somehow, for example? So that the students have to "cont" in the > > qemu console as an acknowledgment when this happens. > > You can bring the system to a halt with vm_stop(RUN_STATE_PAUSED) or > possible RUN_STATE_DEBUG? No, please don't do anything like that. This should not be special. Just log a message if the guest does something bad. There are an absolute ton of things that the guest can do wrong, and in general QEMU does not attempt to be an "identify all the ways the guest does something wrong in a friendly way" system. thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> writes: > On Tue, 18 Oct 2022 at 10:21, Alex Bennée <alex.bennee@linaro.org> wrote: >> >> >> Jiri Slaby <jirislaby@kernel.org> writes: >> >> > On 17. 10. 22, 16:13, Peter Maydell wrote: >> >> * for situations where the guest has misprogrammed the device, >> >> log that with qemu_log_mask(LOG_GUEST_ERROR, ...) >> >> and continue with whatever the real hardware would do, or >> >> some reasonable choice if the h/w spec is vague >> > >> > As I wrote in the previous mail, can we stop the machine after the >> > print somehow, for example? So that the students have to "cont" in the >> > qemu console as an acknowledgment when this happens. >> >> You can bring the system to a halt with vm_stop(RUN_STATE_PAUSED) or >> possible RUN_STATE_DEBUG? > > No, please don't do anything like that. This should not be special. > Just log a message if the guest does something bad. There are > an absolute ton of things that the guest can do wrong, and > in general QEMU does not attempt to be an "identify all the > ways the guest does something wrong in a friendly way" system. We should clean-up the other uses of vm_stop in hw/ then: ./hw/ppc/prep_systemio.c:78: vm_stop(RUN_STATE_PAUSED); ./hw/ppc/vof.c:921: vm_stop(RUN_STATE_PAUSED); ./hw/vfio/pci.c:2725: vm_stop(RUN_STATE_INTERNAL_ERROR); > > thanks > -- PMM
Hi Jiri, Peter, Are you able to review the more recent version of this change? Look for the subject "[PATCH v4 x/3] hw: misc: edu: ..." I believe I addressed all concerns. Cheers, C On Mon, Oct 17, 2022 at 12:36 PM Christopher Friedt <chrisfriedt@gmail.com> wrote: > > On Mon, Oct 17, 2022 at 2:23 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > On 15. 10. 22, 23:10, Chris Friedt wrote: > > > From: Christopher Friedt <cfriedt@meta.com> > > This should be split into two patches. This way, it's hard to review. > > I can do that :-) > > Thanks for the review
diff --git a/hw/misc/edu.c b/hw/misc/edu.c index e935c418d4..52afbd792a 100644 --- a/hw/misc/edu.c +++ b/hw/misc/edu.c @@ -103,25 +103,24 @@ static void edu_lower_irq(EduState *edu, uint32_t val) } } -static bool within(uint64_t addr, uint64_t start, uint64_t end) +static void edu_check_range(uint64_t xfer_start, uint64_t xfer_size, + uint64_t dma_start, uint64_t dma_size) { - return start <= addr && addr < end; -} - -static void edu_check_range(uint64_t addr, uint64_t size1, uint64_t start, - uint64_t size2) -{ - uint64_t end1 = addr + size1; - uint64_t end2 = start + size2; - - if (within(addr, start, end2) && - end1 > addr && within(end1, start, end2)) { + uint64_t xfer_end = xfer_start + xfer_size; + uint64_t dma_end = dma_start + dma_size; + + /* + * 1. ensure we aren't overflowing + * 2. ensure that xfer is within dma address range + */ + if (dma_end >= dma_start && xfer_end >= xfer_start && + xfer_start >= dma_start && xfer_end <= dma_end) { return; } hw_error("EDU: DMA range 0x%016"PRIx64"-0x%016"PRIx64 " out of bounds (0x%016"PRIx64"-0x%016"PRIx64")!", - addr, end1 - 1, start, end2 - 1); + xfer_start, xfer_end - 1, dma_start, dma_end - 1); } static dma_addr_t edu_clamp_addr(const EduState *edu, dma_addr_t addr)