Message ID | 20210820111509.172500-1-yangx.jy@fujitsu.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | RDMA/rxe: Zero out index member of struct rxe_queue | expand |
Hi Olga Kornievskaia, Could you check if this patch can fix your issues? https://www.spinics.net/lists/linux-rdma/msg104358.html https://www.spinics.net/lists/linux-rdma/msg104359.html https://www.spinics.net/lists/linux-rdma/msg104360.html By the way, this patch can fix my panic. Best Regards, Xiao Yang On 2021/8/20 19:15, Xiao Yang wrote: > 1) New index member of struct rxe_queue is introduced but not zeroed > so the initial value of index may be random. > 2) Current index is not masked off to index_mask. > In such case, producer_addr() and consumer_addr() will get an invalid > address by the random index and then accessing the invalid address > triggers the following panic: > "BUG: unable to handle page fault for address: ffff9ae2c07a1414" > > Fix the issue by using kzalloc() to zero out index member. > > Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") > Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c > index 85b812586ed4..72d95398e604 100644 > --- a/drivers/infiniband/sw/rxe/rxe_queue.c > +++ b/drivers/infiniband/sw/rxe/rxe_queue.c > @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, > if (*num_elem < 0) > goto err1; > > - q = kmalloc(sizeof(*q), GFP_KERNEL); > + q = kzalloc(sizeof(*q), GFP_KERNEL); > if (!q) > goto err1; >
On 8/20/21 6:15 AM, Xiao Yang wrote: > 1) New index member of struct rxe_queue is introduced but not zeroed > so the initial value of index may be random. > 2) Current index is not masked off to index_mask. > In such case, producer_addr() and consumer_addr() will get an invalid > address by the random index and then accessing the invalid address > triggers the following panic: > "BUG: unable to handle page fault for address: ffff9ae2c07a1414" > > Fix the issue by using kzalloc() to zero out index member. > > Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") > Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c > index 85b812586ed4..72d95398e604 100644 > --- a/drivers/infiniband/sw/rxe/rxe_queue.c > +++ b/drivers/infiniband/sw/rxe/rxe_queue.c > @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, > if (*num_elem < 0) > goto err1; > > - q = kmalloc(sizeof(*q), GFP_KERNEL); > + q = kzalloc(sizeof(*q), GFP_KERNEL); > if (!q) > goto err1; > > Thanks for this!! I am happy to take the blame but this has been there from the original 2016 rxe commit. Its a good catch. Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
On Fri, Aug 20, 2021 at 07:15:09PM +0800, Xiao Yang wrote: > 1) New index member of struct rxe_queue is introduced but not zeroed > so the initial value of index may be random. > 2) Current index is not masked off to index_mask. > In such case, producer_addr() and consumer_addr() will get an invalid > address by the random index and then accessing the invalid address > triggers the following panic: > "BUG: unable to handle page fault for address: ffff9ae2c07a1414" > > Fix the issue by using kzalloc() to zero out index member. > > Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") > Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied to for-rc, thanks Jason
On Fri, Aug 20, 2021 at 6:44 PM Xiao Yang <yangx.jy@fujitsu.com> wrote: > > 1) New index member of struct rxe_queue is introduced but not zeroed > so the initial value of index may be random. > 2) Current index is not masked off to index_mask. > In such case, producer_addr() and consumer_addr() will get an invalid > address by the random index and then accessing the invalid address > triggers the following panic: > "BUG: unable to handle page fault for address: ffff9ae2c07a1414" > > Fix the issue by using kzalloc() to zero out index member. > > Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") > Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c > index 85b812586ed4..72d95398e604 100644 > --- a/drivers/infiniband/sw/rxe/rxe_queue.c > +++ b/drivers/infiniband/sw/rxe/rxe_queue.c > @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, > if (*num_elem < 0) > goto err1; > > - q = kmalloc(sizeof(*q), GFP_KERNEL); > + q = kzalloc(sizeof(*q), GFP_KERNEL); Perhaps this is why I can not reproduce this problem in the local host. Zhu Yanjun > if (!q) > goto err1; > > -- > 2.25.1 > > >
On 2021/8/21 2:44, Bob Pearson wrote: > On 8/20/21 6:15 AM, Xiao Yang wrote: >> 1) New index member of struct rxe_queue is introduced but not zeroed >> so the initial value of index may be random. >> 2) Current index is not masked off to index_mask. >> In such case, producer_addr() and consumer_addr() will get an invalid >> address by the random index and then accessing the invalid address >> triggers the following panic: >> "BUG: unable to handle page fault for address: ffff9ae2c07a1414" >> >> Fix the issue by using kzalloc() to zero out index member. >> >> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") >> Signed-off-by: Xiao Yang<yangx.jy@fujitsu.com> >> --- >> drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c >> index 85b812586ed4..72d95398e604 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_queue.c >> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c >> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, >> if (*num_elem< 0) >> goto err1; >> >> - q = kmalloc(sizeof(*q), GFP_KERNEL); >> + q = kzalloc(sizeof(*q), GFP_KERNEL); >> if (!q) >> goto err1; >> >> > Thanks for this!! I am happy to take the blame but this has been there from the original 2016 rxe commit. Its a good catch. Hi Bob, The original 2016 rxe commit actually introduced kmalloc() but it initialized all members of struct rxe_queue at subsequent steps. When the new index member of struct rxe_queue was added, it didn't initialized at subsequent steps. So I think the issue was caused by your patch. I use kzalloc() to fix the issue because I want to avoid the same issue when another new member will be added in future. Best Regards, Xiao Yang > Reviewed-by: Bob Pearson<rpearsonhpe@gmail.com>
On 2021/8/21 15:21, Zhu Yanjun wrote: > On Fri, Aug 20, 2021 at 6:44 PM Xiao Yang<yangx.jy@fujitsu.com> wrote: >> 1) New index member of struct rxe_queue is introduced but not zeroed >> so the initial value of index may be random. >> 2) Current index is not masked off to index_mask. >> In such case, producer_addr() and consumer_addr() will get an invalid >> address by the random index and then accessing the invalid address >> triggers the following panic: >> "BUG: unable to handle page fault for address: ffff9ae2c07a1414" >> >> Fix the issue by using kzalloc() to zero out index member. >> >> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") >> Signed-off-by: Xiao Yang<yangx.jy@fujitsu.com> >> --- >> drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c >> index 85b812586ed4..72d95398e604 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_queue.c >> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c >> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, >> if (*num_elem< 0) >> goto err1; >> >> - q = kmalloc(sizeof(*q), GFP_KERNEL); >> + q = kzalloc(sizeof(*q), GFP_KERNEL); > Perhaps this is why I can not reproduce this problem in the local host. Hi Yanjun, I forgot to say that I reproduced the issue on my local vm. Best Regards, Xiao Yang > Zhu Yanjun > >> if (!q) >> goto err1; >> >> -- >> 2.25.1 >> >> >>
On Mon, Aug 23, 2021 at 12:37 PM yangx.jy@fujitsu.com <yangx.jy@fujitsu.com> wrote: > > On 2021/8/21 15:21, Zhu Yanjun wrote: > > On Fri, Aug 20, 2021 at 6:44 PM Xiao Yang<yangx.jy@fujitsu.com> wrote: > >> 1) New index member of struct rxe_queue is introduced but not zeroed > >> so the initial value of index may be random. > >> 2) Current index is not masked off to index_mask. > >> In such case, producer_addr() and consumer_addr() will get an invalid > >> address by the random index and then accessing the invalid address > >> triggers the following panic: > >> "BUG: unable to handle page fault for address: ffff9ae2c07a1414" > >> > >> Fix the issue by using kzalloc() to zero out index member. > >> > >> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") > >> Signed-off-by: Xiao Yang<yangx.jy@fujitsu.com> > >> --- > >> drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c > >> index 85b812586ed4..72d95398e604 100644 > >> --- a/drivers/infiniband/sw/rxe/rxe_queue.c > >> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c > >> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, > >> if (*num_elem< 0) > >> goto err1; > >> > >> - q = kmalloc(sizeof(*q), GFP_KERNEL); > >> + q = kzalloc(sizeof(*q), GFP_KERNEL); > > Perhaps this is why I can not reproduce this problem in the local host. > Hi Yanjun, > > I forgot to say that I reproduced the issue on my local vm. Which OS are you using to reproduce this problem? Zhu Yanjun > > Best Regards, > Xiao Yang > > Zhu Yanjun > > > >> if (!q) > >> goto err1; > >> > >> -- > >> 2.25.1 > >> > >> > >>
On 2021/8/23 13:42, Zhu Yanjun wrote: > On Mon, Aug 23, 2021 at 12:37 PM yangx.jy@fujitsu.com > <yangx.jy@fujitsu.com> wrote: >> On 2021/8/21 15:21, Zhu Yanjun wrote: >>> On Fri, Aug 20, 2021 at 6:44 PM Xiao Yang<yangx.jy@fujitsu.com> wrote: >>>> 1) New index member of struct rxe_queue is introduced but not zeroed >>>> so the initial value of index may be random. >>>> 2) Current index is not masked off to index_mask. >>>> In such case, producer_addr() and consumer_addr() will get an invalid >>>> address by the random index and then accessing the invalid address >>>> triggers the following panic: >>>> "BUG: unable to handle page fault for address: ffff9ae2c07a1414" >>>> >>>> Fix the issue by using kzalloc() to zero out index member. >>>> >>>> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") >>>> Signed-off-by: Xiao Yang<yangx.jy@fujitsu.com> >>>> --- >>>> drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c >>>> index 85b812586ed4..72d95398e604 100644 >>>> --- a/drivers/infiniband/sw/rxe/rxe_queue.c >>>> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c >>>> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, >>>> if (*num_elem< 0) >>>> goto err1; >>>> >>>> - q = kmalloc(sizeof(*q), GFP_KERNEL); >>>> + q = kzalloc(sizeof(*q), GFP_KERNEL); >>> Perhaps this is why I can not reproduce this problem in the local host. >> Hi Yanjun, >> >> I forgot to say that I reproduced the issue on my local vm. > Which OS are you using to reproduce this problem? OS is fedora31. > Zhu Yanjun > >> Best Regards, >> Xiao Yang >>> Zhu Yanjun >>> >>>> if (!q) >>>> goto err1; >>>> >>>> -- >>>> 2.25.1 >>>> >>>> >>>>
On Mon, Aug 23, 2021 at 2:18 PM yangx.jy@fujitsu.com <yangx.jy@fujitsu.com> wrote: > > On 2021/8/23 13:42, Zhu Yanjun wrote: > > On Mon, Aug 23, 2021 at 12:37 PM yangx.jy@fujitsu.com > > <yangx.jy@fujitsu.com> wrote: > >> On 2021/8/21 15:21, Zhu Yanjun wrote: > >>> On Fri, Aug 20, 2021 at 6:44 PM Xiao Yang<yangx.jy@fujitsu.com> wrote: > >>>> 1) New index member of struct rxe_queue is introduced but not zeroed > >>>> so the initial value of index may be random. > >>>> 2) Current index is not masked off to index_mask. > >>>> In such case, producer_addr() and consumer_addr() will get an invalid > >>>> address by the random index and then accessing the invalid address > >>>> triggers the following panic: > >>>> "BUG: unable to handle page fault for address: ffff9ae2c07a1414" > >>>> > >>>> Fix the issue by using kzalloc() to zero out index member. > >>>> > >>>> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") > >>>> Signed-off-by: Xiao Yang<yangx.jy@fujitsu.com> > >>>> --- > >>>> drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c > >>>> index 85b812586ed4..72d95398e604 100644 > >>>> --- a/drivers/infiniband/sw/rxe/rxe_queue.c > >>>> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c > >>>> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, > >>>> if (*num_elem< 0) > >>>> goto err1; > >>>> > >>>> - q = kmalloc(sizeof(*q), GFP_KERNEL); > >>>> + q = kzalloc(sizeof(*q), GFP_KERNEL); > >>> Perhaps this is why I can not reproduce this problem in the local host. > >> Hi Yanjun, > >> > >> I forgot to say that I reproduced the issue on my local vm. > > Which OS are you using to reproduce this problem? > > OS is fedora31. Can you reproduce this problem on Ubuntu 20.04? Thanks, Zhu Yanjun > > > Zhu Yanjun > > > >> Best Regards, > >> Xiao Yang > >>> Zhu Yanjun > >>> > >>>> if (!q) > >>>> goto err1; > >>>> > >>>> -- > >>>> 2.25.1 > >>>> > >>>> > >>>>
On 8/21/21 8:00 AM, yangx.jy@fujitsu.com wrote: > On 2021/8/21 2:44, Bob Pearson wrote: >> On 8/20/21 6:15 AM, Xiao Yang wrote: >>> 1) New index member of struct rxe_queue is introduced but not zeroed >>> so the initial value of index may be random. >>> 2) Current index is not masked off to index_mask. >>> In such case, producer_addr() and consumer_addr() will get an invalid >>> address by the random index and then accessing the invalid address >>> triggers the following panic: >>> "BUG: unable to handle page fault for address: ffff9ae2c07a1414" >>> >>> Fix the issue by using kzalloc() to zero out index member. >>> >>> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") >>> Signed-off-by: Xiao Yang<yangx.jy@fujitsu.com> >>> --- >>> drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c >>> index 85b812586ed4..72d95398e604 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_queue.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c >>> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, >>> if (*num_elem< 0) >>> goto err1; >>> >>> - q = kmalloc(sizeof(*q), GFP_KERNEL); >>> + q = kzalloc(sizeof(*q), GFP_KERNEL); >>> if (!q) >>> goto err1; >>> >>> >> Thanks for this!! I am happy to take the blame but this has been there from the original 2016 rxe commit. Its a good catch. > Hi Bob, > > The original 2016 rxe commit actually introduced kmalloc() but it > initialized all members of struct rxe_queue at subsequent steps. > When the new index member of struct rxe_queue was added, it didn't > initialized at subsequent steps. So I think the issue was caused by > your patch. Yup. My comment was really that if it was an old one I was guilty either way most likely. But this is a good catch. > I use kzalloc() to fix the issue because I want to avoid the same issue > when another new member will be added in future. > > Best Regards, > Xiao Yang >> Reviewed-by: Bob Pearson<rpearsonhpe@gmail.com>
On 2021/8/23 14:48, Zhu Yanjun wrote: > Can you reproduce this problem on Ubuntu 20.04? Hi Yanjun, I cannot reproduce this issue on Ubuntu 20.04 vm for now. I think I didn't hit the condition where q->index gets the random value after kmalloc(). Perhaps only when allocating the memory which has been used and freed before. Best Regards, Xiao Yang > Thanks, > Zhu Yanjun
On Wed, Sep 1, 2021 at 1:42 PM yangx.jy@fujitsu.com <yangx.jy@fujitsu.com> wrote: > > On 2021/8/23 14:48, Zhu Yanjun wrote: > > Can you reproduce this problem on Ubuntu 20.04? > Hi Yanjun, > > I cannot reproduce this issue on Ubuntu 20.04 vm for now. > I think I didn't hit the condition where q->index gets the random value > after kmalloc(). Sure. Thanks a lot. I can not reproduce this problem on Ubuntu 20.04. Zhu Yanjun > Perhaps only when allocating the memory which has been used and freed > before. > > Best Regards, > Xiao Yang > > Thanks, > > Zhu Yanjun
diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c index 85b812586ed4..72d95398e604 100644 --- a/drivers/infiniband/sw/rxe/rxe_queue.c +++ b/drivers/infiniband/sw/rxe/rxe_queue.c @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem, if (*num_elem < 0) goto err1; - q = kmalloc(sizeof(*q), GFP_KERNEL); + q = kzalloc(sizeof(*q), GFP_KERNEL); if (!q) goto err1;
1) New index member of struct rxe_queue is introduced but not zeroed so the initial value of index may be random. 2) Current index is not masked off to index_mask. In such case, producer_addr() and consumer_addr() will get an invalid address by the random index and then accessing the invalid address triggers the following panic: "BUG: unable to handle page fault for address: ffff9ae2c07a1414" Fix the issue by using kzalloc() to zero out index member. Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> --- drivers/infiniband/sw/rxe/rxe_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)