Message ID | cover.1699503619.git.matsuda-daisuke@fujitsu.com (mailing list archive) |
---|---|
Headers | show |
Series | On-Demand Paging on SoftRoCE | expand |
On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote: > > Daisuke Matsuda (7): > RDMA/rxe: Always defer tasks on responder and completer to workqueue > RDMA/rxe: Make MR functions accessible from other rxe source code > RDMA/rxe: Move resp_states definition to rxe_verbs.h > RDMA/rxe: Add page invalidation support > RDMA/rxe: Allow registering MRs for On-Demand Paging > RDMA/rxe: Add support for Send/Recv/Write/Read with ODP > RDMA/rxe: Add support for the traditional Atomic operations with ODP What is the current situation with rxe? I don't recall seeing the bugs that were reported get fixed? I'm reluctant to dig a deeper hold until it is done? Thanks, Jason
在 2023/12/5 8:11, Jason Gunthorpe 写道: > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote: >> >> Daisuke Matsuda (7): >> RDMA/rxe: Always defer tasks on responder and completer to workqueue >> RDMA/rxe: Make MR functions accessible from other rxe source code >> RDMA/rxe: Move resp_states definition to rxe_verbs.h >> RDMA/rxe: Add page invalidation support >> RDMA/rxe: Allow registering MRs for On-Demand Paging >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP >> RDMA/rxe: Add support for the traditional Atomic operations with ODP > > What is the current situation with rxe? I don't recall seeing the bugs > that were reported get fixed? Exactly. A problem is reported in the link https://www.spinics.net/lists/linux-rdma/msg120947.html It seems that a variable 'entry' set but not used [-Wunused-but-set-variable] And ODP is an important feature. Should we suggest to add a test case about this ODP in rdma-core to verify this ODP feature? Zhu Yanjun > > I'm reluctant to dig a deeper hold until it is done? > > Thanks, > Jason
On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote: > > 在 2023/12/5 8:11, Jason Gunthorpe 写道: > > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote: > >> > >> Daisuke Matsuda (7): > >> RDMA/rxe: Always defer tasks on responder and completer to workqueue > >> RDMA/rxe: Make MR functions accessible from other rxe source code > >> RDMA/rxe: Move resp_states definition to rxe_verbs.h > >> RDMA/rxe: Add page invalidation support > >> RDMA/rxe: Allow registering MRs for On-Demand Paging > >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP > >> RDMA/rxe: Add support for the traditional Atomic operations with ODP > > > > What is the current situation with rxe? I don't recall seeing the bugs > > that were reported get fixed? Well, I suppose Jason is mentioning "blktests srp/002 hang". cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel", so the hang looks not specific to rxe. cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/ I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue. There is another issue that causes kernel panic. [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/ https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=* Zhijian has submitted patches to fix this, and he got some comments. It looks he is involved in CXL driver intensively these days. I guess he is still working on it. > > Exactly. A problem is reported in the link > https://www.spinics.net/lists/linux-rdma/msg120947.html > > It seems that a variable 'entry' set but not used > [-Wunused-but-set-variable] Yeah, I can revise the patch anytime. > > And ODP is an important feature. Should we suggest to add a test case > about this ODP in rdma-core to verify this ODP feature? Rxe can share the same tests with mlx5. I added test cases for Write, Read and Atomic operations with ODP, and we can add more tests if there are any suggestions. Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py Thanks, Daisuke Matsuda > > Zhu Yanjun > > > > > I'm reluctant to dig a deeper hold until it is done? > > > > Thanks, > > Jason
在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道: > On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote: >> >> 在 2023/12/5 8:11, Jason Gunthorpe 写道: >>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote: >>>> >>>> Daisuke Matsuda (7): >>>> RDMA/rxe: Always defer tasks on responder and completer to workqueue >>>> RDMA/rxe: Make MR functions accessible from other rxe source code >>>> RDMA/rxe: Move resp_states definition to rxe_verbs.h >>>> RDMA/rxe: Add page invalidation support >>>> RDMA/rxe: Allow registering MRs for On-Demand Paging >>>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP >>>> RDMA/rxe: Add support for the traditional Atomic operations with ODP >>> >>> What is the current situation with rxe? I don't recall seeing the bugs >>> that were reported get fixed? > > Well, I suppose Jason is mentioning "blktests srp/002 hang". > cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ > > It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel", > so the hang looks not specific to rxe. > cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/ > I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue. > > > There is another issue that causes kernel panic. > [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size > cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/ > > https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=* > Zhijian has submitted patches to fix this, and he got some comments. > It looks he is involved in CXL driver intensively these days. > I guess he is still working on it. > >> >> Exactly. A problem is reported in the link >> https://www.spinics.net/lists/linux-rdma/msg120947.html >> >> It seems that a variable 'entry' set but not used >> [-Wunused-but-set-variable] > > Yeah, I can revise the patch anytime. > >> >> And ODP is an important feature. Should we suggest to add a test case >> about this ODP in rdma-core to verify this ODP feature? > > Rxe can share the same tests with mlx5. > I added test cases for Write, Read and Atomic operations with ODP, > and we can add more tests if there are any suggestions. > Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py Thanks a lot. Do you make tests with blktests after your patches are applied with the latest kernel? Zhu Yanjun > > Thanks, > Daisuke Matsuda > >> >> Zhu Yanjun >> >>> >>> I'm reluctant to dig a deeper hold until it is done? >>> >>> Thanks, >>> Jason >
On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote: > 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道: > > On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote: > >> > >> 在 2023/12/5 8:11, Jason Gunthorpe 写道: > >>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote: > >>>> > >>>> Daisuke Matsuda (7): > >>>> RDMA/rxe: Always defer tasks on responder and completer to workqueue > >>>> RDMA/rxe: Make MR functions accessible from other rxe source code > >>>> RDMA/rxe: Move resp_states definition to rxe_verbs.h > >>>> RDMA/rxe: Add page invalidation support > >>>> RDMA/rxe: Allow registering MRs for On-Demand Paging > >>>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP > >>>> RDMA/rxe: Add support for the traditional Atomic operations with ODP > >>> > >>> What is the current situation with rxe? I don't recall seeing the bugs > >>> that were reported get fixed? > > > > Well, I suppose Jason is mentioning "blktests srp/002 hang". > > cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ > > > > It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel", > > so the hang looks not specific to rxe. > > cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/ > > I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue. > > > > > > There is another issue that causes kernel panic. > > [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size > > cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/ > > > > https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=* > > Zhijian has submitted patches to fix this, and he got some comments. > > It looks he is involved in CXL driver intensively these days. > > I guess he is still working on it. > > > >> > >> Exactly. A problem is reported in the link > >> https://www.spinics.net/lists/linux-rdma/msg120947.html > >> > >> It seems that a variable 'entry' set but not used > >> [-Wunused-but-set-variable] > > > > Yeah, I can revise the patch anytime. > > > >> > >> And ODP is an important feature. Should we suggest to add a test case > >> about this ODP in rdma-core to verify this ODP feature? > > > > Rxe can share the same tests with mlx5. > > I added test cases for Write, Read and Atomic operations with ODP, > > and we can add more tests if there are any suggestions. > > Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py > > Thanks a lot. > Do you make tests with blktests after your patches are applied with the > latest kernel? I have not done that yet, but I agree I should do it. I will try to take time for the test before submitting v8 Thanks, Daisuke Matsuda > > Zhu Yanjun > > > > > Thanks, > > Daisuke Matsuda > > > >> > >> Zhu Yanjun > >> > >>> > >>> I'm reluctant to dig a deeper hold until it is done? > >>> > >>> Thanks, > >>> Jason > > >
在 2023/12/14 13:55, Daisuke Matsuda (Fujitsu) 写道: > On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote: >> 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道: >>> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote: >>>> 在 2023/12/5 8:11, Jason Gunthorpe 写道: >>>>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote: >>>>>> Daisuke Matsuda (7): >>>>>> RDMA/rxe: Always defer tasks on responder and completer to workqueue >>>>>> RDMA/rxe: Make MR functions accessible from other rxe source code >>>>>> RDMA/rxe: Move resp_states definition to rxe_verbs.h >>>>>> RDMA/rxe: Add page invalidation support >>>>>> RDMA/rxe: Allow registering MRs for On-Demand Paging >>>>>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP >>>>>> RDMA/rxe: Add support for the traditional Atomic operations with ODP >>>>> What is the current situation with rxe? I don't recall seeing the bugs >>>>> that were reported get fixed? >>> Well, I suppose Jason is mentioning "blktests srp/002 hang". >>> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ >>> >>> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel", >>> so the hang looks not specific to rxe. >>> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/ >>> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue. >>> >>> >>> There is another issue that causes kernel panic. >>> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size >>> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/ >>> >>> https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=* >>> Zhijian has submitted patches to fix this, and he got some comments. >>> It looks he is involved in CXL driver intensively these days. >>> I guess he is still working on it. >>> >>>> Exactly. A problem is reported in the link >>>> https://www.spinics.net/lists/linux-rdma/msg120947.html >>>> >>>> It seems that a variable 'entry' set but not used >>>> [-Wunused-but-set-variable] >>> Yeah, I can revise the patch anytime. >>> >>>> And ODP is an important feature. Should we suggest to add a test case >>>> about this ODP in rdma-core to verify this ODP feature? >>> Rxe can share the same tests with mlx5. >>> I added test cases for Write, Read and Atomic operations with ODP, >>> and we can add more tests if there are any suggestions. >>> Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py >> Thanks a lot. >> Do you make tests with blktests after your patches are applied with the >> latest kernel? > I have not done that yet, but I agree I should do it. > I will try to take time for the test before submitting v8 Thanks. Hope blktest can work well with your commits. Zhu Yanjun > > Thanks, > Daisuke Matsuda > > >> Zhu Yanjun >> >>> Thanks, >>> Daisuke Matsuda >>> >>>> Zhu Yanjun >>>> >>>>> I'm reluctant to dig a deeper hold until it is done? >>>>> >>>>> Thanks, >>>>> Jason
On Thu, Dec 07, 2023 at 06:37:13AM +0000, Daisuke Matsuda (Fujitsu) wrote: > On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote: > > > > 在 2023/12/5 8:11, Jason Gunthorpe 写道: > > > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote: > > >> > > >> Daisuke Matsuda (7): > > >> RDMA/rxe: Always defer tasks on responder and completer to workqueue > > >> RDMA/rxe: Make MR functions accessible from other rxe source code > > >> RDMA/rxe: Move resp_states definition to rxe_verbs.h > > >> RDMA/rxe: Add page invalidation support > > >> RDMA/rxe: Allow registering MRs for On-Demand Paging > > >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP > > >> RDMA/rxe: Add support for the traditional Atomic operations with ODP > > > > > > What is the current situation with rxe? I don't recall seeing the bugs > > > that were reported get fixed? > > Well, I suppose Jason is mentioning "blktests srp/002 hang". > cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ > > It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel", > so the hang looks not specific to rxe. > cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/ > I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue. Bob? Is that what we think? > There is another issue that causes kernel panic. > [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size > cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/ This is more understandable, and the fix of matching the MTT size to the PAGE_SIZE seems reasonable to me. Jason