mbox series

[for-rc,v2,0/6] Add CM packets missing and harden the proxying

Message ID 20200803061941.1139994-1-haakon.bugge@oracle.com (mailing list archive)
Headers show
Series Add CM packets missing and harden the proxying | expand

Message

Haakon Bugge Aug. 3, 2020, 6:19 a.m. UTC
A high number of MAD packet drops are observed in the mlx4 MAD proxy
system. These are fixed by separating the parameters for the tunnel
vs. wire QPs and by introducing a separate worker-thread for the wire
QPs.

Support for MRA and REJ with its reason being timeout is also added.

Dynamic debug prints adjusted and amended.

    v1->v2:
	* Added commit ("Adjust delayed work when a dup is observed")
	* Minor adjustments in some of the commits

Håkon Bugge (6):
  IB/mlx4: Add and improve logging
  IB/mlx4: Add support for MRA
  IB/mlx4: Separate tunnel and wire bufs parameters
  IB/mlx4: Fix starvation in paravirt mux/demux
  IB/mlx4: Add support for REJ due to timeout
  IB/mlx4: Adjust delayed work when a dup is observed

 drivers/infiniband/hw/mlx4/cm.c      | 148 ++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx4/mad.c     | 158 +++++++++++++++------------
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   8 +-
 3 files changed, 241 insertions(+), 73 deletions(-)

--
2.20.1

Comments

Haakon Bugge Aug. 10, 2020, 11:20 a.m. UTC | #1
A friendly reminder.


Thxs, Håkon


> On 3 Aug 2020, at 08:19, Håkon Bugge <haakon.bugge@oracle.com> wrote:
> 
> A high number of MAD packet drops are observed in the mlx4 MAD proxy
> system. These are fixed by separating the parameters for the tunnel
> vs. wire QPs and by introducing a separate worker-thread for the wire
> QPs.
> 
> Support for MRA and REJ with its reason being timeout is also added.
> 
> Dynamic debug prints adjusted and amended.
> 
>    v1->v2:
> 	* Added commit ("Adjust delayed work when a dup is observed")
> 	* Minor adjustments in some of the commits
> 
> Håkon Bugge (6):
>  IB/mlx4: Add and improve logging
>  IB/mlx4: Add support for MRA
>  IB/mlx4: Separate tunnel and wire bufs parameters
>  IB/mlx4: Fix starvation in paravirt mux/demux
>  IB/mlx4: Add support for REJ due to timeout
>  IB/mlx4: Adjust delayed work when a dup is observed
> 
> drivers/infiniband/hw/mlx4/cm.c      | 148 ++++++++++++++++++++++++-
> drivers/infiniband/hw/mlx4/mad.c     | 158 +++++++++++++++------------
> drivers/infiniband/hw/mlx4/mlx4_ib.h |   8 +-
> 3 files changed, 241 insertions(+), 73 deletions(-)
> 
> --
> 2.20.1
>
Leon Romanovsky Aug. 10, 2020, 11:46 a.m. UTC | #2
On Mon, Aug 10, 2020 at 01:20:43PM +0200, Håkon Bugge wrote:
> A friendly reminder.

We are in merge window.

BTW, the patches are in our regression all that time and everything works
as expected.

Thanks

>
>
> Thxs, Håkon
>
>
> > On 3 Aug 2020, at 08:19, Håkon Bugge <haakon.bugge@oracle.com> wrote:
> >
> > A high number of MAD packet drops are observed in the mlx4 MAD proxy
> > system. These are fixed by separating the parameters for the tunnel
> > vs. wire QPs and by introducing a separate worker-thread for the wire
> > QPs.
> >
> > Support for MRA and REJ with its reason being timeout is also added.
> >
> > Dynamic debug prints adjusted and amended.
> >
> >    v1->v2:
> > 	* Added commit ("Adjust delayed work when a dup is observed")
> > 	* Minor adjustments in some of the commits
> >
> > Håkon Bugge (6):
> >  IB/mlx4: Add and improve logging
> >  IB/mlx4: Add support for MRA
> >  IB/mlx4: Separate tunnel and wire bufs parameters
> >  IB/mlx4: Fix starvation in paravirt mux/demux
> >  IB/mlx4: Add support for REJ due to timeout
> >  IB/mlx4: Adjust delayed work when a dup is observed
> >
> > drivers/infiniband/hw/mlx4/cm.c      | 148 ++++++++++++++++++++++++-
> > drivers/infiniband/hw/mlx4/mad.c     | 158 +++++++++++++++------------
> > drivers/infiniband/hw/mlx4/mlx4_ib.h |   8 +-
> > 3 files changed, 241 insertions(+), 73 deletions(-)
> >
> > --
> > 2.20.1
> >
>
Gal Pressman Aug. 10, 2020, 2:10 p.m. UTC | #3
On 10/08/2020 14:46, Leon Romanovsky wrote:
> On Mon, Aug 10, 2020 at 01:20:43PM +0200, Håkon Bugge wrote:
>> A friendly reminder.
> 
> We are in merge window.

The merge window shouldn't affect bug fixes submissions, no?
Leon Romanovsky Aug. 10, 2020, 2:23 p.m. UTC | #4
On Mon, Aug 10, 2020 at 05:10:44PM +0300, Gal Pressman wrote:
> On 10/08/2020 14:46, Leon Romanovsky wrote:
> > On Mon, Aug 10, 2020 at 01:20:43PM +0200, Håkon Bugge wrote:
> >> A friendly reminder.
> >
> > We are in merge window.
>
> The merge window shouldn't affect bug fixes submissions, no?

It is hard to call bug fixes, according to Fixes line and description
the code is broken from day one. There is no urgency to merge it now.

Thanks
Jason Gunthorpe Aug. 24, 2020, 4:36 p.m. UTC | #5
On Mon, Aug 03, 2020 at 08:19:35AM +0200, Håkon Bugge wrote:
> A high number of MAD packet drops are observed in the mlx4 MAD proxy
> system. These are fixed by separating the parameters for the tunnel
> vs. wire QPs and by introducing a separate worker-thread for the wire
> QPs.
> 
> Support for MRA and REJ with its reason being timeout is also added.
> 
> Dynamic debug prints adjusted and amended.
> 
>     v1->v2:
> 	* Added commit ("Adjust delayed work when a dup is observed")
> 	* Minor adjustments in some of the commits
> 
> Håkon Bugge (6):
>   IB/mlx4: Add and improve logging
>   IB/mlx4: Add support for MRA
>   IB/mlx4: Separate tunnel and wire bufs parameters
>   IB/mlx4: Fix starvation in paravirt mux/demux
>   IB/mlx4: Add support for REJ due to timeout
>   IB/mlx4: Adjust delayed work when a dup is observed

Applied to for-next, this does not look like -rc material

Thanks,
Jason