Message ID | 20110512161443.GC22389@calypso.voltaire.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Alex Netes |
Headers | show |
Hi Alex, On 5/12/2011 12:14 PM, Alex Netes wrote: > MFTSubnSet failed MADs may leave temporary MC loops in the fabric. > In order to eliminate this faulty state as quick as possible it's a good > thing to initiate a heavy sweep immediately and to wait for the next light > sweep. > > Signed-off-by: Alex Netes <alexne@mellanox.com> > --- > opensm/osm_state_mgr.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c > index dd308f2..aa71b03 100644 > --- a/opensm/osm_state_mgr.c > +++ b/opensm/osm_state_mgr.c > @@ -1434,6 +1434,13 @@ static void do_process_mgrp_queue(osm_sm_t * sm) > osm_mcast_mgr_process(sm); > wait_for_pending_transactions(&sm->p_subn->p_osm->stats); > } > + > + /* if one or more MFTSubnSet MADs fails > + * during idle process time initiate heavy sweep */ > + if (sm->p_subn->force_heavy_sweep > + || sm->p_subn->subnet_initialization_error) > + osm_sm_signal(sm, OSM_SIGNAL_SWEEP); subnet_initialization_error is more than just set MFT failures. Should it be narrowed down to just those failures ? Also, while this looks like it would fix the scenario you mention, couldn't this change cause a continual heavy sweep ? -- Hal > + > } > > void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Hal, On 10:43 Sat 14 May , Hal Rosenstock wrote: > Hi Alex, > > On 5/12/2011 12:14 PM, Alex Netes wrote: > > MFTSubnSet failed MADs may leave temporary MC loops in the fabric. > > In order to eliminate this faulty state as quick as possible it's a good > > thing to initiate a heavy sweep immediately and to wait for the next light > > sweep. > > > > Signed-off-by: Alex Netes <alexne@mellanox.com> > > --- > > opensm/osm_state_mgr.c | 7 +++++++ > > 1 files changed, 7 insertions(+), 0 deletions(-) > > > > diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c > > index dd308f2..aa71b03 100644 > > --- a/opensm/osm_state_mgr.c > > +++ b/opensm/osm_state_mgr.c > > @@ -1434,6 +1434,13 @@ static void do_process_mgrp_queue(osm_sm_t * sm) > > osm_mcast_mgr_process(sm); > > wait_for_pending_transactions(&sm->p_subn->p_osm->stats); > > } > > + > > + /* if one or more MFTSubnSet MADs fails > > + * during idle process time initiate heavy sweep */ > > + if (sm->p_subn->force_heavy_sweep > > + || sm->p_subn->subnet_initialization_error) > > + osm_sm_signal(sm, OSM_SIGNAL_SWEEP); > > subnet_initialization_error is more than just set MFT failures. Should > it be narrowed down to just those failures ? > Do you mean, just resend the MFTs without causing heavy sweep? > Also, while this looks like it would fix the scenario you mention, > couldn't this change cause a continual heavy sweep ? > Yes. This can cause continual heavy sweep. But this would happen anyway. This patch initiate heavy sweep immediately and without it, the heavy sweep would be started on the next light sweep. In both cases you can end up in heavy sweep loop. > -- Hal > > > + > > } > > > > void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c index dd308f2..aa71b03 100644 --- a/opensm/osm_state_mgr.c +++ b/opensm/osm_state_mgr.c @@ -1434,6 +1434,13 @@ static void do_process_mgrp_queue(osm_sm_t * sm) osm_mcast_mgr_process(sm); wait_for_pending_transactions(&sm->p_subn->p_osm->stats); } + + /* if one or more MFTSubnSet MADs fails + * during idle process time initiate heavy sweep */ + if (sm->p_subn->force_heavy_sweep + || sm->p_subn->subnet_initialization_error) + osm_sm_signal(sm, OSM_SIGNAL_SWEEP); + } void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal)
MFTSubnSet failed MADs may leave temporary MC loops in the fabric. In order to eliminate this faulty state as quick as possible it's a good thing to initiate a heavy sweep immediately and to wait for the next light sweep. Signed-off-by: Alex Netes <alexne@mellanox.com> --- opensm/osm_state_mgr.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-)