From patchwork Fri Jun 27 17:56:14 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Al Chu X-Patchwork-Id: 4436781 X-Patchwork-Delegate: hal@mellanox.com Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 8172EBEEAA for ; Fri, 27 Jun 2014 18:05:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 92ACA203B5 for ; Fri, 27 Jun 2014 18:05:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 888F1203B1 for ; Fri, 27 Jun 2014 18:05:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751437AbaF0SFi (ORCPT ); Fri, 27 Jun 2014 14:05:38 -0400 Received: from prdiron-2.llnl.gov ([128.15.143.172]:5370 "EHLO prdiron-2.llnl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751425AbaF0SFg (ORCPT ); Fri, 27 Jun 2014 14:05:36 -0400 X-Attachments: Received: from auk59.llnl.gov (HELO [134.9.93.24]) ([134.9.93.24]) by prdiron-2.llnl.gov with ESMTP; 27 Jun 2014 10:56:15 -0700 Subject: [PATCH 3/3] opensm/osm_perfmgr.c: Fix perfmgr sweep_state race From: Albert Chu To: linux-rdma@vger.kernel.org Date: Fri, 27 Jun 2014 10:56:14 -0700 Message-Id: <1403891774.30332.421.camel@auk59.llnl.gov> Mime-Version: 1.0 X-Mailer: Evolution 2.12.3 (2.12.3-19.el5) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce new sweep state PERFMGR_SWEEP_POST_PROCESSING to fix race in perfmgr. Race occurs as follows: Under typical conditions, osm_perfmgr_process() is entered with sweep_state set to PERFMGR_SWEEP_SLEEP. osm_perfmgr_process() sets sweep_state to PERFMGR_SWEEP_ACTIVE when it begins to sweep. osm_perfmgr_process() will eventually call perfmgr_send_mad() by way of perfmgr_query_counters() and several other functions. Responses to performance counter MADs may initiate the sending of more MADs via perfmgr_send_mad(), such as through redirection or the desire to clear counters. If too many MADs have been put on the wire, perfmgr_send_mad() will throttle sending out MADS and temporarily change sweep_state between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE as it throttles. The sweep_state is set to PERFMGR_SWEEP_ACTIVE when all performance counter MADs have been sent out by the sweeper. osm_perfmgr_process() eventually completes its sweep and puts sweep_state back into PERFMGR_SWEEP_SLEEP. At this point, some MADs may still be on the wire. New MADs may be put back on the wire if responses necessitate it (redirection or clearing counters). If enough MADs are put back onto the wire, perfmgr_send_mad() will throttle as normal, temporarily moving between PERFMGR_SWEEP_SUSPENDED and PERFMGR_SWEEP_ACTIVE. After the throttling is complete, sweep_state is put into PERFMGR_SWEEP_ACTIVE state. This is the key problem, the sweep_state is changed from PERFMGR_SWEEP_SLEEP to PERFMGR_SWEEP_ACTIVE outside of osm_perfmgr_process(). Now that the perfmgr is in ACTIVE state, any future sweep call to osm_perfmgr_process() will not sweep b/c the sweep_state is set to PERFMGR_SWEEP_ACTIVE. The introduction of a new sweep_state PERFMGR_SWEEP_POST_PROCESSING fixes this problem. If perfmgr_send_mad() throttles mads while in PERFMGR_SWEEP_SLEEP. sweep_state will be moved into the PERFMGR_SWEEP_POST_PROCESSING state instead of PERFMGR_SWEEP_SUSPENDED/PERFMGR_SWEEP_ACTIVE. When all post-SLEEP state MAD processing is complete, the sweep_state will move from PERFMGR_SWEEP_POST_PROCESSING back to PERFMGR_SWEEP_SLEEP, so that future sweeps can operate as normal. Signed-off-by: Albert L. Chu --- include/opensm/osm_perfmgr.h | 6 +++++- opensm/osm_perfmgr.c | 21 +++++++++++++++++++-- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/include/opensm/osm_perfmgr.h b/include/opensm/osm_perfmgr.h index ab02c26..44a278d 100644 --- a/include/opensm/osm_perfmgr.h +++ b/include/opensm/osm_perfmgr.h @@ -88,7 +88,8 @@ typedef enum { typedef enum { PERFMGR_SWEEP_SLEEP, PERFMGR_SWEEP_ACTIVE, - PERFMGR_SWEEP_SUSPENDED + PERFMGR_SWEEP_SUSPENDED, + PERFMGR_SWEEP_POST_PROCESSING } osm_perfmgr_sweep_state_t; typedef struct monitored_port { @@ -239,6 +240,9 @@ inline static const char *osm_perfmgr_get_sweep_state_str(osm_perfmgr_t * perfmg case PERFMGR_SWEEP_SUSPENDED: return "Suspended"; break; + case PERFMGR_SWEEP_POST_PROCESSING: + return "PostProcessing"; + break; } return "UNKNOWN"; } diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c index bfa63dc..863e9bf 100644 --- a/opensm/osm_perfmgr.c +++ b/opensm/osm_perfmgr.c @@ -164,6 +164,16 @@ static inline void decrement_outstanding_queries(osm_perfmgr_t * pm) { cl_atomic_dec(&pm->outstanding_queries); cl_event_signal(&pm->sig_query); + + if (!pm->outstanding_queries) { + cl_spinlock_acquire(&pm->lock); + if (pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) { + pm->sweep_state = PERFMGR_SWEEP_SLEEP; + OSM_LOG(pm->log, OSM_LOG_INFO, + "PM sweep state exiting Post Processing\n"); + } + cl_spinlock_release(&pm->lock); + } } /********************************************************************** @@ -439,7 +449,13 @@ static ib_api_status_t perfmgr_send_mad(osm_perfmgr_t *perfmgr, while (perfmgr->outstanding_queries > (int32_t)perfmgr->max_outstanding_queries) { cl_spinlock_acquire(&perfmgr->lock); - perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED; + if (perfmgr->sweep_state == PERFMGR_SWEEP_SLEEP) { + perfmgr->sweep_state = PERFMGR_SWEEP_POST_PROCESSING; + OSM_LOG(perfmgr->log, OSM_LOG_INFO, + "PM sweep state going into Post Processing\n"); + } + else if (perfmgr->sweep_state == PERFMGR_SWEEP_ACTIVE) + perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED; cl_spinlock_release(&perfmgr->lock); wait: sts = cl_event_wait_on(&perfmgr->sig_query, @@ -1015,7 +1031,8 @@ void osm_perfmgr_process(osm_perfmgr_t * pm) cl_spinlock_acquire(&pm->lock); if (pm->sweep_state == PERFMGR_SWEEP_ACTIVE || - pm->sweep_state == PERFMGR_SWEEP_SUSPENDED) { + pm->sweep_state == PERFMGR_SWEEP_SUSPENDED || + pm->sweep_state == PERFMGR_SWEEP_POST_PROCESSING) { cl_spinlock_release(&pm->lock); OSM_LOG(pm->log, OSM_LOG_INFO, "PM sweep state %d, skipping sweep\n",