diff mbox

osm_sm_state_mgr.c Fix handling of polling retry number

Message ID 3b0cb098-1e00-417d-8a5b-0aa766926867@default (mailing list archive)
State Superseded
Delegated to: Hal Rosenstock
Headers show

Commit Message

Line Holen Nov. 15, 2013, 12:15 p.m. UTC
The retry counter is now only updated if a packet is actually sent.
(But as before the initial request is also counted.)

Prior to this change the actual maximum number of packets sent were
polling retry number minus one.

Signed-off-by: Line Holen <line.holen@oracle.com>

---

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hal Rosenstock Nov. 27, 2013, 12:16 p.m. UTC | #1
On 11/15/2013 7:15 AM, Line Holen wrote:
> The retry counter is now only updated if a packet is actually sent.
> (But as before the initial request is also counted.)
> 
> Prior to this change the actual maximum number of packets sent were
> polling retry number minus one.
> 
> Signed-off-by: Line Holen <line.holen@oracle.com>
> 
> ---
> 
> diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
> index 596ad8f..6eff9ee 100644
> --- a/opensm/osm_sm_state_mgr.c
> +++ b/opensm/osm_sm_state_mgr.c
> @@ -197,16 +197,14 @@ void osm_sm_state_mgr_polling_callback(IN void *context)
>  	}
>  
>  	/*
> -	 * Incr the retry number.
> -	 * If it reached the max_retry_number in the subnet opt - call
> +	 * If retry number reached the max_retry_number in the subnet opt - call
>  	 * osm_sm_state_mgr_process with signal OSM_SM_SIGNAL_POLLING_TIMEOUT
>  	 */
> -	sm->retry_number++;
>  	OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "SM State %d (%s), Retry number:%d\n",
>  		sm->p_subn->sm_state,  osm_get_sm_mgr_state_str(sm->p_subn->sm_state),
>  		sm->retry_number);
>  
> -	if (sm->retry_number >= sm->p_subn->opt.polling_retry_number) {
> +	if (sm->retry_number > sm->p_subn->opt.polling_retry_number) {
>  		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>  			"Reached polling_retry_number value in retry_number. "
>  			"Go to DISCOVERY state\n");
> @@ -214,6 +212,9 @@ void osm_sm_state_mgr_polling_callback(IN void *context)
>  		goto Exit;
>  	}
>  
> +	/* Increment the retry number */
> +	sm->retry_number++;

Would it be better to increment retry number if
sm_state_mgr_send_master_sm_info_req call just below this succeeds ?

-- Hal

> +
>  	/* Send a SubnGet(SMInfo) request to the remote sm (depends on our state) */
>  	sm_state_mgr_send_master_sm_info_req(sm);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Line Holen Nov. 27, 2013, 2:42 p.m. UTC | #2
On 11/27/13 13:16, Hal Rosenstock wrote:
> On 11/15/2013 7:15 AM, Line Holen wrote:
>> The retry counter is now only updated if a packet is actually sent.
>> (But as before the initial request is also counted.)
>>
>> Prior to this change the actual maximum number of packets sent were
>> polling retry number minus one.
>>
>> Signed-off-by: Line Holen<line.holen@oracle.com>
>>
>> ---
>>
>> diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
>> index 596ad8f..6eff9ee 100644
>> --- a/opensm/osm_sm_state_mgr.c
>> +++ b/opensm/osm_sm_state_mgr.c
>> @@ -197,16 +197,14 @@ void osm_sm_state_mgr_polling_callback(IN void *context)
>>   	}
>>
>>   	/*
>> -	 * Incr the retry number.
>> -	 * If it reached the max_retry_number in the subnet opt - call
>> +	 * If retry number reached the max_retry_number in the subnet opt - call
>>   	 * osm_sm_state_mgr_process with signal OSM_SM_SIGNAL_POLLING_TIMEOUT
>>   	 */
>> -	sm->retry_number++;
>>   	OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "SM State %d (%s), Retry number:%d\n",
>>   		sm->p_subn->sm_state,  osm_get_sm_mgr_state_str(sm->p_subn->sm_state),
>>   		sm->retry_number);
>>
>> -	if (sm->retry_number>= sm->p_subn->opt.polling_retry_number) {
>> +	if (sm->retry_number>  sm->p_subn->opt.polling_retry_number) {
>>   		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>>   			"Reached polling_retry_number value in retry_number. "
>>   			"Go to DISCOVERY state\n");
>> @@ -214,6 +212,9 @@ void osm_sm_state_mgr_polling_callback(IN void *context)
>>   		goto Exit;
>>   	}
>>
>> +	/* Increment the retry number */
>> +	sm->retry_number++;
> Would it be better to increment retry number if
> sm_state_mgr_send_master_sm_info_req call just below this succeeds ?
>
> -- Hal
I'm not sure really. The current placement was to avoid potential race 
with response handling
and the clearing of the counter there (incrementing after the response 
were received). Seemed
to me that this could happen with the current locking.

Line
>
>> +
>>   	/* Send a SubnGet(SMInfo) request to the remote sm (depends on our state) */
>>   	sm_state_mgr_send_master_sm_info_req(sm);
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Dec. 3, 2013, 1:17 p.m. UTC | #3
On 11/27/2013 9:42 AM, Line Holen wrote:
> On 11/27/13 13:16, Hal Rosenstock wrote:
>> On 11/15/2013 7:15 AM, Line Holen wrote:
>>> The retry counter is now only updated if a packet is actually sent.
>>> (But as before the initial request is also counted.)
>>>
>>> Prior to this change the actual maximum number of packets sent were
>>> polling retry number minus one.
>>>
>>> Signed-off-by: Line Holen<line.holen@oracle.com>
>>>
>>> ---
>>>
>>> diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
>>> index 596ad8f..6eff9ee 100644
>>> --- a/opensm/osm_sm_state_mgr.c
>>> +++ b/opensm/osm_sm_state_mgr.c
>>> @@ -197,16 +197,14 @@ void osm_sm_state_mgr_polling_callback(IN void
>>> *context)
>>>       }
>>>
>>>       /*
>>> -     * Incr the retry number.
>>> -     * If it reached the max_retry_number in the subnet opt - call
>>> +     * If retry number reached the max_retry_number in the subnet
>>> opt - call
>>>        * osm_sm_state_mgr_process with signal
>>> OSM_SM_SIGNAL_POLLING_TIMEOUT
>>>        */
>>> -    sm->retry_number++;
>>>       OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "SM State %d (%s), Retry
>>> number:%d\n",
>>>           sm->p_subn->sm_state, 
>>> osm_get_sm_mgr_state_str(sm->p_subn->sm_state),
>>>           sm->retry_number);
>>>
>>> -    if (sm->retry_number>= sm->p_subn->opt.polling_retry_number) {
>>> +    if (sm->retry_number>  sm->p_subn->opt.polling_retry_number) {
>>>           OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>>>               "Reached polling_retry_number value in retry_number. "
>>>               "Go to DISCOVERY state\n");
>>> @@ -214,6 +212,9 @@ void osm_sm_state_mgr_polling_callback(IN void
>>> *context)
>>>           goto Exit;
>>>       }
>>>
>>> +    /* Increment the retry number */
>>> +    sm->retry_number++;
>> Would it be better to increment retry number if
>> sm_state_mgr_send_master_sm_info_req call just below this succeeds ?
>>
>> -- Hal
> I'm not sure really.

All I was proposing was a minor variation to what you proposed:
to add a status return to sm_state_mgr_send_master_sm_info_req and only
increment the retry_number if that call was "successful".

> The current placement was to avoid potential race
> with response handling
> and the clearing of the counter there (incrementing after the response
> were received). 

Maybe I'm missing something but I don't see how this changes any
potential race condition other than perhaps a smaller time window.

> Seemed to me that this could happen with the current locking.

Yes, it looks to me like the locking here needs fixing. I'll send a
patch for this shortly...

-- Hal

> 
> Line
>>
>>> +
>>>       /* Send a SubnGet(SMInfo) request to the remote sm (depends on
>>> our state) */
>>>       sm_state_mgr_send_master_sm_info_req(sm);
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Line Holen Dec. 3, 2013, 2:18 p.m. UTC | #4
On 12/03/13 14:17, Hal Rosenstock wrote:
> On 11/27/2013 9:42 AM, Line Holen wrote:
>> On 11/27/13 13:16, Hal Rosenstock wrote:
>>> On 11/15/2013 7:15 AM, Line Holen wrote:
>>>> The retry counter is now only updated if a packet is actually sent.
>>>> (But as before the initial request is also counted.)
>>>>
>>>> Prior to this change the actual maximum number of packets sent were
>>>> polling retry number minus one.
>>>>
>>>> Signed-off-by: Line Holen<line.holen@oracle.com>
>>>>
>>>> ---
>>>>
>>>> diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
>>>> index 596ad8f..6eff9ee 100644
>>>> --- a/opensm/osm_sm_state_mgr.c
>>>> +++ b/opensm/osm_sm_state_mgr.c
>>>> @@ -197,16 +197,14 @@ void osm_sm_state_mgr_polling_callback(IN void
>>>> *context)
>>>>        }
>>>>
>>>>        /*
>>>> -     * Incr the retry number.
>>>> -     * If it reached the max_retry_number in the subnet opt - call
>>>> +     * If retry number reached the max_retry_number in the subnet
>>>> opt - call
>>>>         * osm_sm_state_mgr_process with signal
>>>> OSM_SM_SIGNAL_POLLING_TIMEOUT
>>>>         */
>>>> -    sm->retry_number++;
>>>>        OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "SM State %d (%s), Retry
>>>> number:%d\n",
>>>>            sm->p_subn->sm_state,
>>>> osm_get_sm_mgr_state_str(sm->p_subn->sm_state),
>>>>            sm->retry_number);
>>>>
>>>> -    if (sm->retry_number>= sm->p_subn->opt.polling_retry_number) {
>>>> +    if (sm->retry_number>   sm->p_subn->opt.polling_retry_number) {
>>>>            OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>>>>                "Reached polling_retry_number value in retry_number. "
>>>>                "Go to DISCOVERY state\n");
>>>> @@ -214,6 +212,9 @@ void osm_sm_state_mgr_polling_callback(IN void
>>>> *context)
>>>>            goto Exit;
>>>>        }
>>>>
>>>> +    /* Increment the retry number */
>>>> +    sm->retry_number++;
>>> Would it be better to increment retry number if
>>> sm_state_mgr_send_master_sm_info_req call just below this succeeds ?
>>>
>>> -- Hal
>> I'm not sure really.
> All I was proposing was a minor variation to what you proposed:
> to add a status return to sm_state_mgr_send_master_sm_info_req and only
> increment the retry_number if that call was "successful".
Understood.
>
>> The current placement was to avoid potential race
>> with response handling
>> and the clearing of the counter there (incrementing after the response
>> were received).
> Maybe I'm missing something but I don't see how this changes any
> potential race condition other than perhaps a smaller time window.
With the current locking and moving the increment later you could end up 
processing
the response (and clear the counter) and then increment it afterwords in 
this function.
With the suggested patch you'd increment every time you would attempt to 
send a
packet. Not perfect, but at least better than it used to be and it did 
not introduce any race.
>
>> Seemed to me that this could happen with the current locking.
> Yes, it looks to me like the locking here needs fixing. I'll send a
> patch for this shortly...
OK, good. Do you want me to send a v2 rebased on top of this patch that 
incorporate
your initial comment ?

Line

>
> -- Hal
>
>> Line
>>>> +
>>>>        /* Send a SubnGet(SMInfo) request to the remote sm (depends on
>>>> our state) */
>>>>        sm_state_mgr_send_master_sm_info_req(sm);
>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Dec. 3, 2013, 2:30 p.m. UTC | #5
On 12/3/2013 9:18 AM, Line Holen wrote:
> On 12/03/13 14:17, Hal Rosenstock wrote:
>> Yes, it looks to me like the locking here needs fixing. I'll send a
>> patch for this shortly...
> OK, good. Do you want me to send a v2 rebased on top of this patch that
> incorporate
> your initial comment ?

Yes, please.

-- Hal

> 
> Line
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
index 596ad8f..6eff9ee 100644
--- a/opensm/osm_sm_state_mgr.c
+++ b/opensm/osm_sm_state_mgr.c
@@ -197,16 +197,14 @@  void osm_sm_state_mgr_polling_callback(IN void *context)
 	}
 
 	/*
-	 * Incr the retry number.
-	 * If it reached the max_retry_number in the subnet opt - call
+	 * If retry number reached the max_retry_number in the subnet opt - call
 	 * osm_sm_state_mgr_process with signal OSM_SM_SIGNAL_POLLING_TIMEOUT
 	 */
-	sm->retry_number++;
 	OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "SM State %d (%s), Retry number:%d\n",
 		sm->p_subn->sm_state,  osm_get_sm_mgr_state_str(sm->p_subn->sm_state),
 		sm->retry_number);
 
-	if (sm->retry_number >= sm->p_subn->opt.polling_retry_number) {
+	if (sm->retry_number > sm->p_subn->opt.polling_retry_number) {
 		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
 			"Reached polling_retry_number value in retry_number. "
 			"Go to DISCOVERY state\n");
@@ -214,6 +212,9 @@  void osm_sm_state_mgr_polling_callback(IN void *context)
 		goto Exit;
 	}
 
+	/* Increment the retry number */
+	sm->retry_number++;
+
 	/* Send a SubnGet(SMInfo) request to the remote sm (depends on our state) */
 	sm_state_mgr_send_master_sm_info_req(sm);