Message ID | 1699436909-22767-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [net,v1] net/smc: avoid data corruption caused by decline | expand |
On 08.11.23 10:48, D. Wythe wrote: > From: "D. Wythe" <alibuda@linux.alibaba.com> > > We found a data corruption issue during testing of SMC-R on Redis > applications. > > The benchmark has a low probability of reporting a strange error as > shown below. > > "Error: Protocol error, got "\xe2" as reply type byte" > > Finally, we found that the retrieved error data was as follows: > > 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C > 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 > > It is quite obvious that this is a SMC DECLINE message, which means that > the applications received SMC protocol message. > We found that this was caused by the following situations: > > client server > proposal > -------------> > accept > <------------- > confirm > -------------> > wait confirm > > failed llc confirm > x------ > (after 2s)timeout > wait rsp > > wait decline > > (after 1s) timeout > (after 2s) timeout > decline > --------------> > decline > <-------------- > > As a result, a decline message was sent in the implementation, and this > message was read from TCP by the already-fallback connection. > > This patch double the client timeout as 2x of the server value, > With this simple change, the Decline messages should never cross or > collide (during Confirm link timeout). > > This issue requires an immediate solution, since the protocol updates > involve a more long-term solution. > > Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow") > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> > --- > net/smc/af_smc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index abd2667..5b91f55 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc) > int rc; > > /* receive CONFIRM LINK request from server over RoCE fabric */ > - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, > + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, > SMC_LLC_CONFIRM_LINK); > if (!qentry) { > struct smc_clc_msg_decline dclc; I'm wondering if the double time (if sufficient) of timeout could be for waiting for CLC_DECLINE on the client's side. i.e. diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 35ddebae8894..9b1feef1013d 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc) struct smc_clc_msg_decline dclc; rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), - SMC_CLC_DECLINE, CLC_WAIT_TIME_SHORT); + SMC_CLC_DECLINE, 2 * CLC_WAIT_TIME_SHORT); return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc; } smc_llc_save_peer_uid(qentry); Because the purpose is to let the server have the control to deline.
On Wed, 8 Nov 2023 17:48:29 +0800 D. Wythe wrote: > From: "D. Wythe" <alibuda@linux.alibaba.com> > > We found a data corruption issue during testing of SMC-R on Redis > applications. Please make sure you CC all relevant people pointed out by get_maintainers. Make sure you run get_maintainers on the generated patch, not just on file paths. You seem to have missed the following people: pabeni@redhat.com guwen@linux.alibaba.com tonylu@linux.alibaba.com edumazet@google.com
On 11/10/23 10:51 AM, D. Wythe wrote: > > > On 11/8/23 9:00 PM, Wenjia Zhang wrote: >> >> >> On 08.11.23 10:48, D. Wythe wrote: >>> From: "D. Wythe" <alibuda@linux.alibaba.com> >>> >>> We found a data corruption issue during testing of SMC-R on Redis >>> applications. >>> >>> The benchmark has a low probability of reporting a strange error as >>> shown below. >>> >>> "Error: Protocol error, got "\xe2" as reply type byte" >>> >>> Finally, we found that the retrieved error data was as follows: >>> >>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C >>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 >>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 >>> >>> It is quite obvious that this is a SMC DECLINE message, which means >>> that >>> the applications received SMC protocol message. >>> We found that this was caused by the following situations: >>> >>> client server >>> proposal >>> -------------> >>> accept >>> <------------- >>> confirm >>> -------------> >>> wait confirm >>> >>> failed llc confirm >>> x------ >>> (after 2s)timeout >>> wait rsp >>> >>> wait decline >>> >>> (after 1s) timeout >>> (after 2s) timeout >>> decline >>> --------------> >>> decline >>> <-------------- >>> >>> As a result, a decline message was sent in the implementation, and this >>> message was read from TCP by the already-fallback connection. >>> >>> This patch double the client timeout as 2x of the server value, >>> With this simple change, the Decline messages should never cross or >>> collide (during Confirm link timeout). >>> >>> This issue requires an immediate solution, since the protocol updates >>> involve a more long-term solution. >>> >>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC >>> flow") >>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >>> --- >>> net/smc/af_smc.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>> index abd2667..5b91f55 100644 >>> --- a/net/smc/af_smc.c >>> +++ b/net/smc/af_smc.c >>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct >>> smc_sock *smc) >>> int rc; >>> /* receive CONFIRM LINK request from server over RoCE fabric */ >>> - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, >>> + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, >>> SMC_LLC_CONFIRM_LINK); >>> if (!qentry) { >>> struct smc_clc_msg_decline dclc; >> I'm wondering if the double time (if sufficient) of timeout could be >> for waiting for CLC_DECLINE on the client's side. i.e. >> > > It depends. We can indeed introduce a sysctl to allow server to > manager their Confirm Link timeout, > but if there will be protocol updates, this introduction will no > longer be necessary, and we will > have to maintain it continuously. > > I believe the core of the solution is to ensure that decline messages > never cross or collide. Increasing > the client's timeout by twice as much as the server's timeout can > temporarily solve this problem. > If Jerry's proposed protocol updates are too complex or if there won't > be any future protocol updates, > it's still not late to let server manager their Confirm Link timeout then. > > Best wishes, > D. Wythe > FYI: It seems that my email was not successfully delivered due to some reasons. Sorry for that. D. Wythe >> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >> index 35ddebae8894..9b1feef1013d 100644 >> --- a/net/smc/af_smc.c >> +++ b/net/smc/af_smc.c >> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct >> smc_sock *smc) >> struct smc_clc_msg_decline dclc; >> >> rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), >> - SMC_CLC_DECLINE, >> CLC_WAIT_TIME_SHORT); >> + SMC_CLC_DECLINE, 2 * >> CLC_WAIT_TIME_SHORT); >> return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc; >> } >> smc_llc_save_peer_uid(qentry); >> >> Because the purpose is to let the server have the control to deline. >
On Wed, Nov 08, 2023 at 05:48:29PM +0800, D. Wythe wrote: >From: "D. Wythe" <alibuda@linux.alibaba.com> > >We found a data corruption issue during testing of SMC-R on Redis >applications. > >The benchmark has a low probability of reporting a strange error as >shown below. > >"Error: Protocol error, got "\xe2" as reply type byte" > >Finally, we found that the retrieved error data was as follows: > >0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C >0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 >0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 > >It is quite obvious that this is a SMC DECLINE message, which means that >the applications received SMC protocol message. >We found that this was caused by the following situations: > >client server > proposal > -------------> > accept > <------------- > confirm > -------------> >wait confirm > > failed llc confirm > x------ >(after 2s)timeout > wait rsp > >wait decline > >(after 1s) timeout > (after 2s) timeout > decline > --------------> > decline > <-------------- > >As a result, a decline message was sent in the implementation, and this >message was read from TCP by the already-fallback connection. > >This patch double the client timeout as 2x of the server value, >With this simple change, the Decline messages should never cross or >collide (during Confirm link timeout). > >This issue requires an immediate solution, since the protocol updates >involve a more long-term solution. > >Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow") >Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >--- > net/smc/af_smc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >index abd2667..5b91f55 100644 >--- a/net/smc/af_smc.c >+++ b/net/smc/af_smc.c >@@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc) > int rc; > > /* receive CONFIRM LINK request from server over RoCE fabric */ >- qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, >+ qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, > SMC_LLC_CONFIRM_LINK); It may be difficult for people to understand why LLC_WAIT_TIME is different, especially without any comments explaining its purpose. People are required to use git to find the reason, which I believe is not conducive to easy maintenance. Best regards, Dust > if (!qentry) { > struct smc_clc_msg_decline dclc; >-- >1.8.3.1
On 13.11.23 03:50, D. Wythe wrote: > > > On 11/10/23 10:51 AM, D. Wythe wrote: >> >> >> On 11/8/23 9:00 PM, Wenjia Zhang wrote: >>> >>> >>> On 08.11.23 10:48, D. Wythe wrote: >>>> From: "D. Wythe" <alibuda@linux.alibaba.com> >>>> >>>> We found a data corruption issue during testing of SMC-R on Redis >>>> applications. >>>> >>>> The benchmark has a low probability of reporting a strange error as >>>> shown below. >>>> >>>> "Error: Protocol error, got "\xe2" as reply type byte" >>>> >>>> Finally, we found that the retrieved error data was as follows: >>>> >>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C >>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 >>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 >>>> >>>> It is quite obvious that this is a SMC DECLINE message, which means >>>> that >>>> the applications received SMC protocol message. >>>> We found that this was caused by the following situations: >>>> >>>> client server >>>> proposal >>>> -------------> >>>> accept >>>> <------------- >>>> confirm >>>> -------------> >>>> wait confirm >>>> >>>> failed llc confirm >>>> x------ >>>> (after 2s)timeout >>>> wait rsp >>>> >>>> wait decline >>>> >>>> (after 1s) timeout >>>> (after 2s) timeout >>>> decline >>>> --------------> >>>> decline >>>> <-------------- >>>> >>>> As a result, a decline message was sent in the implementation, and this >>>> message was read from TCP by the already-fallback connection. >>>> >>>> This patch double the client timeout as 2x of the server value, >>>> With this simple change, the Decline messages should never cross or >>>> collide (during Confirm link timeout). >>>> >>>> This issue requires an immediate solution, since the protocol updates >>>> involve a more long-term solution. >>>> >>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC >>>> flow") >>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >>>> --- >>>> net/smc/af_smc.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>>> index abd2667..5b91f55 100644 >>>> --- a/net/smc/af_smc.c >>>> +++ b/net/smc/af_smc.c >>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct >>>> smc_sock *smc) >>>> int rc; >>>> /* receive CONFIRM LINK request from server over RoCE fabric */ >>>> - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, >>>> + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, >>>> SMC_LLC_CONFIRM_LINK); >>>> if (!qentry) { >>>> struct smc_clc_msg_decline dclc; >>> I'm wondering if the double time (if sufficient) of timeout could be >>> for waiting for CLC_DECLINE on the client's side. i.e. >>> >> >> It depends. We can indeed introduce a sysctl to allow server to >> manager their Confirm Link timeout, >> but if there will be protocol updates, this introduction will no >> longer be necessary, and we will >> have to maintain it continuously. >> no, I don't think, either, that we need a sysctl for that. >> I believe the core of the solution is to ensure that decline messages >> never cross or collide. Increasing >> the client's timeout by twice as much as the server's timeout can >> temporarily solve this problem. I have no objection with that, but my question is why you don't increase the timeout waiting for CLC_DECLINE instead of waiting LLC_Confirm_Link? Shouldn't they have the same effect? >> If Jerry's proposed protocol updates are too complex or if there won't >> be any future protocol updates, >> it's still not late to let server manager their Confirm Link timeout >> then. >> >> Best wishes, >> D. Wythe >> > > FYI: > > It seems that my email was not successfully delivered due to some > reasons. Sorry > for that. > > D. Wythe > > >>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>> index 35ddebae8894..9b1feef1013d 100644 >>> --- a/net/smc/af_smc.c >>> +++ b/net/smc/af_smc.c >>> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct >>> smc_sock *smc) >>> struct smc_clc_msg_decline dclc; >>> >>> rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), >>> - SMC_CLC_DECLINE, >>> CLC_WAIT_TIME_SHORT); >>> + SMC_CLC_DECLINE, 2 * >>> CLC_WAIT_TIME_SHORT); >>> return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc; >>> } >>> smc_llc_save_peer_uid(qentry); >>> >>> Because the purpose is to let the server have the control to deline. >> >
On 11/13/23 6:57 PM, Wenjia Zhang wrote: > > > On 13.11.23 03:50, D. Wythe wrote: >> >> >> On 11/10/23 10:51 AM, D. Wythe wrote: >>> >>> >>> On 11/8/23 9:00 PM, Wenjia Zhang wrote: >>>> >>>> >>>> On 08.11.23 10:48, D. Wythe wrote: >>>>> From: "D. Wythe" <alibuda@linux.alibaba.com> >>>>> >>>>> We found a data corruption issue during testing of SMC-R on Redis >>>>> applications. >>>>> >>>>> The benchmark has a low probability of reporting a strange error as >>>>> shown below. >>>>> >>>>> "Error: Protocol error, got "\xe2" as reply type byte" >>>>> >>>>> Finally, we found that the retrieved error data was as follows: >>>>> >>>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C >>>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 >>>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 >>>>> >>>>> It is quite obvious that this is a SMC DECLINE message, which >>>>> means that >>>>> the applications received SMC protocol message. >>>>> We found that this was caused by the following situations: >>>>> >>>>> client server >>>>> proposal >>>>> -------------> >>>>> accept >>>>> <------------- >>>>> confirm >>>>> -------------> >>>>> wait confirm >>>>> >>>>> failed llc confirm >>>>> x------ >>>>> (after 2s)timeout >>>>> wait rsp >>>>> >>>>> wait decline >>>>> >>>>> (after 1s) timeout >>>>> (after 2s) timeout >>>>> decline >>>>> --------------> >>>>> decline >>>>> <-------------- >>>>> >>>>> As a result, a decline message was sent in the implementation, and >>>>> this >>>>> message was read from TCP by the already-fallback connection. >>>>> >>>>> This patch double the client timeout as 2x of the server value, >>>>> With this simple change, the Decline messages should never cross or >>>>> collide (during Confirm link timeout). >>>>> >>>>> This issue requires an immediate solution, since the protocol updates >>>>> involve a more long-term solution. >>>>> >>>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the >>>>> LLC flow") >>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >>>>> --- >>>>> net/smc/af_smc.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>>>> index abd2667..5b91f55 100644 >>>>> --- a/net/smc/af_smc.c >>>>> +++ b/net/smc/af_smc.c >>>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct >>>>> smc_sock *smc) >>>>> int rc; >>>>> /* receive CONFIRM LINK request from server over RoCE >>>>> fabric */ >>>>> - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, >>>>> + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, >>>>> SMC_LLC_CONFIRM_LINK); >>>>> if (!qentry) { >>>>> struct smc_clc_msg_decline dclc; >>>> I'm wondering if the double time (if sufficient) of timeout could >>>> be for waiting for CLC_DECLINE on the client's side. i.e. >>>> >>> >>> It depends. We can indeed introduce a sysctl to allow server to >>> manager their Confirm Link timeout, >>> but if there will be protocol updates, this introduction will no >>> longer be necessary, and we will >>> have to maintain it continuously. >>> > no, I don't think, either, that we need a sysctl for that. I am okay about that. >>> I believe the core of the solution is to ensure that decline >>> messages never cross or collide. Increasing >>> the client's timeout by twice as much as the server's timeout can >>> temporarily solve this problem. > > I have no objection with that, but my question is why you don't > increase the timeout waiting for CLC_DECLINE instead of waiting > LLC_Confirm_Link? Shouldn't they have the same effect? > Logically speaking, of course, they have the same effect, but there are two reasons that i choose to increase LLC timeout here: 1. to avoid DECLINE cross or collide, we need a bigger time gap, a simple math is 2 ( LLC_Confirm_Link) + 1 (CLC_DECLINE) = 3 2 (LLC_Confirm_Link) + 1 * 2 (CLC_DECLINE) = 4 2 * 2(LLC_Confirm_Link) + 1 (CLC_DECLINE) = 5 Obviously, double the LLC_Confirm_Link will result in more time gaps. 2. increase LLC timeout to allow as many RDMA link as possible to succeed, rather than fallback. D. Wythe >>> If Jerry's proposed protocol updates are too complex or if there >>> won't be any future protocol updates, >>> it's still not late to let server manager their Confirm Link timeout >>> then. >>> >>> Best wishes, >>> D. Wythe >>> >> >> FYI: >> >> It seems that my email was not successfully delivered due to some >> reasons. Sorry >> for that. >> >> D. Wythe >> >> > >>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>>> index 35ddebae8894..9b1feef1013d 100644 >>>> --- a/net/smc/af_smc.c >>>> +++ b/net/smc/af_smc.c >>>> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct >>>> smc_sock *smc) >>>> struct smc_clc_msg_decline dclc; >>>> >>>> rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), >>>> - SMC_CLC_DECLINE, >>>> CLC_WAIT_TIME_SHORT); >>>> + SMC_CLC_DECLINE, 2 * >>>> CLC_WAIT_TIME_SHORT); >>>> return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc; >>>> } >>>> smc_llc_save_peer_uid(qentry); >>>> >>>> Because the purpose is to let the server have the control to deline. >>> >>
On 14.11.23 10:52, D. Wythe wrote: > > > On 11/13/23 6:57 PM, Wenjia Zhang wrote: >> >> >> On 13.11.23 03:50, D. Wythe wrote: >>> >>> >>> On 11/10/23 10:51 AM, D. Wythe wrote: >>>> >>>> >>>> On 11/8/23 9:00 PM, Wenjia Zhang wrote: >>>>> >>>>> >>>>> On 08.11.23 10:48, D. Wythe wrote: >>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com> >>>>>> >>>>>> We found a data corruption issue during testing of SMC-R on Redis >>>>>> applications. >>>>>> >>>>>> The benchmark has a low probability of reporting a strange error as >>>>>> shown below. >>>>>> >>>>>> "Error: Protocol error, got "\xe2" as reply type byte" >>>>>> >>>>>> Finally, we found that the retrieved error data was as follows: >>>>>> >>>>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C >>>>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 >>>>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 >>>>>> >>>>>> It is quite obvious that this is a SMC DECLINE message, which >>>>>> means that >>>>>> the applications received SMC protocol message. >>>>>> We found that this was caused by the following situations: >>>>>> >>>>>> client server >>>>>> proposal >>>>>> -------------> >>>>>> accept >>>>>> <------------- >>>>>> confirm >>>>>> -------------> >>>>>> wait confirm >>>>>> >>>>>> failed llc confirm >>>>>> x------ >>>>>> (after 2s)timeout >>>>>> wait rsp >>>>>> >>>>>> wait decline >>>>>> >>>>>> (after 1s) timeout >>>>>> (after 2s) timeout >>>>>> decline >>>>>> --------------> >>>>>> decline >>>>>> <-------------- >>>>>> >>>>>> As a result, a decline message was sent in the implementation, and >>>>>> this >>>>>> message was read from TCP by the already-fallback connection. >>>>>> >>>>>> This patch double the client timeout as 2x of the server value, >>>>>> With this simple change, the Decline messages should never cross or >>>>>> collide (during Confirm link timeout). >>>>>> >>>>>> This issue requires an immediate solution, since the protocol updates >>>>>> involve a more long-term solution. >>>>>> >>>>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the >>>>>> LLC flow") >>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >>>>>> --- >>>>>> net/smc/af_smc.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>>>>> index abd2667..5b91f55 100644 >>>>>> --- a/net/smc/af_smc.c >>>>>> +++ b/net/smc/af_smc.c >>>>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct >>>>>> smc_sock *smc) >>>>>> int rc; >>>>>> /* receive CONFIRM LINK request from server over RoCE >>>>>> fabric */ >>>>>> - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, >>>>>> + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, >>>>>> SMC_LLC_CONFIRM_LINK); >>>>>> if (!qentry) { >>>>>> struct smc_clc_msg_decline dclc; >>>>> I'm wondering if the double time (if sufficient) of timeout could >>>>> be for waiting for CLC_DECLINE on the client's side. i.e. >>>>> >>>> >>>> It depends. We can indeed introduce a sysctl to allow server to >>>> manager their Confirm Link timeout, >>>> but if there will be protocol updates, this introduction will no >>>> longer be necessary, and we will >>>> have to maintain it continuously. >>>> >> no, I don't think, either, that we need a sysctl for that. > > I am okay about that. > >>>> I believe the core of the solution is to ensure that decline >>>> messages never cross or collide. Increasing >>>> the client's timeout by twice as much as the server's timeout can >>>> temporarily solve this problem. >> >> I have no objection with that, but my question is why you don't >> increase the timeout waiting for CLC_DECLINE instead of waiting >> LLC_Confirm_Link? Shouldn't they have the same effect? >> > > Logically speaking, of course, they have the same effect, but there are > two reasons that i choose to increase LLC timeout here: > > 1. to avoid DECLINE cross or collide, we need a bigger time gap, a > simple math is > > 2 ( LLC_Confirm_Link) + 1 (CLC_DECLINE) = 3 > 2 (LLC_Confirm_Link) + 1 * 2 (CLC_DECLINE) = 4 > 2 * 2(LLC_Confirm_Link) + 1 (CLC_DECLINE) = 5 > > Obviously, double the LLC_Confirm_Link will result in more time gaps. > That's already clear to me. That's why I stressed "(if sufficient)". > 2. increase LLC timeout to allow as many RDMA link as possible to > succeed, rather than fallback. > ok, that sounds reasonable. And I think that's the answer which persuaded me. Thank you! > D. Wythe > >>>> If Jerry's proposed protocol updates are too complex or if there >>>> won't be any future protocol updates, >>>> it's still not late to let server manager their Confirm Link timeout >>>> then. >>>> >>>> Best wishes, >>>> D. Wythe >>>> >>> >>> FYI: >>> >>> It seems that my email was not successfully delivered due to some >>> reasons. Sorry >>> for that. >>> >>> D. Wythe >>> >>> >> >>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>>>> index 35ddebae8894..9b1feef1013d 100644 >>>>> --- a/net/smc/af_smc.c >>>>> +++ b/net/smc/af_smc.c >>>>> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct >>>>> smc_sock *smc) >>>>> struct smc_clc_msg_decline dclc; >>>>> >>>>> rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), >>>>> - SMC_CLC_DECLINE, >>>>> CLC_WAIT_TIME_SHORT); >>>>> + SMC_CLC_DECLINE, 2 * >>>>> CLC_WAIT_TIME_SHORT); >>>>> return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc; >>>>> } >>>>> smc_llc_save_peer_uid(qentry); >>>>> >>>>> Because the purpose is to let the server have the control to deline. >>>> >>> >
On 13.11.23 04:44, Dust Li wrote: > On Wed, Nov 08, 2023 at 05:48:29PM +0800, D. Wythe wrote: >> From: "D. Wythe" <alibuda@linux.alibaba.com> >> >> We found a data corruption issue during testing of SMC-R on Redis >> applications. >> >> The benchmark has a low probability of reporting a strange error as >> shown below. >> >> "Error: Protocol error, got "\xe2" as reply type byte" >> >> Finally, we found that the retrieved error data was as follows: >> >> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C >> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 >> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 >> >> It is quite obvious that this is a SMC DECLINE message, which means that >> the applications received SMC protocol message. >> We found that this was caused by the following situations: >> >> client server >> proposal >> -------------> >> accept >> <------------- >> confirm >> -------------> >> wait confirm >> >> failed llc confirm >> x------ >> (after 2s)timeout >> wait rsp >> >> wait decline >> >> (after 1s) timeout >> (after 2s) timeout >> decline >> --------------> >> decline >> <-------------- >> >> As a result, a decline message was sent in the implementation, and this >> message was read from TCP by the already-fallback connection. >> >> This patch double the client timeout as 2x of the server value, >> With this simple change, the Decline messages should never cross or >> collide (during Confirm link timeout). >> >> This issue requires an immediate solution, since the protocol updates >> involve a more long-term solution. >> >> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow") >> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >> --- >> net/smc/af_smc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >> index abd2667..5b91f55 100644 >> --- a/net/smc/af_smc.c >> +++ b/net/smc/af_smc.c >> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc) >> int rc; >> >> /* receive CONFIRM LINK request from server over RoCE fabric */ >> - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, >> + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, >> SMC_LLC_CONFIRM_LINK); > > It may be difficult for people to understand why LLC_WAIT_TIME is > different, especially without any comments explaining its purpose. > People are required to use git to find the reason, which I believe is > not conducive to easy maintenance. > > Best regards, > Dust > > Good point! @D.Wythe, could you please try to add a simple commet to explain it? Thanks, Wenjia > >> if (!qentry) { >> struct smc_clc_msg_decline dclc; >> -- >> 1.8.3.1
On 11/15/23 10:06 PM, Wenjia Zhang wrote: > > > On 13.11.23 04:44, Dust Li wrote: >> On Wed, Nov 08, 2023 at 05:48:29PM +0800, D. Wythe wrote: >>> From: "D. Wythe" <alibuda@linux.alibaba.com> >>> >>> We found a data corruption issue during testing of SMC-R on Redis >>> applications. >>> >>> The benchmark has a low probability of reporting a strange error as >>> shown below. >>> >>> "Error: Protocol error, got "\xe2" as reply type byte" >>> >>> Finally, we found that the retrieved error data was as follows: >>> >>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C >>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 >>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 >>> >>> It is quite obvious that this is a SMC DECLINE message, which means >>> that >>> the applications received SMC protocol message. >>> We found that this was caused by the following situations: >>> >>> client server >>> proposal >>> -------------> >>> accept >>> <------------- >>> confirm >>> -------------> >>> wait confirm >>> >>> failed llc confirm >>> x------ >>> (after 2s)timeout >>> wait rsp >>> >>> wait decline >>> >>> (after 1s) timeout >>> (after 2s) timeout >>> decline >>> --------------> >>> decline >>> <-------------- >>> >>> As a result, a decline message was sent in the implementation, and this >>> message was read from TCP by the already-fallback connection. >>> >>> This patch double the client timeout as 2x of the server value, >>> With this simple change, the Decline messages should never cross or >>> collide (during Confirm link timeout). >>> >>> This issue requires an immediate solution, since the protocol updates >>> involve a more long-term solution. >>> >>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC >>> flow") >>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> >>> --- >>> net/smc/af_smc.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>> index abd2667..5b91f55 100644 >>> --- a/net/smc/af_smc.c >>> +++ b/net/smc/af_smc.c >>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct >>> smc_sock *smc) >>> int rc; >>> >>> /* receive CONFIRM LINK request from server over RoCE fabric */ >>> - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, >>> + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, >>> SMC_LLC_CONFIRM_LINK); >> >> It may be difficult for people to understand why LLC_WAIT_TIME is >> different, especially without any comments explaining its purpose. >> People are required to use git to find the reason, which I believe is >> not conducive to easy maintenance. >> >> Best regards, >> Dust >> >> > Good point! @D.Wythe, could you please try to add a simple commet to > explain it? > Also good to me, i will add comment to explain it. D. Wythe > Thanks, > Wenjia >> >>> if (!qentry) { >>> struct smc_clc_msg_decline dclc; >>> -- >>> 1.8.3.1
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index abd2667..5b91f55 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc) int rc; /* receive CONFIRM LINK request from server over RoCE fabric */ - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, SMC_LLC_CONFIRM_LINK); if (!qentry) { struct smc_clc_msg_decline dclc;