Message ID | 20210305142407.23652-1-aaptel@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cifs: try to pick channel with a minimum of credits | expand |
The comment caught my attention - is that accurate? When a channel has fewer than 3 credits (assuming we had two reserved for oplock and echo), that doesn't mean that there are fewer than 3 credits in flight - just that current credits available is lower right? So the channel could still recover as long as current credits + credits in flight is at least 3. +/* + * Min number of credits for a channel to be picked. + * + * Note that once a channel reaches this threshold it will never be + * picked again as no credits can be requested from it. + */ +#define CIFS_CHANNEL_MIN_CREDITS 3 On Fri, Mar 5, 2021 at 8:24 AM Aurélien Aptel <aaptel@suse.com> wrote: > > From: Aurelien Aptel <aaptel@suse.com> > > Check channel credits to prevent the client from using a starved > channel that cannot send anything. > > Special care must be taken in selecting the minimum value: when > channels are created they start off with a small amount that slowly > ramps up as the channel gets used. Thus a new channel might never be > picked if the min value is too small. > > Signed-off-by: Aurelien Aptel <aaptel@suse.com> > --- > fs/cifs/transport.c | 57 ++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 49 insertions(+), 8 deletions(-) > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > index e90a1d1380b0..7bb1584b3724 100644 > --- a/fs/cifs/transport.c > +++ b/fs/cifs/transport.c > @@ -44,6 +44,14 @@ > /* Max number of iovectors we can use off the stack when sending requests. */ > #define CIFS_MAX_IOV_SIZE 8 > > +/* > + * Min number of credits for a channel to be picked. > + * > + * Note that once a channel reaches this threshold it will never be > + * picked again as no credits can be requested from it. > + */ > +#define CIFS_CHANNEL_MIN_CREDITS 3 > + > void > cifs_wake_up_task(struct mid_q_entry *mid) > { > @@ -1051,20 +1059,53 @@ cifs_cancelled_callback(struct mid_q_entry *mid) > struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > { > uint index = 0; > + struct TCP_Server_Info *s; > > if (!ses) > return NULL; > > - if (!ses->binding) { > - /* round robin */ > - if (ses->chan_count > 1) { > - index = (uint)atomic_inc_return(&ses->chan_seq); > - index %= ses->chan_count; > - } > - return ses->chans[index].server; > - } else { > + if (ses->binding) > return cifs_ses_server(ses); > + > + /* > + * Channels are created right after the session is made. The > + * count cannot change after that so it is not racy to check. > + */ > + if (ses->chan_count == 1) > + return ses->chans[index].server; > + > + /* round robin */ > + index = (uint)atomic_inc_return(&ses->chan_seq); > + index %= ses->chan_count; > + s = ses->chans[index].server; > + > + /* > + * Checking server credits is racy, but getting a slightly > + * stale value should not be an issue here > + */ > + if (s->credits <= CIFS_CHANNEL_MIN_CREDITS) { > + uint i; > + > + cifs_dbg(VFS, "cannot pick conn_id=0x%llx not enough credits (%u)", > + s->conn_id, > + s->credits); > + > + /* > + * Look at all other channels starting from the next > + * one and pick first possible channel. > + */ > + for (i = 1; i < ses->chan_count; i++) { > + s = ses->chans[(index+i) % ses->chan_count].server; > + if (s->credits > CIFS_CHANNEL_MIN_CREDITS) > + return s; > + } > } > + > + /* > + * If none are possible, keep the initially picked one, but > + * later on it will block to wait for credits or fail. > + */ > + return ses->chans[index].server; > } > > int > -- > 2.30.0 >
Spent some time in this code path. Seems like this is more complicated than I initially thought. @Aurélien Aptel A few things to consider: 1. What if the credits that will be needed by the request is more than 3 (or any constant). We may still end up returning -EDEADLK to the user when we don't find enough credits, and there are not enough in_flight to satisfy the request. Ideally, we should still try other channels. 2. Echo and oplock use 1 reserved credit each, which the regular operations can steal, in case of shortage. 3. Reading server->credits without a lock can result in wildly different values, since some CPU architectures may not update it atomically. At worse, we may select a channel which may not have enough credits when we get to it. What if we do this... wait_for_compound_request() can return -EDEADLK when it doesn't get enough credits, and there are no requests in_flight. In compound_send_recv(), after we call wait_for_compound_request(), we can check it's return value. If it's -EDEADLK, we keep calling cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for channels already selected and rejected) and wait_for_compound_request() in a loop till we find a channel which has enough credits, or we run out of channels; in which case we can return -EDEADLK. Makes sense? Do you see a problem with that approach? Regards, Shyam On Fri, Mar 5, 2021 at 9:40 PM Steve French <smfrench@gmail.com> wrote: > > The comment caught my attention - is that accurate? When a channel > has fewer than 3 credits (assuming we had two reserved for oplock and > echo), that doesn't mean that there are fewer than 3 credits in flight > - just that current credits available is lower right? So the channel > could still recover as long as current credits + credits in flight is > at least 3. > > +/* > + * Min number of credits for a channel to be picked. > + * > + * Note that once a channel reaches this threshold it will never be > + * picked again as no credits can be requested from it. > + */ > +#define CIFS_CHANNEL_MIN_CREDITS 3 > > On Fri, Mar 5, 2021 at 8:24 AM Aurélien Aptel <aaptel@suse.com> wrote: > > > > From: Aurelien Aptel <aaptel@suse.com> > > > > Check channel credits to prevent the client from using a starved > > channel that cannot send anything. > > > > Special care must be taken in selecting the minimum value: when > > channels are created they start off with a small amount that slowly > > ramps up as the channel gets used. Thus a new channel might never be > > picked if the min value is too small. > > > > Signed-off-by: Aurelien Aptel <aaptel@suse.com> > > --- > > fs/cifs/transport.c | 57 ++++++++++++++++++++++++++++++++++++++------- > > 1 file changed, 49 insertions(+), 8 deletions(-) > > > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > > index e90a1d1380b0..7bb1584b3724 100644 > > --- a/fs/cifs/transport.c > > +++ b/fs/cifs/transport.c > > @@ -44,6 +44,14 @@ > > /* Max number of iovectors we can use off the stack when sending requests. */ > > #define CIFS_MAX_IOV_SIZE 8 > > > > +/* > > + * Min number of credits for a channel to be picked. > > + * > > + * Note that once a channel reaches this threshold it will never be > > + * picked again as no credits can be requested from it. > > + */ > > +#define CIFS_CHANNEL_MIN_CREDITS 3 > > + > > void > > cifs_wake_up_task(struct mid_q_entry *mid) > > { > > @@ -1051,20 +1059,53 @@ cifs_cancelled_callback(struct mid_q_entry *mid) > > struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > > { > > uint index = 0; > > + struct TCP_Server_Info *s; > > > > if (!ses) > > return NULL; > > > > - if (!ses->binding) { > > - /* round robin */ > > - if (ses->chan_count > 1) { > > - index = (uint)atomic_inc_return(&ses->chan_seq); > > - index %= ses->chan_count; > > - } > > - return ses->chans[index].server; > > - } else { > > + if (ses->binding) > > return cifs_ses_server(ses); > > + > > + /* > > + * Channels are created right after the session is made. The > > + * count cannot change after that so it is not racy to check. > > + */ > > + if (ses->chan_count == 1) > > + return ses->chans[index].server; > > + > > + /* round robin */ > > + index = (uint)atomic_inc_return(&ses->chan_seq); > > + index %= ses->chan_count; > > + s = ses->chans[index].server; > > + > > + /* > > + * Checking server credits is racy, but getting a slightly > > + * stale value should not be an issue here > > + */ > > + if (s->credits <= CIFS_CHANNEL_MIN_CREDITS) { > > + uint i; > > + > > + cifs_dbg(VFS, "cannot pick conn_id=0x%llx not enough credits (%u)", > > + s->conn_id, > > + s->credits); > > + > > + /* > > + * Look at all other channels starting from the next > > + * one and pick first possible channel. > > + */ > > + for (i = 1; i < ses->chan_count; i++) { > > + s = ses->chans[(index+i) % ses->chan_count].server; > > + if (s->credits > CIFS_CHANNEL_MIN_CREDITS) > > + return s; > > + } > > } > > + > > + /* > > + * If none are possible, keep the initially picked one, but > > + * later on it will block to wait for credits or fail. > > + */ > > + return ses->chans[index].server; > > } > > > > int > > -- > > 2.30.0 > > > > > -- > Thanks, > > Steve
Hi Shyam, Thanks for the review. I also realized we cannot make this failproof so I went in with the idea to just avoid picking easy, non-confusing cases of unusable channels. If that's not good we can drop the patch all together. Shyam Prasad N <nspmangalore@gmail.com> writes: > Spent some time in this code path. Seems like this is more complicated > than I initially thought. > @Aurélien Aptel A few things to consider: > 1. What if the credits that will be needed by the request is more than > 3 (or any constant). We may still end up returning -EDEADLK to the > user when we don't find enough credits, and there are not enough > in_flight to satisfy the request. Ideally, we should still try other > channels. Yes you're right, it won't prevent failing if more credits are needed. The patch wasn't meant to be failproof, just to avoid most occurences of the problem. It's a simple sanity check with some false-positives and false-negatives. > 2. Echo and oplock use 1 reserved credit each, which the regular > operations can steal, in case of shortage. There's a dedicated server->echo_credit I think. > 3. Reading server->credits without a lock can result in wildly > different values, since some CPU architectures may not update it > atomically. At worse, we may select a channel which may not have > enough credits when we get to it If we are just doing a read on an aligned int, at least on x86 you will get either a stale value or the new value, you cannot get a garbage mix of both. Which CPU architecture doesn't provide cache coherency at that level? This is a complex question I couldn't find an easy answer to. In any case cifs.ko is already assuming it's valid in various places. We are doing it for some usage of the server->tcpStatus, ses->status, tcon->tidStatus at least. The problems of atomic read in pick_channel() aside, the credits might change from another process between the moment the channel is picked and the moment the credits needed are spent (server->credit -= XYZ). In which case it will also EDEADLK later on. > What if we do this... > wait_for_compound_request() can return -EDEADLK when it doesn't get > enough credits, and there are no requests in_flight. > In compound_send_recv(), after we call wait_for_compound_request(), we > can check it's return value. If it's -EDEADLK, we keep calling > cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for > channels already selected and rejected) and > wait_for_compound_request() in a loop till we find a channel which has > enough credits, or we run out of channels; in which case we can return > -EDEADLK. > Makes sense? Do you see a problem with that approach? Some code relies on server->* values so if you swap the server pointer it at the last moment I'm not sure those values will match the new server ptr. Cheers,
Hm there's a typo in my commit msg: Special care must be taken in selecting the minimum value: when channels are created they start off with a small amount that slowly ramps up as the channel gets used. Thus a new channel might never be picked if the min value is too small. ^^^^^^^ should be "too big" Cheers,
> Some code relies on server->* values so if you swap the server pointer > it at the last moment I'm not sure those values will match the new > server ptr. I'm not sure that I understand this. I'm not suggesting that we swap. I'm only saying that on getting a EDEADLK error from compound_send_recv(), try another channel instead of returning that to userspace. Please let me know if I'm missing something here. Regards, Shyam On Mon, Mar 8, 2021 at 5:22 PM Aurélien Aptel <aaptel@suse.com> wrote: > > Hi Shyam, > > Thanks for the review. I also realized we cannot make this failproof so > I went in with the idea to just avoid picking easy, non-confusing cases > of unusable channels. If that's not good we can drop the patch all > together. > > Shyam Prasad N <nspmangalore@gmail.com> writes: > > Spent some time in this code path. Seems like this is more complicated > > than I initially thought. > > @Aurélien Aptel A few things to consider: > > 1. What if the credits that will be needed by the request is more than > > 3 (or any constant). We may still end up returning -EDEADLK to the > > user when we don't find enough credits, and there are not enough > > in_flight to satisfy the request. Ideally, we should still try other > > channels. > > Yes you're right, it won't prevent failing if more credits are > needed. The patch wasn't meant to be failproof, just to avoid most > occurences of the problem. It's a simple sanity check with some > false-positives and false-negatives. > > > 2. Echo and oplock use 1 reserved credit each, which the regular > > operations can steal, in case of shortage. > > There's a dedicated server->echo_credit I think. > > > 3. Reading server->credits without a lock can result in wildly > > different values, since some CPU architectures may not update it > > atomically. At worse, we may select a channel which may not have > > enough credits when we get to it > > If we are just doing a read on an aligned int, at least on x86 you will > get either a stale value or the new value, you cannot get a garbage mix > of both. Which CPU architecture doesn't provide cache coherency at that > level? This is a complex question I couldn't find an easy answer to. > > In any case cifs.ko is already assuming it's valid in various places. We > are doing it for some usage of the server->tcpStatus, ses->status, > tcon->tidStatus at least. > > The problems of atomic read in pick_channel() aside, the credits might > change from another process between the moment the channel is picked and > the moment the credits needed are spent (server->credit -= XYZ). In > which case it will also EDEADLK later on. > > > What if we do this... > > wait_for_compound_request() can return -EDEADLK when it doesn't get > > enough credits, and there are no requests in_flight. > > In compound_send_recv(), after we call wait_for_compound_request(), we > > can check it's return value. If it's -EDEADLK, we keep calling > > cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for > > channels already selected and rejected) and > > wait_for_compound_request() in a loop till we find a channel which has > > enough credits, or we run out of channels; in which case we can return > > -EDEADLK. > > Makes sense? Do you see a problem with that approach? > > Some code relies on server->* values so if you swap the server pointer > it at the last moment I'm not sure those values will match the new > server ptr. > > Cheers, > -- > Aurélien Aptel / SUSE Labs Samba Team > GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3 > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München) >
Discussed this with Aurelien. > I'm only saying that on getting a EDEADLK error from > compound_send_recv(), try another channel instead of returning that to > userspace. We both agreed that this will be a cleaner way to deal with the issue. However, he pointed to me that the code changes will be more than what I initially thought. That needs more investigation. The most likely case to hit the problem of EDEADLK is when the first request on the channel is bigger than what we need. The proposed fix should avoid that for basic requests by switching channels (pending testing). However, a large read/write could still result in EDEADLK error being returned. Regards, Shyam On Thu, Mar 11, 2021 at 4:19 PM Shyam Prasad N <nspmangalore@gmail.com> wrote: > > > Some code relies on server->* values so if you swap the server pointer > > it at the last moment I'm not sure those values will match the new > > server ptr. > > I'm not sure that I understand this. I'm not suggesting that we swap. > I'm only saying that on getting a EDEADLK error from > compound_send_recv(), try another channel instead of returning that to > userspace. > Please let me know if I'm missing something here. > > Regards, > Shyam > > On Mon, Mar 8, 2021 at 5:22 PM Aurélien Aptel <aaptel@suse.com> wrote: > > > > Hi Shyam, > > > > Thanks for the review. I also realized we cannot make this failproof so > > I went in with the idea to just avoid picking easy, non-confusing cases > > of unusable channels. If that's not good we can drop the patch all > > together. > > > > Shyam Prasad N <nspmangalore@gmail.com> writes: > > > Spent some time in this code path. Seems like this is more complicated > > > than I initially thought. > > > @Aurélien Aptel A few things to consider: > > > 1. What if the credits that will be needed by the request is more than > > > 3 (or any constant). We may still end up returning -EDEADLK to the > > > user when we don't find enough credits, and there are not enough > > > in_flight to satisfy the request. Ideally, we should still try other > > > channels. > > > > Yes you're right, it won't prevent failing if more credits are > > needed. The patch wasn't meant to be failproof, just to avoid most > > occurences of the problem. It's a simple sanity check with some > > false-positives and false-negatives. > > > > > 2. Echo and oplock use 1 reserved credit each, which the regular > > > operations can steal, in case of shortage. > > > > There's a dedicated server->echo_credit I think. > > > > > 3. Reading server->credits without a lock can result in wildly > > > different values, since some CPU architectures may not update it > > > atomically. At worse, we may select a channel which may not have > > > enough credits when we get to it > > > > If we are just doing a read on an aligned int, at least on x86 you will > > get either a stale value or the new value, you cannot get a garbage mix > > of both. Which CPU architecture doesn't provide cache coherency at that > > level? This is a complex question I couldn't find an easy answer to. > > > > In any case cifs.ko is already assuming it's valid in various places. We > > are doing it for some usage of the server->tcpStatus, ses->status, > > tcon->tidStatus at least. > > > > The problems of atomic read in pick_channel() aside, the credits might > > change from another process between the moment the channel is picked and > > the moment the credits needed are spent (server->credit -= XYZ). In > > which case it will also EDEADLK later on. > > > > > What if we do this... > > > wait_for_compound_request() can return -EDEADLK when it doesn't get > > > enough credits, and there are no requests in_flight. > > > In compound_send_recv(), after we call wait_for_compound_request(), we > > > can check it's return value. If it's -EDEADLK, we keep calling > > > cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for > > > channels already selected and rejected) and > > > wait_for_compound_request() in a loop till we find a channel which has > > > enough credits, or we run out of channels; in which case we can return > > > -EDEADLK. > > > Makes sense? Do you see a problem with that approach? > > > > Some code relies on server->* values so if you swap the server pointer > > it at the last moment I'm not sure those values will match the new > > server ptr. > > > > Cheers, > > -- > > Aurélien Aptel / SUSE Labs Samba Team > > GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3 > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München) > > > > > -- > Regards, > Shyam
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index e90a1d1380b0..7bb1584b3724 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -44,6 +44,14 @@ /* Max number of iovectors we can use off the stack when sending requests. */ #define CIFS_MAX_IOV_SIZE 8 +/* + * Min number of credits for a channel to be picked. + * + * Note that once a channel reaches this threshold it will never be + * picked again as no credits can be requested from it. + */ +#define CIFS_CHANNEL_MIN_CREDITS 3 + void cifs_wake_up_task(struct mid_q_entry *mid) { @@ -1051,20 +1059,53 @@ cifs_cancelled_callback(struct mid_q_entry *mid) struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) { uint index = 0; + struct TCP_Server_Info *s; if (!ses) return NULL; - if (!ses->binding) { - /* round robin */ - if (ses->chan_count > 1) { - index = (uint)atomic_inc_return(&ses->chan_seq); - index %= ses->chan_count; - } - return ses->chans[index].server; - } else { + if (ses->binding) return cifs_ses_server(ses); + + /* + * Channels are created right after the session is made. The + * count cannot change after that so it is not racy to check. + */ + if (ses->chan_count == 1) + return ses->chans[index].server; + + /* round robin */ + index = (uint)atomic_inc_return(&ses->chan_seq); + index %= ses->chan_count; + s = ses->chans[index].server; + + /* + * Checking server credits is racy, but getting a slightly + * stale value should not be an issue here + */ + if (s->credits <= CIFS_CHANNEL_MIN_CREDITS) { + uint i; + + cifs_dbg(VFS, "cannot pick conn_id=0x%llx not enough credits (%u)", + s->conn_id, + s->credits); + + /* + * Look at all other channels starting from the next + * one and pick first possible channel. + */ + for (i = 1; i < ses->chan_count; i++) { + s = ses->chans[(index+i) % ses->chan_count].server; + if (s->credits > CIFS_CHANNEL_MIN_CREDITS) + return s; + } } + + /* + * If none are possible, keep the initially picked one, but + * later on it will block to wait for credits or fail. + */ + return ses->chans[index].server; } int