Message ID | 1ee5ce7818f9d45c9713ce99e810cb84f50dcf03.1552907276.git.lorenzo@kernel.org (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Felix Fietkau |
Headers | show |
Series | [RFC] mt76: usb: reduce locking in mt76u_tx_tasklet | expand |
On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote: > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since > q->head is managed just in mt76u_tx_tasklet and q->queued is updated > holding q->lock > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- > drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c > index ac03acdae279..8cd70c32d77a 100644 > --- a/drivers/net/wireless/mediatek/mt76/usb.c > +++ b/drivers/net/wireless/mediatek/mt76/usb.c > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) > int i; > > for (i = 0; i < IEEE80211_NUM_ACS; i++) { > + u32 n_queued = 0, n_sw_queued = 0; > + > sq = &dev->q_tx[i]; > q = sq->q; > > - spin_lock_bh(&q->lock); > - while (true) { > + while (q->queued > n_queued) { > buf = &q->entry[q->head].ubuf; > - if (!buf->done || !q->queued) > + if (!buf->done) > break; I'm still thinking if this is safe or not. Is somewhat tricky to read variable outside the lock because in such case there is no time guarantee when variable written on one CPU gets updated value on different CPU. And for USB is not only q->queued but also buf->done. Stanislaw
> On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote: > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated > > holding q->lock > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > --- > > drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c > > index ac03acdae279..8cd70c32d77a 100644 > > --- a/drivers/net/wireless/mediatek/mt76/usb.c > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) > > int i; > > > > for (i = 0; i < IEEE80211_NUM_ACS; i++) { > > + u32 n_queued = 0, n_sw_queued = 0; > > + > > sq = &dev->q_tx[i]; > > q = sq->q; > > > > - spin_lock_bh(&q->lock); > > - while (true) { > > + while (q->queued > n_queued) { > > buf = &q->entry[q->head].ubuf; > > - if (!buf->done || !q->queued) > > + if (!buf->done) > > break; > > I'm still thinking if this is safe or not. Is somewhat tricky to > read variable outside the lock because in such case there is no time > guarantee when variable written on one CPU gets updated value on > different CPU. And for USB is not only q->queued but also buf->done. Hi Stanislaw, I was wondering if this is safe as well, but q->queued is updated holding q->lock and I guess it will ensure to not overlap tx and status code path. Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx Regards, Lorenzo > > Stanislaw >
On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote: > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote: > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated > > > holding q->lock > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > > --- > > > drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c > > > index ac03acdae279..8cd70c32d77a 100644 > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) > > > int i; > > > > > > for (i = 0; i < IEEE80211_NUM_ACS; i++) { > > > + u32 n_queued = 0, n_sw_queued = 0; > > > + > > > sq = &dev->q_tx[i]; > > > q = sq->q; > > > > > > - spin_lock_bh(&q->lock); > > > - while (true) { > > > + while (q->queued > n_queued) { > > > buf = &q->entry[q->head].ubuf; > > > - if (!buf->done || !q->queued) > > > + if (!buf->done) > > > break; > > > > I'm still thinking if this is safe or not. Is somewhat tricky to > > read variable outside the lock because in such case there is no time > > guarantee when variable written on one CPU gets updated value on > > different CPU. And for USB is not only q->queued but also buf->done. > > Hi Stanislaw, > > I was wondering if this is safe as well, but q->queued is updated holding q->lock > and I guess it will ensure to not overlap tx and status code path. Overlap will not happen, at worst what can happen is q->queued will be smaller on tx_tasklet than on tx_queue_skb. > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx That's actually a bug, but it's not important, if tx_tasklet will not see updated buf->done <- true value by mt76u_complete_tx on different cpu, it will not complete skb. It will be done on next tx_tasklet iteration. Worse thing would be opposite situation. Stanislaw
> On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote: > > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote: > > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since > > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated > > > > holding q->lock > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > > > --- > > > > drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c > > > > index ac03acdae279..8cd70c32d77a 100644 > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c > > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) > > > > int i; > > > > > > > > for (i = 0; i < IEEE80211_NUM_ACS; i++) { > > > > + u32 n_queued = 0, n_sw_queued = 0; > > > > + > > > > sq = &dev->q_tx[i]; > > > > q = sq->q; > > > > > > > > - spin_lock_bh(&q->lock); > > > > - while (true) { > > > > + while (q->queued > n_queued) { > > > > buf = &q->entry[q->head].ubuf; > > > > - if (!buf->done || !q->queued) > > > > + if (!buf->done) > > > > break; > > > > > > I'm still thinking if this is safe or not. Is somewhat tricky to > > > read variable outside the lock because in such case there is no time > > > guarantee when variable written on one CPU gets updated value on > > > different CPU. And for USB is not only q->queued but also buf->done. > > > > Hi Stanislaw, > > > > I was wondering if this is safe as well, but q->queued is updated holding q->lock > > and I guess it will ensure to not overlap tx and status code path. > > Overlap will not happen, at worst what can happen is q->queued will be > smaller on tx_tasklet than on tx_queue_skb. Yes, that is the point :) > > > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx > > That's actually a bug, but it's not important, if tx_tasklet will not > see updated buf->done <- true value by mt76u_complete_tx on different > cpu, it will not complete skb. It will be done on next tx_tasklet iteration. > Worse thing would be opposite situation. Can this really occur? (since queued is update holding the lock) > > Stanislaw
On Tue, Mar 19, 2019 at 05:23:25PM +0100, Lorenzo Bianconi wrote: > > On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote: > > > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote: > > > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since > > > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated > > > > > holding q->lock > > > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > > > > --- > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- > > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c > > > > > index ac03acdae279..8cd70c32d77a 100644 > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c > > > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) > > > > > int i; > > > > > > > > > > for (i = 0; i < IEEE80211_NUM_ACS; i++) { > > > > > + u32 n_queued = 0, n_sw_queued = 0; > > > > > + > > > > > sq = &dev->q_tx[i]; > > > > > q = sq->q; > > > > > > > > > > - spin_lock_bh(&q->lock); > > > > > - while (true) { > > > > > + while (q->queued > n_queued) { > > > > > buf = &q->entry[q->head].ubuf; > > > > > - if (!buf->done || !q->queued) > > > > > + if (!buf->done) > > > > > break; > > > > > > > > I'm still thinking if this is safe or not. Is somewhat tricky to > > > > read variable outside the lock because in such case there is no time > > > > guarantee when variable written on one CPU gets updated value on > > > > different CPU. And for USB is not only q->queued but also buf->done. > > > > > > Hi Stanislaw, > > > > > > I was wondering if this is safe as well, but q->queued is updated holding q->lock > > > and I guess it will ensure to not overlap tx and status code path. > > > > Overlap will not happen, at worst what can happen is q->queued will be > > smaller on tx_tasklet than on tx_queue_skb. > > Yes, that is the point :) > > > > > > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx > > > > That's actually a bug, but it's not important, if tx_tasklet will not > > see updated buf->done <- true value by mt76u_complete_tx on different > > cpu, it will not complete skb. It will be done on next tx_tasklet iteration. > > Worse thing would be opposite situation. > > Can this really occur? I was thinking about that and yes it can occur. If q->queued and buf->done writes/read will be reordered by CPUs. To prevent that you will need to use smp_wmb/smp_rmb pair, but it's just simpler and more convenient to use lock. > (since queued is update holding the lock) Holding the lock on one thread without holding it on concurrent thread is irrelevant, it's the same as not holding any lock at all. Stanislaw
> On Tue, Mar 19, 2019 at 05:23:25PM +0100, Lorenzo Bianconi wrote: > > > On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote: > > > > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote: > > > > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since > > > > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated > > > > > > holding q->lock > > > > > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > > > > > > --- > > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- > > > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c > > > > > > index ac03acdae279..8cd70c32d77a 100644 > > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c > > > > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) > > > > > > int i; > > > > > > > > > > > > for (i = 0; i < IEEE80211_NUM_ACS; i++) { > > > > > > + u32 n_queued = 0, n_sw_queued = 0; > > > > > > + > > > > > > sq = &dev->q_tx[i]; > > > > > > q = sq->q; > > > > > > > > > > > > - spin_lock_bh(&q->lock); > > > > > > - while (true) { > > > > > > + while (q->queued > n_queued) { > > > > > > buf = &q->entry[q->head].ubuf; > > > > > > - if (!buf->done || !q->queued) > > > > > > + if (!buf->done) > > > > > > break; > > > > > > > > > > I'm still thinking if this is safe or not. Is somewhat tricky to > > > > > read variable outside the lock because in such case there is no time > > > > > guarantee when variable written on one CPU gets updated value on > > > > > different CPU. And for USB is not only q->queued but also buf->done. > > > > > > > > Hi Stanislaw, > > > > > > > > I was wondering if this is safe as well, but q->queued is updated holding q->lock > > > > and I guess it will ensure to not overlap tx and status code path. > > > > > > Overlap will not happen, at worst what can happen is q->queued will be > > > smaller on tx_tasklet than on tx_queue_skb. > > > > Yes, that is the point :) > > > > > > > > > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx > > > > > > That's actually a bug, but it's not important, if tx_tasklet will not > > > see updated buf->done <- true value by mt76u_complete_tx on different > > > cpu, it will not complete skb. It will be done on next tx_tasklet iteration. > > > Worse thing would be opposite situation. > > > > Can this really occur? > I was thinking about that and yes it can occur. If q->queued and > buf->done writes/read will be reordered by CPUs. To prevent that you > will need to use smp_wmb/smp_rmb pair, but it's just simpler and more > convenient to use lock. good point, I will go through it. Regards, Lorenzo > > > (since queued is update holding the lock) > Holding the lock on one thread without holding it on concurrent thread > is irrelevant, it's the same as not holding any lock at all. > > Stanislaw
On 2019-03-21 10:02, Lorenzo Bianconi wrote: >> On Tue, Mar 19, 2019 at 05:23:25PM +0100, Lorenzo Bianconi wrote: >> > > On Tue, Mar 19, 2019 at 01:58:13PM +0100, Lorenzo Bianconi wrote: >> > > > > On Mon, Mar 18, 2019 at 12:09:32PM +0100, Lorenzo Bianconi wrote: >> > > > > > Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since >> > > > > > q->head is managed just in mt76u_tx_tasklet and q->queued is updated >> > > > > > holding q->lock >> > > > > > >> > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> >> > > > > > --- >> > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- >> > > > > > 1 file changed, 11 insertions(+), 7 deletions(-) >> > > > > > >> > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c >> > > > > > index ac03acdae279..8cd70c32d77a 100644 >> > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c >> > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c >> > > > > > @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) >> > > > > > int i; >> > > > > > >> > > > > > for (i = 0; i < IEEE80211_NUM_ACS; i++) { >> > > > > > + u32 n_queued = 0, n_sw_queued = 0; >> > > > > > + >> > > > > > sq = &dev->q_tx[i]; >> > > > > > q = sq->q; >> > > > > > >> > > > > > - spin_lock_bh(&q->lock); >> > > > > > - while (true) { >> > > > > > + while (q->queued > n_queued) { >> > > > > > buf = &q->entry[q->head].ubuf; >> > > > > > - if (!buf->done || !q->queued) >> > > > > > + if (!buf->done) >> > > > > > break; >> > > > > >> > > > > I'm still thinking if this is safe or not. Is somewhat tricky to >> > > > > read variable outside the lock because in such case there is no time >> > > > > guarantee when variable written on one CPU gets updated value on >> > > > > different CPU. And for USB is not only q->queued but also buf->done. >> > > > >> > > > Hi Stanislaw, >> > > > >> > > > I was wondering if this is safe as well, but q->queued is updated holding q->lock >> > > > and I guess it will ensure to not overlap tx and status code path. >> > > >> > > Overlap will not happen, at worst what can happen is q->queued will be >> > > smaller on tx_tasklet than on tx_queue_skb. >> > >> > Yes, that is the point :) >> > >> > > >> > > > Regarding buf->done, it is already updated without holding the lock in mt76u_complete_tx >> > > >> > > That's actually a bug, but it's not important, if tx_tasklet will not >> > > see updated buf->done <- true value by mt76u_complete_tx on different >> > > cpu, it will not complete skb. It will be done on next tx_tasklet iteration. >> > > Worse thing would be opposite situation. >> > >> > Can this really occur? >> I was thinking about that and yes it can occur. If q->queued and >> buf->done writes/read will be reordered by CPUs. To prevent that you >> will need to use smp_wmb/smp_rmb pair, but it's just simpler and more >> convenient to use lock. > > good point, I will go through it. Another simple solution would be to set buf->done = false in mt76u_tx_tasklet after tx_complete_skb instead of doing it at enqueue time. - Felix
diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c index ac03acdae279..8cd70c32d77a 100644 --- a/drivers/net/wireless/mediatek/mt76/usb.c +++ b/drivers/net/wireless/mediatek/mt76/usb.c @@ -634,29 +634,33 @@ static void mt76u_tx_tasklet(unsigned long data) int i; for (i = 0; i < IEEE80211_NUM_ACS; i++) { + u32 n_queued = 0, n_sw_queued = 0; + sq = &dev->q_tx[i]; q = sq->q; - spin_lock_bh(&q->lock); - while (true) { + while (q->queued > n_queued) { buf = &q->entry[q->head].ubuf; - if (!buf->done || !q->queued) + if (!buf->done) break; if (q->entry[q->head].schedule) { q->entry[q->head].schedule = false; - sq->swq_queued--; + n_sw_queued++; } entry = q->entry[q->head]; q->head = (q->head + 1) % q->ndesc; - q->queued--; + n_queued++; - spin_unlock_bh(&q->lock); dev->drv->tx_complete_skb(dev, i, &entry); - spin_lock_bh(&q->lock); } + spin_lock_bh(&q->lock); + + sq->swq_queued -= n_sw_queued; + q->queued -= n_queued; + wake = q->stopped && q->queued < q->ndesc - 8; if (wake) q->stopped = false;
Similar to pci counterpart, reduce locking in mt76u_tx_tasklet since q->head is managed just in mt76u_tx_tasklet and q->queued is updated holding q->lock Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> --- drivers/net/wireless/mediatek/mt76/usb.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)