Message ID | 20200317160510.85914.22202.stgit@awfm-01.aw.intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [for-rc] IB/hfi1: Insure pq is not left on waitlist | expand |
On Tue, Mar 17, 2020 at 12:05:10PM -0400, Dennis Dalessandro wrote: > +static void flush_pq_iowait(struct hfi1_user_sdma_pkt_q *pq) > +{ > + unsigned long flags; > + seqlock_t *lock = pq->busy.lock; > + > + if (!lock) > + return; > + write_seqlock_irqsave(lock, flags); > + if (!list_empty(&pq->busy.list)) { > + list_del_init(&pq->busy.list); > + pq->busy.lock = NULL; > + } > + write_sequnlock_irqrestore(lock, flags); I'm trying to grasp how a seqlock is protecting a list_empty and list_del_init, and this seems.. uh.. insane? The only place that uses seqlock in infiniband is in hfi1 It only calls seqlock_init and write_seqlock Never read_seqlock So, this isn't a seqlock, it is a normal spinlock obfuscated as a seqlock. Please clean this mess too. I don't know what to do with this patch, it might well be functionally right, but everything about reading it screams wrong wrong wrong. I don't want to send it to Linus in -rc like this. At least add something to the commit message asking temporary forgiveness for this madness. Also s/insure/ensure/, right? Jason
> Subject: Re: [PATCH for-rc] IB/hfi1: Insure pq is not left on waitlist > > The only place that uses seqlock in infiniband is in hfi1 > > It only calls seqlock_init and write_seqlock > > Never read_seqlock The sdma code uses read_seqbegin() and read_seq_retry() to avoid the spin that is in that is in read_seqlock(). The two calls together allow for detecting a race where the interrupt handler detects if the base level submit routines have enqueued to a waiter list due to a descriptor shortage concurrently with the this interrupt handler. The full write_seqlock() is gotten when the list is not empty and the req_seq_retry() detects when a list entry might have been added. SDMA interrupts frequently encounter no waiters, so the lock only slows down the interrupt handler. See sdma_desc_avail() for details. > Please clean this mess too. The APIs associated with SDMA and iowait are pretty loose and we will clean the up in a subsequent patch series. The nature of the locking should not bleed out to the client code of SDMA. We will adjust the commit message to indicate this. > Also s/insure/ensure/, right? Will Fix. Mike
On Thu, Mar 19, 2020 at 09:46:54PM +0000, Marciniszyn, Mike wrote: > > Subject: Re: [PATCH for-rc] IB/hfi1: Insure pq is not left on waitlist > > > > The only place that uses seqlock in infiniband is in hfi1 > > > > It only calls seqlock_init and write_seqlock > > > > Never read_seqlock > > The sdma code uses read_seqbegin() and read_seq_retry() to avoid the spin > that is in that is in read_seqlock(). Hm, I see.. I did not find these uses when I was grepping, but now I'm even less happy with this :( > The two calls together allow for detecting a race where the > interrupt handler detects if the base level submit routines > have enqueued to a waiter list due to a descriptor shortage > concurrently with the this interrupt handler. You can't use read seqlock to protect a linked list when the write side is doing list_del. It is just wrong. > The full write_seqlock() is gotten when the list is not empty and the > req_seq_retry() detects when a list entry might have been added. A write side inside a read_side? It is maddness. > SDMA interrupts frequently encounter no waiters, so the lock only slows > down the interrupt handler. So, if you don't care about the race with adding then just use list_empty with no lock and then a normal spin lock All this readlock stuff doesn't remove any races. > > Please clean this mess too. > > The APIs associated with SDMA and iowait are pretty loose and we > will clean the up in a subsequent patch series. The nature of the locking > should not bleed out to the client code of SDMA. We will adjust the > commit message to indicate this. So what is the explanation here? This uses a write seqlock for a linked list but it is OK because nothing uses the read side except to do list_empty, which is unnecessary, and will be fixed later? Jason
> Subject: Re: [PATCH for-rc] IB/hfi1: Insure pq is not left on waitlist > > On Thu, Mar 19, 2020 at 09:46:54PM +0000, Marciniszyn, Mike wrote: > > > Subject: Re: [PATCH for-rc] IB/hfi1: Insure pq is not left on waitlist > > > > > > The only place that uses seqlock in infiniband is in hfi1 > > > > > > It only calls seqlock_init and write_seqlock > > > > > > Never read_seqlock > > > > The sdma code uses read_seqbegin() and read_seq_retry() to avoid the > spin > > that is in that is in read_seqlock(). > > Hm, I see.. I did not find these uses when I was grepping, but now I'm > even less happy with this :( > > > The two calls together allow for detecting a race where the > > interrupt handler detects if the base level submit routines > > have enqueued to a waiter list due to a descriptor shortage > > concurrently with the this interrupt handler. > > You can't use read seqlock to protect a linked list when the write > side is doing list_del. It is just wrong. > It is not actually doing that. The lock only protects the list_empty(). > > The full write_seqlock() is gotten when the list is not empty and the > > req_seq_retry() detects when a list entry might have been added. > > A write side inside a read_side? It is maddness. > > > SDMA interrupts frequently encounter no waiters, so the lock only slows > > down the interrupt handler. > > So, if you don't care about the race with adding then just use > list_empty with no lock and then a normal spin lock > So are you suggesting the list_empty() can be uncontrolled? Perhaps list_empty_careful() is a better choice? > All this readlock stuff doesn't remove any races. > > > > Please clean this mess too. > > > > The APIs associated with SDMA and iowait are pretty loose and we > > will clean the up in a subsequent patch series. The nature of the locking > > should not bleed out to the client code of SDMA. We will adjust the > > commit message to indicate this. > > So what is the explanation here? This uses a write seqlock for a > linked list but it is OK because nothing uses the read side except to > do list_empty, which is unnecessary, and will be fixed later? > I suggest we fix the bug and submit a follow-up to clean the locking up and the open coding. The patch footprint would probably be too large for stable as a single patch. Mike
On Fri, Mar 20, 2020 at 12:26:32AM +0000, Marciniszyn, Mike wrote: > > Subject: Re: [PATCH for-rc] IB/hfi1: Insure pq is not left on waitlist > > > > On Thu, Mar 19, 2020 at 09:46:54PM +0000, Marciniszyn, Mike wrote: > > > > Subject: Re: [PATCH for-rc] IB/hfi1: Insure pq is not left on waitlist > > > > > > > > The only place that uses seqlock in infiniband is in hfi1 > > > > > > > > It only calls seqlock_init and write_seqlock > > > > > > > > Never read_seqlock > > > > > > The sdma code uses read_seqbegin() and read_seq_retry() to avoid the > > spin > > > that is in that is in read_seqlock(). > > > > Hm, I see.. I did not find these uses when I was grepping, but now I'm > > even less happy with this :( > > > > > The two calls together allow for detecting a race where the > > > interrupt handler detects if the base level submit routines > > > have enqueued to a waiter list due to a descriptor shortage > > > concurrently with the this interrupt handler. > > > > You can't use read seqlock to protect a linked list when the write > > side is doing list_del. It is just wrong. > > > > It is not actually doing that. The lock only protects the list_empty(). Which is now running concurrently with list_del - fortunately list_empty() is safe to run unlocked. > > > The full write_seqlock() is gotten when the list is not empty and the > > > req_seq_retry() detects when a list entry might have been added. > > > > A write side inside a read_side? It is maddness. > > > > > SDMA interrupts frequently encounter no waiters, so the lock only slows > > > down the interrupt handler. > > > > So, if you don't care about the race with adding then just use > > list_empty with no lock and then a normal spin lock > > > > So are you suggesting the list_empty() can be uncontrolled? Yes. list_empty() is defined to work for RCU readers, so it is safe to call unlocked. > Perhaps list_empty_careful() is a better choice? The comments for this say it is not safe if list_add is happening concurrently. list_empty has a single concurrent safe READ_ONCE. > > > > Please clean this mess too. > > > > > > The APIs associated with SDMA and iowait are pretty loose and we > > > will clean the up in a subsequent patch series. The nature of the locking > > > should not bleed out to the client code of SDMA. We will adjust the > > > commit message to indicate this. > > > > So what is the explanation here? This uses a write seqlock for a > > linked list but it is OK because nothing uses the read side except to > > do list_empty, which is unnecessary, and will be fixed later? > > I suggest we fix the bug and submit a follow-up to clean the locking up and > the open coding. Yes, but I still can't send this to Linus without a commit message explaining why it is like this. Like I say, protecting list calls with seqlock does not make any sense. > The patch footprint would probably be too large for stable as a single patch. Yes Jason
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c index c2f0d9b..13e4203 100644 --- a/drivers/infiniband/hw/hfi1/user_sdma.c +++ b/drivers/infiniband/hw/hfi1/user_sdma.c @@ -141,6 +141,7 @@ static int defer_packet_queue( */ xchg(&pq->state, SDMA_PKT_Q_DEFERRED); if (list_empty(&pq->busy.list)) { + pq->busy.lock = &sde->waitlock; iowait_get_priority(&pq->busy); iowait_queue(pkts_sent, &pq->busy, &sde->dmawait); } @@ -155,6 +156,7 @@ static void activate_packet_queue(struct iowait *wait, int reason) { struct hfi1_user_sdma_pkt_q *pq = container_of(wait, struct hfi1_user_sdma_pkt_q, busy); + pq->busy.lock = NULL; xchg(&pq->state, SDMA_PKT_Q_ACTIVE); wake_up(&wait->wait_dma); }; @@ -256,6 +258,21 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt, return ret; } +static void flush_pq_iowait(struct hfi1_user_sdma_pkt_q *pq) +{ + unsigned long flags; + seqlock_t *lock = pq->busy.lock; + + if (!lock) + return; + write_seqlock_irqsave(lock, flags); + if (!list_empty(&pq->busy.list)) { + list_del_init(&pq->busy.list); + pq->busy.lock = NULL; + } + write_sequnlock_irqrestore(lock, flags); +} + int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd, struct hfi1_ctxtdata *uctxt) { @@ -281,6 +298,7 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd, kfree(pq->reqs); kfree(pq->req_in_use); kmem_cache_destroy(pq->txreq_cache); + flush_pq_iowait(pq); kfree(pq); } else { spin_unlock(&fd->pq_rcu_lock); @@ -587,11 +605,12 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd, if (ret < 0) { if (ret != -EBUSY) goto free_req; - wait_event_interruptible_timeout( + if (wait_event_interruptible_timeout( pq->busy.wait_dma, - (pq->state == SDMA_PKT_Q_ACTIVE), + pq->state == SDMA_PKT_Q_ACTIVE, msecs_to_jiffies( - SDMA_IOWAIT_TIMEOUT)); + SDMA_IOWAIT_TIMEOUT)) <= 0) + flush_pq_iowait(pq); } } *count += idx;