Message ID | 20230106082136.68501-1-zouyipeng@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] irqchip/gic-v3: wait irq done to set affinity | expand |
On Fri, 06 Jan 2023 08:21:36 +0000, Yipeng Zou <zouyipeng@huawei.com> wrote: > > Recently we have some problem about gic set affinity in our test. > > This patch just aim to make some discuss about this problem. > > For now, the implementation of gic set affinity going to take effects > immediately, and without check if any irq are being processed. > > So, This leads to some problem, think about this scenario: > > 1. First, we have an irq was generated by an device. > > 2. In the processing of this irq(after handle event, before clear > IRQD_IRQ_INPROGRESS flag), we modify the route and the gic takes effect > immediately,at the same time the new one was generated again. How is that possible? If it is affected by GICD_IROUTERn (as your patch suggests), then it is a SPI. If it is a SPI, it has an active state. Which means it cannot fire again without a deactivation (EOI if EOImode=0, EOI+DIR if EOImode=1) having taken place. So either something has deactivated the interrupt without masking it beforehand, or the active state is not honoured. Either way, this is wrong. > > 3. The new irq will be processing in other cpu which different form the > old one. > > 4. The new irq going to be discarded because of the flag IRQD_IRQ_INPROGRESS > has been set. > > I notice that if we set IRQF_ONESHOT when register the irq, this problem > will gone. > > But I'm also thinking about change the gic_set_affinity function, to wait > current irq done on all cpus before gic_write_irouter. > I'm not sure if that's appropriate. The base architecture should guarantee that this is not a problem, thanks to the active state. If that was a LPI (which do not have an active state), that'd be a different problem. But this doesn't seem to be the case here. I'm afraid to say that what you describe seem like a bug of some sort, either HW or SW. Thanks, M.
在 2023/1/6 19:55, Marc Zyngier 写道: > On Fri, 06 Jan 2023 08:21:36 +0000, > Yipeng Zou <zouyipeng@huawei.com> wrote: >> Recently we have some problem about gic set affinity in our test. >> >> This patch just aim to make some discuss about this problem. >> >> For now, the implementation of gic set affinity going to take effects >> immediately, and without check if any irq are being processed. >> >> So, This leads to some problem, think about this scenario: >> >> 1. First, we have an irq was generated by an device. >> >> 2. In the processing of this irq(after handle event, before clear >> IRQD_IRQ_INPROGRESS flag), we modify the route and the gic takes effect >> immediately,at the same time the new one was generated again. > How is that possible? > > If it is affected by GICD_IROUTERn (as your patch suggests), then it > is a SPI. If it is a SPI, it has an active state. Which means it > cannot fire again without a deactivation (EOI if EOImode=0, EOI+DIR if > EOImode=1) having taken place. > > So either something has deactivated the interrupt without masking it > beforehand, or the active state is not honoured. Either way, this is > wrong. Yes, agree, There is no possible in SPI case. >> 3. The new irq will be processing in other cpu which different form the >> old one. >> >> 4. The new irq going to be discarded because of the flag IRQD_IRQ_INPROGRESS >> has been set. >> >> I notice that if we set IRQF_ONESHOT when register the irq, this problem >> will gone. >> >> But I'm also thinking about change the gic_set_affinity function, to wait >> current irq done on all cpus before gic_write_irouter. >> I'm not sure if that's appropriate. > The base architecture should guarantee that this is not a problem, > thanks to the active state. If that was a LPI (which do not have an > active state), that'd be a different problem. But this doesn't seem to > be the case here. Hi , Thanks for reply very much. I have rechecked our test. Actually, that was a LPI in out test case. It cause the problem since its_send_movi command. I made a mistake when i modified the code. It should be as follow. Sorry for misleading you. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 973ede0197e3..fad08ccb7fd9 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1667,6 +1667,9 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val, /* don't set the affinity when the target cpu is same as current one */ if (cpu != prev_cpu) { + + // wait irq done on all cpus + target_col = &its_dev->its->collections[cpu]; its_send_movi(its_dev, target_col, id); its_dev->event_map.col_map[id] = cpu > I'm afraid to say that what you describe seem like a bug of some sort, > either HW or SW. > > Thanks, > > M.
On Mon, Jan 09 2023 at 20:26, Yipeng Zou wrote: > 在 2023/1/6 19:55, Marc Zyngier 写道: > index 973ede0197e3..fad08ccb7fd9 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -1667,6 +1667,9 @@ static int its_set_affinity(struct irq_data *d, > const struct cpumask *mask_val, > > /* don't set the affinity when the target cpu is same as > current one */ > if (cpu != prev_cpu) { > + > + // wait irq done on all cpus > + There is no way to wait here. The caller holds the interrupt descriptor lock. If this is really an issue for LPI, then the only way to deal with that is CONFIG_GENERIC_PENDING_IRQ, which delays the affinity change to interrupt context Why on earth must all the known hardware mistakes be repeated over and over? Thanks, tglx
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 997104d4338e..e9b9f15f07f8 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1348,6 +1348,8 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, reg = gic_dist_base(d) + offset + (index * 8); val = gic_mpidr_to_affinity(cpu_logical_map(cpu)); + // wait irq done on all cpus + gic_write_irouter(val, reg); /*
Recently we have some problem about gic set affinity in our test. This patch just aim to make some discuss about this problem. For now, the implementation of gic set affinity going to take effects immediately, and without check if any irq are being processed. So, This leads to some problem, think about this scenario: 1. First, we have an irq was generated by an device. 2. In the processing of this irq(after handle event, before clear IRQD_IRQ_INPROGRESS flag), we modify the route and the gic takes effect immediately,at the same time the new one was generated again. 3. The new irq will be processing in other cpu which different form the old one. 4. The new irq going to be discarded because of the flag IRQD_IRQ_INPROGRESS has been set. I notice that if we set IRQF_ONESHOT when register the irq, this problem will gone. But I'm also thinking about change the gic_set_affinity function, to wait current irq done on all cpus before gic_write_irouter. I'm not sure if that's appropriate. Is the best workaround to use IRQF_ONESHOT to prevent reentrancy? Please let me know, if have any other suggestions on this issue. Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> --- drivers/irqchip/irq-gic-v3.c | 2 ++ 1 file changed, 2 insertions(+)