diff mbox series

[5/9] x86/IRQ: fix locking around vector management

Message ID 5CC6DF0F0200007800229EBC@prv1-mh.provo.novell.com (mailing list archive)
State Superseded
Headers show
Series x86: IRQ management adjustments | expand

Commit Message

Jan Beulich April 29, 2019, 11:25 a.m. UTC
All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
fields, and hence ought to be called with the descriptor lock held in
addition to vector_lock. This is currently the case for only
set_desc_affinity() and destroy_irq(), which also clarifies what the
nesting behavior between the locks has to be. Reflect the new
expectation by having these functions all take a descriptor as
parameter instead of an interrupt number.

Drop one of the two leading underscores from all three functions at
the same time.

There's one case left where descriptors get manipulated with just
vector_lock held: setup_vector_irq() assumes its caller to acquire
vector_lock, and hence can't itself acquire the descriptor locks (wrong
lock order). I don't currently see how to address this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Roger Pau Monné May 6, 2019, 11:48 a.m. UTC | #1
On Mon, Apr 29, 2019 at 05:25:03AM -0600, Jan Beulich wrote:
> All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
> fields, and hence ought to be called with the descriptor lock held in
> addition to vector_lock. This is currently the case for only
> set_desc_affinity() and destroy_irq(), which also clarifies what the

AFAICT set_desc_affinity is called from set_ioapic_affinity_irq which in
turn is called from setup_ioapic_dest without holding the desc lock.
Is this fine because that's only used a boot time?

> nesting behavior between the locks has to be. Reflect the new
> expectation by having these functions all take a descriptor as
> parameter instead of an interrupt number.
> 
> Drop one of the two leading underscores from all three functions at
> the same time.
> 
> There's one case left where descriptors get manipulated with just
> vector_lock held: setup_vector_irq() assumes its caller to acquire
> vector_lock, and hence can't itself acquire the descriptor locks (wrong
> lock order). I don't currently see how to address this.

Can you take the desc lock and vector lock for each irq in the second
loop of setup_vector_irq and remove the vector locking from the caller?

That might be inefficient, but it's just done for CPU initialization.

AFAICT the first loop of setup_vector_irq doesn't require any locking
since it's per-cpu initialization.

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Change looks good to me:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> 
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -27,6 +27,7 @@
>  #include <public/physdev.h>
>  
>  static int parse_irq_vector_map_param(const char *s);
> +static void _clear_irq_vector(struct irq_desc *desc);
>  
>  /* opt_noirqbalance: If true, software IRQ balancing/affinity is disabled. */
>  bool __read_mostly opt_noirqbalance;
> @@ -112,13 +113,12 @@ static void trace_irq_mask(u32 event, in
>      trace_var(event, 1, sizeof(d), &d);
>  }
>  
> -static int __init __bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
> +static int __init _bind_irq_vector(struct irq_desc *desc, int vector,

I wouldn't be opposed to adding ASSERTs here (and in
_{assign,bind,clear}_irq_vector, set_desc_affinity and destroy_irq)
to check for lock correctness.

Thanks, Roger.
Jan Beulich May 6, 2019, 1:06 p.m. UTC | #2
>>> On 06.05.19 at 13:48, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 05:25:03AM -0600, Jan Beulich wrote:
>> All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
>> fields, and hence ought to be called with the descriptor lock held in
>> addition to vector_lock. This is currently the case for only
>> set_desc_affinity() and destroy_irq(), which also clarifies what the
> 
> AFAICT set_desc_affinity is called from set_ioapic_affinity_irq which in
> turn is called from setup_ioapic_dest without holding the desc lock.
> Is this fine because that's only used a boot time?

No, this isn't fine, and it's also not only called at boot time. I
simply didn't spot this case of function re-use - I had come to
the conclusion that all calls to set_desc_affinity() would come
through the .set_affinity hook pointers (or happen sufficiently
early).

VT-d's adjust_irq_affinity() has a similar issue.

At boot time alone would be insufficient anyway. Not taking
locks can only be safe prior to bringing up APs; any later
skipping of locking would at least require additional justification.

>> nesting behavior between the locks has to be. Reflect the new
>> expectation by having these functions all take a descriptor as
>> parameter instead of an interrupt number.
>> 
>> Drop one of the two leading underscores from all three functions at
>> the same time.
>> 
>> There's one case left where descriptors get manipulated with just
>> vector_lock held: setup_vector_irq() assumes its caller to acquire
>> vector_lock, and hence can't itself acquire the descriptor locks (wrong
>> lock order). I don't currently see how to address this.
> 
> Can you take the desc lock and vector lock for each irq in the second
> loop of setup_vector_irq and remove the vector locking from the caller?
> 
> That might be inefficient, but it's just done for CPU initialization.
> 
> AFAICT the first loop of setup_vector_irq doesn't require any locking
> since it's per-cpu initialization.

It's not so much the first lock afaict. It's the combined action of
calling this function and setting the online bit which needs the
lock held around it. I.e. the function setting bits in various
descriptors' CPU masks (and the tracking of the vector -> IRQ
relationships) has to be atomic (to the outside world) with the
setting of the CPU's bit in cpu_online_map.

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Change looks good to me:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, but I'll not add this for now, given the further locking to
be added as per above.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -27,6 +27,7 @@ 
 #include <public/physdev.h>
 
 static int parse_irq_vector_map_param(const char *s);
+static void _clear_irq_vector(struct irq_desc *desc);
 
 /* opt_noirqbalance: If true, software IRQ balancing/affinity is disabled. */
 bool __read_mostly opt_noirqbalance;
@@ -112,13 +113,12 @@  static void trace_irq_mask(u32 event, in
     trace_var(event, 1, sizeof(d), &d);
 }
 
-static int __init __bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
+static int __init _bind_irq_vector(struct irq_desc *desc, int vector,
+                                   const cpumask_t *cpu_mask)
 {
     cpumask_t online_mask;
     int cpu;
-    struct irq_desc *desc = irq_to_desc(irq);
 
-    BUG_ON((unsigned)irq >= nr_irqs);
     BUG_ON((unsigned)vector >= NR_VECTORS);
 
     cpumask_and(&online_mask, cpu_mask, &cpu_online_map);
@@ -129,9 +129,9 @@  static int __init __bind_irq_vector(int
         return 0;
     if ( desc->arch.vector != IRQ_VECTOR_UNASSIGNED )
         return -EBUSY;
-    trace_irq_mask(TRC_HW_IRQ_BIND_VECTOR, irq, vector, &online_mask);
+    trace_irq_mask(TRC_HW_IRQ_BIND_VECTOR, desc->irq, vector, &online_mask);
     for_each_cpu(cpu, &online_mask)
-        per_cpu(vector_irq, cpu)[vector] = irq;
+        per_cpu(vector_irq, cpu)[vector] = desc->irq;
     desc->arch.vector = vector;
     cpumask_copy(desc->arch.cpu_mask, &online_mask);
     if ( desc->arch.used_vectors )
@@ -145,12 +145,18 @@  static int __init __bind_irq_vector(int
 
 int __init bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
 {
+    struct irq_desc *desc = irq_to_desc(irq);
     unsigned long flags;
     int ret;
 
-    spin_lock_irqsave(&vector_lock, flags);
-    ret = __bind_irq_vector(irq, vector, cpu_mask);
-    spin_unlock_irqrestore(&vector_lock, flags);
+    BUG_ON((unsigned)irq >= nr_irqs);
+
+    spin_lock_irqsave(&desc->lock, flags);
+    spin_lock(&vector_lock);
+    ret = _bind_irq_vector(desc, vector, cpu_mask);
+    spin_unlock(&vector_lock);
+    spin_unlock_irqrestore(&desc->lock, flags);
+
     return ret;
 }
 
@@ -235,7 +241,9 @@  void destroy_irq(unsigned int irq)
 
     spin_lock_irqsave(&desc->lock, flags);
     desc->handler = &no_irq_type;
-    clear_irq_vector(irq);
+    spin_lock(&vector_lock);
+    _clear_irq_vector(desc);
+    spin_unlock(&vector_lock);
     desc->arch.used_vectors = NULL;
     spin_unlock_irqrestore(&desc->lock, flags);
 
@@ -256,11 +264,11 @@  static void release_old_vec(struct irq_d
     }
 }
 
-static void __clear_irq_vector(int irq)
+static void _clear_irq_vector(struct irq_desc *desc)
 {
-    int cpu, vector, old_vector;
+    unsigned int cpu;
+    int vector, old_vector, irq = desc->irq;
     cpumask_t tmp_mask;
-    struct irq_desc *desc = irq_to_desc(irq);
 
     BUG_ON(!desc->arch.vector);
 
@@ -306,11 +314,14 @@  static void __clear_irq_vector(int irq)
 
 void clear_irq_vector(int irq)
 {
+    struct irq_desc *desc = irq_to_desc(irq);
     unsigned long flags;
 
-    spin_lock_irqsave(&vector_lock, flags);
-    __clear_irq_vector(irq);
-    spin_unlock_irqrestore(&vector_lock, flags);
+    spin_lock_irqsave(&desc->lock, flags);
+    spin_lock(&vector_lock);
+    _clear_irq_vector(desc);
+    spin_unlock(&vector_lock);
+    spin_unlock_irqrestore(&desc->lock, flags);
 }
 
 int irq_to_vector(int irq)
@@ -445,8 +456,7 @@  static vmask_t *irq_get_used_vector_mask
     return ret;
 }
 
-static int __assign_irq_vector(
-    int irq, struct irq_desc *desc, const cpumask_t *mask)
+static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
 {
     /*
      * NOTE! The local APIC isn't very good at handling
@@ -460,7 +470,8 @@  static int __assign_irq_vector(
      * 0x80, because int 0x80 is hm, kind of importantish. ;)
      */
     static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0;
-    int cpu, err, old_vector;
+    unsigned int cpu;
+    int err, old_vector, irq = desc->irq;
     cpumask_t tmp_mask;
     vmask_t *irq_used_vectors = NULL;
 
@@ -570,8 +581,12 @@  int assign_irq_vector(int irq, const cpu
     
     BUG_ON(irq >= nr_irqs || irq <0);
 
-    spin_lock_irqsave(&vector_lock, flags);
-    ret = __assign_irq_vector(irq, desc, mask ?: TARGET_CPUS);
+    spin_lock_irqsave(&desc->lock, flags);
+
+    spin_lock(&vector_lock);
+    ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
+    spin_unlock(&vector_lock);
+
     if ( !ret )
     {
         ret = desc->arch.vector;
@@ -580,7 +595,8 @@  int assign_irq_vector(int irq, const cpu
         else
             cpumask_setall(desc->affinity);
     }
-    spin_unlock_irqrestore(&vector_lock, flags);
+
+    spin_unlock_irqrestore(&desc->lock, flags);
 
     return ret;
 }
@@ -754,7 +770,6 @@  void irq_complete_move(struct irq_desc *
 
 unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
 {
-    unsigned int irq;
     int ret;
     unsigned long flags;
     cpumask_t dest_mask;
@@ -762,10 +777,8 @@  unsigned int set_desc_affinity(struct ir
     if (!cpumask_intersects(mask, &cpu_online_map))
         return BAD_APICID;
 
-    irq = desc->irq;
-
     spin_lock_irqsave(&vector_lock, flags);
-    ret = __assign_irq_vector(irq, desc, mask);
+    ret = _assign_irq_vector(desc, mask);
     spin_unlock_irqrestore(&vector_lock, flags);
 
     if (ret < 0)