Message ID | 1558945891-3015-10-git-send-email-chao.gao@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | improve late microcode loading | expand |
On Mon, May 27, 2019 at 04:31:30PM +0800, Chao Gao wrote: > microcode_update_lock is to prevent logic threads of a same core from > updating microcode at the same time. But due to using a global lock, it > also prevented parallel microcode updating on different cores. Oh, OK, so that's what I was missing from patch 8 and what serializes the updating. > Remove this lock in order to update microcode in parallel. It is safe > because we have already ensured serialization of sibling threads at the > caller side. Then you certainly need to fix the wait loop in do_microcode_update to only wait for MICROCODE_UPDATE_TIMEOUT_US regardless of the number of CPUs in the system? > 1.For late microcode update, do_microcode_update() ensures that only one > sibiling thread of a core can update microcode. > 2.For microcode update during system startup or CPU-hotplug, > microcode_mutex() guarantees update serialization of logical threads. > 3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and > late microcode update. > > Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD > only) are still processed sequentially. > > Signed-off-by: Chao Gao <chao.gao@intel.com> Patch LGTM, but it needs to fix the wait loop in do_microcode_update. Thanks, Roger.
>>> On 27.05.19 at 10:31, <chao.gao@intel.com> wrote: > microcode_update_lock is to prevent logic threads of a same core from > updating microcode at the same time. But due to using a global lock, it > also prevented parallel microcode updating on different cores. > > Remove this lock in order to update microcode in parallel. It is safe > because we have already ensured serialization of sibling threads at the > caller side. > 1.For late microcode update, do_microcode_update() ensures that only one > sibiling thread of a core can update microcode. > 2.For microcode update during system startup or CPU-hotplug, > microcode_mutex() guarantees update serialization of logical threads. > 3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and > late microcode update. > > Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD > only) are still processed sequentially. > > Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> > --- > Changes in v7: > - reworked. Remove complex lock logics introduced in v5 and v6. The microcode > patch to be applied is passed as an argument without any global variable. Thus > no lock is added to serialize potential readers/writers. Callers of > apply_microcode() will guarantee the correctness: the patch poninted by the > arguments won't be changed by others. Much better this way indeed. > @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch *patch) > > mc_intel = patch->mc_intel; > > - /* serialize access to the physical write to MSR 0x79 */ > - spin_lock_irqsave(µcode_update_lock, flags); > + BUG_ON(local_irq_is_enabled()); > > /* > * Writeback and invalidate caches before updating microcode to avoid Thinking about it - what happens if we hit an NMI or #MC here? watchdog_disable(), a call to which you add in an earlier patch, doesn't really suppress the generation of NMIs, it only tells the handler not to look at the accumulated statistics. Jan
>>> On 05.06.19 at 16:52, <roger.pau@citrix.com> wrote: > On Mon, May 27, 2019 at 04:31:30PM +0800, Chao Gao wrote: >> microcode_update_lock is to prevent logic threads of a same core from >> updating microcode at the same time. But due to using a global lock, it >> also prevented parallel microcode updating on different cores. > > Oh, OK, so that's what I was missing from patch 8 and what serializes > the updating. > >> Remove this lock in order to update microcode in parallel. It is safe >> because we have already ensured serialization of sibling threads at the >> caller side. > > Then you certainly need to fix the wait loop in do_microcode_update to > only wait for MICROCODE_UPDATE_TIMEOUT_US regardless of the number of > CPUs in the system? Well, no, not exactly. On huge systems it may indeed still take longer than on smaller ones. The way the waiting is coded now (expecting forward progress in every MICROCODE_UPDATE_TIMEOUT_US period, is, I think, acceptable. Plus leaving that logic alone will avoid touching it yet again when introducing serial application policy as an option. Jan
On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: >>>> On 27.05.19 at 10:31, <chao.gao@intel.com> wrote: >> microcode_update_lock is to prevent logic threads of a same core from >> updating microcode at the same time. But due to using a global lock, it >> also prevented parallel microcode updating on different cores. >> >> Remove this lock in order to update microcode in parallel. It is safe >> because we have already ensured serialization of sibling threads at the >> caller side. >> 1.For late microcode update, do_microcode_update() ensures that only one >> sibiling thread of a core can update microcode. >> 2.For microcode update during system startup or CPU-hotplug, >> microcode_mutex() guarantees update serialization of logical threads. >> 3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and >> late microcode update. >> >> Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD >> only) are still processed sequentially. >> >> Signed-off-by: Chao Gao <chao.gao@intel.com> > >Reviewed-by: Jan Beulich <jbeulich@suse.com> Thanks. > >> --- >> Changes in v7: >> - reworked. Remove complex lock logics introduced in v5 and v6. The microcode >> patch to be applied is passed as an argument without any global variable. Thus >> no lock is added to serialize potential readers/writers. Callers of >> apply_microcode() will guarantee the correctness: the patch poninted by the >> arguments won't be changed by others. > >Much better this way indeed. > >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch *patch) >> >> mc_intel = patch->mc_intel; >> >> - /* serialize access to the physical write to MSR 0x79 */ >> - spin_lock_irqsave(µcode_update_lock, flags); >> + BUG_ON(local_irq_is_enabled()); >> >> /* >> * Writeback and invalidate caches before updating microcode to avoid > >Thinking about it - what happens if we hit an NMI or #MC here? >watchdog_disable(), a call to which you add in an earlier patch, >doesn't really suppress the generation of NMIs, it only tells the >handler not to look at the accumulated statistics. I think they should be suppressed. Ashok, could you confirm it? I will figure out how to suppress them in Xen. Thanks Chao
>>> On 11.06.19 at 14:46, <chao.gao@intel.com> wrote: > On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: >>>>> On 27.05.19 at 10:31, <chao.gao@intel.com> wrote: >>> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch *patch) >>> >>> mc_intel = patch->mc_intel; >>> >>> - /* serialize access to the physical write to MSR 0x79 */ >>> - spin_lock_irqsave(µcode_update_lock, flags); >>> + BUG_ON(local_irq_is_enabled()); >>> >>> /* >>> * Writeback and invalidate caches before updating microcode to avoid >> >>Thinking about it - what happens if we hit an NMI or #MC here? >>watchdog_disable(), a call to which you add in an earlier patch, >>doesn't really suppress the generation of NMIs, it only tells the >>handler not to look at the accumulated statistics. > > I think they should be suppressed. Ashok, could you confirm it? > > I will figure out how to suppress them in Xen. Well, afaik suppressing #MC is impossible. CR4.MCE clear leads to immediate shutdown in case of a machine check, unless I'm mistaken. The watchdog NMI, otoh, could of course be "properly" disabled. Jan
On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote: > On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: > > > >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch *patch) > >> > >> mc_intel = patch->mc_intel; > >> > >> - /* serialize access to the physical write to MSR 0x79 */ > >> - spin_lock_irqsave(µcode_update_lock, flags); > >> + BUG_ON(local_irq_is_enabled()); > >> > >> /* > >> * Writeback and invalidate caches before updating microcode to avoid > > > >Thinking about it - what happens if we hit an NMI or #MC here? > >watchdog_disable(), a call to which you add in an earlier patch, > >doesn't really suppress the generation of NMIs, it only tells the > >handler not to look at the accumulated statistics. > > I think they should be suppressed. Ashok, could you confirm it? I think the only sources would be the watchdog as you pointed out which we already touch to keep it from expiring. The perf counters i'm not an expert in, but i'll check. When we are in stop_machine() type flow, its not clear if any of those would fire. (I might be wrong, but let me check). #MC shouldn't fire once you entered the rendezvous, if it does its usually fatal like a 3strike or something. Machine is going down at that time so nothing we could do to handle. Cheers, Ashok
>>> On 11.06.19 at 18:04, <ashok.raj@intel.com> wrote: > On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote: >> On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: >> > >> >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch > *patch) >> >> >> >> mc_intel = patch->mc_intel; >> >> >> >> - /* serialize access to the physical write to MSR 0x79 */ >> >> - spin_lock_irqsave(µcode_update_lock, flags); >> >> + BUG_ON(local_irq_is_enabled()); >> >> >> >> /* >> >> * Writeback and invalidate caches before updating microcode to avoid >> > >> >Thinking about it - what happens if we hit an NMI or #MC here? >> >watchdog_disable(), a call to which you add in an earlier patch, >> >doesn't really suppress the generation of NMIs, it only tells the >> >handler not to look at the accumulated statistics. >> >> I think they should be suppressed. Ashok, could you confirm it? > > I think the only sources would be the watchdog as you pointed out > which we already touch to keep it from expiring. The perf counters > i'm not an expert in, but i'll check. When we are in stop_machine() type > flow, its not clear if any of those would fire. (I might be wrong, but let > me check). Well, without disarming the watchdog NMI at the LAPIC / IO-APIC, how would it _not_ potentially fire? > #MC shouldn't fire once you entered the rendezvous, if it does its usually > fatal like a 3strike or something. Machine is going down at that time > so nothing we could do to handle. Right - as long as we assume that #MC would be fatal anyway, there's no point in thinking about ways to suppress it. I guess it is fatal (almost) always right now, but I don't think it ought to be. It's just that no-one has the time and environment to make it actually behave better. Jan
On Wed, Jun 12, 2019 at 01:38:31AM -0600, Jan Beulich wrote: >>>> On 11.06.19 at 18:04, <ashok.raj@intel.com> wrote: >> On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote: >>> On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: >>> > >>> >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch >> *patch) >>> >> >>> >> mc_intel = patch->mc_intel; >>> >> >>> >> - /* serialize access to the physical write to MSR 0x79 */ >>> >> - spin_lock_irqsave(µcode_update_lock, flags); >>> >> + BUG_ON(local_irq_is_enabled()); >>> >> >>> >> /* >>> >> * Writeback and invalidate caches before updating microcode to avoid >>> > >>> >Thinking about it - what happens if we hit an NMI or #MC here? >>> >watchdog_disable(), a call to which you add in an earlier patch, >>> >doesn't really suppress the generation of NMIs, it only tells the >>> >handler not to look at the accumulated statistics. >>> >>> I think they should be suppressed. Ashok, could you confirm it? >> >> I think the only sources would be the watchdog as you pointed out >> which we already touch to keep it from expiring. The perf counters >> i'm not an expert in, but i'll check. When we are in stop_machine() type >> flow, its not clear if any of those would fire. (I might be wrong, but let >> me check). > >Well, without disarming the watchdog NMI at the LAPIC / IO-APIC, >how would it _not_ potentially fire? We plan not to prevent NMI being fired. Instead, if one thread of a core is updating microcode, other threads of this core would stop in the handler of NMI until the update completion. Is this approach acceptable? Thanks Chao
>>> On 13.06.19 at 16:05, <chao.gao@intel.com> wrote: > On Wed, Jun 12, 2019 at 01:38:31AM -0600, Jan Beulich wrote: >>>>> On 11.06.19 at 18:04, <ashok.raj@intel.com> wrote: >>> On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote: >>>> On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: >>>> > >>>> >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch >>> *patch) >>>> >> >>>> >> mc_intel = patch->mc_intel; >>>> >> >>>> >> - /* serialize access to the physical write to MSR 0x79 */ >>>> >> - spin_lock_irqsave(µcode_update_lock, flags); >>>> >> + BUG_ON(local_irq_is_enabled()); >>>> >> >>>> >> /* >>>> >> * Writeback and invalidate caches before updating microcode to avoid >>>> > >>>> >Thinking about it - what happens if we hit an NMI or #MC here? >>>> >watchdog_disable(), a call to which you add in an earlier patch, >>>> >doesn't really suppress the generation of NMIs, it only tells the >>>> >handler not to look at the accumulated statistics. >>>> >>>> I think they should be suppressed. Ashok, could you confirm it? >>> >>> I think the only sources would be the watchdog as you pointed out >>> which we already touch to keep it from expiring. The perf counters >>> i'm not an expert in, but i'll check. When we are in stop_machine() type >>> flow, its not clear if any of those would fire. (I might be wrong, but let >>> me check). >> >>Well, without disarming the watchdog NMI at the LAPIC / IO-APIC, >>how would it _not_ potentially fire? > > We plan not to prevent NMI being fired. Instead, if one thread of a core > is updating microcode, other threads of this core would stop in the > handler of NMI until the update completion. Is this approach acceptable? Well, I have to return the question: It is you who knows what is or is not acceptable while an ucode update is in progress. In particular it obviously matters how much ucode is involved in the delivery of an NMI (and in allowing the handler to get to the point where you'd "stop" it). If the approach you suggest is fine for the NMI case, I'd then wonder if it couldn't also be used for the #MC one. Jan
On Thu, Jun 13, 2019 at 08:08:46AM -0600, Jan Beulich wrote: >>>> On 13.06.19 at 16:05, <chao.gao@intel.com> wrote: >> On Wed, Jun 12, 2019 at 01:38:31AM -0600, Jan Beulich wrote: >>>>>> On 11.06.19 at 18:04, <ashok.raj@intel.com> wrote: >>>> On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote: >>>>> On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: >>>>> > >>>>> >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch >>>> *patch) >>>>> >> >>>>> >> mc_intel = patch->mc_intel; >>>>> >> >>>>> >> - /* serialize access to the physical write to MSR 0x79 */ >>>>> >> - spin_lock_irqsave(µcode_update_lock, flags); >>>>> >> + BUG_ON(local_irq_is_enabled()); >>>>> >> >>>>> >> /* >>>>> >> * Writeback and invalidate caches before updating microcode to avoid >>>>> > >>>>> >Thinking about it - what happens if we hit an NMI or #MC here? >>>>> >watchdog_disable(), a call to which you add in an earlier patch, >>>>> >doesn't really suppress the generation of NMIs, it only tells the >>>>> >handler not to look at the accumulated statistics. >>>>> >>>>> I think they should be suppressed. Ashok, could you confirm it? >>>> >>>> I think the only sources would be the watchdog as you pointed out >>>> which we already touch to keep it from expiring. The perf counters >>>> i'm not an expert in, but i'll check. When we are in stop_machine() type >>>> flow, its not clear if any of those would fire. (I might be wrong, but let >>>> me check). >>> >>>Well, without disarming the watchdog NMI at the LAPIC / IO-APIC, >>>how would it _not_ potentially fire? >> >> We plan not to prevent NMI being fired. Instead, if one thread of a core >> is updating microcode, other threads of this core would stop in the >> handler of NMI until the update completion. Is this approach acceptable? > >Well, I have to return the question: It is you who knows what is or >is not acceptable while an ucode update is in progress. In particular >it obviously matters how much ucode is involved in the delivery of >an NMI (and in allowing the handler to get to the point where you'd >"stop" it). > >If the approach you suggest is fine for the NMI case, Yes. It is fine. It is a suggestion from Ashok and what he is working on in linux kernel. I just wanted to make sure you didn't oppose this approach in Xen (considering disarming watchdog NMI might be an alternative). >I'd then wonder if it couldn't also be used for the #MC one. I think no much pratical value for #MC because we still need to wait for the callin of all threads. But as you and Ashok said, #MC is usually fatal and machine goes down anyway. Thanks Chao
On Thu, Jun 13, 2019 at 08:08:46AM -0600, Jan Beulich wrote: > >>> On 13.06.19 at 16:05, <chao.gao@intel.com> wrote: > > On Wed, Jun 12, 2019 at 01:38:31AM -0600, Jan Beulich wrote: > >>>>> On 11.06.19 at 18:04, <ashok.raj@intel.com> wrote: > >>> On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote: > >>>> On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: > >>>> > > >>>> >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch > >>> *patch) > >>>> >> > >>>> >> mc_intel = patch->mc_intel; > >>>> >> > >>>> >> - /* serialize access to the physical write to MSR 0x79 */ > >>>> >> - spin_lock_irqsave(µcode_update_lock, flags); > >>>> >> + BUG_ON(local_irq_is_enabled()); > >>>> >> > >>>> >> /* > >>>> >> * Writeback and invalidate caches before updating microcode to avoid > >>>> > > >>>> >Thinking about it - what happens if we hit an NMI or #MC here? > >>>> >watchdog_disable(), a call to which you add in an earlier patch, > >>>> >doesn't really suppress the generation of NMIs, it only tells the > >>>> >handler not to look at the accumulated statistics. > >>>> > >>>> I think they should be suppressed. Ashok, could you confirm it? > >>> > >>> I think the only sources would be the watchdog as you pointed out > >>> which we already touch to keep it from expiring. The perf counters > >>> i'm not an expert in, but i'll check. When we are in stop_machine() type > >>> flow, its not clear if any of those would fire. (I might be wrong, but let > >>> me check). > >> > >>Well, without disarming the watchdog NMI at the LAPIC / IO-APIC, > >>how would it _not_ potentially fire? > > > > We plan not to prevent NMI being fired. Instead, if one thread of a core > > is updating microcode, other threads of this core would stop in the > > handler of NMI until the update completion. Is this approach acceptable? > > Well, I have to return the question: It is you who knows what is or > is not acceptable while an ucode update is in progress. In particular > it obviously matters how much ucode is involved in the delivery of > an NMI (and in allowing the handler to get to the point where you'd > "stop" it). > > If the approach you suggest is fine for the NMI case, I'd then wonder > if it couldn't also be used for the #MC one. Architecturally only one #MC can be active in the system. If a new #MC condition happens when MCG_STATUS.MCIP is already set, that would cause spontaneous shutdown. If another NMI arrives on the CPU doing the wrmsr, it will be pended in the lapic and delivered after the wrmsr returns. wrmsr flow can't be interrupted.
>>> On 13.06.19 at 19:47, <ashok.raj@intel.com> wrote: > On Thu, Jun 13, 2019 at 08:08:46AM -0600, Jan Beulich wrote: >> >>> On 13.06.19 at 16:05, <chao.gao@intel.com> wrote: >> > On Wed, Jun 12, 2019 at 01:38:31AM -0600, Jan Beulich wrote: >> >>>>> On 11.06.19 at 18:04, <ashok.raj@intel.com> wrote: >> >>> On Tue, Jun 11, 2019 at 08:46:04PM +0800, Chao Gao wrote: >> >>>> On Wed, Jun 05, 2019 at 08:53:46AM -0600, Jan Beulich wrote: >> >>>> > >> >>>> >> @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch > >> >>> *patch) >> >>>> >> >> >>>> >> mc_intel = patch->mc_intel; >> >>>> >> >> >>>> >> - /* serialize access to the physical write to MSR 0x79 */ >> >>>> >> - spin_lock_irqsave(µcode_update_lock, flags); >> >>>> >> + BUG_ON(local_irq_is_enabled()); >> >>>> >> >> >>>> >> /* >> >>>> >> * Writeback and invalidate caches before updating microcode to avoid >> >>>> > >> >>>> >Thinking about it - what happens if we hit an NMI or #MC here? >> >>>> >watchdog_disable(), a call to which you add in an earlier patch, >> >>>> >doesn't really suppress the generation of NMIs, it only tells the >> >>>> >handler not to look at the accumulated statistics. >> >>>> >> >>>> I think they should be suppressed. Ashok, could you confirm it? >> >>> >> >>> I think the only sources would be the watchdog as you pointed out >> >>> which we already touch to keep it from expiring. The perf counters >> >>> i'm not an expert in, but i'll check. When we are in stop_machine() type >> >>> flow, its not clear if any of those would fire. (I might be wrong, but let >> >>> me check). >> >> >> >>Well, without disarming the watchdog NMI at the LAPIC / IO-APIC, >> >>how would it _not_ potentially fire? >> > >> > We plan not to prevent NMI being fired. Instead, if one thread of a core >> > is updating microcode, other threads of this core would stop in the >> > handler of NMI until the update completion. Is this approach acceptable? >> >> Well, I have to return the question: It is you who knows what is or >> is not acceptable while an ucode update is in progress. In particular >> it obviously matters how much ucode is involved in the delivery of >> an NMI (and in allowing the handler to get to the point where you'd >> "stop" it). >> >> If the approach you suggest is fine for the NMI case, I'd then wonder >> if it couldn't also be used for the #MC one. > > Architecturally only one #MC can be active in the system. If a new #MC > condition happens when MCG_STATUS.MCIP is already set, that would cause > spontaneous > shutdown. That's understood. > If another NMI arrives on the CPU doing the wrmsr, it will be pended > in the lapic and delivered after the wrmsr returns. wrmsr flow > can't be interrupted. Of course. Neither part of your response is an argument against adding the same "defense" to the #MC handler, though. While likely #MC will be fatal to the system anyway, we should try to avoid making things worse when we can. Jan
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c index c819028..b64a58d 100644 --- a/xen/arch/x86/microcode_amd.c +++ b/xen/arch/x86/microcode_amd.c @@ -74,9 +74,6 @@ struct mpbhdr { uint8_t data[]; }; -/* serialize access to the physical write */ -static DEFINE_SPINLOCK(microcode_update_lock); - /* See comment in start_update() for cases when this routine fails */ static int collect_cpu_info(struct cpu_signature *csig) { @@ -251,7 +248,6 @@ static enum microcode_match_result compare_patch( static int apply_microcode(const struct microcode_patch *patch) { - unsigned long flags; uint32_t rev; int hw_err; unsigned int cpu = smp_processor_id(); @@ -263,15 +259,13 @@ static int apply_microcode(const struct microcode_patch *patch) hdr = patch->mc_amd->mpb; - spin_lock_irqsave(µcode_update_lock, flags); + BUG_ON(local_irq_is_enabled()); hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr); /* get patch id after patching */ rdmsrl(MSR_AMD_PATCHLEVEL, rev); - spin_unlock_irqrestore(µcode_update_lock, flags); - /* * Some processors leave the ucode blob mapping as UC after the update. * Flush the mapping to regain normal cacheability. diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c index bfb48ce..94a1561 100644 --- a/xen/arch/x86/microcode_intel.c +++ b/xen/arch/x86/microcode_intel.c @@ -93,9 +93,6 @@ struct extended_sigtable { #define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE) -/* serialize access to the physical write to MSR 0x79 */ -static DEFINE_SPINLOCK(microcode_update_lock); - static int collect_cpu_info(struct cpu_signature *csig) { unsigned int cpu_num = smp_processor_id(); @@ -295,7 +292,6 @@ static struct microcode_patch *allow_microcode_patch( static int apply_microcode(const struct microcode_patch *patch) { - unsigned long flags; uint64_t msr_content; unsigned int val[2]; unsigned int cpu_num = raw_smp_processor_id(); @@ -307,8 +303,7 @@ static int apply_microcode(const struct microcode_patch *patch) mc_intel = patch->mc_intel; - /* serialize access to the physical write to MSR 0x79 */ - spin_lock_irqsave(µcode_update_lock, flags); + BUG_ON(local_irq_is_enabled()); /* * Writeback and invalidate caches before updating microcode to avoid @@ -327,7 +322,6 @@ static int apply_microcode(const struct microcode_patch *patch) rdmsrl(MSR_IA32_UCODE_REV, msr_content); val[1] = (uint32_t)(msr_content >> 32); - spin_unlock_irqrestore(µcode_update_lock, flags); if ( val[1] != mc_intel->hdr.rev ) { printk(KERN_ERR "microcode: CPU%d update from revision "
microcode_update_lock is to prevent logic threads of a same core from updating microcode at the same time. But due to using a global lock, it also prevented parallel microcode updating on different cores. Remove this lock in order to update microcode in parallel. It is safe because we have already ensured serialization of sibling threads at the caller side. 1.For late microcode update, do_microcode_update() ensures that only one sibiling thread of a core can update microcode. 2.For microcode update during system startup or CPU-hotplug, microcode_mutex() guarantees update serialization of logical threads. 3.get/put_cpu_bitmaps() prevents the concurrency of CPU-hotplug and late microcode update. Note that printk in apply_microcode() and svm_host_osvm_init() (for AMD only) are still processed sequentially. Signed-off-by: Chao Gao <chao.gao@intel.com> --- Changes in v7: - reworked. Remove complex lock logics introduced in v5 and v6. The microcode patch to be applied is passed as an argument without any global variable. Thus no lock is added to serialize potential readers/writers. Callers of apply_microcode() will guarantee the correctness: the patch poninted by the arguments won't be changed by others. Changes in v6: - introduce early_ucode_update_lock to serialize early ucode update. Changes in v5: - newly add --- xen/arch/x86/microcode_amd.c | 8 +------- xen/arch/x86/microcode_intel.c | 8 +------- 2 files changed, 2 insertions(+), 14 deletions(-)