Message ID | 1422311118-11320-1-git-send-email-poh@qca.qualcomm.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Mon, Jan 26, 2015 at 02:25:18PM -0800, Peter Oh wrote: > Using ioread() to perform data sync is excessive. > Use compact API, wmb(), that intended to be used for the case. > It reduces total 14 CPU clocks per interrupt. Hi, > ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS + PCIE_INTR_CLR_ADDRESS, > PCIE_INTR_FIRMWARE_MASK | PCIE_INTR_CE_MASK_ALL); > > - /* IMPORTANT: this extra read transaction is required to > - * flush the posted write buffer. */ > - (void)ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS + > - PCIE_INTR_ENABLE_ADDRESS); > + /* invoke data sync barrier */ > + wmb(); > } I am no expert in arcane PCI matters, but that looks suspicious to me. I seem to recall wmb() only enforced ordering, and maybe not even memory-IO ordering on all platforms. If you want to disable an irq, it really seems like you would want to flush posted writes so you know the hardware has seen it.
On 01/27/2015 01:33 PM, Bob Copeland wrote: > On Mon, Jan 26, 2015 at 02:25:18PM -0800, Peter Oh wrote: >> Using ioread() to perform data sync is excessive. >> Use compact API, wmb(), that intended to be used for the case. >> It reduces total 14 CPU clocks per interrupt. > Hi, > >> ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS + PCIE_INTR_CLR_ADDRESS, >> PCIE_INTR_FIRMWARE_MASK | PCIE_INTR_CE_MASK_ALL); >> >> - /* IMPORTANT: this extra read transaction is required to >> - * flush the posted write buffer. */ >> - (void)ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS + >> - PCIE_INTR_ENABLE_ADDRESS); >> + /* invoke data sync barrier */ >> + wmb(); >> } > I am no expert in arcane PCI matters, but that looks suspicious to me. I seem > to recall wmb() only enforced ordering, and maybe not even memory-IO ordering > on all platforms. If you want to disable an irq, it really seems like you > would want to flush posted writes so you know the hardware has seen it. enforced ordering is happened by flush write buffer and wmb is commonly used to flush write buffer. so that wmb guarantees ordering by flush write buffer. That's why it's called a memory barrier. when wmb is used, it guarantees all memory accesses complete before wmb command completes. for instance dsb is the corresponding command for wmb in arm and arm instruction guide says "The DSB instruction completes when all explicit memory accesses before it complete." Which means DSB instruction will hold the bus (usually AXI bus) and never returns until any memory or memory mapped I/O access complete (dsb is for ARM and other platforms have the equivalent instruction such as sfence, sync, mf, and dcs). Thanks, Peter
On Tue, Jan 27, 2015 at 03:53:00PM -0800, Peter Oh wrote: > >>- /* IMPORTANT: this extra read transaction is required to > >>- * flush the posted write buffer. */ > >>- (void)ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS + > >>- PCIE_INTR_ENABLE_ADDRESS); > >>+ /* invoke data sync barrier */ > >>+ wmb(); > >> } > >I am no expert in arcane PCI matters, but that looks suspicious to me. I seem > >to recall wmb() only enforced ordering, and maybe not even memory-IO ordering > >on all platforms. If you want to disable an irq, it really seems like you > >would want to flush posted writes so you know the hardware has seen it. > enforced ordering is happened by flush write buffer and wmb is > commonly used to flush write buffer. > so that wmb guarantees ordering by flush write buffer. That's why > it's called a memory barrier. Ok, sure, but I/O ordering two different writes, and ensuring device has seen a posted write, are related but different things, no? So if you are disabling an interrupt and want to be really sure interrupts are off when function returns, I think you still want the read. Anyway, it's your driver, just pointing out that the "IMPORTANT" read might still be important to someone.
On 01/27/2015 08:30 PM, Bob Copeland wrote: > On Tue, Jan 27, 2015 at 03:53:00PM -0800, Peter Oh wrote: >>>> - /* IMPORTANT: this extra read transaction is required to >>>> - * flush the posted write buffer. */ >>>> - (void)ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS + >>>> - PCIE_INTR_ENABLE_ADDRESS); >>>> + /* invoke data sync barrier */ >>>> + wmb(); >>>> } >>> I am no expert in arcane PCI matters, but that looks suspicious to me. I seem >>> to recall wmb() only enforced ordering, and maybe not even memory-IO ordering >>> on all platforms. If you want to disable an irq, it really seems like you >>> would want to flush posted writes so you know the hardware has seen it. >> enforced ordering is happened by flush write buffer and wmb is >> commonly used to flush write buffer. >> so that wmb guarantees ordering by flush write buffer. That's why >> it's called a memory barrier. > Ok, sure, but I/O ordering two different writes, and ensuring device > has seen a posted write, are related but different things, no? yes, they are different and wmb guarantees both. > So if > you are disabling an interrupt and want to be really sure interrupts > are off when function returns, I think you still want the read. the mechanism that read is using to make sure write work done is the same mechanism that wmb is using. > Anyway, it's your driver, just pointing out that the "IMPORTANT" > read might still be important to someone. I really appreciate your opinion. > Regards, Peter
On Tue, 2015-01-27 at 21:39 -0800, Peter Oh wrote: > > Ok, sure, but I/O ordering two different writes, and ensuring device > > has seen a posted write, are related but different things, no? > yes, they are different and wmb guarantees both. No, wmb() doesn't. I'd be very surprised if it had any side effect on the PCI bus even on ARM. Read Documentation/memory-barriers.txt. And to understand (PCI) posted writes, wikipedia helps: http://en.wikipedia.org/wiki/Posted_write johannes
Hi, On 01/27/2015 11:37 PM, Johannes Berg wrote: > On Tue, 2015-01-27 at 21:39 -0800, Peter Oh wrote: > >>> Ok, sure, but I/O ordering two different writes, and ensuring device >>> has seen a posted write, are related but different things, no? >> yes, they are different and wmb guarantees both. > No, wmb() doesn't. I'd be very surprised if it had any side effect on > the PCI bus even on ARM. Read Documentation/memory-barriers.txt. > > And to understand (PCI) posted writes, wikipedia helps: > http://en.wikipedia.org/wiki/Posted_write I admit that I/O ordering and posted write are looked different in theory and at glance since posted write could be related cache invalidate. But wmb are still related to both. As I addressed wmb uses dsb (in arm arch) and here is the description of arm architecture. * DSB drains write buffer. * DSB is architecturally defined to include all cache, TLB and branch prediction maintenance operations as well as explicit memory operations These are the reasons why I mentioned wmb does both. * captured from ARMv7 Architecture Manual --- Notes --- Historically, this operation was referred to as Drain Write Buffer or Data Write Barrier (DWB). From ARMv6, these names and the use of DWB were deprecated in favor of the new Data Synchronization Barrier name and DSB abbreviation. DSB better reflects the functionality provided from ARMv6, because DSB is architecturally defined to include all cache, TLB and branch prediction maintenance operations as well as explicit memory operations --- A DSB completes when: --- ? all explicit memory accesses that are observed by Pe before the DSB is executed, are of the required access types, and are from observers in the same required shareability domain as Pe, are complete for the set of observers in the required shareability domain. ? if the required accesses types of the DSB is reads and writes, all cache and branch predictor maintenance operations issued by Pe before the DSB are complete for the required shareability domain. ? if the required accesses types of the DSB is reads and writes, all TLB maintenance operations issued by Pe before the DSB are complete for the required shareability domain. -------------- Furthermore this is the comparison of the compiled assembly code between ath10k_pci_read32 and wmb. ath10k_pci_read32() bac: e5932008 ldr r2, [r3, #8] bb0: f57ff04f dsb sy bb4: e2883d52 add r3, r8, #5248 ; 0x1480 bb8: e283303c add r3, r3, #60 ; 0x3c bbc: e593300c ldr r3, [r3, #12] bc0: e2833a09 add r3, r3, #36864 ; 0x9000 wmb(); b9c: f57ff04e dsb st ath10k_pci_read32 does register operation except dsb and there is no cache invalidate related commands. So that if wmb is not enough for the purpose then ath10k_pci_read32 is also not enough for that. Also refer the section "ACQUIRES VS I/O ACCESSES" in memory-barriers.txt. It gives an example with PCI bridge and introduces readl as an alternative method to mmiowb which weaker form of wmb. Please give your opinion. > johannes > > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k Regards, Peter
Peter Oh wrote: > As I addressed wmb uses dsb (in arm arch) and here is the description of > arm architecture. What about other architectures ? Sujith
On 01/30/2015 05:16 PM, Sujith Manoharan wrote: > Peter Oh wrote: >> As I addressed wmb uses dsb (in arm arch) and here is the description of >> arm architecture. > What about other architectures ? Please refer the email thread that I mentioned about other architectures. (dsb is for ARM and other platforms have the equivalent instruction such as sfence, sync, mf, and dcs). Also the patch is updated with 2nd patch set replacing wmb to mb. > Sujith > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k
Peter Oh wrote: > Please refer the email thread that I mentioned about other architectures. > (dsb is for ARM and other platforms have the equivalent instruction such > as sfence, sync, mf, and dcs). Ok. > Also the patch is updated with 2nd patch set replacing wmb to mb. Would be good to test this on a MIPS platform... Sujith
On Fri, 2015-01-30 at 14:53 -0800, Peter Oh wrote: > I admit that I/O ordering and posted write are looked different in > theory and at glance since posted write could be related cache invalidate. > But wmb are still related to both. > As I addressed wmb uses dsb (in arm arch) and here is the description of > arm architecture. > > * DSB drains write buffer. > * DSB is architecturally defined to include all cache, TLB and branch > prediction maintenance operations as well as explicit memory operations > > These are the reasons why I mentioned wmb does both. > > * captured from ARMv7 Architecture Manual > --- Notes --- > Historically, this operation was referred to as Drain Write Buffer or > Data Write Barrier (DWB). From ARMv6, these > names and the use of DWB were deprecated in favor of the new Data > Synchronization Barrier name and DSB > abbreviation. DSB better reflects the functionality provided from ARMv6, > because DSB is architecturally defined > to include all cache, TLB and branch prediction maintenance operations > as well as explicit memory operations > > --- A DSB completes when: --- > ? all explicit memory accesses that are observed by Pe before the DSB is > executed, are of the required access > types, and are from observers in the same required shareability domain > as Pe, are complete for the set of > observers in the required shareability domain. > ? if the required accesses types of the DSB is reads and writes, all > cache and branch predictor maintenance > operations issued by Pe before the DSB are complete for the required > shareability domain. > ? if the required accesses types of the DSB is reads and writes, all TLB > maintenance operations issued by Pe > before the DSB are complete for the required shareability domain. > -------------- I cannot read from this in any way that it can post writes to the PCIe bus. In fact, architecturally, I cannot think of any reason how it even could do that from the CPU. > Furthermore this is the comparison of the compiled assembly code between > ath10k_pci_read32 and wmb. > > ath10k_pci_read32() > bac: e5932008 ldr r2, [r3, #8] > bb0: f57ff04f dsb sy > bb4: e2883d52 add r3, r8, #5248 ; 0x1480 > bb8: e283303c add r3, r3, #60 ; 0x3c > bbc: e593300c ldr r3, [r3, #12] > bc0: e2833a09 add r3, r3, #36864 ; 0x9000 > > wmb(); > b9c: f57ff04e dsb st > > ath10k_pci_read32 does register operation except dsb and there is no > cache invalidate related commands. I don't think this is relevant. The question is "what are you trying to achieve". > So that if wmb is not enough for the purpose then ath10k_pci_read32 is > also not enough for that. > > Also refer the section "ACQUIRES VS I/O ACCESSES" in memory-barriers.txt. > > It gives an example with PCI bridge and introduces readl as an > alternative method to mmiowb which weaker form of wmb. > > Please give your opinion. Again - the question is - what are you trying to achieve? The code (as it is before your patch) implies that it's trying to make sure that before it continues, any previous writes to the PCIe device's registers are posted. The only way to ensure that is to do a read to the registers, as the code does now. What you're describing is something else entirely - you're describing a way to make sure that some data was flushed out to DRAM from the CPU caches. These two things are not related in any way. In an interrupt routine, it would make sense to ensure that the write was posted (e.g. to mask interrupts, or to acknowledge them, or similar, before the routine can be re-invoked.) To me, flushing memory writes to DRAM makes less sense in an interrupt handlers unless the device was somehow using DMA to coordinate interrupts [1], which seems unlikely but I haven't checked. Anyway - I have no particular interest in this discussion, I was merely trying to help you out with this :) You can make whatever change you want, of course :P johannes [1] incidentally, our device [iwlwifi] does in fact do something like that, but it's read-only for the driver so no need for such a thing either
On 01/30/2015 06:06 PM, Sujith Manoharan wrote: > Peter Oh wrote: >> Please refer the email thread that I mentioned about other > architectures. >> (dsb is for ARM and other platforms have the equivalent instruction such >> as sfence, sync, mf, and dcs). > Ok. > >> Also the patch is updated with 2nd patch set replacing wmb to mb. > Would be good to test this on a MIPS platform... Agree. I'll do the investigation and test. > Sujith > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k Thanks, Peter
On 02/02/2015 05:02 AM, Johannes Berg wrote: > On Fri, 2015-01-30 at 14:53 -0800, Peter Oh wrote: > >> I admit that I/O ordering and posted write are looked different in >> theory and at glance since posted write could be related cache > invalidate. >> But wmb are still related to both. >> As I addressed wmb uses dsb (in arm arch) and here is the description of >> arm architecture. >> >> * DSB drains write buffer. >> * DSB is architecturally defined to include all cache, TLB and branch >> prediction maintenance operations as well as explicit memory operations >> >> These are the reasons why I mentioned wmb does both. >> >> * captured from ARMv7 Architecture Manual >> --- Notes --- >> Historically, this operation was referred to as Drain Write Buffer or >> Data Write Barrier (DWB). From ARMv6, these >> names and the use of DWB were deprecated in favor of the new Data >> Synchronization Barrier name and DSB >> abbreviation. DSB better reflects the functionality provided from ARMv6, >> because DSB is architecturally defined >> to include all cache, TLB and branch prediction maintenance operations >> as well as explicit memory operations >> >> --- A DSB completes when: --- >> ? all explicit memory accesses that are observed by Pe before the DSB is >> executed, are of the required access >> types, and are from observers in the same required shareability domain >> as Pe, are complete for the set of >> observers in the required shareability domain. >> ? if the required accesses types of the DSB is reads and writes, all >> cache and branch predictor maintenance >> operations issued by Pe before the DSB are complete for the required >> shareability domain. >> ? if the required accesses types of the DSB is reads and writes, all TLB >> maintenance operations issued by Pe >> before the DSB are complete for the required shareability domain. >> -------------- > I cannot read from this in any way that it can post writes to the PCIe > bus. In fact, architecturally, I cannot think of any reason how it even > could do that from the CPU. > >> Furthermore this is the comparison of the compiled assembly code between >> ath10k_pci_read32 and wmb. >> >> ath10k_pci_read32() >> bac: e5932008 ldr r2, [r3, #8] >> bb0: f57ff04f dsb sy >> bb4: e2883d52 add r3, r8, #5248 ; 0x1480 >> bb8: e283303c add r3, r3, #60 ; 0x3c >> bbc: e593300c ldr r3, [r3, #12] >> bc0: e2833a09 add r3, r3, #36864 ; 0x9000 >> >> wmb(); >> b9c: f57ff04e dsb st >> >> ath10k_pci_read32 does register operation except dsb and there is no >> cache invalidate related commands. > I don't think this is relevant. The question is "what are you trying to > achieve". > >> So that if wmb is not enough for the purpose then ath10k_pci_read32 is >> also not enough for that. >> >> Also refer the section "ACQUIRES VS I/O ACCESSES" in > memory-barriers.txt. >> It gives an example with PCI bridge and introduces readl as an >> alternative method to mmiowb which weaker form of wmb. >> >> Please give your opinion. > Again - the question is - what are you trying to achieve? > > The code (as it is before your patch) implies that it's trying to make > sure that before it continues, any previous writes to the PCIe device's > registers are posted. The only way to ensure that is to do a read to the > registers, as the code does now. Do you know how the read ensure that although the read code does not check the return value? Can you explain how a read ensures that posted write reaches PCIe device? > What you're describing is something else entirely - you're describing a > way to make sure that some data was flushed out to DRAM from the CPU > caches. > > These two things are not related in any way. > > In an interrupt routine, it would make sense to ensure that the write > was posted (e.g. to mask interrupts, or to acknowledge them, or similar, > before the routine can be re-invoked.) > > To me, flushing memory writes to DRAM makes less sense in an interrupt > handlers unless the device was somehow using DMA to coordinate > interrupts [1], which seems unlikely but I haven't checked. > > Anyway - I have no particular interest in this discussion, I was merely > trying to help you out with this :) You can make whatever change you > want, of course :P > > johannes > > [1] incidentally, our device [iwlwifi] does in fact do something like > that, but it's read-only for the driver so no need for such a thing > either > > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k Thanks, Peter
On Mon, 2015-02-02 at 09:33 -0800, Peter Oh wrote: > > The code (as it is before your patch) implies that it's trying to make > > sure that before it continues, any previous writes to the PCIe device's > > registers are posted. The only way to ensure that is to do a read to the > > registers, as the code does now. > Do you know how the read ensure that although the read code does not > check the return value? > Can you explain how a read ensures that posted write reaches PCIe device? You basically have the following sequence: iowrite() ioread() If you look, you'll see that iowrite() is actually done (or should be, or perhaps with appropriate syncs) on an uncached mapping. As a result, the only thing you care about here is the PCIe bus, not the CPU cache flush. And from there on that's just a question of PCIe bus semantics. johannes
On 02/02/2015 10:54 AM, Johannes Berg wrote: > On Mon, 2015-02-02 at 09:33 -0800, Peter Oh wrote: > >>> The code (as it is before your patch) implies that it's trying to make >>> sure that before it continues, any previous writes to the PCIe > device's >>> registers are posted. The only way to ensure that is to do a read to > the >>> registers, as the code does now. >> Do you know how the read ensure that although the read code does not >> check the return value? >> Can you explain how a read ensures that posted write reaches PCIe > device? > > You basically have the following sequence: > > iowrite() > ioread() > > If you look, you'll see that iowrite() is actually done (or should be, > or perhaps with appropriate syncs) on an uncached mapping. since it's mmio, iowrite will be map to write, not out which is cached mapping. That's why we address "posted write" here. If it's un-cached mapping which is volatile, we don't even need ioread. > As a result, > the only thing you care about here is the PCIe bus, not the CPU cache > flush. And from there on that's just a question of PCIe bus semantics. So how does ioread guarantee PCIe bus transaction done? > > johannes > > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k Regards, Peter
> > You basically have the following sequence: > > > > iowrite() > > ioread() > > > > If you look, you'll see that iowrite() is actually done (or should be, > > or perhaps with appropriate syncs) on an uncached mapping. > since it's mmio, iowrite will be map to write, not out which is cached > mapping. > That's why we address "posted write" here. > If it's un-cached mapping which is volatile, we don't even need ioread. No, this isn't true - "posted write" in the context of this discussion is about the PCIe bus. Memory writes that go through cache aren't referred to as "posted writes", those are just (cached) memory writes. > > As a result, > > the only thing you care about here is the PCIe bus, not the CPU cache > > flush. And from there on that's just a question of PCIe bus semantics. > So how does ioread guarantee PCIe bus transaction done? That's how PCIe works, operations are serialized, and read() has to wait for a response from the device (but write doesn't - which is "posted write") johannes
On 02/02/2015 11:22 AM, Johannes Berg wrote: >>> You basically have the following sequence: >>> >>> iowrite() >>> ioread() >>> >>> If you look, you'll see that iowrite() is actually done (or should be, >>> or perhaps with appropriate syncs) on an uncached mapping. >> since it's mmio, iowrite will be map to write, not out which is cached >> mapping. >> That's why we address "posted write" here. >> If it's un-cached mapping which is volatile, we don't even need ioread. > No, this isn't true - "posted write" in the context of this discussion > is about the PCIe bus. Memory writes that go through cache aren't > referred to as "posted writes", those are just (cached) memory writes. > >>> As a result, >>> the only thing you care about here is the PCIe bus, not the CPU cache >>> flush. And from there on that's just a question of PCIe bus semantics. >> So how does ioread guarantee PCIe bus transaction done? > That's how PCIe works, operations are serialized, and read() has to wait > for a response from the device do you know which mechanism or which instruction set makes read() wait for a response from the device? > (but write doesn't - which is "posted > write") > > johannes >
On Mon, 2015-02-02 at 11:36 -0800, Peter Oh wrote: > On 02/02/2015 11:22 AM, Johannes Berg wrote: > >>> You basically have the following sequence: > >>> > >>> iowrite() > >>> ioread() > >>> > >>> If you look, you'll see that iowrite() is actually done (or should be, > >>> or perhaps with appropriate syncs) on an uncached mapping. > >> since it's mmio, iowrite will be map to write, not out which is cached > >> mapping. > >> That's why we address "posted write" here. > >> If it's un-cached mapping which is volatile, we don't even need ioread. > > No, this isn't true - "posted write" in the context of this discussion > > is about the PCIe bus. Memory writes that go through cache aren't > > referred to as "posted writes", those are just (cached) memory writes. > > > >>> As a result, > >>> the only thing you care about here is the PCIe bus, not the CPU cache > >>> flush. And from there on that's just a question of PCIe bus semantics. > >> So how does ioread guarantee PCIe bus transaction done? > > That's how PCIe works, operations are serialized, and read() has to wait > > for a response from the device > do you know which mechanism or which instruction set makes read() wait > for a response from the device? I have no idea. I assume it's just like a DRAM read, the CPU stalls while there's no response. johannes
On 02/02/2015 11:47 AM, Johannes Berg wrote: > On Mon, 2015-02-02 at 11:36 -0800, Peter Oh wrote: >> On 02/02/2015 11:22 AM, Johannes Berg wrote: >>>>> You basically have the following sequence: >>>>> >>>>> iowrite() >>>>> ioread() >>>>> >>>>> If you look, you'll see that iowrite() is actually done (or should > be, >>>>> or perhaps with appropriate syncs) on an uncached mapping. >>>> since it's mmio, iowrite will be map to write, not out which is > cached >>>> mapping. >>>> That's why we address "posted write" here. >>>> If it's un-cached mapping which is volatile, we don't even need > ioread. >>> No, this isn't true - "posted write" in the context of this discussion >>> is about the PCIe bus. Memory writes that go through cache aren't >>> referred to as "posted writes", those are just (cached) memory writes. >>> >>>>> As a result, >>>>> the only thing you care about here is the PCIe bus, not the CPU > cache >>>>> flush. And from there on that's just a question of PCIe bus > semantics. >>>> So how does ioread guarantee PCIe bus transaction done? >>> That's how PCIe works, operations are serialized, and read() has to > wait >>> for a response from the device >> do you know which mechanism or which instruction set makes read() wait >> for a response from the device? > I have no idea. I assume it's just like a DRAM read, the CPU stalls > while there's no response. My explanation in this thread is all about how read() guarantees the wait for a response from the device, therefore why mb() - replace from wmb at patch set 2 - is compatible to read(). Briefly speaking, read() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus -> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe device) signals write completion when write transactions completed in write response channel -> cpu release axi bus -> cpu program counter (pc) proceeds the next to read. the exact same routines happen with mb(). mb() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus -> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe device) signals write completion when write transactions completed in write response channel -> cpu release axi bus -> cpu program counter (pc) proceeds the next to read. Since axi bus master is waiting (blocking) for write completion signal from axi slave (PCIe device), this is how read() and mb() guarantee write command reaches to the device. > johannes > > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k Regards, Peter
On 02/02/2015 02:06 PM, Peter Oh wrote: > > On 02/02/2015 11:47 AM, Johannes Berg wrote: >> On Mon, 2015-02-02 at 11:36 -0800, Peter Oh wrote: >>> On 02/02/2015 11:22 AM, Johannes Berg wrote: >>>>>> You basically have the following sequence: >>>>>> >>>>>> iowrite() >>>>>> ioread() >>>>>> >>>>>> If you look, you'll see that iowrite() is actually done (or should >> be, >>>>>> or perhaps with appropriate syncs) on an uncached mapping. >>>>> since it's mmio, iowrite will be map to write, not out which is >> cached >>>>> mapping. >>>>> That's why we address "posted write" here. >>>>> If it's un-cached mapping which is volatile, we don't even need >> ioread. >>>> No, this isn't true - "posted write" in the context of this discussion >>>> is about the PCIe bus. Memory writes that go through cache aren't >>>> referred to as "posted writes", those are just (cached) memory writes. >>>> >>>>>> As a result, >>>>>> the only thing you care about here is the PCIe bus, not the CPU >> cache >>>>>> flush. And from there on that's just a question of PCIe bus >> semantics. >>>>> So how does ioread guarantee PCIe bus transaction done? >>>> That's how PCIe works, operations are serialized, and read() has to >> wait >>>> for a response from the device >>> do you know which mechanism or which instruction set makes read() wait >>> for a response from the device? >> I have no idea. I assume it's just like a DRAM read, the CPU stalls >> while there's no response. > My explanation in this thread is all about how read() guarantees the > wait for a response from the device, therefore why mb() - replace from > wmb at patch set 2 - is compatible to read(). > Briefly speaking, > read() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus > -> cpu post write buffer on axi bus -> axi bus (axi slave which is > PCIe device) signals write completion when write transactions > completed in write response channel -> cpu release axi bus -> cpu > program counter (pc) proceeds the next to read. > > the exact same routines happen with mb(). > mb() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus > -> cpu post write buffer on axi bus -> axi bus (axi slave which is > PCIe device) signals write completion when write transactions > completed in write response channel -> cpu release axi bus -> cpu > program counter (pc) proceeds the next to read. read -> mb (erratum) > > Since axi bus master is waiting (blocking) for write completion signal > from axi slave (PCIe device), this is how read() and mb() guarantee > write command reaches to the device. >> johannes >> >> >> _______________________________________________ >> ath10k mailing list >> ath10k@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/ath10k > Regards, > Peter > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k
On 30 January 2015 at 18:06, Sujith Manoharan <sujith@msujith.org> wrote: > Peter Oh wrote: >> Please refer the email thread that I mentioned about other architectures. >> (dsb is for ARM and other platforms have the equivalent instruction such >> as sfence, sync, mf, and dcs). > > Ok. > >> Also the patch is updated with 2nd patch set replacing wmb to mb. > > Would be good to test this on a MIPS platform... > The Atheros mips74k stuff I have here does /not/ flush all the writes out to the device and guarantee the device has seen everything with a memory barrier. Just saying. Various drivers ended up needing ioread()s in my experiments; mips sync operations weren't enough. So I'd suggest abstracting it out like the linux dri i915 code has - they define a "posting read" macro which they use whenever they need to ensure it's definitely made it all the way out to the hardware and through internal FIFOs so internal hardware has seen the state change. Then you can redefine that to your hearts content based on platform. -adrian
On 02/02/2015 02:26 PM, Adrian Chadd wrote: > On 30 January 2015 at 18:06, Sujith Manoharan <sujith@msujith.org> wrote: >> Peter Oh wrote: >>> Please refer the email thread that I mentioned about other architectures. >>> (dsb is for ARM and other platforms have the equivalent instruction such >>> as sfence, sync, mf, and dcs). >> Ok. >> >>> Also the patch is updated with 2nd patch set replacing wmb to mb. >> Would be good to test this on a MIPS platform... >> > The Atheros mips74k stuff I have here does /not/ flush all the writes > out to the device and guarantee the device has seen everything with a > memory barrier. Just saying. Various drivers ended up needing > ioread()s in my experiments; mips sync operations weren't enough. > > So I'd suggest abstracting it out like the linux dri i915 code has - > they define a "posting read" macro which they use whenever they need > to ensure it's definitely made it all the way out to the hardware and > through internal FIFOs so internal hardware has seen the state change. > Then you can redefine that to your hearts content based on platform. Thank you Adrian to head up the concern and suggestion. The other people also concerned about other architectures like mips, so I was going to analysis it. But since you've already experienced the defects, let me hold the change back until I find better for all. > > > -adrian Regards, Peter
On 02/02/15 14:06, Peter Oh wrote: > > On 02/02/2015 11:47 AM, Johannes Berg wrote: >> On Mon, 2015-02-02 at 11:36 -0800, Peter Oh wrote: >>> On 02/02/2015 11:22 AM, Johannes Berg wrote: >>>>>> You basically have the following sequence: >>>>>> >>>>>> iowrite() >>>>>> ioread() >>>>>> >>>>>> If you look, you'll see that iowrite() is actually done (or should >> be, >>>>>> or perhaps with appropriate syncs) on an uncached mapping. >>>>> since it's mmio, iowrite will be map to write, not out which is >> cached >>>>> mapping. >>>>> That's why we address "posted write" here. >>>>> If it's un-cached mapping which is volatile, we don't even need >> ioread. >>>> No, this isn't true - "posted write" in the context of this discussion >>>> is about the PCIe bus. Memory writes that go through cache aren't >>>> referred to as "posted writes", those are just (cached) memory writes. >>>> >>>>>> As a result, >>>>>> the only thing you care about here is the PCIe bus, not the CPU >> cache >>>>>> flush. And from there on that's just a question of PCIe bus >> semantics. >>>>> So how does ioread guarantee PCIe bus transaction done? >>>> That's how PCIe works, operations are serialized, and read() has to >> wait >>>> for a response from the device >>> do you know which mechanism or which instruction set makes read() wait >>> for a response from the device? >> I have no idea. I assume it's just like a DRAM read, the CPU stalls >> while there's no response. > My explanation in this thread is all about how read() guarantees the > wait for a response from the device, therefore why mb() - replace from > wmb at patch set 2 - is compatible to read(). > Briefly speaking, > read() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus > -> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe > device) signals write completion when write transactions completed in > write response channel -> cpu release axi bus -> cpu program counter > (pc) proceeds the next to read. > > the exact same routines happen with mb(). > mb() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus -> > cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe > device) signals write completion when write transactions completed in > write response channel -> cpu release axi bus -> cpu program counter > (pc) proceeds the next to read. > > Since axi bus master is waiting (blocking) for write completion signal > from axi slave (PCIe device), this is how read() and mb() guarantee > write command reaches to the device. PCIe writes are posted, so the only guarantee you can have by inserting such barriers is that writes from CPU to the PCIe RC (targeting PCIe device) is non-posted (as far as the busing between CPU and the PCIe RC is concerned), but past the PCIe RC, there is no such guarantee, because the PCIe specification allows for that and there is flow control, PCIe switches or other things that can alter the way your PCIe device ends-up being written to. The only way to make a "portable" synchronization barrier is to do a PCIe read from the same register you just wrote to, because then, the PCIe RC needs to guarantee the transaction ordering on the PCIe bus itself. You might just be lucky and/or have very good HW which ensures that the ARM synchronization barriers are propagated to the memory region where your PCIe device BARs are mapped from the CPU perspective, but you definitively cannot rely on such assumptions, as there will be bogus HW there, for which only a subsequent ioread32() will work.
Hi Florian, Very appreciate your explanation in detail. Regards, Peter On 02/02/2015 03:25 PM, Florian Fainelli wrote: > On 02/02/15 14:06, Peter Oh wrote: >> On 02/02/2015 11:47 AM, Johannes Berg wrote: >>> On Mon, 2015-02-02 at 11:36 -0800, Peter Oh wrote: >>>> On 02/02/2015 11:22 AM, Johannes Berg wrote: >>>>>>> You basically have the following sequence: >>>>>>> >>>>>>> iowrite() >>>>>>> ioread() >>>>>>> >>>>>>> If you look, you'll see that iowrite() is actually done (or should >>> be, >>>>>>> or perhaps with appropriate syncs) on an uncached mapping. >>>>>> since it's mmio, iowrite will be map to write, not out which is >>> cached >>>>>> mapping. >>>>>> That's why we address "posted write" here. >>>>>> If it's un-cached mapping which is volatile, we don't even need >>> ioread. >>>>> No, this isn't true - "posted write" in the context of this discussion >>>>> is about the PCIe bus. Memory writes that go through cache aren't >>>>> referred to as "posted writes", those are just (cached) memory writes. >>>>> >>>>>>> As a result, >>>>>>> the only thing you care about here is the PCIe bus, not the CPU >>> cache >>>>>>> flush. And from there on that's just a question of PCIe bus >>> semantics. >>>>>> So how does ioread guarantee PCIe bus transaction done? >>>>> That's how PCIe works, operations are serialized, and read() has to >>> wait >>>>> for a response from the device >>>> do you know which mechanism or which instruction set makes read() wait >>>> for a response from the device? >>> I have no idea. I assume it's just like a DRAM read, the CPU stalls >>> while there's no response. >> My explanation in this thread is all about how read() guarantees the >> wait for a response from the device, therefore why mb() - replace from >> wmb at patch set 2 - is compatible to read(). >> Briefly speaking, >> read() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus >> -> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe >> device) signals write completion when write transactions completed in >> write response channel -> cpu release axi bus -> cpu program counter >> (pc) proceeds the next to read. >> >> the exact same routines happen with mb(). >> mb() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus -> >> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe >> device) signals write completion when write transactions completed in >> write response channel -> cpu release axi bus -> cpu program counter >> (pc) proceeds the next to read. >> >> Since axi bus master is waiting (blocking) for write completion signal >> from axi slave (PCIe device), this is how read() and mb() guarantee >> write command reaches to the device. > PCIe writes are posted, so the only guarantee you can have by inserting > such barriers is that writes from CPU to the PCIe RC (targeting PCIe > device) is non-posted (as far as the busing between CPU and the PCIe RC > is concerned), but past the PCIe RC, there is no such guarantee, because > the PCIe specification allows for that and there is flow control, PCIe > switches or other things that can alter the way your PCIe device ends-up > being written to. > > The only way to make a "portable" synchronization barrier is to do a > PCIe read from the same register you just wrote to, because then, the > PCIe RC needs to guarantee the transaction ordering on the PCIe bus itself. > > You might just be lucky and/or have very good HW which ensures that the > ARM synchronization barriers are propagated to the memory region where > your PCIe device BARs are mapped from the CPU perspective, but you > definitively cannot rely on such assumptions, as there will be bogus HW > there, for which only a subsequent ioread32() will work.
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c index 3b40a86..c353a2c 100644 --- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -346,10 +346,8 @@ static void ath10k_pci_disable_and_clear_legacy_irq(struct ath10k *ar) ath10k_pci_write32(ar, SOC_CORE_BASE_ADDRESS + PCIE_INTR_CLR_ADDRESS, PCIE_INTR_FIRMWARE_MASK | PCIE_INTR_CE_MASK_ALL); - /* IMPORTANT: this extra read transaction is required to - * flush the posted write buffer. */ - (void)ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS + - PCIE_INTR_ENABLE_ADDRESS); + /* invoke data sync barrier */ + wmb(); } static void ath10k_pci_enable_legacy_irq(struct ath10k *ar) @@ -358,10 +356,8 @@ static void ath10k_pci_enable_legacy_irq(struct ath10k *ar) PCIE_INTR_ENABLE_ADDRESS, PCIE_INTR_FIRMWARE_MASK | PCIE_INTR_CE_MASK_ALL); - /* IMPORTANT: this extra read transaction is required to - * flush the posted write buffer. */ - (void)ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS + - PCIE_INTR_ENABLE_ADDRESS); + /* invoke data sync barrier */ + wmb(); } static inline const char *ath10k_pci_get_irq_method(struct ath10k *ar)
Using ioread() to perform data sync is excessive. Use compact API, wmb(), that intended to be used for the case. It reduces total 14 CPU clocks per interrupt. Signed-off-by: Peter Oh <poh@qca.qualcomm.com> --- drivers/net/wireless/ath/ath10k/pci.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)