Message ID | 1536770554-107314-2-git-send-email-phil@raspberrypi.org (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | Improve VCHIQ cache line size handling | expand |
Hi Phil, Am 12.09.2018 um 18:42 schrieb Phil Elwell: > The ARM coprocessor registers include dcache line size, but there is no > function to expose this value. Rather than create a new one, use the > read_cpuid_id function to derive the correct value, which is 32 for > BCM2835 and 64 for BCM2836/7. > > Signed-off-by: Phil Elwell <phil@raspberrypi.org> > --- > .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c > index e767209..3537f60 100644 > --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c > +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c > @@ -42,6 +42,7 @@ > #include <linux/uaccess.h> > #include <linux/mm.h> > #include <linux/of.h> > +#include <asm/cputype.h> > #include <soc/bcm2835/raspberrypi-firmware.h> > > #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) > @@ -81,13 +82,15 @@ static void __iomem *g_regs; > * VPU firmware, which determines the required alignment of the > * offsets/sizes in pagelists. > * > - * Modern VPU firmware looks for a DT "cache-line-size" property in > - * the VCHIQ node and will overwrite it with the actual L2 cache size, > + * Previous VPU firmware looked for a DT "cache-line-size" property in > + * the VCHIQ node and would overwrite it with the actual L2 cache size, > * which the kernel must then respect. That property was rejected > - * upstream, so we have to use the VPU firmware's compatibility value > - * of 32. > + * upstream, so we now rely on both sides to "do the right thing" independently > + * of the other. To improve backwards compatibility, this new behaviour is > + * signalled to the firmware by the use of a corrected "reg" property on the > + * relevant Device Tree node. > */ > -static unsigned int g_cache_line_size = 32; > +static unsigned int g_cache_line_size; > static unsigned int g_fragments_size; > static char *g_fragments_base; > static char *g_free_fragments; > @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) > if (err < 0) > return err; > > + /* > + * The tempting L1_CACHE_BYTES macro doesn't work in the case of > + * a kernel built with bcm2835_defconfig running on a BCM2836/7 > + * processor, hence the need for a runtime check. The dcache line size > + * is encoded in one of the coprocessor registers, but there is no > + * convenient way to access it short of embedded assembler, hence > + * the use of read_cpuid_id(). The following test evaluates to true > + * on a BCM2835 showing that it is ARMv6-ish, whereas > + * cpu_architecture() will indicate that it is an ARMv7. > + */ > + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; is there a chance to use ARM_CPU_PART_ARM1176 or something else instead of this magic numbers? Sorry, i cannot remember the whole discussion but why we can use different compatible here instead of using ARM specific stuff. compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 compatible = "brcm,bcm2835"; // only for BCM2835 > g_fragments_size = 2 * g_cache_line_size; > > /* Allocate space for the channels in coherent memory */
Hi Stefan, On 14/09/2018 11:11, Stefan Wahren wrote: > Hi Phil, > > Am 12.09.2018 um 18:42 schrieb Phil Elwell: >> The ARM coprocessor registers include dcache line size, but there is no >> function to expose this value. Rather than create a new one, use the >> read_cpuid_id function to derive the correct value, which is 32 for >> BCM2835 and 64 for BCM2836/7. >> >> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >> --- >> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >> 1 file changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >> index e767209..3537f60 100644 >> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >> @@ -42,6 +42,7 @@ >> #include <linux/uaccess.h> >> #include <linux/mm.h> >> #include <linux/of.h> >> +#include <asm/cputype.h> >> #include <soc/bcm2835/raspberrypi-firmware.h> >> >> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >> * VPU firmware, which determines the required alignment of the >> * offsets/sizes in pagelists. >> * >> - * Modern VPU firmware looks for a DT "cache-line-size" property in >> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >> + * Previous VPU firmware looked for a DT "cache-line-size" property in >> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >> * which the kernel must then respect. That property was rejected >> - * upstream, so we have to use the VPU firmware's compatibility value >> - * of 32. >> + * upstream, so we now rely on both sides to "do the right thing" independently >> + * of the other. To improve backwards compatibility, this new behaviour is >> + * signalled to the firmware by the use of a corrected "reg" property on the >> + * relevant Device Tree node. >> */ >> -static unsigned int g_cache_line_size = 32; >> +static unsigned int g_cache_line_size; >> static unsigned int g_fragments_size; >> static char *g_fragments_base; >> static char *g_free_fragments; >> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >> if (err < 0) >> return err; >> >> + /* >> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >> + * processor, hence the need for a runtime check. The dcache line size >> + * is encoded in one of the coprocessor registers, but there is no >> + * convenient way to access it short of embedded assembler, hence >> + * the use of read_cpuid_id(). The following test evaluates to true >> + * on a BCM2835 showing that it is ARMv6-ish, whereas >> + * cpu_architecture() will indicate that it is an ARMv7. >> + */ >> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; > > is there a chance to use ARM_CPU_PART_ARM1176 or something else instead > of this magic numbers? One could. It's a trade-off between specific and human-readable vs. generic and less-so, and possibly moot given the next question. > Sorry, i cannot remember the whole discussion but why we can use > different compatible here instead of using ARM specific stuff. > > compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 > > compatible = "brcm,bcm2835"; // only for BCM2835 It wasn't an option you raised in the offline discussion. Assuming you meant "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type of device if that is more acceptable. >> g_fragments_size = 2 * g_cache_line_size; >> >> /* Allocate space for the channels in coherent memory */
On Fri, Sep 14, 2018 at 12:11:54PM +0200, Stefan Wahren wrote: > Hi Phil, > > Am 12.09.2018 um 18:42 schrieb Phil Elwell: > > + /* > > + * The tempting L1_CACHE_BYTES macro doesn't work in the case of > > + * a kernel built with bcm2835_defconfig running on a BCM2836/7 > > + * processor, hence the need for a runtime check. The dcache line size > > + * is encoded in one of the coprocessor registers, but there is no > > + * convenient way to access it short of embedded assembler, hence > > + * the use of read_cpuid_id(). The following test evaluates to true > > + * on a BCM2835 showing that it is ARMv6-ish, whereas > > + * cpu_architecture() will indicate that it is an ARMv7. > > + */ > > + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; > > is there a chance to use ARM_CPU_PART_ARM1176 or something else instead > of this magic numbers? In any case, reading the CPU ID and masking it in this way is really not on. The above checks that the ID _may_ be a "new ID" format, and that the top four bits of the CPU part number are 11. However, there are three formats of the CPU ID register which can conflict if you do this kind of trivial mask-and-test. The relevant cases for are: bits 19:16 inclusive being all ones define whether the ID is "new ID" format with the additional CPU ID registers. This is used for all modern CPUs. Otherwise, if bit 19 is clear, it is an older ID format, and bits 18:16 may contain '7'. The value of bits 15:4 depend on the implementer - "1011" in 15:12 are used by Intel for StrongARM1110 (though they won't have '7' in 18:16), ARM Ltd for ARM1136, ARM1156, ARM1176 and ARM11MPCore (which will have '7' in 18:16) - and hence will match this test. So, the above code does not match just BCM2835, but also matches other CPUs. There is a reason why the old macros that allow the part number to be fetched in asm/cputype.h are marked as deprecated, and that is to encourage correct behaviour - to use both the part number and implementer fields. The comment is also wrong. It claims that in order to read the cache line size, one needs to use assembler. asm/cputype.h has macros and inline functions (of which read_cpuid_id() is one of them) to allow access without having assembler encoded in drivers. However, there is a gotcha - the format of the cache type register also depends on the CPU architecture, so it's not a simple case of grabbing the value of a bitfield.
Hi, Am 14.09.2018 um 12:26 schrieb Phil Elwell: > Hi Stefan, > > On 14/09/2018 11:11, Stefan Wahren wrote: >> Hi Phil, >> >> Am 12.09.2018 um 18:42 schrieb Phil Elwell: >>> The ARM coprocessor registers include dcache line size, but there is no >>> function to expose this value. Rather than create a new one, use the >>> read_cpuid_id function to derive the correct value, which is 32 for >>> BCM2835 and 64 for BCM2836/7. >>> >>> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >>> --- >>> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >>> 1 file changed, 19 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>> index e767209..3537f60 100644 >>> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>> @@ -42,6 +42,7 @@ >>> #include <linux/uaccess.h> >>> #include <linux/mm.h> >>> #include <linux/of.h> >>> +#include <asm/cputype.h> >>> #include <soc/bcm2835/raspberrypi-firmware.h> >>> >>> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >>> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >>> * VPU firmware, which determines the required alignment of the >>> * offsets/sizes in pagelists. >>> * >>> - * Modern VPU firmware looks for a DT "cache-line-size" property in >>> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >>> + * Previous VPU firmware looked for a DT "cache-line-size" property in >>> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >>> * which the kernel must then respect. That property was rejected >>> - * upstream, so we have to use the VPU firmware's compatibility value >>> - * of 32. >>> + * upstream, so we now rely on both sides to "do the right thing" independently >>> + * of the other. To improve backwards compatibility, this new behaviour is >>> + * signalled to the firmware by the use of a corrected "reg" property on the >>> + * relevant Device Tree node. >>> */ >>> -static unsigned int g_cache_line_size = 32; >>> +static unsigned int g_cache_line_size; >>> static unsigned int g_fragments_size; >>> static char *g_fragments_base; >>> static char *g_free_fragments; >>> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >>> if (err < 0) >>> return err; >>> >>> + /* >>> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >>> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >>> + * processor, hence the need for a runtime check. The dcache line size >>> + * is encoded in one of the coprocessor registers, but there is no >>> + * convenient way to access it short of embedded assembler, hence >>> + * the use of read_cpuid_id(). The following test evaluates to true >>> + * on a BCM2835 showing that it is ARMv6-ish, whereas >>> + * cpu_architecture() will indicate that it is an ARMv7. >>> + */ >>> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; >> is there a chance to use ARM_CPU_PART_ARM1176 or something else instead >> of this magic numbers? > One could. It's a trade-off between specific and human-readable vs. generic and less-so, > and possibly moot given the next question. > >> Sorry, i cannot remember the whole discussion but why we can use >> different compatible here instead of using ARM specific stuff. >> >> compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 >> >> compatible = "brcm,bcm2835"; // only for BCM2835 > It wasn't an option you raised in the offline discussion. maybe this was too obvious :-( > Assuming you meant > "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type > of device if that is more acceptable. Yes, it's a different SoC. Do you see any problems with the VC4 firmware according to this approach? > >>> g_fragments_size = 2 * g_cache_line_size; >>> >>> /* Allocate space for the channels in coherent memory */ > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 14/09/2018 12:03, Stefan Wahren wrote: > Hi, > > Am 14.09.2018 um 12:26 schrieb Phil Elwell: >> Hi Stefan, >> >> On 14/09/2018 11:11, Stefan Wahren wrote: >>> Hi Phil, >>> >>> Am 12.09.2018 um 18:42 schrieb Phil Elwell: >>>> The ARM coprocessor registers include dcache line size, but there is no >>>> function to expose this value. Rather than create a new one, use the >>>> read_cpuid_id function to derive the correct value, which is 32 for >>>> BCM2835 and 64 for BCM2836/7. >>>> >>>> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >>>> --- >>>> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >>>> 1 file changed, 19 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>> index e767209..3537f60 100644 >>>> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>> @@ -42,6 +42,7 @@ >>>> #include <linux/uaccess.h> >>>> #include <linux/mm.h> >>>> #include <linux/of.h> >>>> +#include <asm/cputype.h> >>>> #include <soc/bcm2835/raspberrypi-firmware.h> >>>> >>>> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >>>> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >>>> * VPU firmware, which determines the required alignment of the >>>> * offsets/sizes in pagelists. >>>> * >>>> - * Modern VPU firmware looks for a DT "cache-line-size" property in >>>> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >>>> + * Previous VPU firmware looked for a DT "cache-line-size" property in >>>> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >>>> * which the kernel must then respect. That property was rejected >>>> - * upstream, so we have to use the VPU firmware's compatibility value >>>> - * of 32. >>>> + * upstream, so we now rely on both sides to "do the right thing" independently >>>> + * of the other. To improve backwards compatibility, this new behaviour is >>>> + * signalled to the firmware by the use of a corrected "reg" property on the >>>> + * relevant Device Tree node. >>>> */ >>>> -static unsigned int g_cache_line_size = 32; >>>> +static unsigned int g_cache_line_size; >>>> static unsigned int g_fragments_size; >>>> static char *g_fragments_base; >>>> static char *g_free_fragments; >>>> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >>>> if (err < 0) >>>> return err; >>>> >>>> + /* >>>> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >>>> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >>>> + * processor, hence the need for a runtime check. The dcache line size >>>> + * is encoded in one of the coprocessor registers, but there is no >>>> + * convenient way to access it short of embedded assembler, hence >>>> + * the use of read_cpuid_id(). The following test evaluates to true >>>> + * on a BCM2835 showing that it is ARMv6-ish, whereas >>>> + * cpu_architecture() will indicate that it is an ARMv7. >>>> + */ >>>> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; >>> is there a chance to use ARM_CPU_PART_ARM1176 or something else instead >>> of this magic numbers? >> One could. It's a trade-off between specific and human-readable vs. generic and less-so, >> and possibly moot given the next question. >> >>> Sorry, i cannot remember the whole discussion but why we can use >>> different compatible here instead of using ARM specific stuff. >>> >>> compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 >>> >>> compatible = "brcm,bcm2835"; // only for BCM2835 >> It wasn't an option you raised in the offline discussion. > > maybe this was too obvious :-( > >> Assuming you meant >> "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type >> of device if that is more acceptable. > > Yes, it's a different SoC. Do you see any problems with the VC4 firmware > according to this approach? No, it can be made to work. Expect V2 shortly. > >> >>>> g_fragments_size = 2 * g_cache_line_size; >>>> >>>> /* Allocate space for the channels in coherent memory */ >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > >
Am 14.09.2018 um 13:09 schrieb Phil Elwell: > On 14/09/2018 12:03, Stefan Wahren wrote: >> Hi, >> >> Am 14.09.2018 um 12:26 schrieb Phil Elwell: >>> Hi Stefan, >>> >>> On 14/09/2018 11:11, Stefan Wahren wrote: >>>> Hi Phil, >>>> >>>> Am 12.09.2018 um 18:42 schrieb Phil Elwell: >>>>> The ARM coprocessor registers include dcache line size, but there is no >>>>> function to expose this value. Rather than create a new one, use the >>>>> read_cpuid_id function to derive the correct value, which is 32 for >>>>> BCM2835 and 64 for BCM2836/7. >>>>> >>>>> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >>>>> --- >>>>> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >>>>> 1 file changed, 19 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>>> index e767209..3537f60 100644 >>>>> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>>> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>>> @@ -42,6 +42,7 @@ >>>>> #include <linux/uaccess.h> >>>>> #include <linux/mm.h> >>>>> #include <linux/of.h> >>>>> +#include <asm/cputype.h> >>>>> #include <soc/bcm2835/raspberrypi-firmware.h> >>>>> >>>>> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >>>>> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >>>>> * VPU firmware, which determines the required alignment of the >>>>> * offsets/sizes in pagelists. >>>>> * >>>>> - * Modern VPU firmware looks for a DT "cache-line-size" property in >>>>> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >>>>> + * Previous VPU firmware looked for a DT "cache-line-size" property in >>>>> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >>>>> * which the kernel must then respect. That property was rejected >>>>> - * upstream, so we have to use the VPU firmware's compatibility value >>>>> - * of 32. >>>>> + * upstream, so we now rely on both sides to "do the right thing" independently >>>>> + * of the other. To improve backwards compatibility, this new behaviour is >>>>> + * signalled to the firmware by the use of a corrected "reg" property on the >>>>> + * relevant Device Tree node. >>>>> */ >>>>> -static unsigned int g_cache_line_size = 32; >>>>> +static unsigned int g_cache_line_size; >>>>> static unsigned int g_fragments_size; >>>>> static char *g_fragments_base; >>>>> static char *g_free_fragments; >>>>> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >>>>> if (err < 0) >>>>> return err; >>>>> >>>>> + /* >>>>> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >>>>> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >>>>> + * processor, hence the need for a runtime check. The dcache line size >>>>> + * is encoded in one of the coprocessor registers, but there is no >>>>> + * convenient way to access it short of embedded assembler, hence >>>>> + * the use of read_cpuid_id(). The following test evaluates to true >>>>> + * on a BCM2835 showing that it is ARMv6-ish, whereas >>>>> + * cpu_architecture() will indicate that it is an ARMv7. >>>>> + */ >>>>> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; >>>> is there a chance to use ARM_CPU_PART_ARM1176 or something else instead >>>> of this magic numbers? >>> One could. It's a trade-off between specific and human-readable vs. generic and less-so, >>> and possibly moot given the next question. >>> >>>> Sorry, i cannot remember the whole discussion but why we can use >>>> different compatible here instead of using ARM specific stuff. >>>> >>>> compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 >>>> >>>> compatible = "brcm,bcm2835"; // only for BCM2835 >>> It wasn't an option you raised in the offline discussion. >> maybe this was too obvious :-( >> >>> Assuming you meant >>> "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type >>> of device if that is more acceptable. >> Yes, it's a different SoC. Do you see any problems with the VC4 firmware >> according to this approach? > No, it can be made to work. Expect V2 shortly. Great. I forgot a mailing list in my first comment. Please also add devel@driverdev.osuosl.org in CC. > >>>>> g_fragments_size = 2 * g_cache_line_size; >>>>> >>>>> /* Allocate space for the channels in coherent memory */ >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c index e767209..3537f60 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c @@ -42,6 +42,7 @@ #include <linux/uaccess.h> #include <linux/mm.h> #include <linux/of.h> +#include <asm/cputype.h> #include <soc/bcm2835/raspberrypi-firmware.h> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) @@ -81,13 +82,15 @@ static void __iomem *g_regs; * VPU firmware, which determines the required alignment of the * offsets/sizes in pagelists. * - * Modern VPU firmware looks for a DT "cache-line-size" property in - * the VCHIQ node and will overwrite it with the actual L2 cache size, + * Previous VPU firmware looked for a DT "cache-line-size" property in + * the VCHIQ node and would overwrite it with the actual L2 cache size, * which the kernel must then respect. That property was rejected - * upstream, so we have to use the VPU firmware's compatibility value - * of 32. + * upstream, so we now rely on both sides to "do the right thing" independently + * of the other. To improve backwards compatibility, this new behaviour is + * signalled to the firmware by the use of a corrected "reg" property on the + * relevant Device Tree node. */ -static unsigned int g_cache_line_size = 32; +static unsigned int g_cache_line_size; static unsigned int g_fragments_size; static char *g_fragments_base; static char *g_free_fragments; @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) if (err < 0) return err; + /* + * The tempting L1_CACHE_BYTES macro doesn't work in the case of + * a kernel built with bcm2835_defconfig running on a BCM2836/7 + * processor, hence the need for a runtime check. The dcache line size + * is encoded in one of the coprocessor registers, but there is no + * convenient way to access it short of embedded assembler, hence + * the use of read_cpuid_id(). The following test evaluates to true + * on a BCM2835 showing that it is ARMv6-ish, whereas + * cpu_architecture() will indicate that it is an ARMv7. + */ + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; g_fragments_size = 2 * g_cache_line_size; /* Allocate space for the channels in coherent memory */
The ARM coprocessor registers include dcache line size, but there is no function to expose this value. Rather than create a new one, use the read_cpuid_id function to derive the correct value, which is 32 for BCM2835 and 64 for BCM2836/7. Signed-off-by: Phil Elwell <phil@raspberrypi.org> --- .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)