Message ID | 20130218134707.17303.48589.sendpatchset@w520 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Monday 18 February 2013, Magnus Damm wrote: > #define PSTR_SHUTDOWN_MODE 3 > > -#define SH73A0_SCU_BASE IOMEM(0xf0000000) > +#define SH73A0_SCU_BASE 0xf0000000 > > #ifdef CONFIG_HAVE_ARM_TWD > static DEFINE_TWD_LOCAL_TIMER(twd_local_timer, SH73A0_SCU_BASE + 0x600, 29); > @@ -81,7 +81,7 @@ static void __init sh73a0_smp_prepare_cp > static void __init sh73a0_smp_init_cpus(void) > { > /* setup sh73a0 specific SCU base */ > - shmobile_scu_base = SH73A0_SCU_BASE; > + shmobile_scu_base = IOMEM(SH73A0_SCU_BASE); > > shmobile_smp_init_cpus(scu_get_core_count(shmobile_scu_base)); > } Ok, this gets rid of the warning, but I'm a bit worried about how it is hardwiring the fact that the SCU physical address has the same bit pattern as the __iomem token. While I realize that you already rely on this in a lot of places in the shmobile code, I see a red light going off every time I read code like this, and it is not any more logical than the previous version. It would be nice to keep these address spaces separate at least in new code, mostly in order to not confuse reviewers with code that is based on assumptions which are not generally true, but also to be more flexible with the virtual memory layout. On a related topic, you are using an entire 256 MB section of your virtual address space for sh73a0 and sh7372 and 160 MB for r8a7740. Putting less of that into the identity mapped area would free up space for vmalloc, but it's hard to prove that doing this is correct when you have all sorts of code using a hardcoded virtual MMIO address token. Arnd
On Monday 18 February 2013, Arnd Bergmann wrote: > Ok, this gets rid of the warning, but I'm a bit worried about > how it is hardwiring the fact that the SCU physical address has > the same bit pattern as the __iomem token. > > While I realize that you already rely on this in a lot of places > in the shmobile code, I see a red light going off every time I read > code like this, and it is not any more logical than the previous > version. > > It would be nice to keep these address spaces separate at least > in new code, mostly in order to not confuse reviewers with code > that is based on assumptions which are not generally true, but also > to be more flexible with the virtual memory layout. On a related > topic, you are using an entire 256 MB section of your virtual > address space for sh73a0 and sh7372 and 160 MB for r8a7740. Putting > less of that into the identity mapped area would free up space for > vmalloc, but it's hard to prove that doing this is correct when > you have all sorts of code using a hardcoded virtual MMIO address > token. To clarify my rant: I'm absolutely fine with this code going in for now, but I'd like to see a long-term plan about what to do with the hardcoded virtual address hacks. Arnd
On Mon, Feb 18, 2013 at 11:44 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Monday 18 February 2013, Arnd Bergmann wrote: >> Ok, this gets rid of the warning, but I'm a bit worried about >> how it is hardwiring the fact that the SCU physical address has >> the same bit pattern as the __iomem token. >> >> While I realize that you already rely on this in a lot of places >> in the shmobile code, I see a red light going off every time I read >> code like this, and it is not any more logical than the previous >> version. >> >> It would be nice to keep these address spaces separate at least >> in new code, mostly in order to not confuse reviewers with code >> that is based on assumptions which are not generally true, but also >> to be more flexible with the virtual memory layout. On a related >> topic, you are using an entire 256 MB section of your virtual >> address space for sh73a0 and sh7372 and 160 MB for r8a7740. Putting >> less of that into the identity mapped area would free up space for >> vmalloc, but it's hard to prove that doing this is correct when >> you have all sorts of code using a hardcoded virtual MMIO address >> token. > > To clarify my rant: I'm absolutely fine with this code going in > for now, but I'd like to see a long-term plan about what to do > with the hardcoded virtual address hacks. Thanks for the clarification. For mach-shmobile the three major components that rely on entity mapped memory maps are SMP, clocks and power domains. The clocks should really be moved in the common direction and I intend to get people to focus on that in the not too distant future (next 6 months). Power domains should be rather easy to convert. SMP tends to be a bit of a headache because last time I checked I couldn't use ioremap() at ->smp_init_cpus() time. What I recall is that ioremap() hanged instead of returning something. Anyway, if I track down the ioremap() issue, would it be possible for you to check if it can be reproduced on some other sub-architecture? Thanks, / magnus
On Monday 25 February 2013, Magnus Damm wrote: > For mach-shmobile the three major components that rely on entity > mapped memory maps are SMP, clocks and power domains. The clocks > should really be moved in the common direction and I intend to get > people to focus on that in the not too distant future (next 6 months). > Power domains should be rather easy to convert. SMP tends to be a bit > of a headache because last time I checked I couldn't use ioremap() at > ->smp_init_cpus() time. What I recall is that ioremap() hanged instead > of returning something. > > Anyway, if I track down the ioremap() issue, would it be possible for > you to check if it can be reproduced on some other sub-architecture? You are right that ioremap cannot be used from ->smp_init_cpus() and any code called from there needs to use a static mapping for accessing MMIO registers. There is nothing wrong with that. There are in fact three distinct reasons why people use static MMIO mappings with iotable_init(): 1. For MMIO registers that need to be accessed before ioremap works. This usually means the SMP startup and the early printk (which I believe shmobile is not using). 2. For getting hugetlb mappings of MMIO registers into the kernel address space. If you have a lot of registers in the same area, using a single TLB to map them is more efficient, even when accessing the registers through ioremap from a device driver. 3. For hardcoding the virtual address to a location that is passed to device drivers as compile-time constants. The first two are absolutely fine, there are no objections to those. The third one is tradtitionally used on a lot of the older platforms, but with the multiplatform work, we are moving away from it, towards passing resources in the platform device (ideally from DT, but that is an orthogonal question here). AFAICT, shmobile is the only "modern" platform that still relies on fixed virtual addresses, and it is the only one I know that uses a mapping where the virtual address equals the physical address. Arnd
On Tue, Feb 26, 2013 at 7:18 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Monday 25 February 2013, Magnus Damm wrote: >> For mach-shmobile the three major components that rely on entity >> mapped memory maps are SMP, clocks and power domains. The clocks >> should really be moved in the common direction and I intend to get >> people to focus on that in the not too distant future (next 6 months). >> Power domains should be rather easy to convert. SMP tends to be a bit >> of a headache because last time I checked I couldn't use ioremap() at >> ->smp_init_cpus() time. What I recall is that ioremap() hanged instead >> of returning something. >> >> Anyway, if I track down the ioremap() issue, would it be possible for >> you to check if it can be reproduced on some other sub-architecture? > > You are right that ioremap cannot be used from ->smp_init_cpus() and any > code called from there needs to use a static mapping for accessing > MMIO registers. There is nothing wrong with that. There are in fact > three distinct reasons why people use static MMIO mappings with > iotable_init(): > > 1. For MMIO registers that need to be accessed before ioremap works. > This usually means the SMP startup and the early printk (which I > believe shmobile is not using). Thanks for describing these. Is there any particular reason why SMP startup needs to happen earlier than ioremap() is available? From a hardware point of view on Cortex-A9 the SCU needs to be enabled and the number of available cores need to be determined. The SCU enabling can probably happen later and the number of cores are already limited to the kernel configuration maximum number of cores setting, so it should be possible to use that to size any early per-cpu variables if needed. So I wonder why we're not enabling SMP later than we actually do? Using maxcpus=1 and late CPU hotplug from user space is certainly working fine. Regarding early printk, you are correct that we're not using that ARM specific debug output. Instead we are relying on earlyprintk via early platform devices. This way we are not only multi-soc and multi-subarch already, we are also multi-arch. For really early console output we rely on the clocks and pin function being initialized by the boot loader and we also require 1:1 entity mappings so we can use printouts before ioremap() is functional. So yes, we like using 1:1 virt-phys memory maps for early printouts. We do not use early printk with DT at this point. If we would be able to move the SMP init later then perhaps we could debug SMP issues with serial ports described by DT in the future? > 2. For getting hugetlb mappings of MMIO registers into the kernel > address space. If you have a lot of registers in the same area, > using a single TLB to map them is more efficient, even when > accessing the registers through ioremap from a device driver. Sure. > 3. For hardcoding the virtual address to a location that is passed > to device drivers as compile-time constants. > > The first two are absolutely fine, there are no objections to those. Ok. As you probably can tell by now - I would like to get rid of the SMP case if possible. > The third one is tradtitionally used on a lot of the older platforms, > but with the multiplatform work, we are moving away from it, towards > passing resources in the platform device (ideally from DT, but that > is an orthogonal question here). AFAICT, shmobile is the only "modern" > platform that still relies on fixed virtual addresses, and it is the > only one I know that uses a mapping where the virtual address equals > the physical address. The 1:1 mapping is deliberately chosen to be simple. So in the case when people do register I/O without ioremap() then at least we can look up the address in the data sheet. I've seen too many examples of people not using ioremap and instead inventing their own magic mapping table with undocumented hard coded address that map to something even more unknown. Of course we should be aiming at using ioremap(). If we for some reason can't then we should use 1:1 mappings. While I agree to move more towards using ioremap(), I can't really see how this affects our multiplatform situation. Our device drivers have always been using the driver model and we do never export any virtual addresses in any header files. If you have any particular area that you think needs work related to ioremap() then perhaps we can get together on next conference and talk it through? As I mentioned before, from my point of view the main limiting factor for mach-shmobile multiplatform at this point is the clock framework. The SH clock framework does already support ioremap() though, so it is just a matter of making the clock code actually use it. And while we're doing that we may as well solve the multiplatform issue to and move towards common clocks. Thanks, / magnus
On Tuesday 26 February 2013, Magnus Damm wrote: > On Tue, Feb 26, 2013 at 7:18 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Monday 25 February 2013, Magnus Damm wrote: > > You are right that ioremap cannot be used from ->smp_init_cpus() and any > > code called from there needs to use a static mapping for accessing > > MMIO registers. There is nothing wrong with that. There are in fact > > three distinct reasons why people use static MMIO mappings with > > iotable_init(): > > > > 1. For MMIO registers that need to be accessed before ioremap works. > > This usually means the SMP startup and the early printk (which I > > believe shmobile is not using). > > Thanks for describing these. > > Is there any particular reason why SMP startup needs to happen earlier > than ioremap() is available? I think it's mostly traditional reason I think. > From a hardware point of view on Cortex-A9 the SCU needs to be enabled > and the number of available cores need to be determined. The SCU > enabling can probably happen later and the number of cores are already > limited to the kernel configuration maximum number of cores setting, > so it should be possible to use that to size any early per-cpu > variables if needed. So I wonder why we're not enabling SMP later than > we actually do? Using maxcpus=1 and late CPU hotplug from user space > is certainly working fine. AFAIK, on Cortex-A15 we already rely on getting the number of cores from the device tree, which is also available at the right time, without the need for an early mapping. It would not be hard to do the same on Cortex-A9. Then again, the static mapping there does not do harm as I said. > Regarding early printk, you are correct that we're not using that ARM > specific debug output. Instead we are relying on earlyprintk via early > platform devices. This way we are not only multi-soc and multi-subarch > already, we are also multi-arch. For really early console output we > rely on the clocks and pin function being initialized by the boot > loader and we also require 1:1 entity mappings so we can use printouts > before ioremap() is functional. So yes, we like using 1:1 virt-phys > memory maps for early printouts. Ok. > We do not use early printk with DT at this point. If we would be able > to move the SMP init later then perhaps we could debug SMP issues with > serial ports described by DT in the future? > > > 3. For hardcoding the virtual address to a location that is passed > > to device drivers as compile-time constants. > > > > The first two are absolutely fine, there are no objections to those. > > Ok. As you probably can tell by now - I would like to get rid of the > SMP case if possible. I would certainly welcome a patch that moves the SMP initialization to a later point. I'm not sure if it requires changes to architecture independent code, but it does sound like a good idea. > > The third one is tradtitionally used on a lot of the older platforms, > > but with the multiplatform work, we are moving away from it, towards > > passing resources in the platform device (ideally from DT, but that > > is an orthogonal question here). AFAICT, shmobile is the only "modern" > > platform that still relies on fixed virtual addresses, and it is the > > only one I know that uses a mapping where the virtual address equals > > the physical address. > > The 1:1 mapping is deliberately chosen to be simple. So in the case > when people do register I/O without ioremap() then at least we can > look up the address in the data sheet. I've seen too many examples of > people not using ioremap and instead inventing their own magic mapping > table with undocumented hard coded address that map to something even > more unknown. Of course we should be aiming at using ioremap(). If we > for some reason can't then we should use 1:1 mappings. Well, I would argue that when someone doesn't understand the basic interfaces we expose to device drivers, they probably shouldn't be writing kernel code. ;) > While I agree to move more towards using ioremap(), I can't really see > how this affects our multiplatform situation. Our device drivers have > always been using the driver model and we do never export any virtual > addresses in any header files. If you have any particular area that > you think needs work related to ioremap() then perhaps we can get > together on next conference and talk it through? It may not be as bad as I thought. I know that at least the intc controller is fundamentally built around this assumption (I tried changing it, and that didn't end well), but that may be the only one, following the recent cleanup of the pfc driver. The main worry is probably that people will take the platform code as example when writing device drivers, and that uses hardcoded IOMEM() macros. There are probably a couple of instances where that is the best solution, but for those, I would suggest using offsets from a base register that gets passed into iotable_init() rather than literal numbers. > As I mentioned before, from my point of view the main limiting factor > for mach-shmobile multiplatform at this point is the clock framework. > The SH clock framework does already support ioremap() though, so it is > just a matter of making the clock code actually use it. And while > we're doing that we may as well solve the multiplatform issue to and > move towards common clocks. Ok, good to hear. Arnd
Hi Arnd, On Wed, Feb 27, 2013 at 1:12 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 26 February 2013, Magnus Damm wrote: >> On Tue, Feb 26, 2013 at 7:18 PM, Arnd Bergmann <arnd@arndb.de> wrote: >> > On Monday 25 February 2013, Magnus Damm wrote: >> > You are right that ioremap cannot be used from ->smp_init_cpus() and any >> > code called from there needs to use a static mapping for accessing >> > MMIO registers. There is nothing wrong with that. There are in fact >> > three distinct reasons why people use static MMIO mappings with >> > iotable_init(): >> > >> > 1. For MMIO registers that need to be accessed before ioremap works. >> > This usually means the SMP startup and the early printk (which I >> > believe shmobile is not using). >> >> Thanks for describing these. >> >> Is there any particular reason why SMP startup needs to happen earlier >> than ioremap() is available? > > I think it's mostly traditional reason I think. I think so too. >> From a hardware point of view on Cortex-A9 the SCU needs to be enabled >> and the number of available cores need to be determined. The SCU >> enabling can probably happen later and the number of cores are already >> limited to the kernel configuration maximum number of cores setting, >> so it should be possible to use that to size any early per-cpu >> variables if needed. So I wonder why we're not enabling SMP later than >> we actually do? Using maxcpus=1 and late CPU hotplug from user space >> is certainly working fine. > > AFAIK, on Cortex-A15 we already rely on getting the number of cores from > the device tree, which is also available at the right time, without > the need for an early mapping. It would not be hard to do the same > on Cortex-A9. Then again, the static mapping there does not do harm > as I said. I understand that you feel that static mapping in the case of SMP is acceptable. >> Regarding early printk, you are correct that we're not using that ARM >> specific debug output. Instead we are relying on earlyprintk via early >> platform devices. This way we are not only multi-soc and multi-subarch >> already, we are also multi-arch. For really early console output we >> rely on the clocks and pin function being initialized by the boot >> loader and we also require 1:1 entity mappings so we can use printouts >> before ioremap() is functional. So yes, we like using 1:1 virt-phys >> memory maps for early printouts. > > Ok. > >> We do not use early printk with DT at this point. If we would be able >> to move the SMP init later then perhaps we could debug SMP issues with >> serial ports described by DT in the future? >> >> > 3. For hardcoding the virtual address to a location that is passed >> > to device drivers as compile-time constants. >> > >> > The first two are absolutely fine, there are no objections to those. >> >> Ok. As you probably can tell by now - I would like to get rid of the >> SMP case if possible. > > I would certainly welcome a patch that moves the SMP initialization > to a later point. I'm not sure if it requires changes to architecture > independent code, but it does sound like a good idea. Good to hear that that this may be a move in the right direction! >> > The third one is tradtitionally used on a lot of the older platforms, >> > but with the multiplatform work, we are moving away from it, towards >> > passing resources in the platform device (ideally from DT, but that >> > is an orthogonal question here). AFAICT, shmobile is the only "modern" >> > platform that still relies on fixed virtual addresses, and it is the >> > only one I know that uses a mapping where the virtual address equals >> > the physical address. >> >> The 1:1 mapping is deliberately chosen to be simple. So in the case >> when people do register I/O without ioremap() then at least we can >> look up the address in the data sheet. I've seen too many examples of >> people not using ioremap and instead inventing their own magic mapping >> table with undocumented hard coded address that map to something even >> more unknown. Of course we should be aiming at using ioremap(). If we >> for some reason can't then we should use 1:1 mappings. > > Well, I would argue that when someone doesn't understand the basic > interfaces we expose to device drivers, they probably shouldn't > be writing kernel code. ;) I am not sure how this is related to device drivers actually. The example I was thinking about was snapshot-style development on some ancient kernel version. In that case the developers simply seemed to follow the at-that-point common coding style in the ARM architecture. I am happy to see that the ARM architecture code is getting cleaner bit by bit. >> While I agree to move more towards using ioremap(), I can't really see >> how this affects our multiplatform situation. Our device drivers have >> always been using the driver model and we do never export any virtual >> addresses in any header files. If you have any particular area that >> you think needs work related to ioremap() then perhaps we can get >> together on next conference and talk it through? > > It may not be as bad as I thought. I know that at least the intc > controller is fundamentally built around this assumption (I tried changing > it, and that didn't end well), but that may be the only one, following > the recent cleanup of the pfc driver. Uhm, I am not sure where you got that idea about the INTC driver. Allow me to clarify. Regarding the shared INTC code base I recall implementing ioremap() support there 2010, feel free to search the archives for "[PATCH] sh: INTC ioremap support V2". As for actual SoC support, this varies with interrupt controller and SoC. It is basically a matter of if I/O memory windows are passed to the INTC driver or not. A typical example would be sh7372 that has two interrupt controllers: INTCA and INTCS. In intc-sh7372.c you have the following: INTCA (no resources - using the 0xe6xxxxxx 1:1 mapping): static DECLARE_INTC_DESC(intca_desc, "sh7372-intca", intca_vectors, intca_groups, intca_mask_registers, intca_prio_registers, NULL); INTCS (resources - relies on ioremap()): static struct resource intcs_resources[] __initdata = { [0] = { .start = 0xffd20000, .end = 0xffd201ff, .flags = IORESOURCE_MEM, }, [1] = { .start = 0xffd50000, .end = 0xffd501ff, .flags = IORESOURCE_MEM, } }; static struct intc_desc intcs_desc __initdata = { .name = "sh7372-intcs", .force_enable = ENABLED_INTCS, .skip_syscore_suspend = true, .resource = intcs_resources, .num_resources = ARRAY_SIZE(intcs_resources), .hw = INTC_HW_DESC(intcs_vectors, intcs_groups, intcs_mask_registers, intcs_prio_registers, NULL, NULL), }; Adding I/O memory resources to the already existing INTC controllers is not a particularly difficult task. Would you like us to perform such a change? > The main worry is probably that people will take the platform code > as example when writing device drivers, and that uses hardcoded > IOMEM() macros. There are probably a couple of instances where that > is the best solution, but for those, I would suggest using offsets > from a base register that gets passed into iotable_init() rather > than literal numbers. If you look at all our regular device drivers we use a base register + offset. In such cases that kind of design makes a lot of sense. In the case of INTC and PFC we do not follow this style. This since each version of the hardware block varies quite a bit and there often are a couple of I/O memory windows associated with each hardware block instance. So the design has been to use the same physical addresses as are described in the data sheet, and for bit fields and register width follow the same type of representation as the data sheet to allow easy development and validation. Keep in mind that in arch/sh and arch/arm we have over 30 different variants of INTC hardware blocks. So to summarize, INTC, PFC and CPG (clocks) have ioremap support included in the actual driver code. If the SoC makes use of it or not is a different question. =) >> As I mentioned before, from my point of view the main limiting factor >> for mach-shmobile multiplatform at this point is the clock framework. >> The SH clock framework does already support ioremap() though, so it is >> just a matter of making the clock code actually use it. And while >> we're doing that we may as well solve the multiplatform issue to and >> move towards common clocks. > > Ok, good to hear. So how would you like to proceed with this matter? Thanks, / magnus
--- 0001/arch/arm/mach-shmobile/smp-sh73a0.c +++ work/arch/arm/mach-shmobile/smp-sh73a0.c 2013-02-18 16:24:05.000000000 +0900 @@ -39,7 +39,7 @@ #define PSTR_SHUTDOWN_MODE 3 -#define SH73A0_SCU_BASE IOMEM(0xf0000000) +#define SH73A0_SCU_BASE 0xf0000000 #ifdef CONFIG_HAVE_ARM_TWD static DEFINE_TWD_LOCAL_TIMER(twd_local_timer, SH73A0_SCU_BASE + 0x600, 29); @@ -81,7 +81,7 @@ static void __init sh73a0_smp_prepare_cp static void __init sh73a0_smp_init_cpus(void) { /* setup sh73a0 specific SCU base */ - shmobile_scu_base = SH73A0_SCU_BASE; + shmobile_scu_base = IOMEM(SH73A0_SCU_BASE); shmobile_smp_init_cpus(scu_get_core_count(shmobile_scu_base)); }