Message ID | 39ad5c69533ef537e6ab0426efc057f9064ee581.1464148904.git.dalias@libc.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 25, 2016 at 05:43:03AM +0000, Rich Felker wrote: > Signed-off-by: Rich Felker <dalias@libc.org> > --- > Documentation/devicetree/bindings/jcore/cpus.txt | 92 ++++++++++++++++++++++++ > 1 file changed, 92 insertions(+) > create mode 100644 Documentation/devicetree/bindings/jcore/cpus.txt > > diff --git a/Documentation/devicetree/bindings/jcore/cpus.txt b/Documentation/devicetree/bindings/jcore/cpus.txt > new file mode 100644 > index 0000000..9d77ec1 > --- /dev/null > +++ b/Documentation/devicetree/bindings/jcore/cpus.txt > @@ -0,0 +1,92 @@ > +=================== > +J-Core cpu bindings > +=================== > + > +The J-Core processors are open source CPU cores that can be built as FPGA > +soft cores or ASICs. The device tree is also responsible for describing the > +cache controls and, for SMP configurations, all details of the SMP method, > +as documented below. > + > + > +--------------------- > +Top-level "cpus" node > +--------------------- > + > +Required properties: > + > +- #address-cells: Must be 1. > + > +- #size-cells: Must be 0. > + > +Optional properties: > + > +- enable-method: Required only for SMP systems. If present, must be > + "jcore,spin-table". > + > + > +-------------------- > +Individual cpu nodes > +-------------------- > + > +Required properties: > + > +- device_type: Must be "cpu". > + > +- compatible: Must be "jcore,j2". > + > +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based > + hardware cpu id on SMP systems. > + > +Optional properties: > + > +- clock-frequency: Clock frequency of the cpu in Hz. > + > +- cpu-release-addr: Necessary only for secondary processors on SMP systems > + using the "jcore,spin-table" enable method. If present, must consist of > + two cells containing physical addresses. The first cell contains an > + address which, when written, unblocks the secondary cpu. The second cell > + contains an address from which the cpu will read its initial program > + counter when unblocked. I take it this follows the example of the arm64 spin-table rather than the ePAPR spin-table, given the lack of an ePAPR reference or struct definition. From my experience with the arm64 spin-table, I would *strongly* recommend that you tighten this up, and define things more thoroughly, before a variety of FW/bootloader implementations appear. e.g. * What initial/invalid value should the location contain? Is zero a valid address that you might want to jump a secondary CPU to? * How must the value be written? - Which endianness? - With a single store? Or is there a more involved sequence to prevent the secondary CPU from seeing a torn value? * Must CPUs have a unique cpu-release-addr? I would *strongly* recommend that they do. - Is any minimal padding required around cpu-release-addrs? e.g. can FW or bootlaoder put data in the same cacheline-aligned region and exepct the OS not to corrupt this? * How should the OS map the region (i.e. which MMU/cache attributes)? - Is the address permitted to be a device register? * Where can this memory live? - Should it be abesnt from any memory node? To which padding? - Should the memory be described with a memreserve? If so, what are your architecture-specific memreserve semantics (i.e. which MMU/cache attributes are permitted for a memreserve'd region)? * What state should the CPU be in when it branches to the provided address? - Must the MMU be off? - What state must any cache be in? Should FW perform any implementation defined coherency and cache management prior to branching? - Must the CPU be in a particular endianness? - Which exceptions must be masked? - Are interrupts permitted to be pending? - Should debug logic (e.g. breakpoint and watchpoints) be placed into a quiescent state? - Should any registers be programmed with specific values? At some point, you are likely to want CPU hotplug and/or cpuidle. We didn't provision the arm64 spin-table for either of these and never extended it, but you may want to put in place some discoverability now to allow future OSs to use that new support while allowing current OSs to retain functional (e.g. not requiring a new enable-method string). > +--------------------- > +Cache controller node > +--------------------- > + > +Required properties: > + > +- compatible: Must be "jcore,cache". > + > +- reg: A memory range for the cache controller registers. There is a well-defined memory map for the cache controller? If so, please refer to documentation for it here (either in this section, or the top of this document if common with other elements described herein). > +-------- > +IPI node > +-------- > + > +Device trees for SMP systems must have an IPI node representing the mechanism > +used for inter-processor interrupt generation. > + > +Required properties: > + > +- compatible: Must be "jcore,ipi-controller". > + > +- reg: A memory range used to IPI generation. > + > +- interrupts: An irq on which IPI will be received. Likewise. > +---------- > +CPUID node > +---------- > + > +Device trees for SMP systems must have a CPUID node representing the mechanism > +used to identify the current processor on which execution is taking place. > + > +Required properties: > + > +- compatible: Must be "jcore,cpuid-mmio". > + > +- reg: A memory range containing a single 32-bit mmio register which produces > + the current cpu id (matching the "reg" property of the cpu performing the > + read) when read. Likewise. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote: > > +Optional properties: > > + > > +- enable-method: Required only for SMP systems. If present, must be > > + "jcore,spin-table". > > + > > + > > +-------------------- > > +Individual cpu nodes > > +-------------------- > > + > > +Required properties: > > + > > +- device_type: Must be "cpu". > > + > > +- compatible: Must be "jcore,j2". > > + > > +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based > > + hardware cpu id on SMP systems. > > + > > +Optional properties: > > + > > +- clock-frequency: Clock frequency of the cpu in Hz. > > + > > +- cpu-release-addr: Necessary only for secondary processors on SMP systems > > + using the "jcore,spin-table" enable method. If present, must consist of > > + two cells containing physical addresses. The first cell contains an > > + address which, when written, unblocks the secondary cpu. The second cell > > + contains an address from which the cpu will read its initial program > > + counter when unblocked. > > I take it this follows the example of the arm64 spin-table rather than > the ePAPR spin-table, given the lack of an ePAPR reference or struct > definition. Indeed, I wasn't aware of the ePAPR spec for it. Would you prefer that we use a different name or something? > From my experience with the arm64 spin-table, I would *strongly* > recommend that you tighten this up, and define things more thoroughly, > before a variety of FW/bootloader implementations appear. e.g. > > * What initial/invalid value should the location contain? Is zero a > valid address that you might want to jump a secondary CPU to? I believe the hardware is implemented such that just the appearance of the address for write on the bus unblocks the secondary cpu waiting on it. As for the address to jump to, this is provided by the kernel and, for Linux purposes, is always _stext. As I understand the mechanism, the actual initial PC for secondary cpus is in rom, and the code there is responsible for loading the application-desired (i.e. kernel-desired) initial PC and jumping to it. > * How must the value be written? > - Which endianness? CPU native. > - With a single store? Or is there a more involved sequence to prevent > the secondary CPU from seeing a torn value? The start address is just a physical ram address (internal sram) and how it's written does not matter, because it's only read once the release write occurs. > * Must CPUs have a unique cpu-release-addr? I would *strongly* recommend > that they do. There is currently no spec or implementation with more than one secondary cpu. > - Is any minimal padding required around cpu-release-addrs? e.g. can > FW or bootlaoder put data in the same cacheline-aligned region and > exepct the OS not to corrupt this? The word-sized memory (for J2, 32-bit) at the address is what's being addressed. There is no implicit license for the kernel to clobber other nearby data. > * How should the OS map the region (i.e. which MMU/cache attributes)? For J2, there is no mmu. All these specs need extension for future models with mmu, because the properties of the mmu should be described in the DT. > - Is the address permitted to be a device register? I don't see a strong reason to disallow it. Do you? > * Where can this memory live? > - Should it be abesnt from any memory node? To which padding? > - Should the memory be described with a memreserve? If so, what are > your architecture-specific memreserve semantics (i.e. which > MMU/cache attributes are permitted for a memreserve'd region)? If it's in the memory regions exposed by the DT to the kernel, it should be reserved, I think. In practice it's not. > * What state should the CPU be in when it branches to the provided > address? > - Must the MMU be off? Current models are nommu. > - What state must any cache be in? > Should FW perform any implementation defined coherency and cache > management prior to branching? The current dcache implementation is fully coherent, but we want to relax that If this changes the hw/fw should ensure that the secondary cpu being started does not see stale dcache. The icache requires explicit flush so secondary should start with it off (as it does now) or ensure that it's flushed before jumping to kernel-provided start address. > - Must the CPU be in a particular endianness? There are not any switchable-endian J-Core cpus. If such a thing is added it would make sense to describe this. > - Which exceptions must be masked? In practice I think it's the same as how cpu0 starts. I suspect that's with everything masked, but I'm not sure right off. > - Are interrupts permitted to be pending? In practice they won't be for current implementations, but I don't have an answer that can forsee the future. > - Should debug logic (e.g. breakpoint and watchpoints) be placed into > a quiescent state? This depends on having a model that has these features. > - Should any registers be programmed with specific values? No, beyond perhaps things like control registers (IMASK). > At some point, you are likely to want CPU hotplug and/or cpuidle. We > didn't provision the arm64 spin-table for either of these and never > extended it, but you may want to put in place some discoverability now > to allow future OSs to use that new support while allowing current OSs > to retain functional (e.g. not requiring a new enable-method string). > > > +--------------------- > > +Cache controller node > > +--------------------- > > + > > +Required properties: > > + > > +- compatible: Must be "jcore,cache". > > + > > +- reg: A memory range for the cache controller registers. > > There is a well-defined memory map for the cache controller? > > If so, please refer to documentation for it here (either in this > section, or the top of this document if common with other elements > described herein). The current version "jcore,cache" has a single 32-bit control register per cpu that can be used to enable/disable/flush icache and/or dcache. There is no finer-grained control. If/when we do larger caches in the future where it makes sense, there will be a new binding for it. (For example it may make sense to do one that matches the original SH memory-mapped cache interface.) > > +-------- > > +IPI node > > +-------- > > + > > +Device trees for SMP systems must have an IPI node representing the mechanism > > +used for inter-processor interrupt generation. > > + > > +Required properties: > > + > > +- compatible: Must be "jcore,ipi-controller". > > + > > +- reg: A memory range used to IPI generation. > > + > > +- interrupts: An irq on which IPI will be received. > > Likewise. It's the same (actually even the same memory range, though I didn't see a sense in requiring that; there's also an IPI-generate bit). > > +---------- > > +CPUID node > > +---------- > > + > > +Device trees for SMP systems must have a CPUID node representing the mechanism > > +used to identify the current processor on which execution is taking place. > > + > > +Required properties: > > + > > +- compatible: Must be "jcore,cpuid-mmio". > > + > > +- reg: A memory range containing a single 32-bit mmio register which produces > > + the current cpu id (matching the "reg" property of the cpu performing the > > + read) when read. > > Likewise. One general question I have about all of your comments -- is the DT binding file really supposed to amount to a hardware programming manual, fully specifying all of the programming interfaces? I don't see that in other binding files, and it feels like what you're asking for is beyond the scope of just specifying the bindings. If you really do want a lot more detail for SMP-related bindings, I could consider submitting a version with SMP omitted for now (since the kernel patches submitted at this point don't include SMP) and do the addition of SMP as a separate patch later. But with the launch of open-hardware boards capable of running SMP J2 systems (see https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like to be getting bindings we can use stabilized so that we're properly including DTB in the boot rom and not relying on external DTB files or linking DTB in kernel. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Rich, On Thu, May 26, 2016 at 1:04 AM, Rich Felker <dalias@libc.org> wrote: > If you really do want a lot more detail for SMP-related bindings, I > could consider submitting a version with SMP omitted for now (since > the kernel patches submitted at this point don't include SMP) and do > the addition of SMP as a separate patch later. But with the launch of > open-hardware boards capable of running SMP J2 systems (see > https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like > to be getting bindings we can use stabilized so that we're properly > including DTB in the boot rom and not relying on external DTB files or > linking DTB in kernel. Submitting a version now without SMP is indeed a good idea, and allows to move forward. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 25, 2016 at 07:04:08PM -0400, Rich Felker wrote: > On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote: > > > +Optional properties: > > > + > > > +- enable-method: Required only for SMP systems. If present, must be > > > + "jcore,spin-table". > > > + > > > + > > > +-------------------- > > > +Individual cpu nodes > > > +-------------------- > > > + > > > +Required properties: > > > + > > > +- device_type: Must be "cpu". > > > + > > > +- compatible: Must be "jcore,j2". > > > + > > > +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based > > > + hardware cpu id on SMP systems. > > > + > > > +Optional properties: > > > + > > > +- clock-frequency: Clock frequency of the cpu in Hz. > > > + > > > +- cpu-release-addr: Necessary only for secondary processors on SMP systems > > > + using the "jcore,spin-table" enable method. If present, must consist of > > > + two cells containing physical addresses. The first cell contains an > > > + address which, when written, unblocks the secondary cpu. The second cell > > > + contains an address from which the cpu will read its initial program > > > + counter when unblocked. > > > > I take it this follows the example of the arm64 spin-table rather than > > the ePAPR spin-table, given the lack of an ePAPR reference or struct > > definition. > > Indeed, I wasn't aware of the ePAPR spec for it. Would you prefer that > we use a different name or something? No, the "jcore,spin-table" name is fine. > > From my experience with the arm64 spin-table, I would *strongly* > > recommend that you tighten this up, and define things more thoroughly, > > before a variety of FW/bootloader implementations appear. e.g. > > > > * What initial/invalid value should the location contain? Is zero a > > valid address that you might want to jump a secondary CPU to? > > I believe the hardware is implemented such that just the appearance of > the address for write on the bus unblocks the secondary cpu waiting on > it. Ok, so this is effectively a device register, rather than a location in "real" memory. > As for the address to jump to, this is provided by the kernel and, > for Linux purposes, is always _stext. As I understand the mechanism, > the actual initial PC for secondary cpus is in rom, and the code there > is responsible for loading the application-desired (i.e. > kernel-desired) initial PC and jumping to it. Ok. Is this second address also a device register, or might this be in "real" memory? > > * How must the value be written? > > - Which endianness? > > CPU native. Ok. I take it that a CPU's endianness cannot be switched onthe fly, then? Or does the hardware backing the release-addr register handle arbitrary endianness dynamically? If you want to reuse the same HW block across configurations where CPU endianness differs, it may make sense to define an endianness regardless, to simplify integration concerns. > > - With a single store? Or is there a more involved sequence to prevent > > the secondary CPU from seeing a torn value? > > The start address is just a physical ram address (internal sram) and > how it's written does not matter, because it's only read once the > release write occurs. Sure. I had initially mis-read the documentation and applied my understanding of the arm64 spin-table sequence (which only has a single write for both purposes). For the actual release write are there any constraints? e.g. value, size of access? > > * Must CPUs have a unique cpu-release-addr? I would *strongly* recommend > > that they do. > > There is currently no spec or implementation with more than one > secondary cpu. Ok. Please bear the above in mind if/when implementations with more than two secondary CPUs are conceivable. > > - Is any minimal padding required around cpu-release-addrs? e.g. can > > FW or bootlaoder put data in the same cacheline-aligned region and > > exepct the OS not to corrupt this? > > The word-sized memory (for J2, 32-bit) at the address is what's being > addressed. There is no implicit license for the kernel to clobber > other nearby data. My concern was that if your memory system is not fully coherent, and the CPU has a cacheable mapping of the initial program counter field, it would need to perform a cache clean to ensure visibility of that field. If the cache line for that region were stale, that would clobber data in the same cache line (e.g. something owned/used by FW). Per your comments below that doesn't matter now but may in future. > > * How should the OS map the region (i.e. which MMU/cache attributes)? > > For J2, there is no mmu. All these specs need extension for future > models with mmu, because the properties of the mmu should be described > in the DT. I was going by the fact you had a binding for a cache, which I assumed was SW configurable. If that's not the case, then my questions about caches and MMU attributes do not apply for the timebeing. > > - Is the address permitted to be a device register? > > I don't see a strong reason to disallow it. Do you? So long as you can guarantee that OS does not have a cacheable mapping of this region, and the size of the access wis well-defined, I do not see a reason to disallow it. > > * Where can this memory live? > > - Should it be abesnt from any memory node? To which padding? > > - Should the memory be described with a memreserve? If so, what are > > your architecture-specific memreserve semantics (i.e. which > > MMU/cache attributes are permitted for a memreserve'd region)? > > If it's in the memory regions exposed by the DT to the kernel, it > should be reserved, I think. In practice it's not. Ok. This should be documented (as we do for the arm64 spin-table). Perhaps that is not a major problem if the OS never pokes the release register. If you do /memreserve/ the region rather than carving it out of memory nodes, you will also need to define semantics for memreserve. Typically memreserve meaans that the OS should not perform any stores to the region, but is permitted to map it with some architecture-specific cacheable attributes. > > * What state should the CPU be in when it branches to the provided > > address? > > - Must the MMU be off? > > Current models are nommu. > > > - What state must any cache be in? > > Should FW perform any implementation defined coherency and cache > > management prior to branching? > > The current dcache implementation is fully coherent, but we want to > relax that If this changes the hw/fw should ensure that the secondary > cpu being started does not see stale dcache. Ok. > The icache requires explicit flush so secondary should start with it > off (as it does now) or ensure that it's flushed before jumping to > kernel-provided start address. > > > - Must the CPU be in a particular endianness? > > There are not any switchable-endian J-Core cpus. If such a thing is > added it would make sense to describe this. > > > - Which exceptions must be masked? > > In practice I think it's the same as how cpu0 starts. I suspect that's > with everything masked, but I'm not sure right off. > > > - Are interrupts permitted to be pending? > > In practice they won't be for current implementations, but I don't > have an answer that can forsee the future. > > > - Should debug logic (e.g. breakpoint and watchpoints) be placed into > > a quiescent state? > > This depends on having a model that has these features. > > > - Should any registers be programmed with specific values? > > No, beyond perhaps things like control registers (IMASK). > > > At some point, you are likely to want CPU hotplug and/or cpuidle. We > > didn't provision the arm64 spin-table for either of these and never > > extended it, but you may want to put in place some discoverability now > > to allow future OSs to use that new support while allowing current OSs > > to retain functional (e.g. not requiring a new enable-method string). > > > > > +--------------------- > > > +Cache controller node > > > +--------------------- > > > + > > > +Required properties: > > > + > > > +- compatible: Must be "jcore,cache". > > > + > > > +- reg: A memory range for the cache controller registers. > > > > There is a well-defined memory map for the cache controller? > > > > If so, please refer to documentation for it here (either in this > > section, or the top of this document if common with other elements > > described herein). > > The current version "jcore,cache" has a single 32-bit control register > per cpu that can be used to enable/disable/flush icache and/or dcache. > There is no finer-grained control. If/when we do larger caches in the > future where it makes sense, there will be a new binding for it. (For > example it may make sense to do one that matches the original SH > memory-mapped cache interface.) Ok, this is simpler than I had anticipated. > > > +-------- > > > +IPI node > > > +-------- > > > + > > > +Device trees for SMP systems must have an IPI node representing the mechanism > > > +used for inter-processor interrupt generation. > > > + > > > +Required properties: > > > + > > > +- compatible: Must be "jcore,ipi-controller". > > > + > > > +- reg: A memory range used to IPI generation. > > > + > > > +- interrupts: An irq on which IPI will be received. > > > > Likewise. > > It's the same (actually even the same memory range, though I didn't > see a sense in requiring that; there's also an IPI-generate bit). > > > > +---------- > > > +CPUID node > > > +---------- > > > + > > > +Device trees for SMP systems must have a CPUID node representing the mechanism > > > +used to identify the current processor on which execution is taking place. > > > + > > > +Required properties: > > > + > > > +- compatible: Must be "jcore,cpuid-mmio". > > > + > > > +- reg: A memory range containing a single 32-bit mmio register which produces > > > + the current cpu id (matching the "reg" property of the cpu performing the > > > + read) when read. > > > > Likewise. > > One general question I have about all of your comments -- is the DT > binding file really supposed to amount to a hardware programming > manual, fully specifying all of the programming interfaces? I don't > see that in other binding files, and it feels like what you're asking > for is beyond the scope of just specifying the bindings. The binding file is not intended to be a full HW description, but where possible relevant documentation should be referred to, in case there is some ambiguity. My questions about SMP are largely orthogonal to DT; I simply have experience in dealing with that for arm64, and am aware of some of the pain points that were not immediately obvious. > If you really do want a lot more detail for SMP-related bindings, I > could consider submitting a version with SMP omitted for now (since > the kernel patches submitted at this point don't include SMP) and do > the addition of SMP as a separate patch later. But with the launch of > open-hardware boards capable of running SMP J2 systems (see > https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like > to be getting bindings we can use stabilized so that we're properly > including DTB in the boot rom and not relying on external DTB files or > linking DTB in kernel. I would argue that the SMP bindings can be added at the same point as the code. If there's any chance that something may change, having the bindings in the kernel early gives a potentially misleading impression of stability. I assume that you have the facility to upgrade the boot ROM? Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 26, 2016 at 11:38:29AM +0100, Mark Rutland wrote: > On Wed, May 25, 2016 at 07:04:08PM -0400, Rich Felker wrote: > > On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote: > > > > +Optional properties: > > > > + > > > > +- enable-method: Required only for SMP systems. If present, must be > > > > + "jcore,spin-table". > > > > + > > > > + > > > > +-------------------- > > > > +Individual cpu nodes > > > > +-------------------- > > > > + > > > > +Required properties: > > > > + > > > > +- device_type: Must be "cpu". > > > > + > > > > +- compatible: Must be "jcore,j2". > > > > + > > > > +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based > > > > + hardware cpu id on SMP systems. > > > > + > > > > +Optional properties: > > > > + > > > > +- clock-frequency: Clock frequency of the cpu in Hz. > > > > + > > > > +- cpu-release-addr: Necessary only for secondary processors on SMP systems > > > > + using the "jcore,spin-table" enable method. If present, must consist of > > > > + two cells containing physical addresses. The first cell contains an > > > > + address which, when written, unblocks the secondary cpu. The second cell > > > > + contains an address from which the cpu will read its initial program > > > > + counter when unblocked. > > > > > > I take it this follows the example of the arm64 spin-table rather than > > > the ePAPR spin-table, given the lack of an ePAPR reference or struct > > > definition. > > > > Indeed, I wasn't aware of the ePAPR spec for it. Would you prefer that > > we use a different name or something? > > No, the "jcore,spin-table" name is fine. > > > > From my experience with the arm64 spin-table, I would *strongly* > > > recommend that you tighten this up, and define things more thoroughly, > > > before a variety of FW/bootloader implementations appear. e.g. > > > > > > * What initial/invalid value should the location contain? Is zero a > > > valid address that you might want to jump a secondary CPU to? > > > > I believe the hardware is implemented such that just the appearance of > > the address for write on the bus unblocks the secondary cpu waiting on > > it. > > Ok, so this is effectively a device register, rather than a location in > "real" memory. Yes, I re-checked and it actually works as a device register. Sorry for the confusion. I think what happened is that I modeled the binding/kernel with the intent that it could just as well be in normal memory with the cpu spinning on it. > > As for the address to jump to, this is provided by the kernel and, > > for Linux purposes, is always _stext. As I understand the mechanism, > > the actual initial PC for secondary cpus is in rom, and the code there > > is responsible for loading the application-desired (i.e. > > kernel-desired) initial PC and jumping to it. > > Ok. Is this second address also a device register, or might this be in > "real" memory? In practice it's real memory, but aside from possible constraints on how it's written I don't think it matters a lot. > > > * How must the value be written? > > > - Which endianness? > > > > CPU native. > > Ok. I take it that a CPU's endianness cannot be switched onthe fly, > then? Or does the hardware backing the release-addr register handle > arbitrary endianness dynamically? No, it's not switched on the fly. > If you want to reuse the same HW block across configurations where CPU > endianness differs, it may make sense to define an endianness > regardless, to simplify integration concerns. The existing cpus are all big-endian, but I believe at one point there was talk that it's easy to get a little-endian version if you want. In any case the value is to be read by the cpu, so cpu endianness (i.e. no endianness, just a value) is the only thing that makes sense to specify. Adding a fixed endian spec independent of cpu endianness just complicates both hardware and kernel implementation and its only benefit seems to be supporting runtime-switchable chips where the entry-point code has to select the endianness to match the rest of the kernel. > > > - With a single store? Or is there a more involved sequence to prevent > > > the secondary CPU from seeing a torn value? > > > > The start address is just a physical ram address (internal sram) and > > how it's written does not matter, because it's only read once the > > release write occurs. > > Sure. I had initially mis-read the documentation and applied my > understanding of the arm64 spin-table sequence (which only has a single > write for both purposes). > > For the actual release write are there any constraints? e.g. value, size > of access? I'm not sure. If so, native word (32-bit) would be the right size to specify, but it's possible that any size works. > > > * Must CPUs have a unique cpu-release-addr? I would *strongly* recommend > > > that they do. > > > > There is currently no spec or implementation with more than one > > secondary cpu. > > Ok. Please bear the above in mind if/when implementations with more than > two secondary CPUs are conceivable. Yes. > > > - Is any minimal padding required around cpu-release-addrs? e.g. can > > > FW or bootlaoder put data in the same cacheline-aligned region and > > > exepct the OS not to corrupt this? > > > > The word-sized memory (for J2, 32-bit) at the address is what's being > > addressed. There is no implicit license for the kernel to clobber > > other nearby data. > > My concern was that if your memory system is not fully coherent, and > the CPU has a cacheable mapping of the initial program counter field, it > would need to perform a cache clean to ensure visibility of that field. > If the cache line for that region were stale, that would clobber data in > the same cache line (e.g. something owned/used by FW). > > Per your comments below that doesn't matter now but may in future. I agree, but from a practical standpoint I think it's irrelevant. The secondary cpu(s) should start with either empty cache or just some boot rom code in their caches (and current architecture does not even cache sram/"rom"). Flushing would be a good idea just to be safe but it's going to be an effective no-op. > > > * How should the OS map the region (i.e. which MMU/cache attributes)? > > > > For J2, there is no mmu. All these specs need extension for future > > models with mmu, because the properties of the mmu should be described > > in the DT. > > I was going by the fact you had a binding for a cache, which I assumed > was SW configurable. If that's not the case, then my questions about > caches and MMU attributes do not apply for the timebeing. The only configuration possible via the current binding is enabling/disabling the cache. If it's enabled, it needs to be used for icache flush when new code is written, and for dcache flush before/after DMA. (Note that there is no DMA binding yet and no DMA driver; these are coming later.) > > > - Is the address permitted to be a device register? > > > > I don't see a strong reason to disallow it. Do you? > > So long as you can guarantee that OS does not have a cacheable mapping > of this region, and the size of the access wis well-defined, I do not > see a reason to disallow it. > > > > * Where can this memory live? > > > - Should it be abesnt from any memory node? To which padding? > > > - Should the memory be described with a memreserve? If so, what are > > > your architecture-specific memreserve semantics (i.e. which > > > MMU/cache attributes are permitted for a memreserve'd region)? > > > > If it's in the memory regions exposed by the DT to the kernel, it > > should be reserved, I think. In practice it's not. > > Ok. This should be documented (as we do for the arm64 spin-table). > > Perhaps that is not a major problem if the OS never pokes the release > register. > > If you do /memreserve/ the region rather than carving it out of memory > nodes, you will also need to define semantics for memreserve. Typically > memreserve meaans that the OS should not perform any stores to the > region, but is permitted to map it with some architecture-specific > cacheable attributes. My interpretation of memreserve is just that it marks memory ranges that the kernel cannot use for allocatable memory for its own purposes, despite otherwise possibly lying in the range of a "memory" node. I would not interpret it as excluding accesses by drivers that were told to use specific addresses in the "reserved" range as part of their DT bindings. > > One general question I have about all of your comments -- is the DT > > binding file really supposed to amount to a hardware programming > > manual, fully specifying all of the programming interfaces? I don't > > see that in other binding files, and it feels like what you're asking > > for is beyond the scope of just specifying the bindings. > > The binding file is not intended to be a full HW description, but where > possible relevant documentation should be referred to, in case there is > some ambiguity. > > My questions about SMP are largely orthogonal to DT; I simply have > experience in dealing with that for arm64, and am aware of some of the > pain points that were not immediately obvious. OK, thanks for clarifying that. This is actually really helpful feedback to have but I wasn't sure if you wanted me to consider it part of what needs to be done for DT binding or for consideration in designing and documenting the SMP architecture. > > If you really do want a lot more detail for SMP-related bindings, I > > could consider submitting a version with SMP omitted for now (since > > the kernel patches submitted at this point don't include SMP) and do > > the addition of SMP as a separate patch later. But with the launch of > > open-hardware boards capable of running SMP J2 systems (see > > https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like > > to be getting bindings we can use stabilized so that we're properly > > including DTB in the boot rom and not relying on external DTB files or > > linking DTB in kernel. > > I would argue that the SMP bindings can be added at the same point as > the code. If there's any chance that something may change, having the > bindings in the kernel early gives a potentially misleading impression > of stability. OK. I'll strip it down to just the parts that are relevant for non-SMP and submit the patch adding SMP bindings along with the SMP kernel patches. > I assume that you have the facility to upgrade the boot ROM? Yes. For FPGA implementations it's just part of the FPGA bitstream and you upgrade it the same way you load a new bitstream onto the FPGA. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/25/2016 06:04 PM, Rich Felker wrote: > On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote: >> * What state should the CPU be in when it branches to the provided >> address? >> - Must the MMU be off? > > Current models are nommu. As far as I know, we're the first nommu SMP implementation in Linux. >> At some point, you are likely to want CPU hotplug and/or cpuidle. There are hundreds of todo items. At the moment we're just trying to get basic board support in for more or less the design we demoed a year ago at https://lwn.net/Articles/647636/ The roadmap page we've posted is a summary of a summary, it doesn't even mention the DSP designs (plural) we've talked about at some length internally. There's a DMA engine we're not using yet. We've got various other things like ethernet support that the cheap $50 introductory board we're targeting as an entry point hasn't got connectors for. And we didn't do anything for the vga or sound connectors on that board because "boot to a shell prompt on serial prompt" was our first target. (It runs out of initramfs, but since you need the sdcard to load linux from, and thus the bootloader had to have an sdcard driver, it seemed a shame NOT to have an sdcard driver in linux too.) Other than that? Intentionally minimalist first pass. >> We >> didn't provision the arm64 spin-table for either of these and never >> extended it, but you may want to put in place some discoverability now >> to allow future OSs to use that new support while allowing current OSs >> to retain functional (e.g. not requiring a new enable-method string). >> >>> +--------------------- >>> +Cache controller node >>> +--------------------- >>> + >>> +Required properties: >>> + >>> +- compatible: Must be "jcore,cache". >>> + >>> +- reg: A memory range for the cache controller registers. >> >> There is a well-defined memory map for the cache controller? The icache and dcache are 8k each, have 32 byte cache lines, and they cache (addr>>5)&255 so reading from + or - 8192 bytes will evict that cache line due to aliasing. The icache and dcache each have an enable and flush bit in the processor's control register. (So 4 bits total controlling the cache, I think? I'd have to dig up Niishi-san's slides to check.) Each processor has its own set of control registers. Back when SMP was first implemented there was a lot of talk among the engineers about separating the register stuff out, because right now processor stuff and SOC I/O devices are kinda mashed together. But "release early, release often" won out over endlessly polishing the same stuff in-house before showing it to anybody else. We WANT other people to suggest fixes, but we also want basic support for what's already been working for a year and a half to go into the vanilla kernel at some point. >> If so, please refer to documentation for it here (either in this >> section, or the top of this document if common with other elements >> described herein). During my most recent trip to Japan I sat down with the engineer who wrote the dcache and dram controller and passed on his explanation of them to somebody on the j-core mailing list, about halfway through this message: http://lists.j-core.org/pipermail/j-core/2016-April/000038.html I've been meaning to cut that out and put it on a web page on j-core.org, but have been busy with other things. That post also points at comments in the VHDL source that implement the features. That is, alas, the level of documentation we're talking about at the moment. Better is on the todo list. In the meantime you can RTFS if you understand vhdl, or ask the engineers on the j-core list or #j-core freenode channel. The instruction set is based on an existing architecture and the other SOC features in the initial release are as minimal as we could get them and still be useful. (We've got a lot more peripherals implemented than this release includes.) We thought getting working code into the kernel should be a high priority, but apparently everything has to be done before anything can be done? > The current version "jcore,cache" has a single 32-bit control register > per cpu that can be used to enable/disable/flush icache and/or dcache. > There is no finer-grained control. If/when we do larger caches in the > future where it makes sense, there will be a new binding for it. (For > example it may make sense to do one that matches the original SH > memory-mapped cache interface.) The first dache-only Linux support I did last year worked by reading an aliased cache line from sram to evict individual cache lines, and it turned out to have no detectable speed advantage over just blanking the whole thing. Then doing the same "flush by aliasing" trick with icache would have required an 8k jump table and we just went "no" and implemented the flush bits instead, so almost all that code went away. When we implement L2 cache for j-core, we can start caring about granularity again, but http://j-core.org/roadmap.html has scaling _down_ into arduino country scheduled before scheduling up into MMU and 64-bit territory so... not this year. :) >>> +- reg: A memory range containing a single 32-bit mmio register which produces >>> + the current cpu id (matching the "reg" property of the cpu performing the >>> + read) when read. >> >> Likewise. > > One general question I have about all of your comments -- is the DT > binding file really supposed to amount to a hardware programming > manual, fully specifying all of the programming interfaces? I don't > see that in other binding files, and it feels like what you're asking > for is beyond the scope of just specifying the bindings. In general if they haven't needed it yet, they haven't done it yet. Doesn't mean we haven't thought about it, or even sketched out multiple possible paths forward and decided on our favorite and our preferred fallback approach. (I have more photographs of whiteboards on my phone than anyone really needs. Some of them I could even tell you what the subject was.) But for the initial "here's how people other than us can try this out on a $50 FPGA board from india" board support, going down the rathole of every possible direction of future expansion as a prerequisite for "one release of one processor on one brand of board" introductory support seems counterproductive. Or is it just me? If you want to critique the hardware design on the j-core mailing list, I'll happily point the appropriate in-house engineers at the thread. In the meantime, Rich submitted a patch to support the SOC we've had for a year now. Rob -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 26, 2016 at 11:33:23AM -0400, Rich Felker wrote: > On Thu, May 26, 2016 at 11:38:29AM +0100, Mark Rutland wrote: > > On Wed, May 25, 2016 at 07:04:08PM -0400, Rich Felker wrote: > > > On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote: > > > > * How must the value be written? > > > > - Which endianness? > > > > > > CPU native. > > > > Ok. I take it that a CPU's endianness cannot be switched onthe fly, > > then? Or does the hardware backing the release-addr register handle > > arbitrary endianness dynamically? > > No, it's not switched on the fly. > > > If you want to reuse the same HW block across configurations where CPU > > endianness differs, it may make sense to define an endianness > > regardless, to simplify integration concerns. > > The existing cpus are all big-endian, but I believe at one point there > was talk that it's easy to get a little-endian version if you want. In > any case the value is to be read by the cpu, so cpu endianness (i.e. > no endianness, just a value) is the only thing that makes sense to > specify. Adding a fixed endian spec independent of cpu endianness just > complicates both hardware and kernel implementation and its only > benefit seems to be supporting runtime-switchable chips where the > entry-point code has to select the endianness to match the rest of the > kernel. Sure. If endianness isn't runtime switchable, and there is no near-term plan to add that, then my concerns do not apply. [...] > > If you do /memreserve/ the region rather than carving it out of memory > > nodes, you will also need to define semantics for memreserve. Typically > > memreserve meaans that the OS should not perform any stores to the > > region, but is permitted to map it with some architecture-specific > > cacheable attributes. > > My interpretation of memreserve is just that it marks memory ranges > that the kernel cannot use for allocatable memory for its own > purposes, despite otherwise possibly lying in the range of a "memory" > node. I would not interpret it as excluding accesses by drivers that > were told to use specific addresses in the "reserved" range as part of > their DT bindings. Yes, this is strictly more correct than my wording. Given the lack of MMU or cache configration beynd on/off, it doesn't sound like there are any arch-specific memory attribute rules to document. [...] > > My questions about SMP are largely orthogonal to DT; I simply have > > experience in dealing with that for arm64, and am aware of some of the > > pain points that were not immediately obvious. > > OK, thanks for clarifying that. This is actually really helpful > feedback to have but I wasn't sure if you wanted me to consider it > part of what needs to be done for DT binding or for consideration in > designing and documenting the SMP architecture. Sorry for that; in retrospect I probably should have sent the boot protocol comments as a separate reply. [...] > OK. I'll strip it down to just the parts that are relevant for non-SMP > and submit the patch adding SMP bindings along with the SMP kernel > patches. That sounds good to me. Thanks, Mark, -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi,
On Thu, May 26, 2016 at 04:44:02PM -0500, Rob Landley wrote:
> As far as I know, we're the first nommu SMP implementation in Linux.
According to hearsay, thou shall be called Buzz Aldrin, Blackfin is
Neil Armstrong.
Regards
afzal
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/devicetree/bindings/jcore/cpus.txt b/Documentation/devicetree/bindings/jcore/cpus.txt new file mode 100644 index 0000000..9d77ec1 --- /dev/null +++ b/Documentation/devicetree/bindings/jcore/cpus.txt @@ -0,0 +1,92 @@ +=================== +J-Core cpu bindings +=================== + +The J-Core processors are open source CPU cores that can be built as FPGA +soft cores or ASICs. The device tree is also responsible for describing the +cache controls and, for SMP configurations, all details of the SMP method, +as documented below. + + +--------------------- +Top-level "cpus" node +--------------------- + +Required properties: + +- #address-cells: Must be 1. + +- #size-cells: Must be 0. + +Optional properties: + +- enable-method: Required only for SMP systems. If present, must be + "jcore,spin-table". + + +-------------------- +Individual cpu nodes +-------------------- + +Required properties: + +- device_type: Must be "cpu". + +- compatible: Must be "jcore,j2". + +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based + hardware cpu id on SMP systems. + +Optional properties: + +- clock-frequency: Clock frequency of the cpu in Hz. + +- cpu-release-addr: Necessary only for secondary processors on SMP systems + using the "jcore,spin-table" enable method. If present, must consist of + two cells containing physical addresses. The first cell contains an + address which, when written, unblocks the secondary cpu. The second cell + contains an address from which the cpu will read its initial program + counter when unblocked. + + +--------------------- +Cache controller node +--------------------- + +Required properties: + +- compatible: Must be "jcore,cache". + +- reg: A memory range for the cache controller registers. + + +-------- +IPI node +-------- + +Device trees for SMP systems must have an IPI node representing the mechanism +used for inter-processor interrupt generation. + +Required properties: + +- compatible: Must be "jcore,ipi-controller". + +- reg: A memory range used to IPI generation. + +- interrupts: An irq on which IPI will be received. + + +---------- +CPUID node +---------- + +Device trees for SMP systems must have a CPUID node representing the mechanism +used to identify the current processor on which execution is taking place. + +Required properties: + +- compatible: Must be "jcore,cpuid-mmio". + +- reg: A memory range containing a single 32-bit mmio register which produces + the current cpu id (matching the "reg" property of the cpu performing the + read) when read.
Signed-off-by: Rich Felker <dalias@libc.org> --- Documentation/devicetree/bindings/jcore/cpus.txt | 92 ++++++++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 Documentation/devicetree/bindings/jcore/cpus.txt