[RFC,0/5] target/ppc: initial SMT support in TCG

Message ID	20230531012313.19891-1-npiggin@gmail.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Nicholas Piggin <npiggin@gmail.com> To: qemu-ppc@nongnu.org Cc: Nicholas Piggin <npiggin@gmail.com>, qemu-devel@nongnu.org, Daniel Henrique Barboza <dbarboza@ventanamicro.com> Subject: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Date: Wed, 31 May 2023 11:23:08 +1000 Message-Id: <20230531012313.19891-1-npiggin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::62a; envelope-from=npiggin@gmail.com; helo=mail-pl1-x62a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	target/ppc: initial SMT support in TCG \| expand [RFC,0/5] target/ppc: initial SMT support in TCG [RFC,1/5] target/ppc: gdbstub init spr gdb_id for all CPUs [RFC,2/5] target/ppc: Add initial flags and helpers for SMT support [RFC,3/5] target/ppc: Add support for SMT CTRL register [RFC,4/5] target/ppc: Add msgsnd/p and DPDES SMT support [RFC,5/5] spapr: Allow up to 8 threads SMT configuration

Message ID

20230531012313.19891-1-npiggin@gmail.com (mailing list archive)

Headers

From: Nicholas Piggin <npiggin@gmail.com>
To: qemu-ppc@nongnu.org
Cc: Nicholas Piggin <npiggin@gmail.com>, qemu-devel@nongnu.org,
 Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Subject: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
Date: Wed, 31 May 2023 11:23:08 +1000
Message-Id: <20230531012313.19891-1-npiggin@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::62a;
 envelope-from=npiggin@gmail.com; helo=mail-pl1-x62a.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

Series

target/ppc: initial SMT support in TCG | expand

Message

Nicholas Piggin May 31, 2023, 1:23 a.m. UTC

Hi,

I'm posting this now just to get some first thoughts. I wouldn't say
it's ready but it does actually work with some basic tests including
pseries booting a Linux distro. I have powernv booting too, it just
requires some more SPRs converted, nothing fundamentally different so
for the purpose of this RFC I leave it out.

A couple of things, I don't know the object model well enough to do
something nice with topology. Iterating siblings I would have thought
should be going to parent core then iterating its children CPUs. Should
that be done with the object model, or is it better to add direct
pointers in CPUs to core and core to CPUs? It is (semi) important for
performance so maybe that is better than object iterators. If we go that
way, the PnvCore and SpaprCore have pointers to the SMT threads already,
should those be abstracted go in the CPUCore?

The other thing is the serialisation of access. It's using the atomic
single stepping for this which... I guess should be sufficient? Is it
the best way to do it though? Can a lock be used somehow instead?

Thanks,
Nick

Nicholas Piggin (5):
  target/ppc: gdbstub init spr gdb_id for all CPUs
  target/ppc: Add initial flags and helpers for SMT support
  target/ppc: Add support for SMT CTRL register
  target/ppc: Add msgsnd/p and DPDES SMT support
  spapr: Allow up to 8 threads SMT configuration

 hw/ppc/ppc.c                                  |  6 ++
 hw/ppc/spapr.c                                |  4 +-
 hw/ppc/spapr_cpu_core.c                       |  7 +-
 include/hw/ppc/ppc.h                          |  1 +
 target/ppc/cpu.h                              | 16 +++-
 target/ppc/cpu_init.c                         |  5 +
 target/ppc/excp_helper.c                      | 86 ++++++++++++-----
 target/ppc/gdbstub.c                          | 32 ++++---
 target/ppc/helper.h                           |  4 +-
 target/ppc/misc_helper.c                      | 93 +++++++++++++++++--
 target/ppc/translate.c                        | 46 ++++++++-
 .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
 12 files changed, 252 insertions(+), 50 deletions(-)

Comments

Cédric Le Goater June 1, 2023, 7:56 a.m. UTC | #1

Hello Nick,

On 5/31/23 03:23, Nicholas Piggin wrote:
> Hi,
> 
> I'm posting this now just to get some first thoughts. I wouldn't say
> it's ready but it does actually work with some basic tests including
> pseries booting a Linux distro. I have powernv booting too, it just
> requires some more SPRs converted, nothing fundamentally different so
> for the purpose of this RFC I leave it out.
> 
> A couple of things, I don't know the object model well enough to do
> something nice with topology. Iterating siblings I would have thought
> should be going to parent core then iterating its children CPUs. Should
> that be done with the object model, or is it better to add direct
> pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
> way, the PnvCore and SpaprCore have pointers to the SMT threads already,
> should those be abstracted go in the CPUCore?

You should be able to move the thread array into the CPUCore. If you do
that, please check that migration compat is not impacted by the state
change. However, I am not sure you can use the CPUCore model under the
insn modeling. Something to check.

Anyhow, the way you implemented the loop on the siblings is sufficiently
fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
I would leave that part for now, if it runs decently with 4*4 vCPUs in
TCG it should be fine.

Thanks,

C.


  
> The other thing is the serialisation of access. It's using the atomic
> single stepping for this which... I guess should be sufficient? Is it
> the best way to do it though? Can a lock be used somehow instead?
> 
> Thanks,
> Nick
> 
> Nicholas Piggin (5):
>    target/ppc: gdbstub init spr gdb_id for all CPUs
>    target/ppc: Add initial flags and helpers for SMT support
>    target/ppc: Add support for SMT CTRL register
>    target/ppc: Add msgsnd/p and DPDES SMT support
>    spapr: Allow up to 8 threads SMT configuration
> 
>   hw/ppc/ppc.c                                  |  6 ++
>   hw/ppc/spapr.c                                |  4 +-
>   hw/ppc/spapr_cpu_core.c                       |  7 +-
>   include/hw/ppc/ppc.h                          |  1 +
>   target/ppc/cpu.h                              | 16 +++-
>   target/ppc/cpu_init.c                         |  5 +
>   target/ppc/excp_helper.c                      | 86 ++++++++++++-----
>   target/ppc/gdbstub.c                          | 32 ++++---
>   target/ppc/helper.h                           |  4 +-
>   target/ppc/misc_helper.c                      | 93 +++++++++++++++++--
>   target/ppc/translate.c                        | 46 ++++++++-
>   .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
>   12 files changed, 252 insertions(+), 50 deletions(-)
>

Nicholas Piggin June 2, 2023, 7:01 a.m. UTC | #2

On Thu Jun 1, 2023 at 5:56 PM AEST, Cédric Le Goater wrote:
> Hello Nick,
>
> On 5/31/23 03:23, Nicholas Piggin wrote:
> > Hi,
> > 
> > I'm posting this now just to get some first thoughts. I wouldn't say
> > it's ready but it does actually work with some basic tests including
> > pseries booting a Linux distro. I have powernv booting too, it just
> > requires some more SPRs converted, nothing fundamentally different so
> > for the purpose of this RFC I leave it out.
> > 
> > A couple of things, I don't know the object model well enough to do
> > something nice with topology. Iterating siblings I would have thought
> > should be going to parent core then iterating its children CPUs. Should
> > that be done with the object model, or is it better to add direct
> > pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
> > way, the PnvCore and SpaprCore have pointers to the SMT threads already,
> > should those be abstracted go in the CPUCore?
>
> You should be able to move the thread array into the CPUCore. If you do
> that, please check that migration compat is not impacted by the state
> change. However, I am not sure you can use the CPUCore model under the
> insn modeling. Something to check.

Okay.

> Anyhow, the way you implemented the loop on the siblings is sufficiently
> fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
> I would leave that part for now, if it runs decently with 4*4 vCPUs in
> TCG it should be fine.

Yeah you're right I'm overly paranoid about it but we don't do hundreds
of CPUs in TCG so it should be fine. Maybe I will defer it for now
then and just do the CPU iteration.

Thanks,
Nick

Cédric Le Goater June 2, 2023, 7:21 a.m. UTC | #3

On 6/2/23 09:01, Nicholas Piggin wrote:
> On Thu Jun 1, 2023 at 5:56 PM AEST, Cédric Le Goater wrote:
>> Hello Nick,
>>
>> On 5/31/23 03:23, Nicholas Piggin wrote:
>>> Hi,
>>>
>>> I'm posting this now just to get some first thoughts. I wouldn't say
>>> it's ready but it does actually work with some basic tests including
>>> pseries booting a Linux distro. I have powernv booting too, it just
>>> requires some more SPRs converted, nothing fundamentally different so
>>> for the purpose of this RFC I leave it out.
>>>
>>> A couple of things, I don't know the object model well enough to do
>>> something nice with topology. Iterating siblings I would have thought
>>> should be going to parent core then iterating its children CPUs. Should
>>> that be done with the object model, or is it better to add direct
>>> pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
>>> way, the PnvCore and SpaprCore have pointers to the SMT threads already,
>>> should those be abstracted go in the CPUCore?
>>
>> You should be able to move the thread array into the CPUCore. If you do
>> that, please check that migration compat is not impacted by the state
>> change. However, I am not sure you can use the CPUCore model under the
>> insn modeling. Something to check.
> 
> Okay.
> 
>> Anyhow, the way you implemented the loop on the siblings is sufficiently
>> fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
>> I would leave that part for now, if it runs decently with 4*4 vCPUs in
>> TCG it should be fine.
> 
> Yeah you're right I'm overly paranoid about it but we don't do hundreds
> of CPUs in TCG so it should be fine. 

The PowerNV did run with 64 CPUs at some point. Boot was slow bc of
contention in some areas when starting the secondaries. When stabilized,
perf was decent.

I think that a realistic goal for book3s is to support 2 sockets * 2 cores
* 4 threads on PowerNV and 16 vCPUs on pseries, this to exercise the various
ways to IPI on the different HV implementations.

> Maybe I will defer it for now then and just do the CPU iteration.

May be there is some value to store a CPU siblings under the PowerPC
CPU descriptor. IT could be useful for instructions that apply to the
current CPU but for others requiring a PIR value, you will have to
find a starting point in the CPU list, so it won't make much difference
I think.

Thanks,

C.



> 
> Thanks,
> Nick