Message ID | 20210715055923.43126-1-xiyou.wangcong@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 65875073eddd24d7b3968c1501ef29277398dc7b |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,resend,v2] net: use %px to print skb address in trace_netif_receive_skb | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: mingo@redhat.com rostedt@goodmis.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 10 this patch: 10 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | CHECK: Alignment should match open parenthesis WARNING: 'adress' may be misspelled - perhaps 'address'? |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 10 this patch: 10 |
netdev/header_inline | success | Link |
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Wed, 14 Jul 2021 22:59:23 -0700 you wrote: > From: Qitao Xu <qitao.xu@bytedance.com> > > The print format of skb adress in tracepoint class net_dev_template > is changed to %px from %p, because we want to use skb address > as a quick way to identify a packet. > > Note, trace ring buffer is only accessible to privileged users, > it is safe to use a real kernel address here. > > [...] Here is the summary with links: - [net-next,resend,v2] net: use %px to print skb address in trace_netif_receive_skb https://git.kernel.org/netdev/net/c/65875073eddd You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
On Wed, Jul 14, 2021 at 10:59:23PM -0700, Cong Wang wrote: > From: Qitao Xu <qitao.xu@bytedance.com> > > The print format of skb adress in tracepoint class net_dev_template > is changed to %px from %p, because we want to use skb address > as a quick way to identify a packet. No; %p was already hashed to uniquely identify unique addresses. This is needlessly exposing kernel addresses with no change in utility. See [1] for full details on when %px is justified (almost never). > Note, trace ring buffer is only accessible to privileged users, > it is safe to use a real kernel address here. That's not accurate either; there is a difference between uid 0 and kernel mode privilege levels. Please revert these: 851f36e40962408309ad2665bf0056c19a97881c 65875073eddd24d7b3968c1501ef29277398dc7b And adjust this to replace %px with %p: 70713dddf3d25a02d1952f8c5d2688c986d2f2fb Thanks! -Kees [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#p-format-specifier > > Reviewed-by: Cong Wang <cong.wang@bytedance.com> > Signed-off-by: Qitao Xu <qitao.xu@bytedance.com> > --- > include/trace/events/net.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/trace/events/net.h b/include/trace/events/net.h > index 2399073c3afc..78c448c6ab4c 100644 > --- a/include/trace/events/net.h > +++ b/include/trace/events/net.h > @@ -136,7 +136,7 @@ DECLARE_EVENT_CLASS(net_dev_template, > __assign_str(name, skb->dev->name); > ), > > - TP_printk("dev=%s skbaddr=%p len=%u", > + TP_printk("dev=%s skbaddr=%px len=%u", > __get_str(name), __entry->skbaddr, __entry->len) > ) > > -- > 2.27.0 >
On Fri, Jul 23, 2021 at 12:09 AM Kees Cook <keescook@chromium.org> wrote: > > On Wed, Jul 14, 2021 at 10:59:23PM -0700, Cong Wang wrote: > > From: Qitao Xu <qitao.xu@bytedance.com> > > > > The print format of skb adress in tracepoint class net_dev_template > > is changed to %px from %p, because we want to use skb address > > as a quick way to identify a packet. > > No; %p was already hashed to uniquely identify unique addresses. This > is needlessly exposing kernel addresses with no change in utility. See > [1] for full details on when %px is justified (almost never). This is clearly not true. Trust us, we _did_ use %p in the beginning, it does not work at all. See the trace below captured with %p: <idle>-0 [012] ..s. 385.248024: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=56 <idle>-0 [012] ..s. 385.261602: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 387.245617: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 389.293621: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 391.277637: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 392.939159: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=60 <idle>-0 [012] .Ns. 392.939197: ip_rcv: skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 392.939286: ip_local_deliver_finish: skbaddr=000000003fa575fb kworker/12:1-122 [012] ..s. 392.939369: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=52 kworker/12:1-122 [012] ..s. 392.939371: ip_rcv: skbaddr=000000003fa575fb kworker/12:1-122 [012] ..s. 392.939409: ip_local_deliver_finish: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 393.261605: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 395.174268: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=56 <idle>-0 [012] ..s. 395.174275: ip_rcv: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 395.174284: sk_data_ready: skaddr=0000000049f7d0d9, skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 395.174337: tcp_v4_rcv: sport=8801 dport=15884 saddr=192.168.122.139 daddr=192.168.122.1 saddrv6=::ffff:192.168.122.139 daddrv6=::ffff:192.168.122.1 state=TCP_ESTABLISHED skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 395.174338: ip_local_deliver_finish: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 395.245605: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 397.293632: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 397.352766: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=56 <idle>-0 [012] ..s. 397.352771: ip_rcv: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 397.352813: sk_data_ready: skaddr=0000000049f7d0d9, skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 397.352819: tcp_v4_rcv: sport=8801 dport=15884 saddr=192.168.122.139 daddr=192.168.122.1 saddrv6=::ffff:192.168.122.139 daddrv6=::ffff:192.168.122.1 state=TCP_ESTABLISHED skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 397.352819: ip_local_deliver_finish: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 399.277623: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 401.261640: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 402.671285: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=56 <idle>-0 [012] ..s. 402.671294: ip_rcv: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 402.671344: sk_data_ready: skaddr=0000000049f7d0d9, skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 402.671355: tcp_v4_rcv: sport=8801 dport=15884 saddr=192.168.122.139 daddr=192.168.122.1 saddrv6=::ffff:192.168.122.139 daddrv6=::ffff:192.168.122.1 state=TCP_ESTABLISHED skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 402.671356: ip_local_deliver_finish: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 403.245617: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 403.688075: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=52 <idle>-0 [012] ..s. 403.688080: ip_rcv: skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 403.688092: sk_data_ready: skaddr=0000000049f7d0d9, skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 403.688094: tcp_v4_rcv: sport=8801 dport=15884 saddr=192.168.122.139 daddr=192.168.122.1 saddrv6=::ffff:192.168.122.139 daddrv6=::ffff:192.168.122.1 state=TCP_CLOSE_WAIT skbaddr=000000003fa575fb <idle>-0 [012] .Ns. 403.688095: ip_local_deliver_finish: skbaddr=000000003fa575fb nc-1044 [012] ..s. 403.688222: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=52 nc-1044 [012] ..s. 403.688222: ip_rcv: skbaddr=000000003fa575fb nc-1044 [012] ..s. 403.688237: tcp_v4_rcv: sport=8801 dport=15884 saddr=192.168.122.139 daddr=192.168.122.1 saddrv6=::ffff:192.168.122.139 daddrv6=::ffff:192.168.122.1 state=TCP_CLOSE skbaddr=000000003fa575fb nc-1044 [012] ..s. 403.688246: ip_local_deliver_finish: skbaddr=000000003fa575fb <idle>-0 [012] ..s. 405.293625: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=38 <idle>-0 [012] ..s. 407.137723: netif_receive_skb: dev=eth0 skbaddr=000000003fa575fb len=222 > > > Note, trace ring buffer is only accessible to privileged users, > > it is safe to use a real kernel address here. > > That's not accurate either; there is a difference between uid 0 and > kernel mode privilege levels. Not sure how accurate we can be just to make you happy. Please give us a pointer on the accuracy. Thanks.
+tracing maintainers On Fri, Jul 23, 2021 at 9:09 AM Kees Cook <keescook@chromium.org> wrote: > On Wed, Jul 14, 2021 at 10:59:23PM -0700, Cong Wang wrote: > > From: Qitao Xu <qitao.xu@bytedance.com> > > > > The print format of skb adress in tracepoint class net_dev_template > > is changed to %px from %p, because we want to use skb address > > as a quick way to identify a packet. > > No; %p was already hashed to uniquely identify unique addresses. This > is needlessly exposing kernel addresses with no change in utility. See > [1] for full details on when %px is justified (almost never). > > > Note, trace ring buffer is only accessible to privileged users, > > it is safe to use a real kernel address here. > > That's not accurate either; there is a difference between uid 0 and > kernel mode privilege levels. > > Please revert these: > > 851f36e40962408309ad2665bf0056c19a97881c > 65875073eddd24d7b3968c1501ef29277398dc7b > > And adjust this to replace %px with %p: > > 70713dddf3d25a02d1952f8c5d2688c986d2f2fb > > Thanks! > > -Kees Hi Kees, as far as I understand, the printf format strings for tracepoints don't matter for exposing what data is exposed to userspace - the raw data, not the formatted data, is stored in the ring buffer that userspace can access via e.g. trace_pipe_raw (see https://www.kernel.org/doc/Documentation/trace/ftrace.txt), and the data can then be formatted **by userspace tooling** (e.g. libtraceevent). As far as I understand, the stuff that root can read via debugfs is the data stored by TP_fast_assign() (although root _can_ also let the kernel do the printing and read it in text form). Maybe Steven Rostedt can help with whether that's true and provide more detail on this. In my view, the ftrace subsystem, just like the BPF subsystem, is root-only debug tracing infrastructure that can and should log detailed information about kernel internals, no matter whether that information might be helpful to attackers, because if an attacker is sufficiently privileged to access this level of debug information, that's beyond the point where it makes sense to worry about exposing kernel pointers. But even if you disagree, I don't think that ftrace format strings are relevant here. > [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#p-format-specifier > > > > > Reviewed-by: Cong Wang <cong.wang@bytedance.com> > > Signed-off-by: Qitao Xu <qitao.xu@bytedance.com> > > --- > > include/trace/events/net.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/trace/events/net.h b/include/trace/events/net.h > > index 2399073c3afc..78c448c6ab4c 100644 > > --- a/include/trace/events/net.h > > +++ b/include/trace/events/net.h > > @@ -136,7 +136,7 @@ DECLARE_EVENT_CLASS(net_dev_template, > > __assign_str(name, skb->dev->name); > > ), > > > > - TP_printk("dev=%s skbaddr=%p len=%u", > > + TP_printk("dev=%s skbaddr=%px len=%u", > > __get_str(name), __entry->skbaddr, __entry->len) > > ) > > > > -- > > 2.27.0 > > > > -- > Kees Cook
On Wed, 28 Jul 2021 17:13:12 +0200 Jann Horn <jannh@google.com> wrote: > +tracing maintainers > > On Fri, Jul 23, 2021 at 9:09 AM Kees Cook <keescook@chromium.org> wrote: > > On Wed, Jul 14, 2021 at 10:59:23PM -0700, Cong Wang wrote: > > > From: Qitao Xu <qitao.xu@bytedance.com> > > > > > > The print format of skb adress in tracepoint class net_dev_template > > > is changed to %px from %p, because we want to use skb address > > > as a quick way to identify a packet. > > > > No; %p was already hashed to uniquely identify unique addresses. This > > is needlessly exposing kernel addresses with no change in utility. See > > [1] for full details on when %px is justified (almost never). > > > > > Note, trace ring buffer is only accessible to privileged users, > > > it is safe to use a real kernel address here. > > > > That's not accurate either; there is a difference between uid 0 and > > kernel mode privilege levels. > > > > Please revert these: > > > > 851f36e40962408309ad2665bf0056c19a97881c > > 65875073eddd24d7b3968c1501ef29277398dc7b > > > > And adjust this to replace %px with %p: > > > > 70713dddf3d25a02d1952f8c5d2688c986d2f2fb > > > > Thanks! > > > > -Kees > > Hi Kees, > > as far as I understand, the printf format strings for tracepoints > don't matter for exposing what data is exposed to userspace - the raw > data, not the formatted data, is stored in the ring buffer that > userspace can access via e.g. trace_pipe_raw (see > https://www.kernel.org/doc/Documentation/trace/ftrace.txt), and the > data can then be formatted **by userspace tooling** (e.g. > libtraceevent). As far as I understand, the stuff that root can read > via debugfs is the data stored by TP_fast_assign() (although root > _can_ also let the kernel do the printing and read it in text form). > Maybe Steven Rostedt can help with whether that's true and provide > more detail on this. That is exactly what is happening. I wrote the following to the replied text up at the top, then noticed you basically stated the same thing here ;-) "You can get the raw data from the trace buffers directly via the trace_pipe_raw. The data is copied directly without any processing. The TP_fast_assign() adds the data into the buffer, and the printf() is only reading what's in that buffer. The hashing happens later. If you read the buffers directly, you get all the data you want." > > In my view, the ftrace subsystem, just like the BPF subsystem, is > root-only debug tracing infrastructure that can and should log > detailed information about kernel internals, no matter whether that > information might be helpful to attackers, because if an attacker is > sufficiently privileged to access this level of debug information, > that's beyond the point where it makes sense to worry about exposing > kernel pointers. But even if you disagree, I don't think that ftrace > format strings are relevant here. Anyway, those patches are not needed. (Kees is going to hate me). Since a345a6718bd56 added in 5.12, you can just do: # trace-cmd start -e net_dev_start_xmit # trace-cmd show [..] sshd-1853 [007] ...1 1995.000611: net_dev_start_xmit: dev=em1 queue_mapping=0 skbaddr=00000000f8c47ebd vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=3 len=150 data_len=84 network_offset=14 transport_offset_valid=1 transport_offset=34 tx_flags=0 gso_size=0 gso_segs=1 gso_type=0x1 Notice the value of skbaddr=00000000f8c47ebd ? Now I do: # trace-cmd start -O nohash-ptr -e net_dev_start_xmit # trace-cmd show [..] sshd-1853 [007] ...1 2089.462722: net_dev_start_xmit: dev= queue_mapping=0 skbaddr=ffff8cfbc3ffd0e0 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=3 len=150 data_len=84 network_offset=14 transport_offset_valid=1 transport_offset=34 tx_flags=0 gso_size=0 gso_segs=1 gso_type=0x1 And now we have: skbaddr=ffff8cfbc3ffd0e0 -- Steve
On Wed, Jul 28, 2021 at 11:56:33AM -0400, Steven Rostedt wrote: > On Wed, 28 Jul 2021 17:13:12 +0200 > Jann Horn <jannh@google.com> wrote: > > > +tracing maintainers > > > > On Fri, Jul 23, 2021 at 9:09 AM Kees Cook <keescook@chromium.org> wrote: > > > On Wed, Jul 14, 2021 at 10:59:23PM -0700, Cong Wang wrote: > > > > From: Qitao Xu <qitao.xu@bytedance.com> > > > > > > > > The print format of skb adress in tracepoint class net_dev_template > > > > is changed to %px from %p, because we want to use skb address > > > > as a quick way to identify a packet. > > > > > > No; %p was already hashed to uniquely identify unique addresses. This > > > is needlessly exposing kernel addresses with no change in utility. See > > > [1] for full details on when %px is justified (almost never). > > > > > > > Note, trace ring buffer is only accessible to privileged users, > > > > it is safe to use a real kernel address here. > > > > > > That's not accurate either; there is a difference between uid 0 and > > > kernel mode privilege levels. > > > > > > Please revert these: > > > > > > 851f36e40962408309ad2665bf0056c19a97881c > > > 65875073eddd24d7b3968c1501ef29277398dc7b > > > > > > And adjust this to replace %px with %p: > > > > > > 70713dddf3d25a02d1952f8c5d2688c986d2f2fb > > > > > > Thanks! > > > > > > -Kees > > > > Hi Kees, > > > > as far as I understand, the printf format strings for tracepoints > > don't matter for exposing what data is exposed to userspace - the raw > > data, not the formatted data, is stored in the ring buffer that > > userspace can access via e.g. trace_pipe_raw (see > > https://www.kernel.org/doc/Documentation/trace/ftrace.txt), and the > > data can then be formatted **by userspace tooling** (e.g. > > libtraceevent). As far as I understand, the stuff that root can read > > via debugfs is the data stored by TP_fast_assign() (although root > > _can_ also let the kernel do the printing and read it in text form). > > Maybe Steven Rostedt can help with whether that's true and provide > > more detail on this. > > That is exactly what is happening. I wrote the following to the replied > text up at the top, then noticed you basically stated the same thing > here ;-) Where is the %px being formatted then? If it's the kernel itself (which is the only thing that does %px), then it doesn't need to be %px, since the raw data is separate. i.e. leave it %p for whatever logs will get spilled out to who knows where. > "You can get the raw data from the trace buffers directly via the > trace_pipe_raw. The data is copied directly without any processing. The > TP_fast_assign() adds the data into the buffer, and the printf() is > only reading what's in that buffer. The hashing happens later. If you > read the buffers directly, you get all the data you want." > > > > > In my view, the ftrace subsystem, just like the BPF subsystem, is > > root-only debug tracing infrastructure that can and should log > > detailed information about kernel internals, no matter whether that > > information might be helpful to attackers, because if an attacker is > > sufficiently privileged to access this level of debug information, > > that's beyond the point where it makes sense to worry about exposing > > kernel pointers. But even if you disagree, I don't think that ftrace > > format strings are relevant here. > > Anyway, those patches are not needed. (Kees is going to hate me). > > Since a345a6718bd56 added in 5.12, you can just do: > > # trace-cmd start -e net_dev_start_xmit > # trace-cmd show > [..] > sshd-1853 [007] ...1 1995.000611: net_dev_start_xmit: dev=em1 queue_mapping=0 skbaddr=00000000f8c47ebd vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=3 len=150 data_len=84 network_offset=14 transport_offset_valid=1 transport_offset=34 tx_flags=0 gso_size=0 gso_segs=1 gso_type=0x1 > > Notice the value of skbaddr=00000000f8c47ebd ? > > Now I do: > > # trace-cmd start -O nohash-ptr -e net_dev_start_xmit > # trace-cmd show > [..] > sshd-1853 [007] ...1 2089.462722: net_dev_start_xmit: dev= queue_mapping=0 skbaddr=ffff8cfbc3ffd0e0 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=3 len=150 data_len=84 network_offset=14 transport_offset_valid=1 transport_offset=34 tx_flags=0 gso_size=0 gso_segs=1 gso_type=0x1 > > And now we have: > > skbaddr=ffff8cfbc3ffd0e0 How does ftrace interact with lockdown's confidentiality mode?
On Wed, Jul 28, 2021 at 8:48 PM Kees Cook <keescook@chromium.org> wrote:
> How does ftrace interact with lockdown's confidentiality mode?
Should be LOCKDOWN_TRACEFS ?
On Wed, 28 Jul 2021 11:48:11 -0700 Kees Cook <keescook@chromium.org> wrote: > > That is exactly what is happening. I wrote the following to the replied > > text up at the top, then noticed you basically stated the same thing > > here ;-) > > Where is the %px being formatted then? If it's the kernel itself (which > is the only thing that does %px), then it doesn't need to be %px, since > the raw data is separate. i.e. leave it %p for whatever logs will get > spilled out to who knows where. The trace events shown by tracefs/trace are formatted via the kernel sprintf() and friends, which will hash "%p". But the raw data read to trace-cmd (and other tools), uses libtraceevent, that parses the printf-fmt of a trace event just like the kernel does. But since we are already in user space, it is pretty pointless to implement "%p" with hashing. But it does understand what "%px" is. > > How does ftrace interact with lockdown's confidentiality mode? > as Jann replied. If you enable "LOCKDOWN_TRACEFS" you lose all access to the tracing interface. -- Steve
diff --git a/include/trace/events/net.h b/include/trace/events/net.h index 2399073c3afc..78c448c6ab4c 100644 --- a/include/trace/events/net.h +++ b/include/trace/events/net.h @@ -136,7 +136,7 @@ DECLARE_EVENT_CLASS(net_dev_template, __assign_str(name, skb->dev->name); ), - TP_printk("dev=%s skbaddr=%p len=%u", + TP_printk("dev=%s skbaddr=%px len=%u", __get_str(name), __entry->skbaddr, __entry->len) )