mbox series

[v2,bpf-next,0/4] introduce bpf_iter for task_vma

Message ID 20201215233702.3301881-1-songliubraving@fb.com (mailing list archive)
Headers show
Series introduce bpf_iter for task_vma | expand

Message

Song Liu Dec. 15, 2020, 11:36 p.m. UTC
This set introduces bpf_iter for task_vma, which can be used to generate
information similar to /proc/pid/maps or /proc/pid/smaps. Patch 4/4 adds
an example that mimics /proc/pid/maps.

Changes v1 => v2:
  1. Small fixes in task_iter.c and the selftests. (Yonghong)

Song Liu (4):
  bpf: introduce task_vma bpf_iter
  bpf: allow bpf_d_path in sleepable bpf_iter program
  libbpf: introduce section "iter.s/" for sleepable bpf_iter program
  selftests/bpf: add test for bpf_iter_task_vma

 include/linux/bpf.h                           |   2 +-
 kernel/bpf/task_iter.c                        | 205 +++++++++++++++++-
 kernel/trace/bpf_trace.c                      |   5 +
 tools/lib/bpf/libbpf.c                        |   5 +
 .../selftests/bpf/prog_tests/bpf_iter.c       | 106 ++++++++-
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   9 +
 .../selftests/bpf/progs/bpf_iter_task_vma.c   |  55 +++++
 7 files changed, 375 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c

--
2.24.1

Comments

Yonghong Song Dec. 16, 2020, 5 p.m. UTC | #1
On 12/15/20 3:36 PM, Song Liu wrote:
> This set introduces bpf_iter for task_vma, which can be used to generate
> information similar to /proc/pid/maps or /proc/pid/smaps. Patch 4/4 adds

I did not see an example for /proc/pid/smaps. It would be good if you 
can cover smaps as well since it is used by a lot of people.

> an example that mimics /proc/pid/maps.
> 
> Changes v1 => v2:
>    1. Small fixes in task_iter.c and the selftests. (Yonghong)
> 
> Song Liu (4):
>    bpf: introduce task_vma bpf_iter
>    bpf: allow bpf_d_path in sleepable bpf_iter program
>    libbpf: introduce section "iter.s/" for sleepable bpf_iter program
>    selftests/bpf: add test for bpf_iter_task_vma
> 
>   include/linux/bpf.h                           |   2 +-
>   kernel/bpf/task_iter.c                        | 205 +++++++++++++++++-
>   kernel/trace/bpf_trace.c                      |   5 +
>   tools/lib/bpf/libbpf.c                        |   5 +
>   .../selftests/bpf/prog_tests/bpf_iter.c       | 106 ++++++++-
>   tools/testing/selftests/bpf/progs/bpf_iter.h  |   9 +
>   .../selftests/bpf/progs/bpf_iter_task_vma.c   |  55 +++++
>   7 files changed, 375 insertions(+), 12 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> 
> --
> 2.24.1
>
Song Liu Dec. 16, 2020, 5:35 p.m. UTC | #2
> On Dec 16, 2020, at 9:00 AM, Yonghong Song <yhs@fb.com> wrote:
> 
> 
> 
> On 12/15/20 3:36 PM, Song Liu wrote:
>> This set introduces bpf_iter for task_vma, which can be used to generate
>> information similar to /proc/pid/maps or /proc/pid/smaps. Patch 4/4 adds
> 
> I did not see an example for /proc/pid/smaps. It would be good if you can cover smaps as well since it is used by a lot of people.

smaps is tricky, as it contains a lot of information, and some of these information
require architecture and configuration specific logic, e.g., page table structure. 
To really mimic smaps, we will probably need a helper for smap_gather_stats().
However, I don't think that's really necessary. I think task_vma iterator is most
useful in gathering information that are not presented in smaps. For example, if a
vma covers mixed 2MB pages and 4kB pages, smaps won't show which address ranges are
backed by 2MB pages. 

I have a test BPF program that parses 4-level x86 page table for huge pages. Since
we need bounded-loop to parse the page table, the program won't work well for too
big vma. We can probably add this program to samples/bpf/, but I think it is not
a good fit for selftests. 

Thanks,
Song
Yonghong Song Dec. 16, 2020, 6:31 p.m. UTC | #3
On 12/16/20 9:35 AM, Song Liu wrote:
> 
> 
>> On Dec 16, 2020, at 9:00 AM, Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 12/15/20 3:36 PM, Song Liu wrote:
>>> This set introduces bpf_iter for task_vma, which can be used to generate
>>> information similar to /proc/pid/maps or /proc/pid/smaps. Patch 4/4 adds
>>
>> I did not see an example for /proc/pid/smaps. It would be good if you can cover smaps as well since it is used by a lot of people.
> 
> smaps is tricky, as it contains a lot of information, and some of these information
> require architecture and configuration specific logic, e.g., page table structure.
> To really mimic smaps, we will probably need a helper for smap_gather_stats().
> However, I don't think that's really necessary. I think task_vma iterator is most
> useful in gathering information that are not presented in smaps. For example, if a
> vma covers mixed 2MB pages and 4kB pages, smaps won't show which address ranges are
> backed by 2MB pages.

Let us remove "/proc/pid/smaps" from cover letter description then.
Could you add the above information to the patch #1 (and possibly cover 
letter as well) of the series? This can serve one of reasons why we
introduce task_vma iter. Maybe you want to extend bpf programs to
cover this use case.

> 
> I have a test BPF program that parses 4-level x86 page table for huge pages. Since
> we need bounded-loop to parse the page table, the program won't work well for too
> big vma. We can probably add this program to samples/bpf/, but I think it is not
> a good fit for selftests.

This can be done later.

> 
> Thanks,
> Song
>