mbox series

[v9,0/5] ceph: add perf metrics support

Message ID 1583739430-4928-1-git-send-email-xiubli@redhat.com (mailing list archive)
Headers show
Series ceph: add perf metrics support | expand

Message

Xiubo Li March 9, 2020, 7:37 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

Changed in V9:
- add an r_ended field to the mds request struct and use that to calculate the metric
- fix some commit comments

We can get the metrics from the debugfs:

$ cat /sys/kernel/debug/ceph/0c93a60d-5645-4c46-8568-4c8f63db4c7f.client4267/metrics 
item          total       sum_lat(us)     avg_lat(us)
-----------------------------------------------------
read          13          417000          32076
write         42          131205000       3123928
metadata      104         493000          4740

item          total           miss            hit
-------------------------------------------------
d_lease       204             0               918
caps          204             213             368218


Xiubo Li (5):
  ceph: add global dentry lease metric support
  ceph: add caps perf metric for each session
  ceph: add global read latency metric support
  ceph: add global write latency metric support
  ceph: add global metadata perf metric support

 fs/ceph/acl.c                   |   2 +-
 fs/ceph/addr.c                  |  18 +++++++
 fs/ceph/caps.c                  |  19 ++++++++
 fs/ceph/debugfs.c               |  71 ++++++++++++++++++++++++++--
 fs/ceph/dir.c                   |  17 ++++++-
 fs/ceph/file.c                  |  26 ++++++++++
 fs/ceph/inode.c                 |   4 +-
 fs/ceph/mds_client.c            | 102 +++++++++++++++++++++++++++++++++++++++-
 fs/ceph/mds_client.h            |   7 ++-
 fs/ceph/metric.h                |  69 +++++++++++++++++++++++++++
 fs/ceph/super.h                 |   9 ++--
 fs/ceph/xattr.c                 |   4 +-
 include/linux/ceph/osd_client.h |   1 +
 net/ceph/osd_client.c           |   2 +
 14 files changed, 334 insertions(+), 17 deletions(-)
 create mode 100644 fs/ceph/metric.h

Comments

Jeff Layton March 9, 2020, 12:09 p.m. UTC | #1
On Mon, 2020-03-09 at 03:37 -0400, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Changed in V9:
> - add an r_ended field to the mds request struct and use that to calculate the metric
> - fix some commit comments
> 
> We can get the metrics from the debugfs:
> 
> $ cat /sys/kernel/debug/ceph/0c93a60d-5645-4c46-8568-4c8f63db4c7f.client4267/metrics 
> item          total       sum_lat(us)     avg_lat(us)
> -----------------------------------------------------
> read          13          417000          32076
> write         42          131205000       3123928
> metadata      104         493000          4740
> 
> item          total           miss            hit
> -------------------------------------------------
> d_lease       204             0               918
> caps          204             213             368218
> 

Thanks Xiubo! This looks good. One minor issue with the cap patch, but I
can just fix that up before merging if you're ok with my proposed
change.

Beyond this...while average latency is a good metric, it's often not
enough to help diagnose problems. I wonder if we ought to be at least
tracking min/max latency for all calls too. I wonder if there's way to
track standard deviation too? That would be really nice to have.

Cheers,
Xiubo Li March 9, 2020, 12:35 p.m. UTC | #2
On 2020/3/9 20:09, Jeff Layton wrote:
> On Mon, 2020-03-09 at 03:37 -0400, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Changed in V9:
>> - add an r_ended field to the mds request struct and use that to calculate the metric
>> - fix some commit comments
>>
>> We can get the metrics from the debugfs:
>>
>> $ cat /sys/kernel/debug/ceph/0c93a60d-5645-4c46-8568-4c8f63db4c7f.client4267/metrics
>> item          total       sum_lat(us)     avg_lat(us)
>> -----------------------------------------------------
>> read          13          417000          32076
>> write         42          131205000       3123928
>> metadata      104         493000          4740
>>
>> item          total           miss            hit
>> -------------------------------------------------
>> d_lease       204             0               918
>> caps          204             213             368218
>>
> Thanks Xiubo! This looks good. One minor issue with the cap patch, but I
> can just fix that up before merging if you're ok with my proposed
> change.
>
> Beyond this...while average latency is a good metric, it's often not
> enough to help diagnose problems. I wonder if we ought to be at least
> tracking min/max latency for all calls too. I wonder if there's way to
> track standard deviation too? That would be really nice to have.

yeah, the min/max latencies here make sense, it is on my todo list and I 
will do it after this patch series.

And for the standard deviation I will try to have a investigate of it.

Thanks

> Cheers,