Message ID | 20190501042831.5313-3-chaitanya.kulkarni@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blktrace: add blktrace extension support | expand |
On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote: > @@ -104,7 +120,12 @@ struct blk_io_trace { > __u64 time; /* in nanoseconds */ > __u64 sector; /* disk offset */ > __u32 bytes; /* transfer length */ > + > +#ifdef CONFIG_BLKTRACE_EXT > + __u64 action; /* what happened */ > +#else > __u32 action; /* what happened */ > +#endif /* CONFIG_BLKTRACE_EXT */ You can't use CONFIG_ symbols in UAPI headers, as userspace applications won't set it. You also can't ever change the layout of an existing structure in UAPI headers in not backward compatible way.
Christoph Hellwig <hch@infradead.org> writes: > On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote: >> @@ -104,7 +120,12 @@ struct blk_io_trace { >> __u64 time; /* in nanoseconds */ >> __u64 sector; /* disk offset */ >> __u32 bytes; /* transfer length */ >> + >> +#ifdef CONFIG_BLKTRACE_EXT >> + __u64 action; /* what happened */ >> +#else >> __u32 action; /* what happened */ >> +#endif /* CONFIG_BLKTRACE_EXT */ > > You can't use CONFIG_ symbols in UAPI headers, as userspace > applications won't set it. You also can't ever change the layout of an > existing structure in UAPI headers in not backward compatible way. Right. The blk_io_trace->magic has the lower 8 bits reserved for a version number which is checked by userspace. There's no way to negotiate a supported version between userspace and the kernel, unfortunately. The version number is checked for each trace event. What you *could* do is to add another trace event with a higher version number that includes only the extra data. So each event would be split into two: the original event with original content and the new event that only contains the new fields. That way the old userspace would continue to work, as it would discard the trace events it doesn't recognize. Newer userspace could handle both types of events, and merge them back together. There would be a ton of warnings spewed on stderr, unfortunately, but it would at least work. I don't see a lot of value in the kernel config option, no matter which way we go with this. -Jeff
Thanks for the reply Jeff. On 5/1/19 5:56 AM, Jeff Moyer wrote: > Christoph Hellwig <hch@infradead.org> writes: > >> On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote: >>> @@ -104,7 +120,12 @@ struct blk_io_trace { >>> __u64 time; /* in nanoseconds */ >>> __u64 sector; /* disk offset */ >>> __u32 bytes; /* transfer length */ >>> + >>> +#ifdef CONFIG_BLKTRACE_EXT >>> + __u64 action; /* what happened */ >>> +#else >>> __u32 action; /* what happened */ >>> +#endif /* CONFIG_BLKTRACE_EXT */ >> >> You can't use CONFIG_ symbols in UAPI headers, as userspace >> applications won't set it. You also can't ever change the layout of an >> existing structure in UAPI headers in not backward compatible way. > > Right. The blk_io_trace->magic has the lower 8 bits reserved for a > version number which is checked by userspace. There's no way to > negotiate a supported version between userspace and the kernel, > unfortunately. The version number is checked for each trace event. > > What you *could* do is to add another trace event with a higher version > number that includes only the extra data. So each event would be split > into two: the original event with original content and the new event > that only contains the new fields. That way the old userspace would > continue to work, as it would discard the trace events it doesn't > recognize. Newer userspace could handle both types of events, and merge > them back together. > > There would be a ton of warnings spewed on stderr, unfortunately, but it > would at least work. I don't see a lot of value in the kernel config > option, no matter which way we go with this. > As you have mentioned this approach will have a lot of stderr, I was trying to avoid this scenario. If everyone is okay with this will make this change and resend the series. > -Jeff >
Thanks for looking into this. On 5/1/19 5:31 AM, Christoph Hellwig wrote: > On Tue, Apr 30, 2019 at 09:28:15PM -0700, Chaitanya Kulkarni wrote: >> @@ -104,7 +120,12 @@ struct blk_io_trace { >> __u64 time; /* in nanoseconds */ >> __u64 sector; /* disk offset */ >> __u32 bytes; /* transfer length */ >> + >> +#ifdef CONFIG_BLKTRACE_EXT >> + __u64 action; /* what happened */ >> +#else >> __u32 action; /* what happened */ >> +#endif /* CONFIG_BLKTRACE_EXT */ > > You can't use CONFIG_ symbols in UAPI headers, as userspace > applications won't set it. You also can't ever change the layout of an > existing structure in UAPI headers in not backward compatible way. > Jeff has suggested another approach, if everyone is okay with that approach will send out the series with that change. Please let me know if you have more comments.
diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h index 690621b610e5..c34cf752a9a1 100644 --- a/include/uapi/linux/blktrace_api.h +++ b/include/uapi/linux/blktrace_api.h @@ -8,30 +8,42 @@ * Trace categories */ enum blktrace_cat { - BLK_TC_READ = 1 << 0, /* reads */ - BLK_TC_WRITE = 1 << 1, /* writes */ - BLK_TC_FLUSH = 1 << 2, /* flush */ - BLK_TC_SYNC = 1 << 3, /* sync IO */ - BLK_TC_SYNCIO = BLK_TC_SYNC, - BLK_TC_QUEUE = 1 << 4, /* queueing/merging */ - BLK_TC_REQUEUE = 1 << 5, /* requeueing */ - BLK_TC_ISSUE = 1 << 6, /* issue */ - BLK_TC_COMPLETE = 1 << 7, /* completions */ - BLK_TC_FS = 1 << 8, /* fs requests */ - BLK_TC_PC = 1 << 9, /* pc requests */ - BLK_TC_NOTIFY = 1 << 10, /* special message */ - BLK_TC_AHEAD = 1 << 11, /* readahead */ - BLK_TC_META = 1 << 12, /* metadata */ - BLK_TC_DISCARD = 1 << 13, /* discard requests */ - BLK_TC_DRV_DATA = 1 << 14, /* binary per-driver data */ - BLK_TC_FUA = 1 << 15, /* fua requests */ - - BLK_TC_END = 1 << 15, /* we've run out of bits! */ + BLK_TC_READ = 1 << 0, /* reads */ + BLK_TC_WRITE = 1 << 1, /* writes */ + BLK_TC_FLUSH = 1 << 2, /* flush */ + BLK_TC_SYNC = 1 << 3, /* sync IO */ + BLK_TC_SYNCIO = BLK_TC_SYNC, + BLK_TC_QUEUE = 1 << 4, /* queueing/merging */ + BLK_TC_REQUEUE = 1 << 5, /* requeueing */ + BLK_TC_ISSUE = 1 << 6, /* issue */ + BLK_TC_COMPLETE = 1 << 7, /* completions */ + BLK_TC_FS = 1 << 8, /* fs requests */ + BLK_TC_PC = 1 << 9, /* pc requests */ + BLK_TC_NOTIFY = 1 << 10, /* special message */ + BLK_TC_AHEAD = 1 << 11, /* readahead */ + BLK_TC_META = 1 << 12, /* metadata */ + BLK_TC_DISCARD = 1 << 13, /* discard requests */ + BLK_TC_DRV_DATA = 1 << 14, /* binary per-driver data */ + BLK_TC_FUA = 1 << 15, /* fua requests */ + +#ifdef CONFIG_BLKTRACE_EXT + BLK_TC_WRITE_ZEROES = 1 << 16, /* write-zeores */ + BLK_TC_ZONE_RESET = 1 << 17, /* zone-reset */ + + BLK_TC_END = 1 << 31, /* we've run out of bits! */ +#else + BLK_TC_END = 1 << 16, /* we've run out of bits! */ +#endif /* CONFIG_BLKTRACE_EXT */ }; +#ifdef CONFIG_BLKTRACE_EXT +#define BLK_TC_SHIFT (32) +#define BLK_TC_ACT(act) (((u64)act) << BLK_TC_SHIFT) +#else #define BLK_TC_SHIFT (16) #define BLK_TC_ACT(act) ((act) << BLK_TC_SHIFT) +#endif /* CONFIG_BLKTRACE_EXT */ /* * Basic trace actions */ @@ -93,7 +105,11 @@ enum blktrace_notify { #define BLK_TN_MESSAGE (__BLK_TN_MESSAGE | BLK_TC_ACT(BLK_TC_NOTIFY)) #define BLK_IO_TRACE_MAGIC 0x65617400 +#ifdef CONFIG_BLKTRACE_EXT +#define BLK_IO_TRACE_VERSION 0x08 +#else #define BLK_IO_TRACE_VERSION 0x07 +#endif /* CONFIG_BLKTRACE_EXT */ /* * The trace itself @@ -104,7 +120,12 @@ struct blk_io_trace { __u64 time; /* in nanoseconds */ __u64 sector; /* disk offset */ __u32 bytes; /* transfer length */ + +#ifdef CONFIG_BLKTRACE_EXT + __u64 action; /* what happened */ +#else __u32 action; /* what happened */ +#endif /* CONFIG_BLKTRACE_EXT */ __u32 pid; /* who did it */ __u32 device; /* device number */ __u32 cpu; /* on what cpu did it happen */ @@ -135,7 +156,11 @@ enum { */ struct blk_user_trace_setup { char name[BLKTRACE_BDEV_SIZE]; /* output */ +#ifdef CONFIG_BLKTRACE_EXT + __u64 act_mask; /* input */ +#else __u16 act_mask; /* input */ +#endif /* CONFIG_BLKTRACE_EXT */ __u32 buf_size; /* input */ __u32 buf_nr; /* input */ __u64 start_lba;
Now that we have increase the size of action mask we can now safely add new blktrace actions and extend the code to track more block layer request flags. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> --- include/uapi/linux/blktrace_api.h | 63 +++++++++++++++++++++---------- 1 file changed, 44 insertions(+), 19 deletions(-)