From patchwork Wed Sep 7 08:10:27 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: akash.goel@intel.com X-Patchwork-Id: 9318499 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DC5E8607D3 for ; Wed, 7 Sep 2016 08:10:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA8D029136 for ; Wed, 7 Sep 2016 08:10:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9EF5A29139; Wed, 7 Sep 2016 08:10:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 03AC829136 for ; Wed, 7 Sep 2016 08:10:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 91F0A6E8F0; Wed, 7 Sep 2016 08:10:32 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id A59026E8EE for ; Wed, 7 Sep 2016 08:10:30 +0000 (UTC) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP; 07 Sep 2016 01:10:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.30,295,1470726000"; d="scan'208"; a="1046685024" Received: from akashgoe-mobl2.gar.corp.intel.com (HELO [10.223.184.149]) ([10.223.184.149]) by orsmga002.jf.intel.com with ESMTP; 07 Sep 2016 01:10:28 -0700 To: Tvrtko Ursulin , intel-gfx@lists.freedesktop.org References: <1473158613-12006-1-git-send-email-akash.goel@intel.com> <57CEC1DA.6090009@linux.intel.com> <57CEE620.3090600@linux.intel.com> From: "Goel, Akash" Message-ID: <2f260f3b-bf49-45e7-0a05-a1fa0b75461a@intel.com> Date: Wed, 7 Sep 2016 13:40:27 +0530 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <57CEE620.3090600@linux.intel.com> Cc: akash.goel@intel.com Subject: Re: [Intel-gfx] [PATCH] tools/intel_guc_logger: Utility for capturing GuC firmware logs in a file X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On 9/6/2016 9:22 PM, Tvrtko Ursulin wrote: > > On 06/09/16 16:33, Goel, Akash wrote: >> On 9/6/2016 6:47 PM, Tvrtko Ursulin wrote: >>> Hi, >>> >>> On 06/09/16 11:43, akash.goel@intel.com wrote: >>>> From: Akash Goel >>>> >>>> This patch provides a test utility which helps capture GuC firmware >>>> logs and >>>> then dump them to file. >>>> The logs are pulled from a debugfs file >>>> '/sys/kernel/debug/dri/guc_log' and >>>> stored into a file '/tmp/guc_log_dump.dat', the name of the output >>>> file can >>>> be changed through a command line argument. >>>> >>>> The utility goes into an infinite loop where it waits for the arrival >>>> of new >>>> logs and as soon as new set of logs are produced it captures them in >>>> its local >>>> buffer which is then flushed out to the file on disk. >>>> Any time when logging needs to be ended, User can stop this utility >>>> (CTRL+C). >>>> >>>> Before entering into a loop, it first discards whatever logs are >>>> present in >>>> the debugfs file. >>>> This way User can first launch this utility and then start a >>>> workload/activity >>>> for which GuC firmware logs are to be actually captured and keep >>>> running the >>>> utility for as long as its needed, like once the workload is over this >>>> utility >>>> can be forcefully stopped. >>>> >>>> If the logging wasn't enabled on GuC side by the Driver at boot time, >>>> utility >>>> will first enable the logging and later on when it is stopped (CTRL+C) >>>> it will >>>> also pause the logging on GuC side. >>>> >>>> Signed-off-by: Akash Goel >>>> --- >>>> tools/Makefile.sources | 1 + >>>> tools/intel_guc_logger.c | 441 >>>> +++++++++++++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 442 insertions(+) >>>> create mode 100644 tools/intel_guc_logger.c >>>> >>>> diff --git a/tools/Makefile.sources b/tools/Makefile.sources >>>> index 2bb6c8e..be58871 100644 >>>> --- a/tools/Makefile.sources >>>> +++ b/tools/Makefile.sources >>>> @@ -19,6 +19,7 @@ tools_prog_lists = \ >>>> intel_gpu_time \ >>>> intel_gpu_top \ >>>> intel_gtt \ >>>> + intel_guc_logger \ >>>> intel_infoframes \ >>>> intel_l3_parity \ >>>> intel_lid \ >>>> diff --git a/tools/intel_guc_logger.c b/tools/intel_guc_logger.c >>>> new file mode 100644 >>>> index 0000000..92172fa >>>> --- /dev/null >>>> +++ b/tools/intel_guc_logger.c >>>> @@ -0,0 +1,441 @@ >>>> + >>>> +#define _GNU_SOURCE /* For using O_DIRECT */ >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> + >>>> +#include "igt.h" >>>> + >>>> +#define MB(x) ((uint64_t)(x) * 1024 * 1024) >>>> +#ifndef PAGE_SIZE >>>> + #define PAGE_SIZE 4096 >>>> +#endif >>>> +#define SUBBUF_SIZE (19*PAGE_SIZE) >>>> +/* Need large buffering from logger side to hide the DISK IO latency, >>>> Driver >>>> + * can only store 8 snapshots of GuC log buffer in relay. >>>> + */ >>>> +#define NUM_SUBBUFS 100 >>>> + >>>> +#define RELAY_FILE_NAME "guc_log" >>>> +#define CONTROL_FILE_NAME "i915_guc_log_control" >>>> + >>>> +char *read_buffer; >>>> +char *out_filename; >>>> +int poll_timeout = 2; /* by default 2ms timeout */ >>>> +pthread_mutex_t mutex; >>>> +pthread_t flush_thread; >>>> +int verbosity_level = 3; /* by default capture logs at max >>>> verbosity */ >>>> +uint32_t produced, consumed; >>>> +uint64_t total_bytes_written; >>>> +int num_buffers = NUM_SUBBUFS; >>>> +int relay_fd, outfile_fd = -1; >>>> +bool stop_logging, discard_oldlogs; >>>> +uint32_t test_duration, max_filesize; >>>> +pthread_cond_t underflow_cond, overflow_cond; >>>> + >>>> +static void guc_log_control(bool enable_logging) >>>> +{ >>>> + int control_fd; >>>> + char data[19]; >>>> + uint64_t val; >>>> + int ret; >>>> + >>>> + control_fd = igt_debugfs_open(CONTROL_FILE_NAME, O_WRONLY); >>>> + if (control_fd < 0) >>>> + igt_assert_f(0, "Couldn't open the guc log control file"); >>>> + >>>> + val = enable_logging ? ((verbosity_level << 4) | 0x1) : 0; >>>> + >>>> + snprintf(data, sizeof(data), "0x%" PRIx64, val); >>>> + ret = write(control_fd, data, strlen(data) + 1); >>> >>> Minor: It looks safe like it is but something like below would maybe be >>> more robust? >>> >>> ret = snprintf(data, sizeof(data), "0x%" PRIx64, val); >>> igt_assert(ret > 2 && ret < sizeof(data)); >> >> ok will add, but possibility of failure will be really remote here. >> but igt_assert(ret > 0) should suffice. > > Yes there is no possibility for failure as it stands, just more robust > implementation should someone change something in the future. That's why > I said you could also decide to keep it as is. My version also avoided > the strlen since snprintf already tells you that. > fine, will use your version then. >>> ret = write(control_fd, data, ret); >>> igt_assert(ret > 0); // assuming short writes can't happen >>> >>> Up to you. >>> >>>> + if (ret < 0) >>>> + igt_assert_f(0, "Couldn't write to the log control file"); >>>> + >>>> + close(control_fd); >>>> +} >>>> + >>>> +static void int_sig_handler(int sig) >>>> +{ >>>> + igt_info("Received signal %d\n", sig); >>>> + >>>> + stop_logging = true; >>>> +} >>>> + >>>> +static void pull_leftover_data(void) >>>> +{ >>>> + unsigned int bytes_read = 0; >>>> + int ret; >>>> + >>>> + while (1) { >>>> + /* Read the logs from relay buffer */ >>>> + ret = read(relay_fd, read_buffer, SUBBUF_SIZE); >>>> + if (!ret) >>>> + break; >>>> + else if (ret < 0) >>>> + igt_assert_f(0, "Failed to read from the guc log file"); >>>> + else if (ret < SUBBUF_SIZE) >>>> + igt_assert_f(0, "invalid read from relay file"); >>>> + >>>> + bytes_read += ret; >>>> + >>>> + if (outfile_fd > 0) { >>> >>>> = 0 I think. Or is it even needed since open_output_file asserts if it >>> fails to open? >>> >> Actually pull_leftover_data() will be called twice, once before opening >> the outfile. This is for purging the old logs, if enabled. > > Oh right, OK then. > >> >>>> + ret = write(outfile_fd, read_buffer, SUBBUF_SIZE); >>>> + if (ret < 0) >>>> + igt_assert_f(0, "Couldn't dump the logs in a file"); >>> >>> Do you need to check for short writes as well here? Man page for write >>> says it can happen with out of disk space or signals. >>> >> To keep things simple can just check 'ret < SUBBUF_SIZE' considering >> out of disk space as a serious failure or add a loop here which will run >> until SUBBUF_SIZE bytes have been written. > > Just assert, no point in complicating things. > Fine will assert on (ret < SUBBUF_SIZE) >>>> + total_bytes_written += ret; >>>> + } >>>> + }; >>>> + >>>> + igt_debug("%u bytes flushed\n", bytes_read); >>>> +} >>>> + >>>> +static int num_filled_bufs(void) >>>> +{ >>>> + return (produced - consumed); >>>> +} >>>> + >>>> +static void pull_data(void) >>>> +{ >>>> + char *ptr; >>>> + int ret; >>>> + >>>> + pthread_mutex_lock(&mutex); >>>> + while (num_filled_bufs() >= num_buffers) { >>>> + igt_debug("overflow, will wait, produced %u, consumed %u\n", >>>> produced, consumed); >>>> + /* Stall the main thread in case of overflow, as there are no >>>> + * buffers available to store the new logs, otherwise there >>>> + * could be corruption if both threads work on the same >>>> buffer. >>>> + */ >>>> + pthread_cond_wait(&overflow_cond, &mutex); >>>> + }; >>>> + pthread_mutex_unlock(&mutex); >>>> + >>>> + ptr = read_buffer + (produced % num_buffers) * SUBBUF_SIZE; >>>> + >>>> + /* Read the logs from relay buffer */ >>>> + ret = read(relay_fd, ptr, SUBBUF_SIZE); >>>> + if (ret < 0) >>>> + igt_assert_f(0, "Failed to read from the guc log file"); >>>> + else if (ret && (ret < SUBBUF_SIZE)) >>>> + igt_assert_f(0, "invalid read from relay file"); >>>> + >>>> + pthread_mutex_lock(&mutex); >>>> + if (ret) { >>>> + produced++; >>>> + pthread_cond_signal(&underflow_cond); >>>> + } >>>> + pthread_mutex_unlock(&mutex); >>> >>> Very minor: can move the mutex under the if. >>> >> Fine will move under the 'if'. >> >>>> +} >>>> + >>>> +static void *flusher(void *arg) >>>> +{ >>>> + char *ptr; >>>> + int ret; >>>> + >>>> + igt_debug("Execution started of flusher thread\n"); >>>> + >>>> + do { >>>> + pthread_mutex_lock(&mutex); >>>> + while (!num_filled_bufs()) { >>>> + /* Exit only after completing the flush of all the filled >>>> + * buffers as User would expect that all logs captured up >>>> + * till the point of interruption/exit are written out to >>>> + * the disk file. >>>> + */ >>>> + if (stop_logging) { >>>> + igt_debug("flusher to exit now\n"); >>>> + pthread_mutex_unlock(&mutex); >>>> + return NULL; >>>> + } >>>> + pthread_cond_wait(&underflow_cond, &mutex); >>>> + }; >>>> + pthread_mutex_unlock(&mutex); >>>> + >>>> + ptr = read_buffer + (consumed % num_buffers) * SUBBUF_SIZE; >>>> + >>>> + ret = write(outfile_fd, ptr, SUBBUF_SIZE); >>>> + if (ret < 0) >>>> + igt_assert_f(0, "Couldn't dump the logs in a file"); >>> >>> Question about short writes applies here as well. >>> >> Fine will do the same here whatever is decided for above. >> >>>> + >>>> + total_bytes_written += ret; >>>> + if (max_filesize && (total_bytes_written > >>>> MB(max_filesize))) { >>>> + igt_debug("Reached the target of %" PRIu64 " bytes\n", >>>> MB(max_filesize)); >>>> + stop_logging = true; >>>> + } >>>> + >>>> + pthread_mutex_lock(&mutex); >>>> + consumed++; >>>> + pthread_cond_signal(&overflow_cond); >>>> + pthread_mutex_unlock(&mutex); >>>> + } while(1); >>> >>> Minor again: You use both "while(1) {}" and "do { } while(1)" in here. >>> Maybe stick with one? :D (I think "for (;;)" is preferred, but maybe I >>> am misremembering things.) >>> >> Sorry for the inconsistency, will clean this up. >> >>>> + >>>> + return NULL; >>>> +} >>>> + >>>> +static void init_flusher_thread(void) >>>> +{ >>>> + struct sched_param thread_sched; >>>> + pthread_attr_t p_attr; >>>> + >>>> + pthread_cond_init(&underflow_cond, NULL); >>>> + pthread_cond_init(&overflow_cond, NULL); >>>> + pthread_mutex_init(&mutex, NULL); >>>> + >>>> + if (pthread_attr_init(&p_attr)) >>>> + igt_assert_f(0, "Error obtaining default thread attributes"); >>>> + >>>> + if (pthread_attr_setinheritsched(&p_attr, PTHREAD_EXPLICIT_SCHED)) >>>> + igt_assert_f(0, "Couldn't set inheritsched"); >>>> + >>>> + if (pthread_attr_setschedpolicy(&p_attr, SCHED_RR)) >>>> + igt_assert_f(0, "Couldn't set thread scheduling policy"); >>>> + >>>> + /* Keep the flusher task also at rt priority, so that it doesn't >>>> get >>>> + * too late in flushing the collected logs in local buffers to >>>> the disk, >>>> + * and so main thread always have spare buffers to collect the >>>> logs. >>>> + */ >>>> + thread_sched.sched_priority = 5; >>>> + if (pthread_attr_setschedparam(&p_attr, &thread_sched)) >>>> + igt_assert_f(0, "Couldn't set thread priority"); >>>> + >>>> + if (pthread_create(&flush_thread, &p_attr, flusher, NULL)) >>>> + igt_assert_f(0, "thread creation failed"); >>> >>> More nitpicks: You mix starting error messages with upper case and lower >>> case. >> >> Will use the lower case for all error messages. fine ? > > Fine with me. > Thanks >>> >>>> + >>>> + if (pthread_attr_destroy(&p_attr)) >>>> + igt_assert_f(0, "Error destroying thread attributes"); >>>> +} >>>> + >>>> +static void open_relay_file(void) >>>> +{ >>>> + relay_fd = igt_debugfs_open(RELAY_FILE_NAME, O_RDONLY); >>>> + if (relay_fd < 0) >>>> + igt_assert_f(0, "Couldn't open the guc log file"); >>>> + >>>> + /* Purge the old/boot-time logs from the relay buffer. >>>> + * This is more for Val team's requirement, where they have to >>>> first >>>> + * purge the existing logs before starting the tests for which >>>> the logs >>>> + * are actually needed. After this logger will enter into a loop >>>> and >>>> + * wait for the new data, at that point benchmark can be launched >>>> from >>>> + * a different shell. >>>> + */ >>>> + if (discard_oldlogs) >>>> + pull_leftover_data(); >>>> +} >>>> + >>>> +static void open_output_file(void) >>>> +{ >>>> + char *filename; >>>> + >>>> + if (asprintf(&filename, "/tmp/%s", out_filename ? : >>>> "guc_log_dump.dat") < 0) >>>> + igt_assert_f(0, "Couldn't allocate the filename"); >>> >>> I think you need to take the out_filename when specified by the user >>> as-is and not stick it in /tmp. It is perfectly reasonable that someone >>> would want to put it somewhere else. >>> >> Ok so User should provide the full path. > > It can be relative, absolute, or just a filename. My suggestion was to > just use what the user provided for the open(2) call. > Yes the path provided can be absolute/relative, will do the open() call with whatever string is provided as a command line. >> >>> Also better to move the default name of "guc_log_dump.dat" to where >>> defaults like RELAY_FILE_NAME and such are defined. >>> >> fine. >> >>> And I would not default to /tmp but the current directory. >>> >> Chris had suggested to use /tmp. Fine will use the current directory. >> >>>> + >>>> + /* Use Direct IO mode for the output file, as the data written is >>>> not >>>> + * supposed to be accessed again, this saves a copy of data from >>>> App's >>>> + * buffer to kernel buffer (Page cache). Due to no buffering on >>>> kernel >>>> + * side, data is flushed out to disk faster and more buffering >>>> can be >>>> + * done on the logger side to hide the disk IO latency. >>>> + */ >>>> + outfile_fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC | >>>> O_DIRECT, 0444); >>> >>> 0400 or 0440 would be better I think. >> >> fine, this might make the file inaccessible to Root User in case the >> logger wasn't running as a Root ? > > I assumed logger will run as root and it would be preferable not to have > log files accessible by anyone. Either way root can read anything > regardless of permissions, no? > Sorry after working on selinux enabled setups (in Android, where there are certain restrictions on Root also), forgot that Root can do anything. >>> >>>> + if (outfile_fd < 0) >>>> + igt_assert_f(0, "Couldn't open the output file"); >>>> + >>>> + free(out_filename); >>>> + free(filename); >>>> +} >>>> + >>>> +static void init_main_thread(void) >>>> +{ >>>> + struct sched_param thread_sched; >>>> + /* Run the main thread at highest priority to ensure that it >>>> always >>>> + * gets woken-up at earliest on arrival of new data and so is >>>> always >>>> + * ready to pull the logs, otherwise there could be loss logs if >>>> + * GuC firmware is generating logs at a very high rate. >>>> + */ >>>> + thread_sched.sched_priority = 1; >>>> + if (sched_setscheduler(getpid(), SCHED_FIFO, &thread_sched)) >>>> + igt_assert_f(0, "Couldn't set the priority"); >>>> + >>>> + if (signal(SIGINT, int_sig_handler) == SIG_ERR) >>>> + igt_assert_f(0, "SIGINT handler registration failed"); >>>> + >>>> + /* Need an aligned pointer for direct IO */ >>>> + if (posix_memalign((void **)&read_buffer, PAGE_SIZE, num_buffers >>>> * SUBBUF_SIZE)) >>>> + igt_assert_f(0, "Couldn't allocate the read buffer"); >>>> + >>>> + /* Keep the pages locked in RAM, avoid page fault overhead */ >>>> + if (mlock(read_buffer, num_buffers * SUBBUF_SIZE)) >>>> + igt_assert_f(0, "Failed to lock memory"); >>>> + >>>> + /* Enable the logging, it may not have been enabled from boot >>>> and so >>>> + * the relay file also wouldn't have been created. >>>> + */ >>>> + guc_log_control(true); >>>> + >>>> + open_relay_file(); >>>> + open_output_file(); >>>> +} >>>> + >>>> +static int parse_options(int opt, int opt_index, void *data) >>>> +{ >>>> + igt_debug("opt %c optarg %s\n", opt, optarg); >>>> + >>>> + switch(opt) { >>>> + case 'v': >>>> + verbosity_level = atoi(optarg); >>>> + if (verbosity_level < 0) >>>> + verbosity_level = 0; >>>> + else if (verbosity_level > 3) >>>> + verbosity_level = 3; >>> >>> Notify user if invalid value was provided? >>> >> Just notify but don't abort and continue by using a valid value ? > > No strong opinion. Either is much better than silently using a different > value or ignoring the fact user has typed in garbage on the command line. > Ok will report/notify and then abort. Will do the same for other arguments. Is it fine ? >> >>>> + >>>> + igt_debug("verbosity level to be used is %d\n", >>>> verbosity_level); >>>> + break; >>>> + case 'o': >>>> + if (asprintf(&out_filename, "%s", optarg) < 0) >>> >>> strdup? >> ok will try to use strdup. >>> >>>> + igt_assert_f(0, "Couldn't allocate the o/p filename"); >>>> + igt_debug("logs to be stored in file /tmp/%s\n", >>>> out_filename); >>>> + break; >>>> + case 'b': >>>> + num_buffers = atoi(optarg); >>>> + if (num_buffers == 0) >>>> + num_buffers = NUM_SUBBUFS; >>> >>> Again notify if invalid number/value. >>> >> same doubt as above. >> >>>> + igt_debug("number of buffers to be used is %d\n", >>>> num_buffers); >>>> + break; >>>> + case 't': >>>> + test_duration = atoi(optarg); >>>> + if (test_duration > 0) >>> >>> If switch was specified it should have a valid optarg or error reported. >>> >>>> + igt_debug("logger to run for %d seconds\n", >>>> test_duration); >>>> + break; >>>> + case 'p': >>>> + poll_timeout = atoi(optarg); >>>> + if (poll_timeout == 0) >>>> + poll_timeout = 2; /* at least use 2ms poll timeout */ >>> >>> Hm I would prefer if you allowed no poll period. >> >> by no poll period, do you mean indefinite wait ? >> >> That can be done by specifying -1 as a timeout value in the command line, >> >> "-p --polltimeout=ms polling timeout in ms, -1 == indefinite wait for >> the new data\n" > > My bad, I missed that. All good then. > >> >>> I even think it should be the default if we think relayfs will be >>> definitely fixed to not >>> require it. >>> >> Can keep the indefinite wait as a default. >> >>> In any case, commend is wrong since user can specify 1ms and get away >>> with it. :) >>> >>>> + if (poll_timeout > 0) >>>> + igt_debug("polling to be done with %d milli seconds >>>> timeout\n", poll_timeout); >>> >>> milliseconds >>> >>>> + break; >>>> + case 's': >>>> + max_filesize = atoi(optarg); >>>> + if (max_filesize > 0) >>> >>> Same as for -t. >>> >>>> + igt_debug("max allowed size of the output file is %d >>>> MBs\n", max_filesize); >>>> + break; >>>> + case 'd': >>>> + discard_oldlogs = true; >>>> + igt_debug("old/boot-time logs will be discarded\n"); >>>> + break; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static void process_command_line(int argc, char **argv) >>>> +{ >>>> + static struct option long_options[] = { >>>> + {"verbosity", required_argument, 0, 'v'}, >>>> + {"outputfilename", required_argument, 0, 'o'}, >>>> + {"buffers", required_argument, 0, 'b'}, >>>> + {"testduration", required_argument, 0, 't'}, >>>> + {"polltimeout", required_argument, 0, 'p'}, >>>> + {"size", required_argument, 0, 's'}, >>>> + {"discard", no_argument, 0, 'd'}, >>>> + { 0, 0, 0, 0 } >>>> + }; >>>> + >>>> + const char *help = >>>> + " -v --verbosity=level verbosity level of GuC logging >>>> (0-3)\n" >>>> + " -o --outfile=name name of the output file in /tmp >>>> directory, where logs will be stored\n" >>>> + " -b --buffers=num number of buffers to be maintained >>>> on logger side for storing logs\n" >>>> + " -t --testduration=sec max duration in seconds for which >>>> the logger should run, 0 == forever\n" >>>> + " -p --polltimeout=ms polling timeout in ms, -1 == >>>> indefinite wait for the new data\n" >>>> + " -s --size=MB max size of output file in MBs >>>> after which logging will be stopped, 0 == no limit\n" >>>> + " -d --discard discard the old/boot-time logs >>>> before entering into the capture loop\n"; >>>> + >>>> + igt_simple_init_parse_opts(&argc, argv, "v:o:b:t:p:s:d", >>>> long_options, >>>> + help, parse_options, NULL); >>>> +} >>>> + >>>> +int main(int argc, char **argv) >>>> +{ >>>> + struct pollfd relay_poll_fd; >>>> + struct timespec start={}; >>>> + int nfds; >>>> + int ret; >>>> + >>>> + process_command_line(argc, argv); >>>> + >>>> + init_main_thread(); >>>> + >>>> + /* Use a separate thread for flushing the logs to a file on disk. >>>> + * Main thread will buffer the data from relay file in its pool of >>>> + * buffers and other thread will flush the data to disk in >>>> background. >>>> + * This is needed, albeit by default data is written out to >>>> disk in >>>> + * async mode, as when there are too many dirty pages in the RAM, >>>> + * (/proc/sys/vm/dirty_ratio), kernel starts blocking the >>>> processes >>>> + * doing the file writes. >>>> + */ >>>> + init_flusher_thread(); >>>> + >>>> + relay_poll_fd.fd = relay_fd; >>>> + relay_poll_fd.events = POLLIN; >>>> + relay_poll_fd.revents = 0; >>>> + >>>> + nfds = 1; /* only one fd to poll */ >>>> + >>>> + while (!stop_logging) >>>> + { >>>> + if (test_duration && (igt_seconds_elapsed(&start) > >>>> test_duration)) { >>> >>> If you agree to allow no poll period the this would not work right? In >>> that case you would need to use alarm(2) or something. >>> >> >> Can calculate the timeout value for poll call as, >> if (poll_timeout < 0) { >> timeout = test_duration - igt_seconds_elapsed(&start)) >> } > > My point was that with indefinite poll loop will not run if there is not > log data so timeout will not work implemented like this. > I understood your concern in first place but probably didn't put forth my point clearly. For more clarity, this is how think it can be addressed. * than a jiffy gap between 2 flush interrupts) and relay runs * out of sub buffers to store the new logs. */ - ret = poll(&relay_poll_fd, nfds, poll_timeout); + ret = poll(&relay_poll_fd, nfds, timeout); if (ret < 0) { if (errno == EINTR) break; So will not do polling with indefinite timeout and adjust the timeout value as per test's duration. Does it look ok ? Best regards Akash >> >> >>>> + igt_debug("Ran for stipulated %d seconds, exit now\n", >>>> test_duration); >>>> + stop_logging = true; >>>> + break; >>>> + } >>>> + >>>> + /* Wait/poll for the new data to be available, relay doesn't >>>> + * provide a blocking read. >>>> + * Need to do polling with a timeout instead of indefinite >>>> wait, >>>> + * to avoid relying on relay for the wakeup, as relay does >>>> wakeup >>>> + * in a deferred manner on jiffies granularity by scheduling a >>>> + * timer and moreover that timer is re-scheduled on every >>>> newly >>>> + * produced buffer so timer keeps getting pushed out if there >>>> + * are multiple flush interrupts in a very quick succession >>>> (less >>>> + * than a jiffy gap between 2 flush interrupts) and relay runs >>>> + * out of sub buffers to store the new logs. >>>> + */ >>>> + ret = poll(&relay_poll_fd, nfds, poll_timeout); >>>> + if (ret < 0) { >>>> + if (errno == EINTR) >>>> + break; >>>> + else >>>> + igt_assert_f(0, "poll call failed"); >>>> + } >>>> + >>>> + /* No data available yet, poll again, hopefully new data is >>>> round the corner */ >>>> + if (!relay_poll_fd.revents) >>>> + continue; >>>> + >>>> + pull_data(); >>>> + } >>>> + >>>> + /* Pause logging on the GuC side*/ >>>> + guc_log_control(false); >>>> + pthread_cond_signal(&underflow_cond); >>>> + pthread_join(flush_thread, NULL); >>>> + pull_leftover_data(); >>>> + igt_info("total_bytes_written %" PRIu64 "\n", >>>> total_bytes_written); >>>> + >>>> + free(read_buffer); >>>> + close(relay_fd); >>>> + close(outfile_fd); >>>> + igt_exit(); >>>> +} >>>> >>> >>> So only minor comments. The rest looks reasonable to me and I trust you >>> tested it well and it works fine. >>> >> Thanks much for the thorough review. > > Np. > > Regards, > > Tvrtko --- a/tools/intel_guc_logger.c +++ b/tools/intel_guc_logger.c @@ -370,6 +370,8 @@ int main(int argc, char **argv) { struct pollfd relay_poll_fd; struct timespec start={}; + uint32_t time_elapsed; + int timeout; int nfds; int ret; @@ -395,10 +397,17 @@ int main(int argc, char **argv) while (!stop_logging) { - if (test_duration && (igt_seconds_elapsed(&start) > test_duration)) { - igt_debug("Ran for stipulated %d seconds, exit now\n", test_duration); - stop_logging = true; - break; + timeout = poll_timeout; + if (test_duration) { + time_elapsed = igt_seconds_elapsed(&start); + if (time_elapsed >= test_duration) { + igt_debug("Ran for stipulated %d seconds, exit now\n", test_duration); + stop_logging = true; + break; + } + if (poll_timeout < 0) + timeout = (test_duration - time_elapsed) * 1000; } /* Wait/poll for the new data to be available, relay doesn't @@ -412,7 +421,7 @@ int main(int argc, char **argv)