From patchwork Wed Sep  7 08:10:27 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: akash.goel@intel.com
X-Patchwork-Id: 9318499
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	DC5E8607D3 for <patchwork-intel-gfx@patchwork.kernel.org>;
	Wed,  7 Sep 2016 08:10:36 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA8D029136
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Wed,  7 Sep 2016 08:10:36 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9EF5A29139; Wed,  7 Sep 2016 08:10:36 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 03AC829136
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Wed,  7 Sep 2016 08:10:33 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 91F0A6E8F0;
	Wed,  7 Sep 2016 08:10:32 +0000 (UTC)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
	by gabe.freedesktop.org (Postfix) with ESMTPS id A59026E8EE
	for <intel-gfx@lists.freedesktop.org>;
	Wed,  7 Sep 2016 08:10:30 +0000 (UTC)
Received: from orsmga002.jf.intel.com ([10.7.209.21])
	by orsmga105.jf.intel.com with ESMTP; 07 Sep 2016 01:10:30 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos; i="5.30,295,1470726000"; d="scan'208";
	a="1046685024"
Received: from akashgoe-mobl2.gar.corp.intel.com (HELO [10.223.184.149])
	([10.223.184.149])
	by orsmga002.jf.intel.com with ESMTP; 07 Sep 2016 01:10:28 -0700
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
References: <1473158613-12006-1-git-send-email-akash.goel@intel.com>
	<57CEC1DA.6090009@linux.intel.com>
	<e50a8af6-b6a0-4796-7b6f-603cb0365ef7@intel.com>
	<57CEE620.3090600@linux.intel.com>
From: "Goel, Akash" <akash.goel@intel.com>
Message-ID: <2f260f3b-bf49-45e7-0a05-a1fa0b75461a@intel.com>
Date: Wed, 7 Sep 2016 13:40:27 +0530
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
	Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <57CEE620.3090600@linux.intel.com>
Cc: akash.goel@intel.com
Subject: Re: [Intel-gfx] [PATCH] tools/intel_guc_logger: Utility for
	capturing GuC firmware logs in a file
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

On 9/6/2016 9:22 PM, Tvrtko Ursulin wrote:
>
> On 06/09/16 16:33, Goel, Akash wrote:
>> On 9/6/2016 6:47 PM, Tvrtko Ursulin wrote:
>>> Hi,
>>>
>>> On 06/09/16 11:43, akash.goel@intel.com wrote:
>>>> From: Akash Goel <akash.goel@intel.com>
>>>>
>>>> This patch provides a test utility which helps capture GuC firmware
>>>> logs and
>>>> then dump them to file.
>>>> The logs are pulled from a debugfs file
>>>> '/sys/kernel/debug/dri/guc_log' and
>>>> stored into a file '/tmp/guc_log_dump.dat', the name of the output
>>>> file can
>>>> be changed through a command line argument.
>>>>
>>>> The utility goes into an infinite loop where it waits for the arrival
>>>> of new
>>>> logs and as soon as new set of logs are produced it captures them in
>>>> its local
>>>> buffer which is then flushed out to the file on disk.
>>>> Any time when logging needs to be ended, User can stop this utility
>>>> (CTRL+C).
>>>>
>>>> Before entering into a loop, it first discards whatever logs are
>>>> present in
>>>> the debugfs file.
>>>> This way User can first launch this utility and then start a
>>>> workload/activity
>>>> for which GuC firmware logs are to be actually captured and keep
>>>> running the
>>>> utility for as long as its needed, like once the workload is over this
>>>> utility
>>>> can be forcefully stopped.
>>>>
>>>> If the logging wasn't enabled on GuC side by the Driver at boot time,
>>>> utility
>>>> will first enable the logging and later on when it is stopped (CTRL+C)
>>>> it will
>>>> also pause the logging on GuC side.
>>>>
>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>> ---
>>>>   tools/Makefile.sources   |   1 +
>>>>   tools/intel_guc_logger.c | 441
>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 442 insertions(+)
>>>>   create mode 100644 tools/intel_guc_logger.c
>>>>
>>>> diff --git a/tools/Makefile.sources b/tools/Makefile.sources
>>>> index 2bb6c8e..be58871 100644
>>>> --- a/tools/Makefile.sources
>>>> +++ b/tools/Makefile.sources
>>>> @@ -19,6 +19,7 @@ tools_prog_lists =        \
>>>>       intel_gpu_time        \
>>>>       intel_gpu_top        \
>>>>       intel_gtt        \
>>>> +    intel_guc_logger        \
>>>>       intel_infoframes    \
>>>>       intel_l3_parity        \
>>>>       intel_lid        \
>>>> diff --git a/tools/intel_guc_logger.c b/tools/intel_guc_logger.c
>>>> new file mode 100644
>>>> index 0000000..92172fa
>>>> --- /dev/null
>>>> +++ b/tools/intel_guc_logger.c
>>>> @@ -0,0 +1,441 @@
>>>> +
>>>> +#define _GNU_SOURCE  /* For using O_DIRECT */
>>>> +#include <inttypes.h>
>>>> +#include <stdio.h>
>>>> +#include <fcntl.h>
>>>> +#include <sched.h>
>>>> +#include <sys/ioctl.h>
>>>> +#include <string.h>
>>>> +#include <errno.h>
>>>> +#include <signal.h>
>>>> +#include <stdlib.h>
>>>> +#include <unistd.h>
>>>> +#include <sys/poll.h>
>>>> +#include <sys/wait.h>
>>>> +#include <sys/types.h>
>>>> +#include <sys/syscall.h>
>>>> +#include <sys/mman.h>
>>>> +#include <assert.h>
>>>> +#include <pthread.h>
>>>> +
>>>> +#include "igt.h"
>>>> +
>>>> +#define MB(x) ((uint64_t)(x) * 1024 * 1024)
>>>> +#ifndef PAGE_SIZE
>>>> +  #define PAGE_SIZE 4096
>>>> +#endif
>>>> +#define SUBBUF_SIZE (19*PAGE_SIZE)
>>>> +/* Need large buffering from logger side to hide the DISK IO latency,
>>>> Driver
>>>> + * can only store 8 snapshots of GuC log buffer in relay.
>>>> + */
>>>> +#define NUM_SUBBUFS 100
>>>> +
>>>> +#define RELAY_FILE_NAME  "guc_log"
>>>> +#define CONTROL_FILE_NAME "i915_guc_log_control"
>>>> +
>>>> +char *read_buffer;
>>>> +char *out_filename;
>>>> +int poll_timeout = 2; /* by default 2ms timeout */
>>>> +pthread_mutex_t mutex;
>>>> +pthread_t flush_thread;
>>>> +int verbosity_level = 3; /* by default capture logs at max
>>>> verbosity */
>>>> +uint32_t produced, consumed;
>>>> +uint64_t total_bytes_written;
>>>> +int num_buffers = NUM_SUBBUFS;
>>>> +int relay_fd, outfile_fd = -1;
>>>> +bool stop_logging, discard_oldlogs;
>>>> +uint32_t test_duration, max_filesize;
>>>> +pthread_cond_t underflow_cond, overflow_cond;
>>>> +
>>>> +static void guc_log_control(bool enable_logging)
>>>> +{
>>>> +    int control_fd;
>>>> +    char data[19];
>>>> +    uint64_t val;
>>>> +    int ret;
>>>> +
>>>> +    control_fd = igt_debugfs_open(CONTROL_FILE_NAME, O_WRONLY);
>>>> +    if (control_fd < 0)
>>>> +        igt_assert_f(0, "Couldn't open the guc log control file");
>>>> +
>>>> +    val = enable_logging ? ((verbosity_level << 4) | 0x1) : 0;
>>>> +
>>>> +    snprintf(data, sizeof(data), "0x%" PRIx64, val);
>>>> +    ret = write(control_fd, data, strlen(data) + 1);
>>>
>>> Minor: It looks safe like it is but something like below would maybe be
>>> more robust?
>>>
>>> ret = snprintf(data, sizeof(data), "0x%" PRIx64, val);
>>> igt_assert(ret > 2 && ret < sizeof(data));
>>
>> ok will add, but possibility of failure will be really remote here.
>> but igt_assert(ret > 0) should suffice.
>
> Yes there is no possibility for failure as it stands, just more robust
> implementation should someone change something in the future. That's why
> I said you could also decide to keep it as is. My version also avoided
> the strlen since snprintf already tells you that.
>

fine, will use your version then.

>>> ret = write(control_fd, data, ret);
>>> igt_assert(ret > 0); // assuming short writes can't happen
>>>
>>> Up to you.
>>>
>>>> +    if (ret < 0)
>>>> +        igt_assert_f(0, "Couldn't write to the log control file");
>>>> +
>>>> +    close(control_fd);
>>>> +}
>>>> +
>>>> +static void int_sig_handler(int sig)
>>>> +{
>>>> +    igt_info("Received signal %d\n", sig);
>>>> +
>>>> +    stop_logging = true;
>>>> +}
>>>> +
>>>> +static void pull_leftover_data(void)
>>>> +{
>>>> +    unsigned int bytes_read = 0;
>>>> +    int ret;
>>>> +
>>>> +    while (1) {
>>>> +        /* Read the logs from relay buffer */
>>>> +        ret = read(relay_fd, read_buffer, SUBBUF_SIZE);
>>>> +        if (!ret)
>>>> +            break;
>>>> +        else if (ret < 0)
>>>> +            igt_assert_f(0, "Failed to read from the guc log file");
>>>> +        else if (ret < SUBBUF_SIZE)
>>>> +            igt_assert_f(0, "invalid read from relay file");
>>>> +
>>>> +        bytes_read += ret;
>>>> +
>>>> +        if (outfile_fd > 0) {
>>>
>>>> = 0 I think. Or is it even needed since open_output_file asserts if it
>>> fails to open?
>>>
>> Actually pull_leftover_data() will be called twice, once before opening
>> the outfile. This is for purging the old logs, if enabled.
>
> Oh right, OK then.
>
>>
>>>> +            ret = write(outfile_fd, read_buffer, SUBBUF_SIZE);
>>>> +            if (ret < 0)
>>>> +                igt_assert_f(0, "Couldn't dump the logs in a file");
>>>
>>> Do you need to check for short writes as well here? Man page for write
>>> says it can happen with out of disk space or signals.
>>>
>> To keep things simple can just check 'ret < SUBBUF_SIZE' considering
>> out of disk space as a serious failure or add a loop here which will run
>> until SUBBUF_SIZE bytes have been written.
>
> Just assert, no point in complicating things.
>
Fine will assert on (ret < SUBBUF_SIZE)

>>>> +            total_bytes_written += ret;
>>>> +        }
>>>> +    };
>>>> +
>>>> +    igt_debug("%u bytes flushed\n", bytes_read);
>>>> +}
>>>> +
>>>> +static int num_filled_bufs(void)
>>>> +{
>>>> +    return (produced - consumed);
>>>> +}
>>>> +
>>>> +static void pull_data(void)
>>>> +{
>>>> +    char *ptr;
>>>> +    int ret;
>>>> +
>>>> +    pthread_mutex_lock(&mutex);
>>>> +    while (num_filled_bufs() >= num_buffers) {
>>>> +        igt_debug("overflow, will wait, produced %u, consumed %u\n",
>>>> produced, consumed);
>>>> +        /* Stall the main thread in case of overflow, as there are no
>>>> +         * buffers available to store the new logs, otherwise there
>>>> +         * could be corruption if both threads work on the same
>>>> buffer.
>>>> +         */
>>>> +        pthread_cond_wait(&overflow_cond, &mutex);
>>>> +    };
>>>> +    pthread_mutex_unlock(&mutex);
>>>> +
>>>> +    ptr = read_buffer + (produced % num_buffers) * SUBBUF_SIZE;
>>>> +
>>>> +    /* Read the logs from relay buffer */
>>>> +    ret = read(relay_fd, ptr, SUBBUF_SIZE);
>>>> +    if (ret < 0)
>>>> +        igt_assert_f(0, "Failed to read from the guc log file");
>>>> +    else if (ret && (ret < SUBBUF_SIZE))
>>>> +        igt_assert_f(0, "invalid read from relay file");
>>>> +
>>>> +    pthread_mutex_lock(&mutex);
>>>> +    if (ret) {
>>>> +        produced++;
>>>> +        pthread_cond_signal(&underflow_cond);
>>>> +    }
>>>> +    pthread_mutex_unlock(&mutex);
>>>
>>> Very minor: can move the mutex under the if.
>>>
>> Fine will move under the 'if'.
>>
>>>> +}
>>>> +
>>>> +static void *flusher(void *arg)
>>>> +{
>>>> +    char *ptr;
>>>> +    int ret;
>>>> +
>>>> +    igt_debug("Execution started of flusher thread\n");
>>>> +
>>>> +    do {
>>>> +        pthread_mutex_lock(&mutex);
>>>> +        while (!num_filled_bufs()) {
>>>> +            /* Exit only after completing the flush of all the filled
>>>> +             * buffers as User would expect that all logs captured up
>>>> +             * till the point of interruption/exit are written out to
>>>> +             * the disk file.
>>>> +             */
>>>> +            if (stop_logging) {
>>>> +                igt_debug("flusher to exit now\n");
>>>> +                pthread_mutex_unlock(&mutex);
>>>> +                return NULL;
>>>> +            }
>>>> +            pthread_cond_wait(&underflow_cond, &mutex);
>>>> +        };
>>>> +        pthread_mutex_unlock(&mutex);
>>>> +
>>>> +        ptr = read_buffer + (consumed % num_buffers) * SUBBUF_SIZE;
>>>> +
>>>> +        ret = write(outfile_fd, ptr, SUBBUF_SIZE);
>>>> +        if (ret < 0)
>>>> +            igt_assert_f(0, "Couldn't dump the logs in a file");
>>>
>>> Question about short writes applies here as well.
>>>
>> Fine will do the same here whatever is decided for above.
>>
>>>> +
>>>> +        total_bytes_written += ret;
>>>> +        if (max_filesize && (total_bytes_written >
>>>> MB(max_filesize))) {
>>>> +            igt_debug("Reached the target of %" PRIu64 " bytes\n",
>>>> MB(max_filesize));
>>>> +            stop_logging = true;
>>>> +        }
>>>> +
>>>> +        pthread_mutex_lock(&mutex);
>>>> +        consumed++;
>>>> +        pthread_cond_signal(&overflow_cond);
>>>> +        pthread_mutex_unlock(&mutex);
>>>> +    } while(1);
>>>
>>> Minor again: You use both "while(1) {}" and "do { } while(1)" in here.
>>> Maybe stick with one? :D (I think "for (;;)" is preferred, but maybe I
>>> am misremembering things.)
>>>
>> Sorry for the inconsistency, will clean this up.
>>
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +static void init_flusher_thread(void)
>>>> +{
>>>> +    struct sched_param    thread_sched;
>>>> +    pthread_attr_t        p_attr;
>>>> +
>>>> +    pthread_cond_init(&underflow_cond, NULL);
>>>> +    pthread_cond_init(&overflow_cond, NULL);
>>>> +    pthread_mutex_init(&mutex, NULL);
>>>> +
>>>> +    if (pthread_attr_init(&p_attr))
>>>> +        igt_assert_f(0, "Error obtaining default thread attributes");
>>>> +
>>>> +    if (pthread_attr_setinheritsched(&p_attr, PTHREAD_EXPLICIT_SCHED))
>>>> +        igt_assert_f(0, "Couldn't set inheritsched");
>>>> +
>>>> +    if (pthread_attr_setschedpolicy(&p_attr, SCHED_RR))
>>>> +        igt_assert_f(0, "Couldn't set thread scheduling policy");
>>>> +
>>>> +    /* Keep the flusher task also at rt priority, so that it doesn't
>>>> get
>>>> +     * too late in flushing the collected logs in local buffers to
>>>> the disk,
>>>> +     * and so main thread always have spare buffers to collect the
>>>> logs.
>>>> +     */
>>>> +    thread_sched.sched_priority = 5;
>>>> +    if (pthread_attr_setschedparam(&p_attr, &thread_sched))
>>>> +        igt_assert_f(0, "Couldn't set thread priority");
>>>> +
>>>> +    if (pthread_create(&flush_thread, &p_attr, flusher, NULL))
>>>> +        igt_assert_f(0, "thread creation failed");
>>>
>>> More nitpicks: You mix starting error messages with upper case and lower
>>> case.
>>
>> Will use the lower case for all error messages. fine ?
>
> Fine with me.
>

Thanks
>>>
>>>> +
>>>> +    if (pthread_attr_destroy(&p_attr))
>>>> +        igt_assert_f(0, "Error destroying thread attributes");
>>>> +}
>>>> +
>>>> +static void open_relay_file(void)
>>>> +{
>>>> +    relay_fd = igt_debugfs_open(RELAY_FILE_NAME, O_RDONLY);
>>>> +    if (relay_fd < 0)
>>>> +        igt_assert_f(0, "Couldn't open the guc log file");
>>>> +
>>>> +    /* Purge the old/boot-time logs from the relay buffer.
>>>> +     * This is more for Val team's requirement, where they have to
>>>> first
>>>> +     * purge the existing logs before starting the tests for which
>>>> the logs
>>>> +     * are actually needed. After this logger will enter into a loop
>>>> and
>>>> +     * wait for the new data, at that point benchmark can be launched
>>>> from
>>>> +     * a different shell.
>>>> +     */
>>>> +    if (discard_oldlogs)
>>>> +        pull_leftover_data();
>>>> +}
>>>> +
>>>> +static void open_output_file(void)
>>>> +{
>>>> +    char *filename;
>>>> +
>>>> +    if (asprintf(&filename, "/tmp/%s", out_filename ? :
>>>> "guc_log_dump.dat") < 0)
>>>> +        igt_assert_f(0, "Couldn't allocate the filename");
>>>
>>> I think you need to take the out_filename when specified by the user
>>> as-is and not stick it in /tmp. It is perfectly reasonable that someone
>>> would want to put it somewhere else.
>>>
>> Ok so User should provide the full path.
>
> It can be relative, absolute, or just a filename. My suggestion was to
> just use what the user provided for the open(2) call.
>
Yes the path provided can be absolute/relative, will do the open() call 
with whatever string is provided as a command line.

>>
>>> Also better to move the default name of "guc_log_dump.dat" to where
>>> defaults like RELAY_FILE_NAME and such are defined.
>>>
>> fine.
>>
>>> And I would not default to /tmp but the current directory.
>>>
>> Chris had suggested to use /tmp. Fine will use the current directory.
>>
>>>> +
>>>> +    /* Use Direct IO mode for the output file, as the data written is
>>>> not
>>>> +     * supposed to be accessed again, this saves a copy of data from
>>>> App's
>>>> +     * buffer to kernel buffer (Page cache). Due to no buffering on
>>>> kernel
>>>> +     * side, data is flushed out to disk faster and more buffering
>>>> can be
>>>> +     * done on the logger side to hide the disk IO latency.
>>>> +     */
>>>> +    outfile_fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC |
>>>> O_DIRECT, 0444);
>>>
>>> 0400 or 0440 would be better I think.
>>
>> fine, this might make the file inaccessible to Root User in case the
>> logger wasn't running as a Root ?
>
> I assumed logger will run as root and it would be preferable not to have
> log files accessible by anyone. Either way root can read anything
> regardless of permissions, no?
>
Sorry after working on selinux enabled setups (in Android, where there 
are certain restrictions on Root also), forgot that Root can do anything.

>>>
>>>> +    if (outfile_fd < 0)
>>>> +        igt_assert_f(0, "Couldn't open the output file");
>>>> +
>>>> +    free(out_filename);
>>>> +    free(filename);
>>>> +}
>>>> +
>>>> +static void init_main_thread(void)
>>>> +{
>>>> +    struct sched_param    thread_sched;
>>>> +    /* Run the main thread at highest priority to ensure that it
>>>> always
>>>> +     * gets woken-up at earliest on arrival of new data and so is
>>>> always
>>>> +     * ready to pull the logs, otherwise there could be loss logs if
>>>> +     * GuC firmware is generating logs at a very high rate.
>>>> +     */
>>>> +    thread_sched.sched_priority = 1;
>>>> +    if (sched_setscheduler(getpid(), SCHED_FIFO, &thread_sched))
>>>> +        igt_assert_f(0, "Couldn't set the priority");
>>>> +
>>>> +    if (signal(SIGINT, int_sig_handler) == SIG_ERR)
>>>> +        igt_assert_f(0, "SIGINT handler registration failed");
>>>> +
>>>> +    /* Need an aligned pointer for direct IO */
>>>> +    if (posix_memalign((void **)&read_buffer, PAGE_SIZE, num_buffers
>>>> * SUBBUF_SIZE))
>>>> +        igt_assert_f(0, "Couldn't allocate the read buffer");
>>>> +
>>>> +    /* Keep the pages locked in RAM, avoid page fault overhead */
>>>> +    if (mlock(read_buffer, num_buffers * SUBBUF_SIZE))
>>>> +        igt_assert_f(0, "Failed to lock memory");
>>>> +
>>>> +    /* Enable the logging, it may not have been enabled from boot
>>>> and so
>>>> +     * the relay file also wouldn't have been created.
>>>> +     */
>>>> +    guc_log_control(true);
>>>> +
>>>> +    open_relay_file();
>>>> +    open_output_file();
>>>> +}
>>>> +
>>>> +static int parse_options(int opt, int opt_index, void *data)
>>>> +{
>>>> +    igt_debug("opt %c optarg %s\n", opt, optarg);
>>>> +
>>>> +    switch(opt) {
>>>> +    case 'v':
>>>> +        verbosity_level = atoi(optarg);
>>>> +        if (verbosity_level < 0)
>>>> +            verbosity_level = 0;
>>>> +        else if (verbosity_level > 3)
>>>> +            verbosity_level = 3;
>>>
>>> Notify user if invalid value was provided?
>>>
>> Just notify but don't abort and continue by using a valid value ?
>
> No strong opinion. Either is much better than silently using a different
> value or ignoring the fact user has typed in garbage on the command line.
>
Ok will report/notify and then abort.
Will do the same for other arguments.

Is it fine ?

>>
>>>> +
>>>> +        igt_debug("verbosity level to be used is %d\n",
>>>> verbosity_level);
>>>> +        break;
>>>> +    case 'o':
>>>> +        if (asprintf(&out_filename, "%s", optarg) < 0)
>>>
>>> strdup?
>> ok will try to use strdup.
>>>
>>>> +            igt_assert_f(0, "Couldn't allocate the o/p filename");
>>>> +        igt_debug("logs to be stored in file /tmp/%s\n",
>>>> out_filename);
>>>> +        break;
>>>> +    case 'b':
>>>> +        num_buffers = atoi(optarg);
>>>> +        if (num_buffers == 0)
>>>> +            num_buffers = NUM_SUBBUFS;
>>>
>>> Again notify if invalid number/value.
>>>
>> same doubt as above.
>>
>>>> +        igt_debug("number of buffers to be used is %d\n",
>>>> num_buffers);
>>>> +        break;
>>>> +    case 't':
>>>> +        test_duration = atoi(optarg);
>>>> +        if (test_duration > 0)
>>>
>>> If switch was specified it should have a valid optarg or error reported.
>>>
>>>> +            igt_debug("logger to run for %d seconds\n",
>>>> test_duration);
>>>> +        break;
>>>> +    case 'p':
>>>> +        poll_timeout = atoi(optarg);
>>>> +        if (poll_timeout == 0)
>>>> +            poll_timeout = 2; /* at least use 2ms poll timeout */
>>>
>>> Hm I would prefer if you allowed no poll period.
>>
>> by no poll period, do you mean indefinite wait ?
>>
>> That can be done by specifying -1 as a timeout value in the command line,
>>
>> "-p --polltimeout=ms polling timeout in ms, -1 == indefinite wait for
>> the new data\n"
>
> My bad, I missed that. All good then.
>
>>
>>> I even think it should be the default if we think relayfs will be
>>> definitely fixed to not
>>> require it.
>>>
>> Can keep the indefinite wait as a default.
>>
>>> In any case, commend is wrong since user can specify 1ms and get away
>>> with it. :)
>>>
>>>> +        if (poll_timeout > 0)
>>>> +            igt_debug("polling to be done with %d milli seconds
>>>> timeout\n", poll_timeout);
>>>
>>> milliseconds
>>>
>>>> +        break;
>>>> +    case 's':
>>>> +        max_filesize = atoi(optarg);
>>>> +        if (max_filesize > 0)
>>>
>>> Same as for -t.
>>>
>>>> +            igt_debug("max allowed size of the output file is %d
>>>> MBs\n", max_filesize);
>>>> +        break;
>>>> +    case 'd':
>>>> +        discard_oldlogs = true;
>>>> +        igt_debug("old/boot-time logs will be discarded\n");
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void process_command_line(int argc, char **argv)
>>>> +{
>>>> +    static struct option long_options[] = {
>>>> +        {"verbosity", required_argument, 0, 'v'},
>>>> +        {"outputfilename", required_argument, 0, 'o'},
>>>> +        {"buffers", required_argument, 0, 'b'},
>>>> +        {"testduration", required_argument, 0, 't'},
>>>> +        {"polltimeout", required_argument, 0, 'p'},
>>>> +        {"size", required_argument, 0, 's'},
>>>> +        {"discard", no_argument, 0, 'd'},
>>>> +        { 0, 0, 0, 0 }
>>>> +    };
>>>> +
>>>> +    const char *help =
>>>> +        "  -v --verbosity=level   verbosity level of GuC logging
>>>> (0-3)\n"
>>>> +        "  -o --outfile=name      name of the output file in /tmp
>>>> directory, where logs will be stored\n"
>>>> +        "  -b --buffers=num       number of buffers to be maintained
>>>> on logger side for storing logs\n"
>>>> +        "  -t --testduration=sec  max duration in seconds for which
>>>> the logger should run, 0 == forever\n"
>>>> +        "  -p --polltimeout=ms    polling timeout in ms, -1 ==
>>>> indefinite wait for the new data\n"
>>>> +        "  -s --size=MB           max size of output file in MBs
>>>> after which logging will be stopped, 0 == no limit\n"
>>>> +        "  -d --discard           discard the old/boot-time logs
>>>> before entering into the capture loop\n";
>>>> +
>>>> +    igt_simple_init_parse_opts(&argc, argv, "v:o:b:t:p:s:d",
>>>> long_options,
>>>> +                   help, parse_options, NULL);
>>>> +}
>>>> +
>>>> +int main(int argc, char **argv)
>>>> +{
>>>> +    struct pollfd relay_poll_fd;
>>>> +    struct timespec start={};
>>>> +    int nfds;
>>>> +    int ret;
>>>> +
>>>> +    process_command_line(argc, argv);
>>>> +
>>>> +    init_main_thread();
>>>> +
>>>> +    /* Use a separate thread for flushing the logs to a file on disk.
>>>> +     * Main thread will buffer the data from relay file in its pool of
>>>> +     * buffers and other thread will flush the data to disk in
>>>> background.
>>>> +     * This is needed, albeit by default data is written out to
>>>> disk in
>>>> +     * async mode, as when there are too many dirty pages in the RAM,
>>>> +     * (/proc/sys/vm/dirty_ratio), kernel starts blocking the
>>>> processes
>>>> +     * doing the file writes.
>>>> +     */
>>>> +    init_flusher_thread();
>>>> +
>>>> +    relay_poll_fd.fd = relay_fd;
>>>> +    relay_poll_fd.events = POLLIN;
>>>> +    relay_poll_fd.revents = 0;
>>>> +
>>>> +    nfds = 1; /* only one fd to poll */
>>>> +
>>>> +    while (!stop_logging)
>>>> +    {
>>>> +        if (test_duration && (igt_seconds_elapsed(&start) >
>>>> test_duration)) {
>>>
>>> If you agree to allow no poll period the this would not work right? In
>>> that case you would need to use alarm(2) or something.
>>>
>>
>> Can calculate the timeout value for poll call as,
>>      if (poll_timeout < 0) {
>>          timeout = test_duration - igt_seconds_elapsed(&start))
>>      }
>
> My point was that with indefinite poll loop will not run if there is not
> log data so timeout will not work implemented like this.
>
I understood your concern in first place but probably didn't put forth 
my point clearly.

For more clarity, this is how think it can be addressed.

  		 * than a jiffy gap between 2 flush interrupts) and relay runs
  		 * out of sub buffers to store the new logs.
  		 */
-		ret = poll(&relay_poll_fd, nfds, poll_timeout);
+		ret = poll(&relay_poll_fd, nfds, timeout);
  		if (ret < 0) {
  			if (errno == EINTR)
  				break;

So will not do polling with indefinite timeout and adjust the timeout 
value as per test's duration.
Does it look ok ?

Best regards
Akash
>>
>>
>>>> +            igt_debug("Ran for stipulated %d seconds, exit now\n",
>>>> test_duration);
>>>> +            stop_logging = true;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        /* Wait/poll for the new data to be available, relay doesn't
>>>> +         * provide a blocking read.
>>>> +         * Need to do polling with a timeout instead of indefinite
>>>> wait,
>>>> +         * to avoid relying on relay for the wakeup, as relay does
>>>> wakeup
>>>> +         * in a deferred manner on jiffies granularity by scheduling a
>>>> +         * timer and moreover that timer is re-scheduled on every
>>>> newly
>>>> +         * produced buffer so timer keeps getting pushed out if there
>>>> +         * are multiple flush interrupts in a very quick succession
>>>> (less
>>>> +         * than a jiffy gap between 2 flush interrupts) and relay runs
>>>> +         * out of sub buffers to store the new logs.
>>>> +         */
>>>> +        ret = poll(&relay_poll_fd, nfds, poll_timeout);
>>>> +        if (ret < 0) {
>>>> +            if (errno == EINTR)
>>>> +                break;
>>>> +            else
>>>> +                igt_assert_f(0, "poll call failed");
>>>> +        }
>>>> +
>>>> +        /* No data available yet, poll again, hopefully new data is
>>>> round the corner */
>>>> +        if (!relay_poll_fd.revents)
>>>> +            continue;
>>>> +
>>>> +        pull_data();
>>>> +    }
>>>> +
>>>> +    /* Pause logging on the GuC side*/
>>>> +    guc_log_control(false);
>>>> +    pthread_cond_signal(&underflow_cond);
>>>> +    pthread_join(flush_thread, NULL);
>>>> +    pull_leftover_data();
>>>> +    igt_info("total_bytes_written %" PRIu64 "\n",
>>>> total_bytes_written);
>>>> +
>>>> +    free(read_buffer);
>>>> +    close(relay_fd);
>>>> +    close(outfile_fd);
>>>> +    igt_exit();
>>>> +}
>>>>
>>>
>>> So only minor comments. The rest looks reasonable to me and I trust you
>>> tested it well and it works fine.
>>>
>> Thanks much for the thorough review.
>
> Np.
>
> Regards,
>
> Tvrtko

--- a/tools/intel_guc_logger.c
+++ b/tools/intel_guc_logger.c
@@ -370,6 +370,8 @@ int main(int argc, char **argv)
  {
  	struct pollfd relay_poll_fd;
  	struct timespec start={};
+	uint32_t time_elapsed;
+	int timeout;
  	int nfds;
  	int ret;

@@ -395,10 +397,17 @@ int main(int argc, char **argv)

  	while (!stop_logging)
  	{
-		if (test_duration && (igt_seconds_elapsed(&start) > test_duration)) {
-			igt_debug("Ran for stipulated %d seconds, exit now\n", test_duration);
-			stop_logging = true;
-			break;
+		timeout = poll_timeout;
+		if (test_duration) {
+			time_elapsed = igt_seconds_elapsed(&start);
+			if (time_elapsed >= test_duration) {
+				igt_debug("Ran for stipulated %d seconds, exit now\n", test_duration);
+				stop_logging = true;
+				break;
+			}
+			if (poll_timeout < 0)
+				timeout = (test_duration - time_elapsed) * 1000;
		}

  		/* Wait/poll for the new data to be available, relay doesn't
@@ -412,7 +421,7 @@ int main(int argc, char **argv)