diff mbox

[libdrm] drm: Fix multi GPU drmGetDevice return wrong device

Message ID 1468214245-4716-1-git-send-email-Qiang.Yu@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qiang Yu July 11, 2016, 5:17 a.m. UTC
drmGetDevice will always return the first device it find
under /dev/dri/. This is not true for multi GPU situation.

Plus fix the memory leak in error handling path of
drmGetDevices.

Change-Id: I2a85a8a4feba8a5cc517ad75c6afb532fa07c53d
Signed-off-by: Qiang Yu <Qiang.Yu@amd.com>
---
 xf86drm.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Comments

Emil Velikov July 13, 2016, 9:47 a.m. UTC | #1
Hi Qiang Yu,

Thanks for fixing my buggy code (yet again) :-)

On 11 July 2016 at 06:17, Qiang Yu <Qiang.Yu@amd.com> wrote:
> drmGetDevice will always return the first device it find
> under /dev/dri/. This is not true for multi GPU situation.
>
How does the following alternative solution sound:
 - keep drmFoldDuplicatedDevices as is
 - after the drmFoldDuplicatedDevices call use the find_rdev to find
the correct device in local_devices.

> Plus fix the memory leak in error handling path of
> drmGetDevices.
>
Unless I'm missing something, there is no memory leak fix below ?
Alternatively please keep it as separate patch.

> Change-Id: I2a85a8a4feba8a5cc517ad75c6afb532fa07c53d
Please drop this line.

Regards,
Emil
Qiang Yu July 13, 2016, 10:17 a.m. UTC | #2
Hi Emil,


Nice to hear from you.

On 11 July 2016 at 06:17, Qiang Yu <Qiang.Yu@amd.com> wrote:
> drmGetDevice will always return the first device it find
> under /dev/dri/. This is not true for multi GPU situation.
>
How does the following alternative solution sound:
 - keep drmFoldDuplicatedDevices as is
 - after the drmFoldDuplicatedDevices call use the find_rdev to find
the correct device in local_devices.

[yuq] This is also OK. But drmFoldDuplicatedDevices() has to be changed for the
drmFreeDevices() in the error handling path: it also exit after see a NULL in the array.

> Plus fix the memory leak in error handling path of
> drmGetDevices.
>
Unless I'm missing something, there is no memory leak fix below ?
Alternatively please keep it as separate patch.

[yuq] This is fixed at the same time by changing drmFoldDuplicatedDevices().

> Change-Id: I2a85a8a4feba8a5cc517ad75c6afb532fa07c53d
Please drop this line.

[yuq] OK.

Regards,
Qiang
Emil Velikov July 13, 2016, 11:15 a.m. UTC | #3
On 13 July 2016 at 11:17, Yu, Qiang <Qiang.Yu@amd.com> wrote:
> Hi Emil,
>
>
> Nice to hear from you.
>
>
> On 11 July 2016 at 06:17, Qiang Yu <Qiang.Yu@amd.com> wrote:
>> drmGetDevice will always return the first device it find
>> under /dev/dri/. This is not true for multi GPU situation.
>>
> How does the following alternative solution sound:
>  - keep drmFoldDuplicatedDevices as is
>  - after the drmFoldDuplicatedDevices call use the find_rdev to find
> the correct device in local_devices.
>
> [yuq] This is also OK. But drmFoldDuplicatedDevices() has to be changed for
> the
> drmFreeDevices() in the error handling path: it also exit after see a NULL
> in the array.
>
>> Plus fix the memory leak in error handling path of
>> drmGetDevices.
>>
> Unless I'm missing something, there is no memory leak fix below ?
> Alternatively please keep it as separate patch.
>
> [yuq] This is fixed at the same time by changing drmFoldDuplicatedDevices().
>
Heh, silly me was assumed that your earlier patch fixed all the
codepaths to handle the "holes" made by drmFoldDuplicatedDevices.
Seems like the ones drmFreeDevices and drmGetDevice are still
outstanding, thus the predicament.

In this case we could do either:
 - go with the above making sure drmFoldDuplicatedDevices doesn't create 'holes'
Note: we still want to fix drmFreeDevices to continue (as opposed to
break) when it sees one.
 - or, fix drmGetDevice/drmFreeDevices

In either case we want that as separate patch, bonus points for adding
a inline comment about the behaviour of drmFoldDuplicatedDevices.

About the core issue a trivial suggestion - s/move target to the first
of local_devices/store target at local_devices[0] for ease to use
below/

Thanks
Emil
P.S. When working with mailing lists please use plain text emails.
Qiang Yu July 14, 2016, 3:02 a.m. UTC | #4
Thanks Emil, I'll submit v2 to address your comments.


I'm using office365, not sure this mail is OK for formatting, otherwise I'll switch to a mail client.


Regards,

Qiang
Emil Velikov July 14, 2016, 4:14 p.m. UTC | #5
On 14 July 2016 at 04:02, Yu, Qiang <Qiang.Yu@amd.com> wrote:
> Thanks Emil, I'll submit v2 to address your comments.
>
I believed you covered them all. Thanks !

Small suggestion for the future - I many devs are appreciate when
patches have a brief shortlog before or after the --- line.

>
> I'm using office365, not sure this mail is OK for formatting, otherwise I'll
> switch to a mail client.
>
Don't think you need to need to switch email client(s). See http://bfy.tw/6kFa

Thanks
Emil
Qiang Yu July 15, 2016, 1:22 a.m. UTC | #6
On 14 July 2016 at 04:02, Yu, Qiang <Qiang.Yu@amd.com> wrote:
> Thanks Emil, I'll submit v2 to address your comments.
>
I believed you covered them all. Thanks !

Small suggestion for the future - I many devs are appreciate when
patches have a brief shortlog before or after the --- line.

[yuq] You mean the v1|v2 change log? OK, remember that.

>
> I'm using office365, not sure this mail is OK for formatting, otherwise I'll
> switch to a mail client.
>
Don't think you need to need to switch email client(s). See http://bfy.tw/6kFa

[yuq] Thanks, I changed my setting and this time should be fine.

Regards,
Qiang
diff mbox

Patch

diff --git a/xf86drm.c b/xf86drm.c
index 6689f7c..e90e8e5 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -3064,6 +3064,17 @@  static void drmFoldDuplicatedDevices(drmDevicePtr local_devices[], int count)
             }
         }
     }
+
+    // move all devices to the beginning of local_devices continuously
+    for (i = 0, j = 0; i < count; i++) {
+        if (local_devices[i]) {
+            if (i != j) {
+                local_devices[j] = local_devices[i];
+                local_devices[i] = NULL;
+            }
+            j++;
+        }
+    }
 }
 
 /**
@@ -3087,6 +3098,7 @@  int drmGetDevice(int fd, drmDevicePtr *device)
     int maj, min;
     int ret, i, node_count;
     int max_count = 16;
+    dev_t find_rdev;
 
     if (fd == -1 || device == NULL)
         return -EINVAL;
@@ -3094,6 +3106,7 @@  int drmGetDevice(int fd, drmDevicePtr *device)
     if (fstat(fd, &sbuf))
         return -errno;
 
+    find_rdev = sbuf.st_rdev;
     maj = major(sbuf.st_rdev);
     min = minor(sbuf.st_rdev);
 
@@ -3154,7 +3167,13 @@  int drmGetDevice(int fd, drmDevicePtr *device)
             local_devices = temp;
         }
 
-        local_devices[i] = d;
+        /* move target to the first of local_devices */
+        if (find_rdev == sbuf.st_rdev && i) {
+            local_devices[i] = local_devices[0];
+            local_devices[0] = d;
+        }
+        else
+            local_devices[i] = d;
         i++;
     }
     node_count = i;
@@ -3267,10 +3286,7 @@  int drmGetDevices(drmDevicePtr devices[], int max_devices)
     drmFoldDuplicatedDevices(local_devices, node_count);
 
     device_count = 0;
-    for (i = 0; i < node_count; i++) {
-        if (!local_devices[i])
-            continue;
-
+    for (i = 0; i < node_count && local_devices[i]; i++) {
         if ((devices != NULL) && (device_count < max_devices))
             devices[device_count] = local_devices[i];
         else