diff mbox

[drm/ttm] Memory corruption problem when ttm_tt_init() fails.

Message ID 201501212056.ACF39099.FLVMFOHOSQtFOJ@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Tetsuo Handa Jan. 21, 2015, 11:56 a.m. UTC
I'm doing memory allocation failure injection test using 3.19-rc5 and
it seems to me that there is a memory corruption bug in ttm or vmwgfx code.

---------- Crash pattern 1 start ----------
[   80.751971] [TTM] Failed allocating page table
[   83.000393] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   83.004392] IP: [<ffffffff811b65a9>] __fput+0x39/0x1e0
[   83.006944] PGD 7acd2067 PUD 7b0c7067 PMD 0
[   83.009240] Oops: 0000 [#1] SMP
[   83.010940] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel dm_mirror ghash_clmulni_intel dm_region_hash aesni_intel dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev
vmw_balloon microcode serio_raw pcspkr parport_pc shpchp parport vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput sd_mod ata_generic pata_acpi mptspi scsi_transport_spi mptscsih ata_piix e1000 mptbase libata floppy
[   83.038033] CPU: 2 PID: 8795 Comm: sh Tainted: G        W  OE  3.19.0-rc5+ #28
[   83.039666] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   83.042110] task: ffff88007a220000 ti: ffff880052048000 task.ti: ffff880052048000
[   83.043865] RIP: 0010:[<ffffffff811b65a9>]  [<ffffffff811b65a9>] __fput+0x39/0x1e0
[   83.045665] RSP: 0018:ffff88005204bea8  EFLAGS: 00010297
[   83.046895] RAX: 0000000000000000 RBX: ffff88007aff3500 RCX: 0000000000000a0a
[   83.048595] RDX: 000000000002801d RSI: 000000000000000a RDI: ffff88007aff3500
[   83.050254] RBP: ffff88005204bee8 R08: ffff88007cbfd000 R09: 0000000180080006
[   83.051848] R10: 0000000000000000 R11: ffffea0001f2fe00 R12: ffffffff81e6c040
[   83.053515] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   83.055156] FS:  0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
[   83.057000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.058328] CR2: 0000000000000000 CR3: 000000007b0bc000 CR4: 00000000000407e0
[   83.060004] Stack:
[   83.060482]  ffff88007af0de48 ffff88007af0dc00 ffff88007af0de48 0000000000000000
[   83.062285]  ffffffff81e6c040 ffff88007a220610 ffff88007a220000 0000000000000000
[   83.064115]  ffff88005204bef8 ffffffff811b679e ffff88005204bf28 ffffffff81088f6f
[   83.065956] Call Trace:
[   83.066544]  [<ffffffff811b679e>] ____fput+0xe/0x10
[   83.067738]  [<ffffffff81088f6f>] task_work_run+0xaf/0xf0
[   83.068971]  [<ffffffff81013c5a>] do_notify_resume+0x7a/0x90
[   83.070307]  [<ffffffff816a6d87>] int_signal+0x12/0x17
[   83.071464] Code: 55 41 54 53 48 89 fb 48 83 ec 18 4c 8b 7f 18 4c 8b 77 10 4c 8b 6f 20 e8 06 c7 4e 00 8b 53 44 4c 8b 53 20 89 d0 83 e0 02 83 f8 01 <41> 0f b7 02 45 19 e4 41 83 e4 08 41 83 c4 08 44 89 e1 66 25 00
[   83.077450] RIP  [<ffffffff811b65a9>] __fput+0x39/0x1e0
[   83.078729]  RSP <ffff88005204bea8>
[   83.079522] CR2: 0000000000000000

crash> bt -l
PID: 8795   TASK: ffff88007a220000  CPU: 2   COMMAND: "sh"
 #0 [ffff88005204ba70] machine_kexec at ffffffff8104ef62
    /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320
 #1 [ffff88005204bac0] crash_kexec at ffffffff810ed983
    /usr/src/linux/kernel/kexec.c: 1482
 #2 [ffff88005204bb90] oops_end at ffffffff810176e8
    /usr/src/linux/arch/x86/kernel/dumpstack.c: 231
 #3 [ffff88005204bbc0] no_context at ffffffff8169af1f
    /usr/src/linux/arch/x86/mm/fault.c: 724
 #4 [ffff88005204bc20] __bad_area_nosemaphore at ffffffff8169aff6
    /usr/src/linux/arch/x86/mm/fault.c: 804
 #5 [ffff88005204bc70] bad_area at ffffffff8169b31f
    /usr/src/linux/arch/x86/mm/fault.c: 833
 #6 [ffff88005204bca0] __do_page_fault at ffffffff81059b37
    /usr/src/linux/arch/x86/mm/fault.c: 1213
 #7 [ffff88005204bdc0] do_page_fault at ffffffff81059c11
    /usr/src/linux/arch/x86/mm/fault.c: 1295
 #8 [ffff88005204bdf0] page_fault at ffffffff816a8a28
    /usr/src/linux/arch/x86/kernel/entry_64.S: 1283
    [exception RIP: __fput+57]
    RIP: ffffffff811b65a9  RSP: ffff88005204bea8  RFLAGS: 00010297
    RAX: 0000000000000000  RBX: ffff88007aff3500  RCX: 0000000000000a0a
    RDX: 000000000002801d  RSI: 000000000000000a  RDI: ffff88007aff3500
    RBP: ffff88005204bee8   R8: ffff88007cbfd000   R9: 0000000180080006
    R10: 0000000000000000  R11: ffffea0001f2fe00  R12: ffffffff81e6c040
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88005204bef0] ____fput at ffffffff811b679e
    /usr/src/linux/fs/file_table.c: 245
#10 [ffff88005204bf00] task_work_run at ffffffff81088f6f
    /usr/src/linux/kernel/task_work.c: 125
#11 [ffff88005204bf30] do_notify_resume at ffffffff81013c5a
    /usr/src/linux/include/linux/tracehook.h: 190
#12 [ffff88005204bf50] int_signal at ffffffff816a6d87
    /usr/src/linux/arch/x86/kernel/entry_64.S: 587
    RIP: 00007f1361d5f420  RSP: 00007fff77be5740  RFLAGS: 00000200
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b
WARNING: possibly bogus exception frame
---------- Crash pattern 1 end ----------

---------- Crash pattern 2 start ----------
[  227.647021] [TTM] Failed allocating page table
[  227.875795] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  227.877714] IP: [<ffffffff81594c57>] skb_queue_tail+0x37/0x60
[  227.879107] PGD 78adc067 PUD 78ada067 PMD 0
[  227.880186] Oops: 0002 [#1] SMP
[  227.881017] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mirror aesni_intel dm_region_hash dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev
vmw_balloon microcode parport_pc serio_raw pcspkr parport vmw_vmci shpchp i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput ata_generic pata_acpi sd_mod ata_piix libata mptspi scsi_transport_spi e1000 mptscsih mptbase floppy
[  227.898988] CPU: 2 PID: 610 Comm: Xorg Tainted: G        W  OE  3.19.0-rc5+ #28
[  227.900691] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[  227.903162] task: ffff8800788c6040 ti: ffff8800792d8000 task.ti: ffff8800792d8000
[  227.904884] RIP: 0010:[<ffffffff81594c57>]  [<ffffffff81594c57>] skb_queue_tail+0x37/0x60
[  227.906816] RSP: 0018:ffff8800792dbbc8  EFLAGS: 00010046
[  227.908056] RAX: 0000000000000292 RBX: ffff88007cbc6d10 RCX: 0000000000000000
[  227.909718] RDX: 0000000000000000 RSI: 0000000000000292 RDI: ffff88007cbc6d24
[  227.911376] RBP: ffff8800792dbbe8 R08: 0000000000000292 R09: 0180000002800000
[  227.913027] R10: 0000000700020008 R11: 0000000000000000 R12: ffff88007b65aa00
[  227.914690] R13: ffff88007cbc6d24 R14: 0000000000000000 R15: ffff88007cbc6c80
[  227.916356] FS:  00007f3d07740980(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
[  227.918232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  227.919559] CR2: 0000000000000000 CR3: 0000000078add000 CR4: 00000000000407e0
[  227.921261] Stack:
[  227.921744]  0000000000000078 ffff88007b65aa00 0000000000000078 0000000000000000
[  227.923618]  ffff8800792dbca8 ffffffff816491bd ffff88007cbc6d10 ffff8800792dbd10
[  227.925427]  0000007800000000 ffff8800792dbcc8 0000000000000078 ffff88007cbc6f78
[  227.927271] Call Trace:
[  227.927872]  [<ffffffff816491bd>] unix_stream_sendmsg+0x1dd/0x430
[  227.929301]  [<ffffffff8158c0c3>] sock_aio_write+0x103/0x140
[  227.930638]  [<ffffffff811b42ec>] do_sync_readv_writev+0x4c/0x80
[  227.932047]  [<ffffffff811b5c95>] do_readv_writev+0x1e5/0x280
[  227.933406]  [<ffffffff8101fe4b>] ? __restore_xstate_sig+0x8b/0x680
[  227.934865]  [<ffffffff81104424>] ? __audit_syscall_entry+0xb4/0x110
[  227.936371]  [<ffffffff811b5db9>] vfs_writev+0x39/0x50
[  227.937565]  [<ffffffff811b5eea>] SyS_writev+0x4a/0xd0
[  227.938777]  [<ffffffff816a6d6c>] ? int_check_syscall_exit_work+0x34/0x3d
[  227.940364]  [<ffffffff816a6ae9>] system_call_fastpath+0x12/0x17
[  227.941775] Code: 8d 6f 14 41 54 49 89 f4 53 48 89 fb 4c 89 ef 48 83 ec 08 e8 dc 1a 11 00 48 8b 53 08 49 89 1c 24 4c 89 ef 48 89 c6 49 89 54 24 08 <4c> 89 22 83 43 10 01 4c 89 63 08 e8 09 17 11 00 48 83 c4 08 5b
[  227.947880] RIP  [<ffffffff81594c57>] skb_queue_tail+0x37/0x60
[  227.949297]  RSP <ffff8800792dbbc8>
[  227.950112] CR2: 0000000000000000

crash> bt -l
PID: 610    TASK: ffff8800788c6040  CPU: 2   COMMAND: "Xorg"
 #0 [ffff8800792db790] machine_kexec at ffffffff8104ef62
    /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320
 #1 [ffff8800792db7e0] crash_kexec at ffffffff810ed983
    /usr/src/linux/kernel/kexec.c: 1482
 #2 [ffff8800792db8b0] oops_end at ffffffff810176e8
    /usr/src/linux/arch/x86/kernel/dumpstack.c: 231
 #3 [ffff8800792db8e0] no_context at ffffffff8169af1f
    /usr/src/linux/arch/x86/mm/fault.c: 724
 #4 [ffff8800792db940] __bad_area_nosemaphore at ffffffff8169aff6
    /usr/src/linux/arch/x86/mm/fault.c: 804
 #5 [ffff8800792db990] bad_area at ffffffff8169b31f
    /usr/src/linux/arch/x86/mm/fault.c: 833
 #6 [ffff8800792db9c0] __do_page_fault at ffffffff81059b37
    /usr/src/linux/arch/x86/mm/fault.c: 1213
 #7 [ffff8800792dbae0] do_page_fault at ffffffff81059c11
    /usr/src/linux/arch/x86/mm/fault.c: 1295
 #8 [ffff8800792dbb10] page_fault at ffffffff816a8a28
    /usr/src/linux/arch/x86/kernel/entry_64.S: 1283
    [exception RIP: skb_queue_tail+55]
    RIP: ffffffff81594c57  RSP: ffff8800792dbbc8  RFLAGS: 00010046
    RAX: 0000000000000292  RBX: ffff88007cbc6d10  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000292  RDI: ffff88007cbc6d24
    RBP: ffff8800792dbbe8   R8: 0000000000000292   R9: 0180000002800000
    R10: 0000000700020008  R11: 0000000000000000  R12: ffff88007b65aa00
    R13: ffff88007cbc6d24  R14: 0000000000000000  R15: ffff88007cbc6c80
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8800792dbbf0] unix_stream_sendmsg at ffffffff816491bd
    /usr/src/linux/net/unix/af_unix.c: 1711
#10 [ffff8800792dbcb0] sock_aio_write at ffffffff8158c0c3
    /usr/src/linux/net/socket.c: 955
#11 [ffff8800792dbd90] do_sync_readv_writev at ffffffff811b42ec
    /usr/src/linux/fs/read_write.c: 697
#12 [ffff8800792dbe20] do_readv_writev at ffffffff811b5c95
    /usr/src/linux/fs/read_write.c: 851
#13 [ffff8800792dbf20] vfs_writev at ffffffff811b5db9
    /usr/src/linux/fs/read_write.c: 893
#14 [ffff8800792dbf30] sys_writev at ffffffff811b5eea
    /usr/src/linux/fs/read_write.c: 926
#15 [ffff8800792dbf80] system_call_fastpath at ffffffff816a6ae9
    /usr/src/linux/arch/x86/kernel/entry_64.S: 423
    RIP: 00007f3d056223c0  RSP: 00007ffff316be40  RFLAGS: 00003293
    RAX: ffffffffffffffda  RBX: ffffffff816a6ae9  RCX: ffffffffffffffff
    RDX: 0000000000000001  RSI: 00007ffff316af90  RDI: 0000000000000014
    RBP: 0000000001d59be0   R8: 0000000000000000   R9: 0000000000000004
    R10: 00000000ffffffff  R11: 0000000000003293  R12: 00007f3d077406a0
    R13: 0000000000000001  R14: 00007ffff316af90  R15: 0000000000000000
    ORIG_RAX: 0000000000000014  CS: 0033  SS: 002b
---------- Crash pattern 2 end ----------

---------- Crash pattern 3 start ----------
[   88.675004] [TTM] Failed allocating page table
[   88.678152] BUG: unable to handle kernel paging request at ffff8801531d77c0
[   88.679845] IP: [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0
[   88.681221] PGD 1f2b067 PUD 0
[   88.682000] Oops: 0002 [#1] SMP
[   88.682838] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mirror dm_region_hash aesni_intel dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev
vmw_balloon microcode serio_raw pcspkr parport_pc shpchp vmw_vmci parport i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput sd_mod ata_generic pata_acpi e1000 ata_piix libata mptspi scsi_transport_spi mptscsih mptbase floppy
[   88.701377] CPU: 0 PID: 3904 Comm: gnome-shell Tainted: G        W  OE  3.19.0-rc5+ #31
[   88.703292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   88.705840] task: ffff880079e05780 ti: ffff88007918c000 task.ti: ffff88007918c000
[   88.707575] RIP: 0010:[<ffffffff815964b5>]  [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0
[   88.709601] RSP: 0018:ffff88007918faa8  EFLAGS: 00010246
[   88.710884] RAX: 00000000ffffffff RBX: ffff8800531d7700 RCX: 00000000ffffffff
[   88.712584] RDX: ffff8801531d77c0 RSI: 0000000000000000 RDI: ffff8800531d77c8
[   88.714260] RBP: ffff88007918faf8 R08: 00000000ffffffc0 R09: 0000000000000200
[   88.715927] R10: ffffffff8159639e R11: ffff88007f803700 R12: ffff8800531d7800
[   88.717648] R13: 00000000ffffffff R14: ffff88007f803700 R15: 0000000000000100
[   88.719327] FS:  00007fcafd8aaa00(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[   88.721216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.722548] CR2: ffff8801531d77c0 CR3: 00000000790ae000 CR4: 00000000000407f0
[   88.724257] Stack:
[   88.724761]  ffff88007a2cdf00 000000007a01b800 0000000000000246 00ff88007918fcd8
[   88.726741]  ffff88007918fae8 0000000000000003 0000000000000000 ffff88007918fba8
[   88.728578]  ffff88007a01b800 0000000000000000 ffff88007918fb58 ffffffff81596d5c
[   88.730582] Call Trace:
[   88.731243]  [<ffffffff81596d5c>] alloc_skb_with_frags+0x5c/0x1e0
[   88.732725]  [<ffffffff811c99bc>] ? do_sys_poll+0x12c/0x5b0
[   88.734208]  [<ffffffff815910b6>] sock_alloc_send_pskb+0x196/0x250
[   88.735710]  [<ffffffff8159b887>] ? skb_copy_datagram_from_iter+0xe7/0x200
[   88.737361]  [<ffffffff8164ba07>] ? wait_for_unix_gc+0x27/0xa0
[   88.738784]  [<ffffffff8164928a>] unix_stream_sendmsg+0x2aa/0x430
[   88.740213]  [<ffffffff8158c0c3>] sock_aio_write+0x103/0x140
[   88.741610]  [<ffffffff811c8860>] ? poll_select_copy_remaining+0x130/0x130
[   88.743278]  [<ffffffff811b42ec>] do_sync_readv_writev+0x4c/0x80
[   88.744721]  [<ffffffff811b5c95>] do_readv_writev+0x1e5/0x280
[   88.746109]  [<ffffffff8158bf9d>] ? SYSC_recvfrom+0x13d/0x160
[   88.747452]  [<ffffffff81104424>] ? __audit_syscall_entry+0xb4/0x110
[   88.748992]  [<ffffffff811b5db9>] vfs_writev+0x39/0x50
[   88.750192]  [<ffffffff811b5eea>] SyS_writev+0x4a/0xd0
[   88.751423]  [<ffffffff811046b6>] ? __audit_syscall_exit+0x236/0x2e0
[   88.753121]  [<ffffffff816a6ae9>] system_call_fastpath+0x12/0x17
[   88.754650] Code: b6 83 90 00 00 00 83 e0 f7 09 c8 b9 ff ff ff ff 85 f6 88 83 90 00 00 00 b8 ff ff ff ff 66 89 8b c2 00 00 00 66 89 83 c6 00 00 00 <48> c7 02 00 00 00 00 48 c7 42 08 00 00 00 00 48 c7 42 10 00 00
[   88.761554] RIP  [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0
[   88.763077]  RSP <ffff88007918faa8>
[   88.763978] CR2: ffff8801531d77c0

crash> bt -l
PID: 3904   TASK: ffff880079e05780  CPU: 0   COMMAND: "gnome-shell"
 #0 [ffff88007918f690] machine_kexec at ffffffff8104ef62
    /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320
 #1 [ffff88007918f6e0] crash_kexec at ffffffff810ed983
    /usr/src/linux/kernel/kexec.c: 1482
 #2 [ffff88007918f7b0] oops_end at ffffffff810176e8
    /usr/src/linux/arch/x86/kernel/dumpstack.c: 231
 #3 [ffff88007918f7e0] no_context at ffffffff8169af1f
    /usr/src/linux/arch/x86/mm/fault.c: 724
 #4 [ffff88007918f840] __bad_area_nosemaphore at ffffffff8169aff6
    /usr/src/linux/arch/x86/mm/fault.c: 804
 #5 [ffff88007918f890] bad_area_nosemaphore at ffffffff8169b162
    /usr/src/linux/arch/x86/mm/fault.c: 812
 #6 [ffff88007918f8a0] __do_page_fault at ffffffff810596f8
    /usr/src/linux/arch/x86/mm/fault.c: 1277
 #7 [ffff88007918f9c0] do_page_fault at ffffffff81059c11
    /usr/src/linux/arch/x86/mm/fault.c: 1295
 #8 [ffff88007918f9f0] page_fault at ffffffff816a8a28
    /usr/src/linux/arch/x86/kernel/entry_64.S: 1283
    [exception RIP: __alloc_skb+357]
    RIP: ffffffff815964b5  RSP: ffff88007918faa8  RFLAGS: 00010246
    RAX: 00000000ffffffff  RBX: ffff8800531d7700  RCX: 00000000ffffffff
    RDX: ffff8801531d77c0  RSI: 0000000000000000  RDI: ffff8800531d77c8
    RBP: ffff88007918faf8   R8: 00000000ffffffc0   R9: 0000000000000200
    R10: ffffffff8159639e  R11: ffff88007f803700  R12: ffff8800531d7800
    R13: 00000000ffffffff  R14: ffff88007f803700  R15: 0000000000000100
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88007918fb00] alloc_skb_with_frags at ffffffff81596d5c
    /usr/src/linux/net/core/skbuff.c: 4386
#10 [ffff88007918fb60] sock_alloc_send_pskb at ffffffff815910b6
    /usr/src/linux/net/core/sock.c: 1826
#11 [ffff88007918fbf0] unix_stream_sendmsg at ffffffff8164928a
    /usr/src/linux/net/unix/af_unix.c: 1682
#12 [ffff88007918fcb0] sock_aio_write at ffffffff8158c0c3
    /usr/src/linux/net/socket.c: 955
#13 [ffff88007918fd90] do_sync_readv_writev at ffffffff811b42ec
    /usr/src/linux/fs/read_write.c: 697
#14 [ffff88007918fe20] do_readv_writev at ffffffff811b5c95
    /usr/src/linux/fs/read_write.c: 851
#15 [ffff88007918ff20] vfs_writev at ffffffff811b5db9
    /usr/src/linux/fs/read_write.c: 893
#16 [ffff88007918ff30] sys_writev at ffffffff811b5eea
    /usr/src/linux/fs/read_write.c: 926
#17 [ffff88007918ff80] system_call_fastpath at ffffffff816a6ae9
    /usr/src/linux/arch/x86/kernel/entry_64.S: 423
    RIP: 00007fcaf3c273c0  RSP: 00007fffadd91330  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff816a6ae9  RCX: 00007fffadd91360
    RDX: 0000000000000002  RSI: 00007fffadd914b0  RDI: 0000000000000006
    RBP: 0000000000b5c230   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000293  R12: 00007fffadd91428
    R13: 00007fffadd91424  R14: 0000000000b5c248  R15: 0000000000000001
    ORIG_RAX: 0000000000000014  CS: 0033  SS: 002b
---------- Crash pattern 3 end ----------

---------- Failed memory allocation start ----------
0xffffffff81199850 : __kmalloc+0x0/0x280 [kernel]
    /usr/src/linux/mm/slub.c:3247
0xffffffff814676fa : ttm_tt_init+0x8a/0xb0 [kernel]
    /usr/src/linux/include/linux/slab.h:524
    /usr/src/linux/include/linux/slab.h:535
    /usr/src/linux/include/drm/drm_mem_util.h:38
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_tt.c:53
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_tt.c:200
0xffffffff8147caa6 : vmw_ttm_tt_create+0x76/0xb0 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c:700
0xffffffff81467b8d : ttm_bo_add_ttm+0x9d/0xe0 [kernel]
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:238
0xffffffff8146a2ff : ttm_bo_validate+0x14f/0x1f0 [kernel]
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:1067
0xffffffff8146a5d4 : ttm_bo_init+0x234/0x470 [kernel]
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:1167
0xffffffff8147ae9e : vmw_dmabuf_init+0x13e/0x240 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:435
0xffffffff8147b0cb : vmw_user_dmabuf_alloc+0x8b/0x120 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:503
0xffffffff8147b202 : vmw_dmabuf_alloc_ioctl+0x52/0xb0 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:698
0xffffffff814497a4 : drm_ioctl+0x1a4/0x630 [kernel]
    /usr/src/linux/drivers/gpu/drm/drm_ioctl.c:727
0xffffffff814773c9 : vmw_generic_ioctl+0x169/0x260 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1073
0xffffffff814774f5 : vmw_unlocked_ioctl+0x15/0x20 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1084
0xffffffff811c7c18 : do_vfs_ioctl+0x2f8/0x510 [kernel]
    /usr/src/linux/fs/ioctl.c:44
    /usr/src/linux/fs/ioctl.c:602
0xffffffff811c7e71 : sys_ioctl+0x41/0x80 [kernel]
    /usr/src/linux/include/linux/file.h:38
    /usr/src/linux/fs/ioctl.c:618
    /usr/src/linux/fs/ioctl.c:608
0xffffffff816a6ae9 : system_call_fastpath+0x12/0x17 [kernel]
    /usr/src/linux/arch/x86/kernel/entry_64.S:423
---------- Failed memory allocation end ----------

If I skip ttm_tt_destroy() call, this bug no longer occurs. Therefore,
I guess that this memory corruption is caused by the destroy function
being called with partially initialized ttm object.

 

I can reproduce this problem at least since 3.13.0. I don't know whether
this problem is specific to vmwgfx code or not, for I tested only CentOS 7
with GUI environment on VMware Player 6.

I think you can reproduce this problem by starting a SystemTap script shown
below and then flipping windows using from Ctrl-Alt-F1 to Ctrl-Alt-F7 .

---------- Reproducer start ----------
# stap -g -e 'global is_target%;
probe begin { printf("Probe start!\n"); }
probe module("ttm").function("ttm_tt_init") { is_target[tid()] = 1; }
probe module("ttm").function("ttm_tt_init").return { is_target[tid()] = 0; }
probe kernel.function("__kmalloc") {
  if (($flags & %{ __GFP_NOFAIL | __GFP_WAIT %} ) == %{ __GFP_WAIT %} && is_target[tid()]) {
    print_backtrace();
    $size = 1 << 30;
    exit();
  }
}
probe end { delete is_target; }'
---------- Reproducer end ----------

I can also reproduce below problem using 3.10.0-123.9.3.el7.x86_64 ,
though below problem might be different from above problem.

---------- Crash pattern 4 start ----------
[TTM] Failed allocating page table
------------[ cut here ]------------
WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
list_add corruption. prev->next should be next (ffff88007af4cd98), but was           (null). (prev=ffff88007ac881f0).
Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ppdev glue_helper vmw_balloon ablk_helper cryptd serio_raw parport_pc i2c_piix4 parport
vmw_vmci pcspkr dm_mirror shpchp dm_region_hash dm_log mperf dm_mod nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi crc32c_intel vmwgfx mptspi ttm scsi_transport_spi mptscsih ahci ata_piix libahci drm mptbase libata e1000 i2c_core floppy [last unloaded: stap_bad36894e80d53e8ee72ce3ee48a27ac_3394]
CPU: 0 PID: 849 Comm: Xorg Tainted: GF       W  O--------------   3.10.0-123.9.3.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
 ffff88007984da10 00000000da42f7a4 ffff88007984d9c8 ffffffff815e239b
 ffff88007984da00 ffffffff8105dee1 ffff88007ac881f0 ffff88007af4cd98
 ffff88007ac881f0 0000000000000282 ffff88007984db98 ffff88007984da68
Call Trace:
 [<ffffffff815e239b>] dump_stack+0x19/0x1b
 [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
 [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff812cfeec>] __list_add+0xac/0xc0
 [<ffffffffa01a56e9>] vmw_fence_create+0xd9/0x130 [vmwgfx]
 [<ffffffffa0197ef8>] vmw_execbuf_fence_commands+0xc8/0x120 [vmwgfx]
 [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx]
 [<ffffffff81194585>] ? __kmalloc+0x55/0x230
 [<ffffffffa0199af8>] do_dmabuf_dirty_sou.isra.9+0x328/0x3c0 [vmwgfx]
 [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm]
 [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm]
 [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx]
 [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230
 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm]
 [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm]
 [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540
 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70
 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0
 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx]
 [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0
 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0
 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0
 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b
---[ end trace a993c155f4775b96 ]---
------------[ cut here ]------------
WARNING: at lib/list_debug.c:36 __list_add+0x8a/0xc0()
list_add double add: new=ffff88007ac881f0, prev=ffff88007ac881f0, next=ffff88007af4cd98.
Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ppdev glue_helper vmw_balloon ablk_helper cryptd serio_raw parport_pc i2c_piix4 parport
vmw_vmci pcspkr dm_mirror shpchp dm_region_hash dm_log mperf dm_mod nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi crc32c_intel vmwgfx mptspi ttm scsi_transport_spi mptscsih ahci ata_piix libahci drm mptbase libata e1000 i2c_core floppy [last unloaded: stap_bad36894e80d53e8ee72ce3ee48a27ac_3394]
CPU: 0 PID: 849 Comm: Xorg Tainted: GF       W  O--------------   3.10.0-123.9.3.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
 ffff88007984da10 00000000da42f7a4 ffff88007984d9c8 ffffffff815e239b
 ffff88007984da00 ffffffff8105dee1 ffff88007ac881f0 ffff88007af4cd98
 ffff88007ac881f0 0000000000000282 ffff88007984db98 ffff88007984da68
Call Trace:
 [<ffffffff815e239b>] dump_stack+0x19/0x1b
 [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
 [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff812cfeca>] __list_add+0x8a/0xc0
 [<ffffffffa01a56e9>] vmw_fence_create+0xd9/0x130 [vmwgfx]
 [<ffffffffa0197ef8>] vmw_execbuf_fence_commands+0xc8/0x120 [vmwgfx]
 [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx]
 [<ffffffff81194585>] ? __kmalloc+0x55/0x230
 [<ffffffffa0199af8>] do_dmabuf_dirty_sou.isra.9+0x328/0x3c0 [vmwgfx]
 [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm]
 [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm]
 [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx]
 [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230
 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm]
 [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm]
 [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540
 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70
 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0
 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx]
 [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0
 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0
 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0
 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b
---[ end trace a993c155f4775b97 ]---
INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=60019 jiffies, g=6722, c=6721, q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU: 0 PID: 849 Comm: Xorg Tainted: GF       W  O--------------   3.10.0-123.9.3.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff880077655b00 ti: ffff88007984c000 task.ti: ffff88007984c000
RIP: 0010:[<ffffffff8108ece5>]  [<ffffffff8108ece5>] __wake_up_common+0x5/0x90
RSP: 0018:ffff88007984d9d0  EFLAGS: 00000046
RAX: 0000000000000046 RBX: ffff88007ac88220 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007ac88220
RBP: ffff88007984da00 R08: 0000000000000000 R09: ffff88007f617320
R10: ffffea000173f700 R11: ffffffffa01a462d R12: 0000000000000046
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
FS:  00007faaaca78980(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007faaa4b3c000 CR3: 000000007baaf000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffff81090af9 ffff88007af4cd80 ffff88007ac881e0 ffff88007ac881f0
 ffff88007984da48 ffff88007ac881e0 ffff88007984da88 ffffffffa01a524b
 ffffc90008680018 ffff88007af4cda8 ffff88007af4cdd8 0000000000000292
Call Trace:
 [<ffffffff81090af9>] ? __wake_up+0x39/0x50
 [<ffffffffa01a524b>] vmw_fences_update+0x11b/0x220 [vmwgfx]
 [<ffffffffa01a2568>] vmw_update_seqno+0x48/0x50 [vmwgfx]
 [<ffffffffa01a2073>] vmw_fifo_send_fence+0x93/0xe0 [vmwgfx]
 [<ffffffffa0197e85>] vmw_execbuf_fence_commands+0x55/0x120 [vmwgfx]
 [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx]
 [<ffffffffa01998d0>] do_dmabuf_dirty_sou.isra.9+0x100/0x3c0 [vmwgfx]
 [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm]
 [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm]
 [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx]
 [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230
 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm]
 [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm]
 [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540
 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70
 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0
 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx]
 [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0
 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0
 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0
 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b
Code: 49 0f af c0 e9 64 ff ff ff 0f 1f 44 00 00 44 8d 4a ff 31 c0 45 31 c0 4d 63 c9 e9 4e ff ff ff 0f 1f 80 00 00 00 00 66 66 66 66 90 <55> 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67
---------- Crash pattern 4 end ----------

Comments

Gu Jinxiang July 14, 2020, 9:13 a.m. UTC | #1
hi

I've encountered [BUG: unable to handle kernel NULL pointer dereference at] which has call stack like your pattern2.
And before this happended, I got a lot of memory allocation failure warnings.
And my kernel is 3.10.0-327.62.1.el7.x86_64.

Since, you mentioned it may be a bug of drm/tmm. So, I checked drm/ttm for possible patch to fix this problem, but found nothing.
Could you please tell me is there any progress of this problem that you detected.

Best wished!

Jinxiang, Gu
Tetsuo Handa July 14, 2020, 10:29 a.m. UTC | #2
On 2020/07/14 18:13, Gu Jinxiang wrote:
> I've encountered [BUG: unable to handle kernel NULL pointer dereference at] which has call stack like your pattern2.
> And before this happended, I got a lot of memory allocation failure warnings.
> And my kernel is 3.10.0-327.62.1.el7.x86_64.
> 
> Since, you mentioned it may be a bug of drm/tmm. So, I checked drm/ttm for possible patch to fix this problem, but found nothing.
> Could you please tell me is there any progress of this problem that you detected.

I'm not aware of any progress on https://patchwork.kernel.org/patch/5681611/ .
Dave Airlie July 29, 2020, 6:20 a.m. UTC | #3
On Wed, 15 Jul 2020 at 17:00, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/07/14 18:13, Gu Jinxiang wrote:
> > I've encountered [BUG: unable to handle kernel NULL pointer dereference at] which has call stack like your pattern2.
> > And before this happended, I got a lot of memory allocation failure warnings.
> > And my kernel is 3.10.0-327.62.1.el7.x86_64.
> >
> > Since, you mentioned it may be a bug of drm/tmm. So, I checked drm/ttm for possible patch to fix this problem, but found nothing.
> > Could you please tell me is there any progress of this problem that you detected.
>
> I'm not aware of any progress on https://patchwork.kernel.org/patch/5681611/ .

Just found this email, I've hopefully fix this issue in my drm-next tree with

https://patchwork.freedesktop.org/patch/380782/

Dave.
diff mbox

Patch

--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -199,8 +199,8 @@  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_bo_device *bdev,
 
 	ttm_tt_alloc_page_directory(ttm);
 	if (!ttm->pages) {
-		ttm_tt_destroy(ttm);
-		pr_err("Failed allocating page table\n");
+		//ttm_tt_destroy(ttm);
+		pr_err("Failed allocating page table, but skip ttm_tt_destroy()\n");
 		return -ENOMEM;
 	}
 	return 0;