diff mbox

Kernel Crash on iSCSI volume create

Message ID 1496211990.27407.144.camel@haakon3.risingtidesystems.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nicholas A. Bellinger May 31, 2017, 6:26 a.m. UTC
On Fri, 2017-05-26 at 09:51 -0500, Bryant G. Ly wrote:
> On 5/26/17 1:51 AM, Nicholas A. Bellinger wrote:
> 
> > Hey Bryant,
> >
> > On Wed, 2017-05-24 at 13:34 -0500, Bryant G. Ly wrote:
> >> Hi Nick,
> >>
> >> We are seeing the below kernel crash with a block backstore.
> >>
> >> The steps to reproduce are: Discovered a new iSCSI volume, created a new ibmvscsis instance, mapped it.
> >>
> > Mmmm, based upon the INQUIRY string this looks like a PSCSI backstore,
> > yes..?
> 
> No, its a block backstore, we dont support PSCSI via our management console (Novalink).
> I don't know why the inquiry string would show it as a PSCSI backstore.
> 

Ah, was getting confused by the initiator INQUIRY strings..

> >
> > So a iscsi initiator provided struct scsi_device is registered
> > with /sys/kernel/config/target/core/pscsi_*, and then mapped to a
> > ibmvscsis lun, right..?
> >
> > Btw looking at 4.8.17, nothing has changed since Jan
> > in /drivers/target/, and certainly nothing related to configfs operation
> > well before that..
> >
> > I assume this not a new regression, right..?
> 
> We just started testing with iSCSI, so we did not see it before.
> 
> >> [ 7872.359650] scsi host6: iSCSI Initiator over TCP/IP
> >> [ 7872.621777] scsi 6:0:0:0: Direct-Access     IBM      2145             0000 PQ: 0 ANSI: 6
> >> [ 7872.665105] sd 6:0:0:0: Attached scsi generic sg14 type 0
> >> [ 7872.665600] sd 6:0:0:0: [sdg] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
> >> [ 7872.666986] sd 6:0:0:0: [sdg] Write Protect is off
> >> [ 7872.666990] sd 6:0:0:0: [sdg] Mode Sense: 97 00 10 08
> >> [ 7872.667691] sd 6:0:0:0: [sdg] Write cache: disabled, read cache: enabled, supports DPO and FUA
> >> [ 7872.680928] sd 6:0:0:0: [sdg] Attached SCSI disk
> >> [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
> >> [ 7879.893760] ------------[ cut here ]------------
> >> [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
> >> [ 7879.893768] Modules linked in: target_core_pscsi target_core_file target_core_iblock rpadlpar_io rpaphp ip6table_raw xt_CT xt_mac xt_tcpudp xt_comment xt_physdev xt_set ip_set_hash_net ip_set iscsi_target_mod iptable_raw dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag openvswitch nf_nat_ipv6 libcrc32c target_core_user uio binfmt_misc pseries_rng ibmvmc(O) vmx_crypto rtc_generic xt_conntrack nf_nat_ftp nf_conntrack_ftp nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv6 nf_defrag_ipv6 nbd ipt_REJECT nf_reject_ipv4 ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables ip6table_filter ip6_tables ebtables x_tables br_netfilter
> >> [ 7879.893797]  bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 ses enclosure scsi_transport_sas ibmvscsis target_core_mod configfs ibmvscsi ibmveth(O) qla2xxx scsi_transport_fc ipr
> >> [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O    4.8.17-customv2.22 #12
> >> [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
> >> [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
> >> [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G           O     (4.8.17-customv2.22)
> >> [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
> >> [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
> >>                  GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
> >>                  GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
> >>                  GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
> >>                  GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
> >>                  GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
> >>                  GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
> >>                  GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
> >>                  GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
> >> [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
> >> [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
> >> [ 7879.893842] Call Trace:
> >> [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
> >> [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
> >> [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
> >> [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
> >> [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
> >> [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
> >> [ 7879.893856] Instruction dump:
> >> [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
> >> [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
> >> [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---
> >> [ 7912.992583] sd 6:0:0:0: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> >> [ 7912.992587] sd 6:0:0:0: [sdg] tag#0 Sense Key : Illegal Request [current]
> >> [ 7912.992589] sd 6:0:0:0: [sdg] tag#0 Add. Sense: Logical unit not supported
> >> [ 7912.992591] sd 6:0:0:0: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 80 00
> >> [ 7912.992593] blk_update_request: I/O error, dev sdg, sector 0
> >>
> > The kref_get() on struct config_item->ci_kref in config_item_get() is
> > triggering a sanity check warning struct kref has reached zero, while a
> > new kref_get() is attempting to take a new reference.
> >
> > Normally it means kref_get() should be using kref_get_unless_zero(), but
> > since this has never been reported before, it makes me think something
> > else outside of configfs proper is going on..
> >
> > What configfs operations are triggering this warning, and what do the
> > configfs operations preceding it look like..?
> >
> > Also, is this specific to PSCSI export or does it trigger independent of
> > target-core backend driver..?
> 
> They were attaching a iSCSI volume, so basically we do a discover/login and then targetcli
> is called to do the mappings. Also between the attach there were a few ls queries.
> All this is done via Novalink API(Virtualization management).
> 
> So the only configfs operations are through the mapping (backstores/block create) and the ls queries.
> 

Ok, after doing further investigation in fs/configfs, it looks like the
config_item_get() triggering WARN_ON() here is most likely coming from
configfs_symlink() -> create_link().

So assuming this is the code-path in question, it looks like a backend
device is being released via:

   rmdir /sys/kernel/config/target/core/iblock_XXXX/$FOO

at the same time another process is attempting to symlink the same
configfs location via target_fabric_port_link() to a ibmvscsis port via:

   ln -s /sys/kernel/config/target/core/iblock_XXX/$FOO \
         /sys/kernel/config/target/ibmvscsis/$WWPN/$TPGT/lun/lun_XXX/$LINK

So let's first confirm this is in fact what's triggering the kref_get()
warning.  With CONFIG_DYNAMIC_DEBUG=y, and enabling full debug with:

   echo 'module target_core_mod +p' > /debug/dynamic_debug/control

should do the trick.

There is enough debug in target-core-mod already to
verify configfs_symlink() -> type->ct_item_ops->allow_link()
target_fabric_port_link() -> core_dev_add_lun() is what's getting
invoked, which happens immediately before create_link() ->
config_item_get() -> kref_get() gets called and hits the WARN_ON.

That said, I do see a race in fs/configfs/symlink.c:create_link() that
might explain what you've observed..

Looking at create_link(), the config_item_get() -> kref_get() for the
struct config_item (eg: se_dev_ci from target_fabric_port_link) symlink
target is attempted before the check for 
target_sd->s_type & CONFIGFS_USET_DROPPING occurs, eg:

        sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
        if (sl) {
                sl->sl_target = config_item_get(item);
                spin_lock(&configfs_dirent_lock);
                if (target_sd->s_type & CONFIGFS_USET_DROPPING) {
                        spin_unlock(&configfs_dirent_lock);
                        config_item_put(item);
                        kfree(sl);
                        return -ENOENT;
                }
                list_add(&sl->sl_list, &target_sd->s_links);
                spin_unlock(&configfs_dirent_lock);
                ret = configfs_create_link(sl, parent_item->ci_dentry,
                                           dentry);

which means configfs_rmdir() -> configfs_detach_prep() setting
CONFIGFS_USET_DROPPING and subsequently doing the final:

        /* Drop our reference from above */
        config_item_put(item);

could happen while config_item_get() is called from create_link(),
before the CONFIGFS_USET_DROPPING check occurs..

Here's a quick (untested) patch to close that potential race.

Please confirm with and without the following with debug enabled, and
let's see what we find.


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Nicholas A. Bellinger June 1, 2017, 5 a.m. UTC | #1
Hey Bryant,

After spending more time reviewing fs/configfs code today, I've
convenienced myself this is a long standing configfs race between
config_group shutdown and symlink creation.

Have you been able to verify the particular scenario is in fact what's
happening..?

On Tue, 2017-05-30 at 23:26 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2017-05-26 at 09:51 -0500, Bryant G. Ly wrote:
> > On 5/26/17 1:51 AM, Nicholas A. Bellinger wrote:
> > 
> > > Hey Bryant,
> > >
> > > On Wed, 2017-05-24 at 13:34 -0500, Bryant G. Ly wrote:
> > >> Hi Nick,
> > >>
> > >> We are seeing the below kernel crash with a block backstore.
> > >>
> > >> The steps to reproduce are: Discovered a new iSCSI volume, created a new ibmvscsis instance, mapped it.
> > >>
> > > Mmmm, based upon the INQUIRY string this looks like a PSCSI backstore,
> > > yes..?
> > 
> > No, its a block backstore, we dont support PSCSI via our management console (Novalink).
> > I don't know why the inquiry string would show it as a PSCSI backstore.
> > 
> 
> Ah, was getting confused by the initiator INQUIRY strings..
> 
> > >
> > > So a iscsi initiator provided struct scsi_device is registered
> > > with /sys/kernel/config/target/core/pscsi_*, and then mapped to a
> > > ibmvscsis lun, right..?
> > >
> > > Btw looking at 4.8.17, nothing has changed since Jan
> > > in /drivers/target/, and certainly nothing related to configfs operation
> > > well before that..
> > >
> > > I assume this not a new regression, right..?
> > 
> > We just started testing with iSCSI, so we did not see it before.
> > 
> > >> [ 7872.359650] scsi host6: iSCSI Initiator over TCP/IP
> > >> [ 7872.621777] scsi 6:0:0:0: Direct-Access     IBM      2145             0000 PQ: 0 ANSI: 6
> > >> [ 7872.665105] sd 6:0:0:0: Attached scsi generic sg14 type 0
> > >> [ 7872.665600] sd 6:0:0:0: [sdg] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
> > >> [ 7872.666986] sd 6:0:0:0: [sdg] Write Protect is off
> > >> [ 7872.666990] sd 6:0:0:0: [sdg] Mode Sense: 97 00 10 08
> > >> [ 7872.667691] sd 6:0:0:0: [sdg] Write cache: disabled, read cache: enabled, supports DPO and FUA
> > >> [ 7872.680928] sd 6:0:0:0: [sdg] Attached SCSI disk
> > >> [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
> > >> [ 7879.893760] ------------[ cut here ]------------
> > >> [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
> > >> [ 7879.893768] Modules linked in: target_core_pscsi target_core_file target_core_iblock rpadlpar_io rpaphp ip6table_raw xt_CT xt_mac xt_tcpudp xt_comment xt_physdev xt_set ip_set_hash_net ip_set iscsi_target_mod iptable_raw dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag openvswitch nf_nat_ipv6 libcrc32c target_core_user uio binfmt_misc pseries_rng ibmvmc(O) vmx_crypto rtc_generic xt_conntrack nf_nat_ftp nf_conntrack_ftp nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv6 nf_defrag_ipv6 nbd ipt_REJECT nf_reject_ipv4 ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables ip6table_filter ip6_tables ebtables x_tables br_netfilter
> > >> [ 7879.893797]  bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 ses enclosure scsi_transport_sas ibmvscsis target_core_mod configfs ibmvscsi ibmveth(O) qla2xxx scsi_transport_fc ipr
> > >> [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O    4.8.17-customv2.22 #12
> > >> [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
> > >> [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
> > >> [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G           O     (4.8.17-customv2.22)
> > >> [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
> > >> [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
> > >>                  GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
> > >>                  GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
> > >>                  GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
> > >>                  GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
> > >>                  GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
> > >>                  GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
> > >>                  GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
> > >>                  GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
> > >> [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
> > >> [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
> > >> [ 7879.893842] Call Trace:
> > >> [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
> > >> [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
> > >> [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
> > >> [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
> > >> [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
> > >> [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
> > >> [ 7879.893856] Instruction dump:
> > >> [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
> > >> [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
> > >> [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---
> > >> [ 7912.992583] sd 6:0:0:0: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > >> [ 7912.992587] sd 6:0:0:0: [sdg] tag#0 Sense Key : Illegal Request [current]
> > >> [ 7912.992589] sd 6:0:0:0: [sdg] tag#0 Add. Sense: Logical unit not supported
> > >> [ 7912.992591] sd 6:0:0:0: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 80 00
> > >> [ 7912.992593] blk_update_request: I/O error, dev sdg, sector 0
> > >>
> > > The kref_get() on struct config_item->ci_kref in config_item_get() is
> > > triggering a sanity check warning struct kref has reached zero, while a
> > > new kref_get() is attempting to take a new reference.
> > >
> > > Normally it means kref_get() should be using kref_get_unless_zero(), but
> > > since this has never been reported before, it makes me think something
> > > else outside of configfs proper is going on..
> > >
> > > What configfs operations are triggering this warning, and what do the
> > > configfs operations preceding it look like..?
> > >
> > > Also, is this specific to PSCSI export or does it trigger independent of
> > > target-core backend driver..?
> > 
> > They were attaching a iSCSI volume, so basically we do a discover/login and then targetcli
> > is called to do the mappings. Also between the attach there were a few ls queries.
> > All this is done via Novalink API(Virtualization management).
> > 
> > So the only configfs operations are through the mapping (backstores/block create) and the ls queries.
> > 
> 
> Ok, after doing further investigation in fs/configfs, it looks like the
> config_item_get() triggering WARN_ON() here is most likely coming from
> configfs_symlink() -> create_link().
> 
> So assuming this is the code-path in question, it looks like a backend
> device is being released via:
> 
>    rmdir /sys/kernel/config/target/core/iblock_XXXX/$FOO
> 
> at the same time another process is attempting to symlink the same
> configfs location via target_fabric_port_link() to a ibmvscsis port via:
> 
>    ln -s /sys/kernel/config/target/core/iblock_XXX/$FOO \
>          /sys/kernel/config/target/ibmvscsis/$WWPN/$TPGT/lun/lun_XXX/$LINK
> 
> So let's first confirm this is in fact what's triggering the kref_get()
> warning.  With CONFIG_DYNAMIC_DEBUG=y, and enabling full debug with:
> 
>    echo 'module target_core_mod +p' > /debug/dynamic_debug/control
> 
> should do the trick.
> 
> There is enough debug in target-core-mod already to
> verify configfs_symlink() -> type->ct_item_ops->allow_link()
> target_fabric_port_link() -> core_dev_add_lun() is what's getting
> invoked, which happens immediately before create_link() ->
> config_item_get() -> kref_get() gets called and hits the WARN_ON.
> 
> That said, I do see a race in fs/configfs/symlink.c:create_link() that
> might explain what you've observed..
> 
> Looking at create_link(), the config_item_get() -> kref_get() for the
> struct config_item (eg: se_dev_ci from target_fabric_port_link) symlink
> target is attempted before the check for 
> target_sd->s_type & CONFIGFS_USET_DROPPING occurs, eg:
> 
>         sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
>         if (sl) {
>                 sl->sl_target = config_item_get(item);
>                 spin_lock(&configfs_dirent_lock);
>                 if (target_sd->s_type & CONFIGFS_USET_DROPPING) {
>                         spin_unlock(&configfs_dirent_lock);
>                         config_item_put(item);
>                         kfree(sl);
>                         return -ENOENT;
>                 }
>                 list_add(&sl->sl_list, &target_sd->s_links);
>                 spin_unlock(&configfs_dirent_lock);
>                 ret = configfs_create_link(sl, parent_item->ci_dentry,
>                                            dentry);
> 
> which means configfs_rmdir() -> configfs_detach_prep() setting
> CONFIGFS_USET_DROPPING and subsequently doing the final:
> 
>         /* Drop our reference from above */
>         config_item_put(item);
> 
> could happen while config_item_get() is called from create_link(),
> before the CONFIGFS_USET_DROPPING check occurs..
> 
> Here's a quick (untested) patch to close that potential race.
> 
> Please confirm with and without the following with debug enabled, and
> let's see what we find.
> 
> diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
> index a6ab012..c8aabba 100644
> --- a/fs/configfs/symlink.c
> +++ b/fs/configfs/symlink.c
> @@ -83,14 +83,13 @@ static int create_link(struct config_item *parent_item,
>         ret = -ENOMEM;
>         sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
>         if (sl) {
> -               sl->sl_target = config_item_get(item);
>                 spin_lock(&configfs_dirent_lock);
>                 if (target_sd->s_type & CONFIGFS_USET_DROPPING) {
>                         spin_unlock(&configfs_dirent_lock);
> -                       config_item_put(item);
>                         kfree(sl);
>                         return -ENOENT;
>                 }
> +               sl->sl_target = config_item_get(item);
>                 list_add(&sl->sl_list, &target_sd->s_links);
>                 spin_unlock(&configfs_dirent_lock);
>                 ret = configfs_create_link(sl, parent_item->ci_dentry,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bryant G. Ly June 1, 2017, 4:49 p.m. UTC | #2
> Hey Bryant,
>
> After spending more time reviewing fs/configfs code today, I've
> convenienced myself this is a long standing configfs race between
> config_group shutdown and symlink creation.
>
> Have you been able to verify the particular scenario is in fact what's
> happening..?

Hi Nick,

I have been testing, and trying to free up resources.

We only ran into the issue when we had 10 hosts + 1k + vms.

SO far no issue, ill let you know if we see anything.

-Bryant

> On Tue, 2017-05-30 at 23:26 -0700, Nicholas A. Bellinger wrote:
>> On Fri, 2017-05-26 at 09:51 -0500, Bryant G. Ly wrote:
>>> On 5/26/17 1:51 AM, Nicholas A. Bellinger wrote:
>>>
>>>> Hey Bryant,
>>>>
>>>> On Wed, 2017-05-24 at 13:34 -0500, Bryant G. Ly wrote:
>>>>> Hi Nick,
>>>>>
>>>>> We are seeing the below kernel crash with a block backstore.
>>>>>
>>>>> The steps to reproduce are: Discovered a new iSCSI volume, created a new ibmvscsis instance, mapped it.
>>>>>
>>>> Mmmm, based upon the INQUIRY string this looks like a PSCSI backstore,
>>>> yes..?
>>> No, its a block backstore, we dont support PSCSI via our management console (Novalink).
>>> I don't know why the inquiry string would show it as a PSCSI backstore.
>>>
>> Ah, was getting confused by the initiator INQUIRY strings..
>>
>>>> So a iscsi initiator provided struct scsi_device is registered
>>>> with /sys/kernel/config/target/core/pscsi_*, and then mapped to a
>>>> ibmvscsis lun, right..?
>>>>
>>>> Btw looking at 4.8.17, nothing has changed since Jan
>>>> in /drivers/target/, and certainly nothing related to configfs operation
>>>> well before that..
>>>>
>>>> I assume this not a new regression, right..?
>>> We just started testing with iSCSI, so we did not see it before.
>>>
>>>>> [ 7872.359650] scsi host6: iSCSI Initiator over TCP/IP
>>>>> [ 7872.621777] scsi 6:0:0:0: Direct-Access     IBM      2145             0000 PQ: 0 ANSI: 6
>>>>> [ 7872.665105] sd 6:0:0:0: Attached scsi generic sg14 type 0
>>>>> [ 7872.665600] sd 6:0:0:0: [sdg] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
>>>>> [ 7872.666986] sd 6:0:0:0: [sdg] Write Protect is off
>>>>> [ 7872.666990] sd 6:0:0:0: [sdg] Mode Sense: 97 00 10 08
>>>>> [ 7872.667691] sd 6:0:0:0: [sdg] Write cache: disabled, read cache: enabled, supports DPO and FUA
>>>>> [ 7872.680928] sd 6:0:0:0: [sdg] Attached SCSI disk
>>>>> [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
>>>>> [ 7879.893760] ------------[ cut here ]------------
>>>>> [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
>>>>> [ 7879.893768] Modules linked in: target_core_pscsi target_core_file target_core_iblock rpadlpar_io rpaphp ip6table_raw xt_CT xt_mac xt_tcpudp xt_comment xt_physdev xt_set ip_set_hash_net ip_set iscsi_target_mod iptable_raw dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag openvswitch nf_nat_ipv6 libcrc32c target_core_user uio binfmt_misc pseries_rng ibmvmc(O) vmx_crypto rtc_generic xt_conntrack nf_nat_ftp nf_conntrack_ftp nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv6 nf_defrag_ipv6 nbd ipt_REJECT nf_reject_ipv4 ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables ip6table_filter ip6_tables ebtables x_tables br_netfilter
>>>>> [ 7879.893797]  bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 ses enclosure scsi_transport_sas ibmvscsis target_core_mod configfs ibmvscsi ibmveth(O) qla2xxx scsi_transport_fc ipr
>>>>> [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O    4.8.17-customv2.22 #12
>>>>> [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
>>>>> [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
>>>>> [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G           O     (4.8.17-customv2.22)
>>>>> [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
>>>>> [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
>>>>>                   GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
>>>>>                   GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
>>>>>                   GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
>>>>>                   GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
>>>>>                   GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
>>>>>                   GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
>>>>>                   GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
>>>>>                   GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
>>>>> [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
>>>>> [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
>>>>> [ 7879.893842] Call Trace:
>>>>> [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
>>>>> [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
>>>>> [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
>>>>> [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
>>>>> [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
>>>>> [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
>>>>> [ 7879.893856] Instruction dump:
>>>>> [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
>>>>> [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
>>>>> [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---
>>>>> [ 7912.992583] sd 6:0:0:0: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>>>>> [ 7912.992587] sd 6:0:0:0: [sdg] tag#0 Sense Key : Illegal Request [current]
>>>>> [ 7912.992589] sd 6:0:0:0: [sdg] tag#0 Add. Sense: Logical unit not supported
>>>>> [ 7912.992591] sd 6:0:0:0: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 80 00
>>>>> [ 7912.992593] blk_update_request: I/O error, dev sdg, sector 0
>>>>>
>>>> The kref_get() on struct config_item->ci_kref in config_item_get() is
>>>> triggering a sanity check warning struct kref has reached zero, while a
>>>> new kref_get() is attempting to take a new reference.
>>>>
>>>> Normally it means kref_get() should be using kref_get_unless_zero(), but
>>>> since this has never been reported before, it makes me think something
>>>> else outside of configfs proper is going on..
>>>>
>>>> What configfs operations are triggering this warning, and what do the
>>>> configfs operations preceding it look like..?
>>>>
>>>> Also, is this specific to PSCSI export or does it trigger independent of
>>>> target-core backend driver..?
>>> They were attaching a iSCSI volume, so basically we do a discover/login and then targetcli
>>> is called to do the mappings. Also between the attach there were a few ls queries.
>>> All this is done via Novalink API(Virtualization management).
>>>
>>> So the only configfs operations are through the mapping (backstores/block create) and the ls queries.
>>>
>> Ok, after doing further investigation in fs/configfs, it looks like the
>> config_item_get() triggering WARN_ON() here is most likely coming from
>> configfs_symlink() -> create_link().
>>
>> So assuming this is the code-path in question, it looks like a backend
>> device is being released via:
>>
>>     rmdir /sys/kernel/config/target/core/iblock_XXXX/$FOO
>>
>> at the same time another process is attempting to symlink the same
>> configfs location via target_fabric_port_link() to a ibmvscsis port via:
>>
>>     ln -s /sys/kernel/config/target/core/iblock_XXX/$FOO \
>>           /sys/kernel/config/target/ibmvscsis/$WWPN/$TPGT/lun/lun_XXX/$LINK
>>
>> So let's first confirm this is in fact what's triggering the kref_get()
>> warning.  With CONFIG_DYNAMIC_DEBUG=y, and enabling full debug with:
>>
>>     echo 'module target_core_mod +p' > /debug/dynamic_debug/control
>>
>> should do the trick.
>>
>> There is enough debug in target-core-mod already to
>> verify configfs_symlink() -> type->ct_item_ops->allow_link()
>> target_fabric_port_link() -> core_dev_add_lun() is what's getting
>> invoked, which happens immediately before create_link() ->
>> config_item_get() -> kref_get() gets called and hits the WARN_ON.
>>
>> That said, I do see a race in fs/configfs/symlink.c:create_link() that
>> might explain what you've observed..
>>
>> Looking at create_link(), the config_item_get() -> kref_get() for the
>> struct config_item (eg: se_dev_ci from target_fabric_port_link) symlink
>> target is attempted before the check for
>> target_sd->s_type & CONFIGFS_USET_DROPPING occurs, eg:
>>
>>          sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
>>          if (sl) {
>>                  sl->sl_target = config_item_get(item);
>>                  spin_lock(&configfs_dirent_lock);
>>                  if (target_sd->s_type & CONFIGFS_USET_DROPPING) {
>>                          spin_unlock(&configfs_dirent_lock);
>>                          config_item_put(item);
>>                          kfree(sl);
>>                          return -ENOENT;
>>                  }
>>                  list_add(&sl->sl_list, &target_sd->s_links);
>>                  spin_unlock(&configfs_dirent_lock);
>>                  ret = configfs_create_link(sl, parent_item->ci_dentry,
>>                                             dentry);
>>
>> which means configfs_rmdir() -> configfs_detach_prep() setting
>> CONFIGFS_USET_DROPPING and subsequently doing the final:
>>
>>          /* Drop our reference from above */
>>          config_item_put(item);
>>
>> could happen while config_item_get() is called from create_link(),
>> before the CONFIGFS_USET_DROPPING check occurs..
>>
>> Here's a quick (untested) patch to close that potential race.
>>
>> Please confirm with and without the following with debug enabled, and
>> let's see what we find.
>>
>> diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
>> index a6ab012..c8aabba 100644
>> --- a/fs/configfs/symlink.c
>> +++ b/fs/configfs/symlink.c
>> @@ -83,14 +83,13 @@ static int create_link(struct config_item *parent_item,
>>          ret = -ENOMEM;
>>          sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
>>          if (sl) {
>> -               sl->sl_target = config_item_get(item);
>>                  spin_lock(&configfs_dirent_lock);
>>                  if (target_sd->s_type & CONFIGFS_USET_DROPPING) {
>>                          spin_unlock(&configfs_dirent_lock);
>> -                       config_item_put(item);
>>                          kfree(sl);
>>                          return -ENOENT;
>>                  }
>> +               sl->sl_target = config_item_get(item);
>>                  list_add(&sl->sl_list, &target_sd->s_links);
>>                  spin_unlock(&configfs_dirent_lock);
>>                  ret = configfs_create_link(sl, parent_item->ci_dentry,
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe target-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bryant G. Ly June 5, 2017, 4:18 p.m. UTC | #3
>
>> Hey Bryant,
>>
>> After spending more time reviewing fs/configfs code today, I've
>> convenienced myself this is a long standing configfs race between
>> config_group shutdown and symlink creation.
>>
>> Have you been able to verify the particular scenario is in fact what's
>> happening..?
>
> Hi Nick,
>
> I have been testing, and trying to free up resources.
>
> We only ran into the issue when we had 10 hosts + 1k + vms.
>
> SO far no issue, ill let you know if we see anything.
>
> -Bryant

We are confident in saying that this fixes our issue now in a GPFS cluster environment.

Thanks,

Bryant.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index a6ab012..c8aabba 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -83,14 +83,13 @@  static int create_link(struct config_item *parent_item,
        ret = -ENOMEM;
        sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
        if (sl) {
-               sl->sl_target = config_item_get(item);
                spin_lock(&configfs_dirent_lock);
                if (target_sd->s_type & CONFIGFS_USET_DROPPING) {
                        spin_unlock(&configfs_dirent_lock);
-                       config_item_put(item);
                        kfree(sl);
                        return -ENOENT;
                }
+               sl->sl_target = config_item_get(item);
                list_add(&sl->sl_list, &target_sd->s_links);
                spin_unlock(&configfs_dirent_lock);
                ret = configfs_create_link(sl, parent_item->ci_dentry,