[505/622] lustre: obdclass: lu_tgt_descs cleanup

Message ID	1582838290-17243-506-git-send-email-jsimmons@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=hXa/=4P=lists.lustre.org=lustre-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0149D24690 From: James Simmons <jsimmons@infradead.org> To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de> Date: Thu, 27 Feb 2020 16:16:13 -0500 Message-Id: <1582838290-17243-506-git-send-email-jsimmons@infradead.org> In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 505/622] lustre: obdclass: lu_tgt_descs cleanup Precedence: list Cc: Lai Siyao <lai.siyao@whamcloud.com>, Lustre Development List <lustre-devel@lists.lustre.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>
Series	lustre: sync closely to 2.13.52 \| expand [000/622] lustre: sync closely to 2.13.52 [001/622] lustre: always enable special debugging, fhandles, and quota support. [002/622] lustre: osc_cache: remove __might_sleep() [003/622] lustre: uapi: remove enum hsm_progress_states [004/622] lustre: uapi: sync enum obd_statfs_state [005/622] lustre: llite: return compatible fsid for statfs [006/622] lustre: ldlm: Make kvzalloc \| kvfree use consistent [007/622] lustre: llite: limit smallest max_cached_mb value [008/622] lustre: obdecho: turn on async flag only for mode 3 [009/622] lustre: llite: reorganize variable and data structures [010/622] lustre: llite: increase whole-file readahead to RPC size [011/622] lustre: llite: handle ORPHAN/DEAD directories [012/622] lustre: lov: protected ost pool count updation [013/622] lustre: obdclass: fix llog_cat_cleanup() usage on Client [014/622] lustre: mdc: fix possible NULL pointer dereference [015/622] lustre: obdclass: allow specifying complex jobids [016/622] lustre: ldlm: don't disable softirq for exp_rpc_lock [017/622] lustre: obdclass: new wrapper to convert NID to string [018/622] lustre: ptlrpc: Add QoS for uid and gid in NRS-TBF [019/622] lustre: hsm: ignore compound_id [020/622] lnet: libcfs: remove unnecessary set_fs(KERNEL_DS) [021/622] lustre: ptlrpc: ptlrpc_register_bulk() LBUG on ENOMEM [022/622] lustre: llite: yield cpu after call to ll_agl_trigger [023/622] lustre: osc: Do not request more than 2GiB grant [024/622] lustre: llite: rename FSFILT_IOC_* to system flags [025/622] lnet: fix nid range format '@<net>' support [026/622] lustre: ptlrpc: fix test_req_buffer_pressure behavior [027/622] lustre: lu_object: improve debug message for lu_object_put() [028/622] lustre: idl: remove obsolete directory split flags [029/622] lustre: mdc: resend quotactl if needed [030/622] lustre: obd: create ping sysfs file [031/622] lustre: ldlm: change LDLM_POOL_ADD_VAR macro to inline function [032/622] lustre: obdecho: use vmalloc for lnb [033/622] lustre: mdc: deny layout swap for DoM file [034/622] lustre: mgc: remove obsolete IR swabbing workaround [035/622] lustre: ptlrpc: add dir migration connect flag [036/622] lustre: mds: remove obsolete MDS_VTX_BYPASS flag [037/622] lustre: ldlm: expose dirty age limit for flush-on-glimpse [038/622] lustre: ldlm: IBITS lock convert instead of cancel [039/622] lustre: ptlrpc: fix return type of boolean functions [040/622] lustre: llite: decrease sa_running if fail to start statahead [041/622] lustre: lmv: dir page is released while in use [042/622] lustre: ldlm: speed up preparation for list of lock cancel [043/622] lustre: checksum: enable/disable checksum correctly [044/622] lustre: build: armv7 client build fixes [045/622] lustre: ldlm: fix l_last_activity usage [046/622] lustre: ptlrpc: Add WBC connect flag [047/622] lustre: llog: remove obsolete llog handlers [048/622] lustre: ldlm: fix for l_lru usage [049/622] lustre: lov: Move lov_tgts_kobj init to lov_setup [050/622] lustre: osc: add T10PI support for RPC checksum [051/622] lustre: ldlm: Reduce debug to console during eviction [052/622] lustre: ptlrpc: idle connections can disconnect [053/622] lustre: osc: truncate does not update blocks count on client [054/622] lustre: ptlrpc: add LOCK_CONVERT connection flag [055/622] lustre: ldlm: handle lock converts in cancel handler [056/622] lustre: ptlrpc: Serialize procfs access to scp_hist_reqs using mutex [057/622] lustre: ldlm: don't add canceling lock back to LRU [058/622] lustre: quota: add default quota setting support [059/622] lustre: ptlrpc: don't zero request handle [060/622] lnet: ko2iblnd: determine gaps correctly [061/622] lustre: osc: increase default max_dirty_mb to 2G [062/622] lustre: ptlrpc: remove obsolete OBD RPC opcodes [063/622] lustre: ptlrpc: assign specific values to MGS opcodes [064/622] lustre: ptlrpc: remove obsolete LLOG_ORIGIN_ RPCs [065/622] lustre: osc: fix idle_timeout handling [066/622] lustre: ptlrpc: ASSERTION(!list_empty(imp->imp_replay_cursor)) [067/622] lustre: obd: keep dirty_max_pages a round number of MB [068/622] lustre: osc: depart grant shrinking from pinger [069/622] lustre: mdt: Lazy size on MDT [070/622] lustre: lfsck: layout LFSCK for mirrored file [071/622] lustre: mdt: read on open for DoM files [072/622] lustre: migrate: pack lmv ea in migrate rpc [073/622] lustre: hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array [074/622] lustre: llite: handle zero length xattr values correctly [075/622] lnet: refactor lnet_select_pathway() [076/622] lnet: add health value per ni [077/622] lnet: add lnet_health_sensitivity [078/622] lnet: add monitor thread [079/622] lnet: handle local ni failure [080/622] lnet: handle o2iblnd tx failure [081/622] lnet: handle socklnd tx failure [082/622] lnet: handle remote errors in LNet [083/622] lnet: add retry count [084/622] lnet: calculate the lnd timeout [085/622] lnet: sysfs functions for module params [086/622] lnet: timeout delayed REPLYs and ACKs [087/622] lnet: remove duplicate timeout mechanism [088/622] lnet: handle fatal device error [089/622] lnet: reset health value [090/622] lnet: add health statistics [091/622] lnet: Add ioctl to get health stats [092/622] lnet: remove obsolete health functions [093/622] lnet: set health value from user space [094/622] lnet: add global health statistics [095/622] lnet: print recovery queues content [096/622] lnet: health error simulation [097/622] lustre: ptlrpc: replace simple_strtol with kstrtol [098/622] lustre: obd: use correct ip_compute_csum() version [099/622] lustre: osc: serialize access to idle_timeout vs cleanup [100/622] lustre: mdc: remove obsolete intent opcodes [101/622] lustre: llite: fix setstripe for specific osts upon dir [102/622] lustre: osc: enable/disable OSC grant shrink [103/622] lustre: protocol: MDT as a statfs proxy [104/622] lustre: ldlm: correct logic in ldlm_prepare_lru_list() [105/622] lustre: llite: check truncate race for DOM pages [106/622] lnet: lnd: conditionally set health status [107/622] lnet: router handling [108/622] lustre: obd: check '-o network' and peer discovery conflict [109/622] lnet: update logging [110/622] lustre: ldlm: don't cancel DoM locks before replay [111/622] lnet: lnd: Clean up logging [112/622] lustre: mdt: revoke lease lock for truncate [113/622] lustre: ptlrpc: race in AT early reply [114/622] lustre: migrate: migrate striped directory [115/622] lustre: obdclass: remove unused ll_import_cachep [116/622] lustre: ptlrpc: add debugging for idle connections [117/622] lustre: obdclass: Add lbug_on_eviction option [118/622] lustre: lmv: support accessing migrating directory [119/622] lustre: mdc: move RPC semaphore code to lustre/osp [120/622] lnet: libcfs: fix wrong check in libcfs_debug_vmsg2() [121/622] lustre: ptlrpc: new request vs disconnect race [122/622] lustre: misc: name open file handles as such [123/622] lustre: ldlm: cleanup LVB handling [124/622] lustre: ldlm: pass preallocated env to methods [125/622] lustre: osc: move obdo_cache to OSC code [126/622] lustre: llite: zero lum for stripeless files [127/622] lustre: idl: remove obsolete RPC flags [128/622] lustre: flr: add 'nosync' flag for FLR mirrors [129/622] lustre: llite: create checksums to replace checksum_pages [130/622] lustre: ptlrpc: don't change buffer when signature is ready [131/622] lustre: ldlm: update l_blocking_lock under lock [132/622] lustre: mgc: don't proccess cld during stopping [133/622] lustre: obdclass: make mod rpc slot wait queue FIFO [134/622] lustre: mdc: use old statfs format [135/622] lnet: Fix selftest backward compatibility post health [136/622] lustre: osc: clarify short_io_bytes is maximum value [137/622] lustre: ptlrpc: Make CPU binding switchable [138/622] lustre: misc: quiet console messages at startup [139/622] lustre: ldlm: don't apply ELC to converting and DOM locks [140/622] lustre: class: use INIT_LIST_HEAD_RCU instead INIT_LIST_HEAD [141/622] lustre: uapi: add new changerec_type [142/622] lustre: ldlm: check double grant race after resource change [143/622] lustre: mdc: grow lvb buffer to hold layout [144/622] lustre: osc: re-check target versus available grant [145/622] lnet: unlink md if fail to send recovery [146/622] lustre: obd: use correct names for conn_uuid [147/622] lustre: idl: use proper ATTR/MDS_ATTR/MDS_OPEN flags [148/622] lustre: llite: optimize read on open pages [149/622] lnet: set the health status correctly [150/622] lustre: lov: add debugging info for statfs [151/622] lnet: Decrement health on timeout [152/622] lustre: quota: fix setattr project check [153/622] lnet: socklnd: dynamically set LND parameters [154/622] lustre: flr: add mirror write command [155/622] lnet: properly error check sensitivity [156/622] lustre: llite: add lock for dir layout data [157/622] lnet: configure recovery interval [158/622] lustre: osc: Do not walk full extent list [159/622] lnet: separate ni state from recovery [160/622] lustre: mdc: move empty xattr handling to mdc layer [161/622] lustre: obd: remove portals handle from OBD import [162/622] lustre: mgc: restore mgc binding for sptlrpc [163/622] lnet: peer deletion code may hide error [164/622] lustre: hsm: make changelog flag argument an enum [165/622] lustre: ldlm: don't skip bl_ast for local lock [166/622] lustre: clio: use pagevec_release for many pages [167/622] lustre: lmv: allocate fid on parent MDT in migrate [168/622] lustre: ptlrpc: Do not map unrecognized ELDLM errnos to EIO [169/622] lustre: llite: protect reading inode->i_data.nrpages [170/622] lustre: mdt: fix read-on-open for big PAGE_SIZE [171/622] lustre: llite: handle -ENODATA in ll_layout_fetch() [172/622] lustre: hsm: increase upper limit of maximum HSM backends registered with MDT [173/622] lustre: osc: wrong page offset for T10PI checksum [174/622] lnet: increase lnet transaction timeout [175/622] lnet: handle multi-md usage [176/622] lustre: uapi: fix warnings when lustre_user.h included [177/622] lustre: obdclass: lu_dirent record length missing '0' [178/622] lustre: update version to 2.11.99 [179/622] lustre: osc: limit chunk number of write submit [180/622] lustre: osc: speed up page cache cleanup during blocking ASTs [181/622] lustre: lmv: Fix style issues for lmv_fld.c [182/622] lustre: llite: Fix style issues for llite_nfs.c [183/622] lustre: llite: Fix style issues for lcommon_misc.c [184/622] lustre: llite: Fix style issues for symlink.c [185/622] lustre: headers: define pct(a, b) once [186/622] lustre: obdclass: report all obd states for OBD_IOC_GETDEVICE [187/622] lustre: ldlm: remove trace from ldlm_pool_count() [188/622] lustre: ptlrpc: clean up rq_interpret_reply callbacks [189/622] lustre: lov: quiet lov_dump_lmm_ console messages [190/622] lustre: lov: cl_cache could miss initialize [191/622] lnet: socklnd: improve scheduling algorithm [192/622] lustre: ldlm: Adjust search_* functions [193/622] lustre: sysfs: make ping sysfs file read and writable [194/622] lustre: ptlrpc: connect vs import invalidate race [195/622] lustre: ptlrpc: always unregister bulk [196/622] lustre: sptlrpc: split sptlrpc_process_config() [197/622] lustre: cfg: reserve flags for SELinux status checking [198/622] lustre: llite: remove cl_file_inode_init() LASSERT [199/622] lnet: add fault injection for bulk transfers [200/622] lnet: remove .nf_min_max handling [201/622] lustre: sec: create new function sptlrpc_get_sepol() [202/622] lustre: clio: fix incorrect invariant in cl_io_iter_fini() [203/622] lustre: mdc: Improve xattr buffer allocations [204/622] lnet: libcfs: allow file/func/line passed to CDEBUG() [205/622] lustre: llog: add startcat for wrapped catalog [206/622] lustre: llog: add synchronization for the last record [207/622] lustre: ptlrpc: improve memory allocation for service RPCs [208/622] lustre: llite: enable flock mount option by default [209/622] lustre: lmv: avoid gratuitous 64-bit modulus [210/622] lustre: Ensure crc-t10pi is enabled. [211/622] lustre: lov: fix lov_iocontrol for inactive OST case [212/622] lustre: llite: Initialize cl_dirty_max_pages [213/622] lustre: mdc: don't use ACL at setattr [214/622] lnet: o2iblnd: ibc_rxs is created and freed with different size [215/622] lustre: osc: reduce atomic ops in osc_enter_cache_try [216/622] lustre: llite: ll_fault should fail for insane file offsets [217/622] lustre: ptlrpc: reset generation for old requests [218/622] lustre: osc: check if opg is in lru list without locking [219/622] lnet: use right rtr address [220/622] lnet: use right address for routing message [221/622] lustre: lov: avoid signed vs. unsigned comparison [222/622] lustre: obd: use ldo_process_config for mdc and osc layer [223/622] lnet: check for asymmetrical route messages [224/622] lustre: llite: Lock inode on tiny write if setuid/setgid set [225/622] lustre: llite: make sure name pack atomic [226/622] lustre: ptlrpc: handle proper import states for recovery [227/622] lustre: ldlm: don't convert wrong resource [228/622] lustre: llite: limit statfs ffree if less than OST ffree [229/622] lustre: mdc: prevent glimpse lock count grow [230/622] lustre: dne: performance improvement for file creation [231/622] lustre: mdc: return DOM size on open resend [232/622] lustre: llite: optimizations for not granted lock processing [233/622] lustre: osc: propagate grant shrink interval immediately [234/622] lustre: osc: grant shrink shouldn't account skipped OSC [235/622] lustre: quota: protect quota flags at OSC [236/622] lustre: osc: pass client page size during reconnect too [237/622] lustre: ptlrpc: Change static defines to use macro for sec_gc.c [238/622] lnet: libcfs: do not calculate debug_mb if it is set [239/622] lustre: ldlm: Lost lease lock on migrate error [240/622] lnet: lnd: increase CQ entries [241/622] lustre: security: return security context for metadata ops [242/622] lustre: grant: prevent overflow of o_undirty [243/622] lustre: ptlrpc: manage SELinux policy info at connect time [244/622] lustre: ptlrpc: manage SELinux policy info for metadata ops [245/622] lustre: obd: make health_check sysfs compliant [246/622] lustre: misc: delete OBD_IOC_PING_TARGET ioctl [247/622] lustre: misc: remove LIBCFS_IOC_DEBUG_MASK ioctl [248/622] lustre: llite: add file heat support [249/622] lustre: obdclass: improve llog config record message [250/622] lustre: lov: remove KEY_CACHE_SET to simplify the code [251/622] lustre: ldlm: Fix style issues for ldlm_lockd.c [252/622] lustre: ldlm: Fix style issues for ldlm_request.c [253/622] lustre: ptlrpc: Fix style issues for sec_bulk.c [254/622] lustre: ldlm: Fix style issues for ptlrpcd.c [255/622] lustre: ptlrpc: IR doesn't reconnect after EAGAIN [256/622] lustre: llite: ll_fault fixes [257/622] lustre: lsom: Add an OBD_CONNECT2_LSOM connect flag [258/622] lustre: pcc: Reserve a new connection flag for PCC [259/622] lustre: uapi: reserve connect flag for plain layout [260/622] lustre: ptlrpc: allow stopping threads above threads_max [261/622] lnet: Avoid lnet debugfs read/write if ctl_table does not exist [262/622] lnet: lnd: bring back concurrent_sends [263/622] lnet: properly cleanup lnet debugfs files [264/622] lustre: mdc: reset lmm->lmm_stripe_offset in mdc_save_lovea [265/622] lnet: Cleanup lnet_get_rtr_pool_cfg [266/622] lustre: quota: make overquota flag for old req [267/622] lustre: osd: Set max ea size to XATTR_SIZE_MAX [268/622] lustre: lov: Remove unnecessary assert [269/622] lnet: o2iblnd: kib_conn leak [270/622] lustre: llite: switch to use ll_fsname directly [271/622] lustre: llite: improve max_readahead console messages [272/622] lustre: llite: fill copied dentry name's ending char properly [273/622] lustre: obd: update udev event handling [274/622] lustre: ptlrpc: Bulk assertion fails on -ENOMEM [275/622] lustre: obd: Add overstriping CONNECT flag [276/622] lustre: llite, readahead: fix to call ll_ras_enter() properly [277/622] lustre: ptlrpc: ASSERTION (req_transno < next_transno) failed [278/622] lustre: lov: new foreign LOV format [279/622] lustre: lmv: new foreign LMV format [280/622] lustre: obd: replace class_uuid with linux kernel version. [281/622] lustre: ptlrpc: Fix style issues for sec_null.c [282/622] lustre: ptlrpc: Fix style issues for service.c [283/622] lustre: uapi: fix file heat support [284/622] lnet: libcfs: poll fail_loc in cfs_fail_timeout_set() [285/622] lustre: obd: round values to nearest MiB for _mb syfs files [286/622] lustre: osc: don't check capability for every page [287/622] lustre: statahead: sa_handle_callback get lli_sa_lock earlier [288/622] lnet: use number of wrs to calculate CQEs [289/622] lustre: ldlm: Fix style issues for ldlm_resource.c [290/622] lustre: ptlrpc: Fix style issues for sec_gc.c [291/622] lustre: ptlrpc: Fix style issues for llog_client.c [292/622] lustre: dne: allow access to striped dir with broken layout [293/622] lustre: ptlrpc: ocd_connect_flags are wrong during reconnect [294/622] lnet: libcfs: fix panic for too large cpu partitions [295/622] lustre: obdclass: put all service's env on the list [296/622] lustre: mdt: fix mdt_dom_discard_data() timeouts [297/622] lustre: lov: Add overstriping support [298/622] lustre: rpc: support maximum 64MB I/O RPC [299/622] lustre: dom: per-resource ELC for WRITE lock enqueue [300/622] lustre: dom: mdc_lock_flush() improvement [301/622] lnet: Fix NI status in debugfs for loopback ni [302/622] lustre: ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro [303/622] lustre: llite: Revalidate dentries in ll_intent_file_open [304/622] lustre: llite: hash just created files if lock allows [305/622] lnet: adds checking msg len [306/622] lustre: dne: add new dir hash type "space" [307/622] lustre: uapi: Add nonrotational flag to statfs [308/622] lnet: libcfs: crashes with certain cpu part numbers [309/622] lustre: lov: fix wrong calculated length for fiemap [310/622] lustre: obdclass: remove unprotected access to lu_object [311/622] lustre: push rcu_barrier() before destroying slab [312/622] lustre: ptlrpc: intent_getattr fetches default LMV [313/622] lustre: mdc: add async statfs [314/622] lustre: lmv: mkdir with balanced space usage [315/622] lustre: llite: check correct size in ll_dom_finish_open() [316/622] lnet: recovery event handling broken [317/622] lnet: clean mt_eqh properly [318/622] lnet: handle remote health error [319/622] lnet: setup health timeout defaults [320/622] lnet: fix cpt locking [321/622] lnet: detach response tracker [322/622] lnet: invalidate recovery ping mdh [323/622] lnet: fix list corruption [324/622] lnet: correct discovery LNetEQFree() [325/622] lnet: Protect lp_dc_pendq manipulation with lp_lock [326/622] lnet: Ensure md is detached when msg is not committed [327/622] lnet: verify msg is commited for send/recv [328/622] lnet: select LO interface for sending [329/622] lnet: remove route add restriction [330/622] lnet: Discover routers on first use [331/622] lnet: use peer for gateway [332/622] lnet: lnet_add/del_route() [333/622] lnet: Do not allow deleting of router nis [334/622] lnet: router sensitivity [335/622] lnet: cache ni status [336/622] lnet: Cache the routing feature [337/622] lnet: peer aliveness [338/622] lnet: router aliveness [339/622] lnet: simplify lnet_handle_local_failure() [340/622] lnet: Cleanup rcd [341/622] lnet: modify lnd notification mechanism [342/622] lnet: use discovery for routing [343/622] lnet: MR aware gateway selection [344/622] lnet: consider alive_router_check_interval [345/622] lnet: allow deleting router primary_nid [346/622] lnet: transfer routers [347/622] lnet: handle health for incoming messages [348/622] lnet: misleading discovery seqno. [349/622] lnet: drop all rule [350/622] lnet: handle discovery off [351/622] lnet: handle router health off [352/622] lnet: push router interface updates [353/622] lnet: net aliveness [354/622] lnet: discover each gateway Net [355/622] lnet: look up MR peers routes [356/622] lnet: check peer timeout on a router [357/622] lustre: lmv: reuse object alloc QoS code from LOD [358/622] lustre: llite: Add persistent cache on client [359/622] lustre: pcc: Non-blocking PCC caching [360/622] lustre: pcc: security and permission for non-root user access [361/622] lustre: llite: Rule based auto PCC caching when create files [362/622] lustre: pcc: auto attach during open for valid cache [363/622] lustre: pcc: change detach behavior and add keep option [364/622] lustre: lov: return error if cl_env_get fails [365/622] lustre: ptlrpc: Add more flags to DEBUG_REQ_FLAGS macro [366/622] lustre: ldlm: layout lock fixes [367/622] lnet: Do not allow gateways on remote nets [368/622] lustre: osc: reduce lock contention in osc_unreserve_grant [369/622] lnet: Change static defines to use macro for module.c [370/622] lustre: llite, readahead: don't always use max RPC size [371/622] lustre: llite: improve single-thread read performance [372/622] lustre: obdclass: allow per-session jobids. [373/622] lustre: llite: fix deadloop with tiny write [374/622] lnet: prevent loop in LNetPrimaryNID() [375/622] lustre: ldlm: Fix style issues for ldlm_lib.c [376/622] lustre: obdclass: protect imp_sec using rwlock_t [377/622] lustre: llite: console message for disabled flock call [378/622] lustre: ptlrpc: Add increasing XIDs CONNECT2 flag [379/622] lustre: ptlrpc: don't reset lru_resize on idle reconnect [380/622] lnet: use after free in lnet_discover_peer_locked() [381/622] lustre: obdclass: generate random u64 max correctly [382/622] lnet: fix peer ref counting [383/622] lustre: llite: collect debug info for ll_fsync [384/622] lustre: obdclass: use RCU to release lu_env_item [385/622] lustre: mdt: improve IBITS lock definitions [386/622] lustre: uapi: change "space" hash type to hash flag [387/622] lustre: osc: cancel osc_lock list traversal once found the lock is being used [388/622] lustre: obdclass: add comment for rcu handling in lu_env_remove [389/622] lnet: honor discovery setting [390/622] lustre: obdclass: don't send multiple statfs RPCs [391/622] lustre: lov: Correct bounds checking [392/622] lustre: lu_object: Add missed qos_rr_init [393/622] lustre: fld: let's caller to retry FLD_QUERY [394/622] lustre: llite: make sure readahead cover current read [395/622] lustre: ptlrpc: Add jobid to rpctrace debug messages [396/622] lnet: libcfs: Reduce memory frag due to HA debug msg [397/622] lustre: ptlrpc: change IMPORT_SET_ macros into real functions [398/622] lustre: uapi: add unused enum obd_statfs_state [399/622] lustre: llite: create obd_device with usercopy whitelist [400/622] lnet: warn if discovery is off [401/622] lustre: ldlm: always cancel aged locks regardless enabling or disabling lru resize [402/622] lustre: llite: cleanup stats of LPROC_LL_* [403/622] lustre: osc: Do not assert for first extent [404/622] lustre: llite: MS_* flags and SB_* flags split [405/622] lustre: llite: improve ll_dom_lock_cancel [406/622] lustre: llite: swab LOV EA user data [407/622] lustre: clio: support custom csi_end_io handler [408/622] lustre: llite: release active extent on sync write commit [409/622] lustre: obd: harden debugfs handling [410/622] lustre: obd: add rmfid support [411/622] lnet: Convert noisy timeout error to cdebug [412/622] lnet: Misleading error from lnet_is_health_check [413/622] lustre: llite: do not cache write open lock for exec file [414/622] lustre: mdc: polling mode for changelog reader [415/622] lnet: Sync the start of discovery and monitor threads [416/622] lustre: llite: don't check vmpage refcount in ll_releasepage() [417/622] lnet: Deprecate live and dead router check params [418/622] lnet: Detach rspt when md_threshold is infinite [419/622] lnet: Return EHOSTUNREACH for unreachable gateway [420/622] lustre: ptlrpc: Don't get jobid in body_v2 [421/622] lnet: Defer rspt cleanup when MD queued for unlink [422/622] lustre: lov: Correct write_intent end for trunc [423/622] lustre: mdc: hold lock while walking changelog dev list [424/622] lustre: import: fix race between imp_state & imp_invalid [425/622] lnet: support non-default network namespace [426/622] lustre: obdclass: 0-nlink race in lu_object_find_at() [427/622] lustre: osc: reserve lru pages for read in batch [428/622] lustre: uapi: Make lustre_user.h c++-legal [429/622] lnet: create existing net returns EEXIST [430/622] lustre: obdecho: reuse an cl env cache for obdecho survey [431/622] lustre: mdc: dir page ldp_hash_end mistakenly adjusted [432/622] lnet: handle unlink before send completes [433/622] lustre: osc: layout and chunkbits alignment mismatch [434/622] lnet: handle recursion in resend [435/622] lustre: llite: forget cached ACLs properly [436/622] lustre: osc: Fix dom handling in weight_ast [437/622] lustre: llite: Fix extents_stats [438/622] lustre: llite: don't miss every first stride page [439/622] lustre: llite: swab LOV EA data in ll_getxattr_lov() [440/622] lustre: llite: Mark lustre_inode_cache as reclaimable [441/622] lustre: osc: add preferred checksum type support [442/622] lustre: ptlrpc: Stop sending ptlrpc_body_v2 [443/622] lnet: Fix style issues for selftest/rpc.c [444/622] lnet: Fix style issues for module.c conctl.c [445/622] lustre: ptlrpc: check lm_bufcount and lm_buflen [446/622] lustre: uapi: Remove unused CONNECT flag [447/622] lustre: lmv: disable remote file statahead [448/622] lustre: llite: Fix page count for unaligned reads [449/622] lnet: discovery off route state update [450/622] lustre: llite: prevent mulitple group locks [451/622] lustre: ptlrpc: make DEBUG_REQ messages consistent [452/622] lustre: ptlrpc: check buffer length in lustre_msg_string() [453/622] lustre: uapi: fix building fail against Power9 little endian [454/622] lustre: ptlrpc: fix reply buffers shrinking and growing [455/622] lustre: dom: manual OST-to-DOM migration via mirroring [456/622] lustre: fld: remove fci_no_shrink field. [457/622] lustre: lustre: remove ldt_obd_type field of lu_device_type [458/622] lustre: lustre: remove imp_no_timeout field [459/622] lustre: llog: remove olg_cat_processing field. [460/622] lustre: ptlrpc: remove struct ptlrpc_bulk_page [461/622] lustre: ptlrpc: remove bd_import_generation field. [462/622] lustre: ptlrpc: remove srv_threads from struct ptlrpc_service [463/622] lustre: ptlrpc: remove scp_nthrs_stopping field. [464/622] lustre: ldlm: remove unused ldlm_server_conn [465/622] lustre: llite: remove lli_readdir_mutex [466/622] lustre: llite: remove ll_umounting field [467/622] lustre: llite: align field names in ll_sb_info [468/622] lustre: llite: remove lti_iter field [469/622] lustre: llite: remove ft_mtime field [470/622] lustre: llite: remove sub_reenter field. [471/622] lustre: osc: remove oti_descr oti_handle oti_plist [472/622] lustre: osc: remove oe_next_page [473/622] lnet: o2iblnd: remove some unused fields. [474/622] lnet: socklnd: remove ksnp_sharecount [475/622] lustre: llite: extend readahead locks for striped file [476/622] lustre: llite: Improve readahead RPC issuance [477/622] lustre: lov: Move page index to top level [478/622] lustre: readahead: convert stride page index to byte [479/622] lustre: osc: prevent use after free [480/622] lustre: mdc: hold obd while processing changelog [481/622] lnet: change ln_mt_waitq to a completion. [482/622] lustre: obdclass: align to T10 sector size when generating guard [483/622] lustre: ptlrpc: Hold imp lock for idle reconnect [484/622] lustre: osc: glimpse - search for active lock [485/622] lustre: lmv: use lu_tgt_descs to manage tgts [486/622] lustre: lmv: share object alloc QoS code with LMV [487/622] lustre: import: Fix missing spin_unlock() [488/622] lnet: o2iblnd: Make credits hiw connection aware [489/622] lustre: obdecho: avoid panic with partially object init [490/622] lnet: o2iblnd: cache max_qp_wr [491/622] lustre: som: integrate LSOM with lfs find [492/622] lustre: llite: error handling of ll_och_fill() [493/622] lnet: Don't queue msg when discovery has completed [494/622] lnet: Use alternate ping processing for non-mr peers [495/622] lustre: obdclass: qos penalties miscalculated [496/622] lustre: osc: wrong cache of LVB attrs [497/622] lustre: osc: wrong cache of LVB attrs, part2 [498/622] lustre: vvp: dirty pages with pagevec [499/622] lustre: ptlrpc: resend may corrupt the data [500/622] lnet: eliminate uninitialized warning [501/622] lnet: o2ib: Record rc in debug log on startup failure [502/622] lnet: o2ib: Reintroduce kiblnd_dev_search [503/622] lustre: ptlrpc: fix watchdog ratelimit logic [504/622] lustre: flr: avoid reading unhealthy mirror [505/622] lustre: obdclass: lu_tgt_descs cleanup [506/622] lustre: ptlrpc: Properly swab ll_fiemap_info_key [507/622] lustre: llite: clear flock when using localflock [508/622] lustre: sec: reserve flags for client side encryption [509/622] lustre: llite: limit max xattr size by kernel value [510/622] lustre: ptlrpc: return proper error code [511/622] lnet: fix peer_ni selection [512/622] lustre: pcc: Auto attach for PCC during IO [513/622] lustre: lmv: alloc dir stripes by QoS [514/622] lustre: llite: Don't clear d_fsdata in ll_release() [515/622] lustre: llite: move agl_thread cleanup out of thread. [516/622] lustre/lnet: remove unnecessary use of msecs_to_jiffies() [517/622] lnet: net_fault: don't pass struct member to do_div() [518/622] lustre: obd: discard unused enum [519/622] lustre: update version to 2.13.50 [520/622] lustre: llite: report latency for filesystem ops [521/622] lustre: osc: don't re-enable grant shrink on reconnect [522/622] lustre: llite: statfs to use NODELAY with MDS [523/622] lustre: ptlrpc: grammar fix. [524/622] lustre: lov: check all entries in lov_flush_composite [525/622] lustre: pcc: Incorrect size after re-attach [526/622] lustre: pcc: auto attach not work after client cache clear [527/622] lustre: pcc: Init saved dataset flags properly [528/622] lustre: use simple sleep in some cases [529/622] lustre: lov: use wait_event() in lov_subobject_kill() [530/622] lustre: llite: use wait_event in cl_object_put_last() [531/622] lustre: modules: Use LIST_HEAD for declaring list_heads [532/622] lustre: handle: move refcount into the lustre_handle. [533/622] lustre: llite: support page unaligned stride readahead [534/622] lustre: ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM [535/622] lustre: osc: allow increasing osc..short_io_bytes [536/622] lnet: remove pt_number from lnet_peer_table. [537/622] lnet: Optimize check for routing feature flag [538/622] lustre: llite: file write pos mimatch [539/622] lustre: ldlm: FLOCK request can be processed twice [540/622] lnet: timers: correctly offset mod_timer. [541/622] lustre: ptlrpc: update wiretest for new values [542/622] lustre: ptlrpc: do lu_env_refill for any new request [543/622] lustre: obd: perform proper division [544/622] lustre: uapi: introduce OBD_CONNECT2_CRUSH [545/622] lnet: Wait for single discovery attempt of routers [546/622] lustre: mgc: config lock leak [547/622] lnet: check if current->nsproxy is NULL before using [548/622] lustre: ptlrpc: always reset generation for idle reconnect [549/622] lustre: obdclass: Allow read-ahead for write requests [550/622] lustre: ldlm: separate buckets from ldlm hash table [551/622] lustre: llite: don't cache MDS_OPEN_LOCK for volatile files [552/622] lnet: discard lnd_refcount [553/622] lnet: socklnd: rename struct ksock_peer to struct ksock_peer_ni [554/622] lnet: change ksocknal_create_peer() to return pointer [555/622] lnet: discard ksnn_lock [556/622] lnet: discard LNetMEInsert [557/622] lustre: lmv: fix to return correct MDT count [558/622] lustre: obdclass: remove assertion for imp_refcount [559/622] lnet: Prefer route specified by rtr_nid [560/622] lustre: all: prefer sizeof(var) for alloc [561/622] lustre: handle: discard OBD_FREE_RCU [562/622] lnet: use list_move where appropriate. [563/622] lnet: libcfs: provide an scnprintf and start using it [564/622] lustre: llite: fetch default layout for a directory [565/622] lnet: fix rspt counter [566/622] lustre: ldlm: add a counter to the per-namespace data [567/622] lnet: Add peer level aliveness information [568/622] lnet: always check return of try_module_get() [569/622] lustre: obdclass: don't skip records for wrapped catalog [570/622] lnet: Refactor lnet_find_best_lpni_on_net [571/622] lnet: Avoid comparing route to itself [572/622] lustre: sysfs: use string helper like functions for sysfs [573/622] lustre: rename ops to owner [574/622] lustre: ldlm: simplify ldlm_ns_hash_defs[] [575/622] lnet: prepare to make lnet_lnd const. [576/622] lnet: discard struct ksock_peer [577/622] lnet: Avoid extra lnet_remotenet lookup [578/622] lnet: Remove unused vars in lnet_find_route_locked [579/622] lnet: Refactor lnet_compare_routes [580/622] lustre: u_object: factor out extra per-bucket data [581/622] lustre: llite: replace lli_trunc_sem [582/622] lnet: Fix source specified route selection [583/622] lustre: uapi: turn struct lustre_nfs_fid to userland fhandle [584/622] lustre: uapi: LU-12521 llapi: add separate fsname and instance API [585/622] lnet: socklnd: initialize the_ksocklnd at compile-time. [586/622] lnet: remove locking protection ln_testprotocompat [587/622] lustre: ptlrpc: suppress connection restored message [588/622] lustre: llite: fix deadlock in ll_update_lsm_md() [589/622] lustre: ldlm: fix lock convert races [590/622] lustre: ldlm: signal vs CP callback race [591/622] lustre: uapi: properly pack data structures [592/622] lnet: peer lookup handle shutdown [593/622] lnet: lnet response entries leak [594/622] lustre: lmv: disable statahead for remote objects [595/622] lustre: llite: eviction during ll_open_cleanup() [596/622] lustre: ptlrpc: show target name in req_history [597/622] lustre: dom: check read-on-open buffer presents in reply [598/622] lustre: llite: proper names/types for offset/pages [599/622] lustre: llite: Accept EBUSY for page unaligned read [600/622] lustre: handle: remove locking from class_handle2object() [601/622] lustre: handle: use hlist for hash lists. [602/622] lustre: obdclass: convert waiting in cl_sync_io_wait(). [603/622] lnet: modules: use list_move were appropriate. [604/622] lnet: fix small race in unloading klnd modules. [605/622] lnet: me: discard struct lnet_handle_me [606/622] lnet: avoid extra memory consumption [607/622] lustre: uapi: remove unused LUSTRE_DIRECTIO_FL [608/622] lustre: lustre: Reserve OST_FALLOCATE(fallocate) opcode [609/622] lnet: libcfs: Cleanup use of bare printk [610/622] lnet: Do not assume peers are MR capable [611/622] lnet: socklnd: convert peers hash table to hashtable.h [612/622] lustre: llite: Update mdc and lite stats on open\|creat [613/622] lustre: osc: glimpse and lock cancel race [614/622] lustre: llog: keep llog handle alive until last reference [615/622] lnet: handling device failure by IB event handler [616/622] lustre: ptlrpc: simplify wait_event handling in unregister functions [617/622] lustre: ptlrpc: use l_wait_event_abortable in ptlrpcd_add_reg() [618/622] lnet: use LIST_HEAD() for local lists. [619/622] lustre: lustre: use LIST_HEAD() for local lists. [620/622] lustre: handle: discard h_lock. [621/622] lnet: remove lnd_query interface. [622/622] lnet: use conservative health timeouts

diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index eaf20ea..e92f12f 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1322,14 +1322,14 @@ struct lu_kmem_descr { extern u32 lu_context_tags_default; extern u32 lu_session_tags_default; -/* Generic subset of OSTs */ -struct ost_pool { +/* Generic subset of tgts */ +struct lu_tgt_pool { u32 *op_array; /* array of index of * lov_obd->lov_tgts */ - unsigned int op_count; /* number of OSTs in the array */ - unsigned int op_size; /* allocated size of lp_array */ - struct rw_semaphore op_rw_sem; /* to protect ost_pool use */ + unsigned int op_count; /* number of tgts in the array */ + unsigned int op_size; /* allocated size of op_array */ + struct rw_semaphore op_rw_sem; /* to protect lu_tgt_pool use */ }; /* round-robin QoS data for LOD/LMV */ @@ -1338,7 +1338,7 @@ struct lu_qos_rr { u32 lqr_start_idx; /* start index of new inode */ u32 lqr_offset_idx;/* aliasing for start_idx */ int lqr_start_count;/* reseed counter */ - struct ost_pool lqr_pool; /* round-robin optimized list */ + struct lu_tgt_pool lqr_pool; /* round-robin optimized list */ unsigned long lqr_dirty:1; /* recalc round-robin list */ }; @@ -1401,13 +1401,30 @@ struct lu_tgt_desc_idx { struct lu_tgt_desc *ldi_tgt[TGT_PTRS_PER_BLOCK]; }; +/* QoS data for LOD/LMV */ +struct lu_qos { + struct list_head lq_svr_list; /* lu_svr_qos list */ + struct rw_semaphore lq_rw_sem; + u32 lq_active_svr_count; + unsigned int lq_prio_free; /* priority for free space */ + unsigned int lq_threshold_rr;/* priority for rr */ + struct lu_qos_rr lq_rr; /* round robin qos data */ + unsigned long lq_dirty:1, /* recalc qos data */ + lq_same_space:1,/* the servers all have approx. + * the same space avail + */ + lq_reset:1; /* zero current penalties */ +}; + struct lu_tgt_descs { + union { + struct lov_desc ltd_lov_desc; + struct lmv_desc ltd_lmv_desc; + }; /* list of known TGTs */ struct lu_tgt_desc_idx *ltd_tgt_idx[TGT_PTRS]; /* Size of the lu_tgts array, granted to be a power of 2 */ u32 ltd_tgts_size; - /* number of registered TGTs */ - u32 ltd_tgtnr; /* bitmap of TGTs available */ unsigned long *ltd_tgt_bitmap; /* TGTs scheduled to be deleted */ @@ -1418,43 +1435,31 @@ struct lu_tgt_descs { struct mutex ltd_mutex; /* read/write semaphore used for array relocation */ struct rw_semaphore ltd_rw_sem; + /* QoS */ + struct lu_qos ltd_qos; + /* all tgts in a packed array */ + struct lu_tgt_pool ltd_tgt_pool; + /* true if tgt is MDT */ + bool ltd_is_mdt; }; #define LTD_TGT(ltd, index) \ - ((ltd)->ltd_tgt_idx[(index) / TGT_PTRS_PER_BLOCK] \ - ->ldi_tgt[(index) % TGT_PTRS_PER_BLOCK]) + (ltd)->ltd_tgt_idx[(index) / TGT_PTRS_PER_BLOCK] \ + ->ldi_tgt[(index) % TGT_PTRS_PER_BLOCK] -/* QoS data for LOD/LMV */ -struct lu_qos { - struct list_head lq_svr_list; /* lu_svr_qos list */ - struct rw_semaphore lq_rw_sem; - u32 lq_active_svr_count; - unsigned int lq_prio_free; /* priority for free space */ - unsigned int lq_threshold_rr;/* priority for rr */ - struct lu_qos_rr lq_rr; /* round robin qos data */ - unsigned long lq_dirty:1, /* recalc qos data */ - lq_same_space:1,/* the servers all have approx. - * the same space avail - */ - lq_reset:1; /* zero current penalties */ -}; - -void lu_qos_rr_init(struct lu_qos_rr *lqr); -int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); -int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); -bool lqos_is_usable(struct lu_qos *qos, u32 active_tgt_nr); -int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, - u32 active_tgt_nr, u32 maxage, bool is_mdt); -void lqos_calc_weight(struct lu_tgt_desc *tgt); -int lqos_recalc_weight(struct lu_qos *qos, struct lu_tgt_descs *ltd, - struct lu_tgt_desc *tgt, u32 active_tgt_nr, - u64 *total_wt); u64 lu_prandom_u64_max(u64 ep_ro); +void lu_qos_rr_init(struct lu_qos_rr *lqr); +int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); +void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt); -int lu_tgt_descs_init(struct lu_tgt_descs *ltd); +int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt); void lu_tgt_descs_fini(struct lu_tgt_descs *ltd); -int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); -void lu_tgt_descs_del(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); +int ltd_add_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); +void ltd_del_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt); +bool ltd_qos_is_usable(struct lu_tgt_descs *ltd); +int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd); +int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, + u64 *total_wt); static inline struct lu_tgt_desc *ltd_first_tgt(struct lu_tgt_descs *ltd) { diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 41431f9..4ba70c7 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -394,7 +394,7 @@ struct lov_md_tgt_desc { struct lov_obd { struct lov_desc desc; struct lov_tgt_desc **lov_tgts; /* sparse array */ - struct ost_pool lov_packed; /* all OSTs in a packed array */ + struct lu_tgt_pool lov_packed; /* all OSTs in a packed array */ struct mutex lov_lock; struct obd_connect_data lov_ocd; atomic_t lov_refcount; @@ -422,7 +422,6 @@ struct lov_obd { struct lmv_obd { struct lu_client_fld lmv_fld; spinlock_t lmv_lock; - struct lmv_desc desc; int connected; int max_easize; @@ -435,10 +434,12 @@ struct lmv_obd { struct kobject *lmv_tgts_kobj; void *lmv_cache; - struct lu_qos lmv_qos; u32 lmv_qos_rr_index; }; +#define lmv_mdt_count lmv_mdt_descs.ltd_lmv_desc.ld_tgt_count +#define lmv_qos lmv_mdt_descs.ltd_qos + struct niobuf_local { u64 lnb_file_offset; u32 lnb_page_offset; diff --git a/fs/lustre/lmv/lmv_fld.c b/fs/lustre/lmv/lmv_fld.c index ef2c866..ea1ef72 100644 --- a/fs/lustre/lmv/lmv_fld.c +++ b/fs/lustre/lmv/lmv_fld.c @@ -75,11 +75,11 @@ int lmv_fld_lookup(struct lmv_obd *lmv, const struct lu_fid *fid, u32 *mds) CDEBUG(D_INODE, "FLD lookup got mds #%x for fid=" DFID "\n", *mds, PFID(fid)); - if (*mds >= lmv->desc.ld_tgt_count) { + if (*mds >= lmv->lmv_mdt_descs.ltd_tgts_size) { rc = -EINVAL; CERROR("%s: FLD lookup got invalid mds #%x (max: %x) for fid=" DFID ": rc = %d\n", - obd->obd_name, *mds, lmv->desc.ld_tgt_count, PFID(fid), - rc); + obd->obd_name, *mds, lmv->lmv_mdt_descs.ltd_tgts_size, + PFID(fid), rc); } return rc; } diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index d95fa3f..70d86676 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -122,7 +122,7 @@ struct lu_tgt_desc *lmv_next_connected_tgt(struct lmv_obd *lmv, u32 mdt_idx; int rc; - if (lmv->desc.ld_tgt_count < 2) + if (lmv->lmv_mdt_count < 2) return 0; rc = lmv_fld_lookup(lmv, fid, &mdt_idx); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 2959b18..84be905 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -64,7 +64,8 @@ void lmv_activate_target(struct lmv_obd *lmv, struct lmv_tgt_desc *tgt, return; tgt->ltd_active = activate; - lmv->desc.ld_active_tgt_count += (activate ? 1 : -1); + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count += + (activate ? 1 : -1); tgt->ltd_exp->exp_obd->obd_inactive = !activate; } @@ -330,11 +331,11 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) tgt->ltd_active = 1; tgt->ltd_exp = mdc_exp; - lmv->desc.ld_active_tgt_count++; + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count++; md_init_ea_size(tgt->ltd_exp, lmv->max_easize, lmv->max_def_easize); - rc = lqos_add_tgt(&lmv->lmv_qos, tgt); + rc = lu_qos_add_tgt(&lmv->lmv_qos, tgt); if (rc) { obd_disconnect(mdc_exp); return rc; @@ -357,8 +358,7 @@ static int lmv_connect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt) static void lmv_del_target(struct lmv_obd *lmv, struct lu_tgt_desc *tgt) { LASSERT(tgt); - lqos_del_tgt(&lmv->lmv_qos, tgt); - lu_tgt_descs_del(&lmv->lmv_mdt_descs, tgt); + ltd_del_tgt(&lmv->lmv_mdt_descs, tgt); kfree(tgt); } @@ -369,7 +369,6 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, struct obd_device *mdc_obd; struct lmv_tgt_desc *tgt; struct lu_tgt_descs *ltd = &lmv->lmv_mdt_descs; - int orig_tgt_count = 0; int rc = 0; CDEBUG(D_CONFIG, "Target uuid: %s. index %d\n", uuidp->uuid, index); @@ -392,11 +391,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, tgt->ltd_active = 0; mutex_lock(&ltd->ltd_mutex); - rc = lu_tgt_descs_add(ltd, tgt); - if (!rc && index >= lmv->desc.ld_tgt_count) { - orig_tgt_count = lmv->desc.ld_tgt_count; - lmv->desc.ld_tgt_count = index + 1; - } + rc = ltd_add_tgt(ltd, tgt); mutex_unlock(&ltd->ltd_mutex); if (rc) @@ -407,14 +402,10 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp, return rc; rc = lmv_connect_mdc(obd, tgt); - if (rc) { - mutex_lock(&ltd->ltd_mutex); - lmv->desc.ld_tgt_count = orig_tgt_count; - memset(tgt, 0, sizeof(*tgt)); - mutex_unlock(&ltd->ltd_mutex); - } else { + if (!rc) { int easize = sizeof(struct lmv_stripe_md) + - lmv->desc.ld_tgt_count * sizeof(struct lu_fid); + lmv->lmv_mdt_count * sizeof(struct lu_fid); + lmv_init_ea_size(obd->obd_self_export, easize, 0); } @@ -441,7 +432,7 @@ static int lmv_check_connect(struct obd_device *obd) goto unlock; } - if (lmv->desc.ld_tgt_count == 0) { + if (!lmv->lmv_mdt_count) { CERROR("%s: no targets configured: rc = -EINVAL\n", obd->obd_name); rc = -EINVAL; @@ -465,7 +456,7 @@ static int lmv_check_connect(struct obd_device *obd) } lmv->connected = 1; - easize = lmv_mds_md_size(lmv->desc.ld_tgt_count, LMV_MAGIC); + easize = lmv_mds_md_size(lmv->lmv_mdt_count, LMV_MAGIC); lmv_init_ea_size(obd->obd_self_export, easize, 0); unlock: mutex_unlock(&lmv->lmv_mdt_descs.ltd_mutex); @@ -478,7 +469,7 @@ static int lmv_check_connect(struct obd_device *obd) if (!tgt->ltd_exp) continue; - --lmv->desc.ld_active_tgt_count; + --lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count; obd_disconnect(tgt->ltd_exp); } @@ -810,7 +801,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, struct lmv_obd *lmv = &obddev->u.lmv; struct lu_tgt_desc *tgt = NULL; int set = 0; - u32 count = lmv->desc.ld_tgt_count; + u32 count = lmv->lmv_mdt_count; int rc = 0; if (count == 0) @@ -824,7 +815,8 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, u32 index; memcpy(&index, data->ioc_inlbuf2, sizeof(u32)); - if (index >= count) + + if (index >= lmv->lmv_mdt_descs.ltd_tgts_size) return -ENODEV; tgt = lmv_tgt(lmv, index); @@ -857,12 +849,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, struct obd_quotactl *oqctl; if (qctl->qc_valid == QC_MDTIDX) { - if (count <= qctl->qc_idx) - return -EINVAL; - tgt = lmv_tgt(lmv, qctl->qc_idx); - if (!tgt || !tgt->ltd_exp) - return -EINVAL; } else if (qctl->qc_valid == QC_UUID) { lmv_foreach_tgt(lmv, tgt) { if (!obd_uuid_equals(&tgt->ltd_uuid, @@ -878,10 +865,9 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp, return -EINVAL; } - if (tgt->ltd_index >= count) - return -EAGAIN; + if (!tgt || !tgt->ltd_exp) + return -EINVAL; - LASSERT(tgt && tgt->ltd_exp); oqctl = kzalloc(sizeof(*oqctl), GFP_KERNEL); if (!oqctl) return -ENOMEM; @@ -1069,7 +1055,7 @@ static u32 lmv_placement_policy(struct obd_device *obd, struct lmv_user_md *lum; u32 mdt; - if (lmv->desc.ld_tgt_count == 1) + if (lmv->lmv_mdt_count == 1) return 0; lum = op_data->op_data; @@ -1182,27 +1168,17 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) return -EINVAL; } - obd_str2uuid(&lmv->desc.ld_uuid, desc->ld_uuid.uuid); - lmv->desc.ld_tgt_count = 0; - lmv->desc.ld_active_tgt_count = 0; - lmv->desc.ld_qos_maxage = LMV_DESC_QOS_MAXAGE_DEFAULT; + obd_str2uuid(&lmv->lmv_mdt_descs.ltd_lmv_desc.ld_uuid, + desc->ld_uuid.uuid); + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_tgt_count = 0; + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count = 0; + lmv->lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage = + LMV_DESC_QOS_MAXAGE_DEFAULT; lmv->max_def_easize = 0; lmv->max_easize = 0; spin_lock_init(&lmv->lmv_lock); - /* Set up allocation policy (QoS and RR) */ - INIT_LIST_HEAD(&lmv->lmv_qos.lq_svr_list); - init_rwsem(&lmv->lmv_qos.lq_rw_sem); - lmv->lmv_qos.lq_dirty = 1; - lmv->lmv_qos.lq_reset = 1; - /* Default priority is toward free space balance */ - lmv->lmv_qos.lq_prio_free = 232; - /* Default threshold for rr (roughly 17%) */ - lmv->lmv_qos.lq_threshold_rr = 43; - - lu_qos_rr_init(&lmv->lmv_qos.lq_rr); - /* * initialize rr_index to lower 32bit of netid, so that client * can distribute subdirs evenly from the beginning. @@ -1224,7 +1200,7 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg) if (rc) CERROR("Can't init FLD, err %d\n", rc); - rc = lu_tgt_descs_init(&lmv->lmv_mdt_descs); + rc = lu_tgt_descs_init(&lmv->lmv_mdt_descs, true); if (rc) CWARN("%s: error initialize target table: rc = %d\n", obd->obd_name, rc); @@ -1292,7 +1268,7 @@ static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags) if (flags & OBD_STATFS_FOR_MDT0) return 0; - if (lmv->lmv_statfs_start || lmv->desc.ld_tgt_count == 1) + if (lmv->lmv_statfs_start || lmv->lmv_mdt_count == 1) return lmv->lmv_statfs_start; /* choose initial MDT for this client */ @@ -1306,8 +1282,8 @@ static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags) /* We dont need a full 64-bit modulus, just enough * to distribute the requests across MDTs evenly. */ - lmv->lmv_statfs_start = - (u32)lnet_id.nid % lmv->desc.ld_tgt_count; + lmv->lmv_statfs_start = (u32)lnet_id.nid % + lmv->lmv_mdt_count; break; } } @@ -1333,8 +1309,8 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp, /* distribute statfs among MDTs */ idx = lmv_select_statfs_mdt(lmv, flags); - for (i = 0; i < lmv->desc.ld_tgt_count; i++, idx++) { - idx = idx % lmv->desc.ld_tgt_count; + for (i = 0; i < lmv->lmv_mdt_descs.ltd_tgts_size; i++, idx++) { + idx = idx % lmv->lmv_mdt_descs.ltd_tgts_size; tgt = lmv_tgt(lmv, idx); if (!tgt || !tgt->ltd_exp) continue; @@ -1410,7 +1386,7 @@ int lmv_statfs_check_update(struct obd_device *obd, struct lmv_tgt_desc *tgt) int rc; if (ktime_get_seconds() - tgt->ltd_statfs_age < - obd->u.lmv.desc.ld_qos_maxage) + obd->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage) return 0; rc = obd_statfs_async(tgt->ltd_exp, &oinfo, 0, NULL); @@ -1526,19 +1502,17 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) u64 rand; int rc; - if (!lqos_is_usable(&lmv->lmv_qos, lmv->desc.ld_active_tgt_count)) + if (!ltd_qos_is_usable(&lmv->lmv_mdt_descs)) return ERR_PTR(-EAGAIN); down_write(&lmv->lmv_qos.lq_rw_sem); - if (!lqos_is_usable(&lmv->lmv_qos, lmv->desc.ld_active_tgt_count)) { + if (!ltd_qos_is_usable(&lmv->lmv_mdt_descs)) { tgt = ERR_PTR(-EAGAIN); goto unlock; } - rc = lqos_calc_penalties(&lmv->lmv_qos, &lmv->lmv_mdt_descs, - lmv->desc.ld_active_tgt_count, - lmv->desc.ld_qos_maxage, true); + rc = ltd_qos_penalties_calc(&lmv->lmv_mdt_descs); if (rc) { tgt = ERR_PTR(rc); goto unlock; @@ -1550,7 +1524,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) continue; tgt->ltd_qos.ltq_usable = 1; - lqos_calc_weight(tgt); + lu_tgt_qos_weight_calc(tgt); total_weight += tgt->ltd_qos.ltq_weight; } @@ -1565,9 +1539,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) continue; *mdt = tgt->ltd_index; - lqos_recalc_weight(&lmv->lmv_qos, &lmv->lmv_mdt_descs, tgt, - lmv->desc.ld_active_tgt_count, - &total_weight); + ltd_qos_update(&lmv->lmv_mdt_descs, tgt, &total_weight); rc = 0; goto unlock; } @@ -1588,14 +1560,16 @@ static struct lu_tgt_desc *lmv_locate_tgt_rr(struct lmv_obd *lmv, u32 *mdt) int index; spin_lock(&lmv->lmv_qos.lq_rr.lqr_alloc); - for (i = 0; i < lmv->desc.ld_tgt_count; i++) { - index = (i + lmv->lmv_qos_rr_index) % lmv->desc.ld_tgt_count; + for (i = 0; i < lmv->lmv_mdt_descs.ltd_tgts_size; i++) { + index = (i + lmv->lmv_qos_rr_index) % + lmv->lmv_mdt_descs.ltd_tgts_size; tgt = lmv_tgt(lmv, index); if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) continue; *mdt = tgt->ltd_index; - lmv->lmv_qos_rr_index = (*mdt + 1) % lmv->desc.ld_tgt_count; + lmv->lmv_qos_rr_index = (*mdt + 1) % + lmv->lmv_mdt_descs.ltd_tgts_size; spin_unlock(&lmv->lmv_qos.lq_rr.lqr_alloc); return tgt; @@ -1791,7 +1765,7 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, struct lmv_tgt_desc *tgt; int rc; - if (!lmv->desc.ld_active_tgt_count) + if (!lmv->lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count) return -EIO; if (lmv_dir_bad_hash(op_data->op_mea1)) @@ -2903,7 +2877,7 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, exp->exp_connect_data = *(struct obd_connect_data *)val; return rc; } else if (KEY_IS(KEY_TGT_COUNT)) { - *((int *)val) = lmv->desc.ld_tgt_count; + *((int *)val) = lmv->lmv_mdt_descs.ltd_tgts_size; return 0; } @@ -2917,7 +2891,7 @@ static int lmv_rmfid(struct obd_export *exp, struct fid_array *fa, struct obd_device *obddev = class_exp2obd(exp); struct ptlrpc_request_set *set = _set; struct lmv_obd *lmv = &obddev->u.lmv; - int tgt_count = lmv->desc.ld_tgt_count; + int tgt_count = lmv->lmv_mdt_count; struct lu_tgt_desc *tgt; struct fid_array *fat, **fas = NULL; int i, rc, **rcs = NULL; @@ -3303,8 +3277,8 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, u64 flags, * since this can be easily found, and only try others if that fails. */ for (i = 0, index = lmv_fid2tgt_index(lmv, fid); - i < lmv->desc.ld_tgt_count; - i++, index = (index + 1) % lmv->desc.ld_tgt_count) { + i < lmv->lmv_mdt_descs.ltd_tgts_size; + i++, index = (index + 1) % lmv->lmv_mdt_descs.ltd_tgts_size) { if (index < 0) { CDEBUG(D_HA, "%s: " DFID " is inaccessible: rc = %d\n", obd->obd_name, PFID(fid), index); diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c index af670f8..79e27b3 100644 --- a/fs/lustre/lmv/lproc_lmv.c +++ b/fs/lustre/lmv/lproc_lmv.c @@ -45,10 +45,8 @@ static ssize_t numobd_show(struct kobject *kobj, struct attribute *attr, { struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - struct lmv_desc *desc; - desc = &dev->u.lmv.desc; - return sprintf(buf, "%u\n", desc->ld_tgt_count); + return sprintf(buf, "%u\n", dev->u.lmv.lmv_mdt_count); } LUSTRE_RO_ATTR(numobd); @@ -57,10 +55,9 @@ static ssize_t activeobd_show(struct kobject *kobj, struct attribute *attr, { struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - struct lmv_desc *desc; - desc = &dev->u.lmv.desc; - return sprintf(buf, "%u\n", desc->ld_active_tgt_count); + return sprintf(buf, "%u\n", + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_active_tgt_count); } LUSTRE_RO_ATTR(activeobd); @@ -69,10 +66,9 @@ static ssize_t desc_uuid_show(struct kobject *kobj, struct attribute *attr, { struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - struct lmv_desc *desc; - desc = &dev->u.lmv.desc; - return sprintf(buf, "%s\n", desc->ld_uuid.uuid); + return sprintf(buf, "%s\n", + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_uuid.uuid); } LUSTRE_RO_ATTR(desc_uuid); @@ -83,7 +79,8 @@ static ssize_t qos_maxage_show(struct kobject *kobj, struct obd_device *dev = container_of(kobj, struct obd_device, obd_kset.kobj); - return sprintf(buf, "%u\n", dev->u.lmv.desc.ld_qos_maxage); + return sprintf(buf, "%u\n", + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage); } static ssize_t qos_maxage_store(struct kobject *kobj, @@ -100,7 +97,7 @@ static ssize_t qos_maxage_store(struct kobject *kobj, if (rc) return rc; - dev->u.lmv.desc.ld_qos_maxage = val; + dev->u.lmv.lmv_mdt_descs.ltd_lmv_desc.ld_qos_maxage = val; return count; } diff --git a/fs/lustre/lov/lov_internal.h b/fs/lustre/lov/lov_internal.h index d235abe..3725d1e 100644 --- a/fs/lustre/lov/lov_internal.h +++ b/fs/lustre/lov/lov_internal.h @@ -221,7 +221,7 @@ struct lsm_operations { struct pool_desc { char pool_name[LOV_MAXPOOLNAME + 1]; - struct ost_pool pool_obds; + struct lu_tgt_pool pool_obds; atomic_t pool_refcount; struct rhash_head pool_hash; /* access by poolname */ union { @@ -322,12 +322,12 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf, #define LOV_MDC_TGT_MAX 256 -/* ost_pool methods */ -int lov_ost_pool_init(struct ost_pool *op, unsigned int count); -int lov_ost_pool_extend(struct ost_pool *op, unsigned int min_count); -int lov_ost_pool_add(struct ost_pool *op, u32 idx, unsigned int min_count); -int lov_ost_pool_remove(struct ost_pool *op, u32 idx); -int lov_ost_pool_free(struct ost_pool *op); +/* lu_tgt_pool methods */ +int lov_ost_pool_init(struct lu_tgt_pool *op, unsigned int count); +int lov_ost_pool_extend(struct lu_tgt_pool *op, unsigned int min_count); +int lov_ost_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count); +int lov_ost_pool_remove(struct lu_tgt_pool *op, u32 idx); +int lov_ost_pool_free(struct lu_tgt_pool *op); /* high level pool methods */ int lov_pool_new(struct obd_device *obd, char *poolname); diff --git a/fs/lustre/lov/lov_pool.c b/fs/lustre/lov/lov_pool.c index a0552fb..9ab81cb 100644 --- a/fs/lustre/lov/lov_pool.c +++ b/fs/lustre/lov/lov_pool.c @@ -231,7 +231,7 @@ static int pool_proc_open(struct inode *inode, struct file *file) }; #define LOV_POOL_INIT_COUNT 2 -int lov_ost_pool_init(struct ost_pool *op, unsigned int count) +int lov_ost_pool_init(struct lu_tgt_pool *op, unsigned int count) { if (count == 0) count = LOV_POOL_INIT_COUNT; @@ -249,7 +249,7 @@ int lov_ost_pool_init(struct ost_pool *op, unsigned int count) } /* Caller must hold write op_rwlock */ -int lov_ost_pool_extend(struct ost_pool *op, unsigned int min_count) +int lov_ost_pool_extend(struct lu_tgt_pool *op, unsigned int min_count) { int new_count; u32 *new; @@ -273,7 +273,7 @@ int lov_ost_pool_extend(struct ost_pool *op, unsigned int min_count) return 0; } -int lov_ost_pool_add(struct ost_pool *op, u32 idx, unsigned int min_count) +int lov_ost_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count) { int rc = 0, i; @@ -298,7 +298,7 @@ int lov_ost_pool_add(struct ost_pool *op, u32 idx, unsigned int min_count) return rc; } -int lov_ost_pool_remove(struct ost_pool *op, u32 idx) +int lov_ost_pool_remove(struct lu_tgt_pool *op, u32 idx) { int i; @@ -318,7 +318,7 @@ int lov_ost_pool_remove(struct ost_pool *op, u32 idx) return -EINVAL; } -int lov_ost_pool_free(struct ost_pool *op) +int lov_ost_pool_free(struct lu_tgt_pool *op) { if (op->op_size == 0) return 0; diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile index 5718a6d..9693a5e 100644 --- a/fs/lustre/obdclass/Makefile +++ b/fs/lustre/obdclass/Makefile @@ -8,4 +8,4 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \ lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \ obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \ cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \ - jobid.o integrity.o obd_cksum.o lu_qos.o lu_tgt_descs.o + jobid.o integrity.o obd_cksum.o lu_tgt_descs.o diff --git a/fs/lustre/obdclass/lu_qos.c b/fs/lustre/obdclass/lu_qos.c deleted file mode 100644 index 13ab4a7..0000000 --- a/fs/lustre/obdclass/lu_qos.c +++ /dev/null @@ -1,512 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * GPL HEADER START - * - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 only, - * as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License version 2 for more details (a copy is included - * in the LICENSE file that accompanied this code). - * - * You should have received a copy of the GNU General Public License - * version 2 along with this program; If not, see - * http://www.gnu.org/licenses/gpl-2.0.html - * - * GPL HEADER END - */ -/* - * This file is part of Lustre, http://www.lustre.org/ - * - * lustre/obdclass/lu_qos.c - * - * Lustre QoS. - * These are the only exported functions, they provide some generic - * infrastructure for object allocation QoS - * - */ - -#define DEBUG_SUBSYSTEM S_CLASS - -#include <linux/module.h> -#include <linux/list.h> -#include <linux/random.h> -#include <obd_class.h> -#include <obd_support.h> -#include <lustre_disk.h> -#include <lustre_fid.h> -#include <lu_object.h> - -void lu_qos_rr_init(struct lu_qos_rr *lqr) -{ - spin_lock_init(&lqr->lqr_alloc); - lqr->lqr_dirty = 1; -} -EXPORT_SYMBOL(lu_qos_rr_init); - -/** - * Add a new target to Quality of Service (QoS) target table. - * - * Add a new MDT/OST target to the structure representing an OSS. Resort the - * list of known MDSs/OSSs by the number of MDTs/OSTs attached to each MDS/OSS. - * The MDS/OSS list is protected internally and no external locking is required. - * - * @qos lu_qos data - * @ltd target description - * - * Return: 0 on success - * -ENOMEM on error - */ -int lqos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) -{ - struct lu_svr_qos *svr = NULL; - struct lu_svr_qos *tempsvr; - struct obd_export *exp = ltd->ltd_exp; - int found = 0; - u32 id = 0; - int rc = 0; - - down_write(&qos->lq_rw_sem); - /* - * a bit hacky approach to learn NID of corresponding connection - * but there is no official API to access information like this - * with OSD API. - */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - if (obd_uuid_equals(&svr->lsq_uuid, - &exp->exp_connection->c_remote_uuid)) { - found++; - break; - } - if (svr->lsq_id > id) - id = svr->lsq_id; - } - - if (!found) { - svr = kmalloc(sizeof(*svr), GFP_NOFS); - if (!svr) { - rc = -ENOMEM; - goto out; - } - memcpy(&svr->lsq_uuid, &exp->exp_connection->c_remote_uuid, - sizeof(svr->lsq_uuid)); - ++id; - svr->lsq_id = id; - } else { - /* Assume we have to move this one */ - list_del(&svr->lsq_svr_list); - } - - svr->lsq_tgt_count++; - ltd->ltd_qos.ltq_svr = svr; - - CDEBUG(D_OTHER, "add tgt %s to server %s (%d targets)\n", - obd_uuid2str(&ltd->ltd_uuid), obd_uuid2str(&svr->lsq_uuid), - svr->lsq_tgt_count); - - /* - * Add sorted by # of tgts. Find the first entry that we're - * bigger than... - */ - list_for_each_entry(tempsvr, &qos->lq_svr_list, lsq_svr_list) { - if (svr->lsq_tgt_count > tempsvr->lsq_tgt_count) - break; - } - /* - * ...and add before it. If we're the first or smallest, tempsvr - * points to the list head, and we add to the end. - */ - list_add_tail(&svr->lsq_svr_list, &tempsvr->lsq_svr_list); - - qos->lq_dirty = 1; - qos->lq_rr.lqr_dirty = 1; - -out: - up_write(&qos->lq_rw_sem); - return rc; -} -EXPORT_SYMBOL(lqos_add_tgt); - -/** - * Remove MDT/OST target from QoS table. - * - * Removes given MDT/OST target from QoS table and releases related - * MDS/OSS structure if no target remain on the MDS/OSS. - * - * @qos lu_qos data - * @ltd target description - * - * Return: 0 on success - * -ENOENT if no server was found - */ -int lqos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) -{ - struct lu_svr_qos *svr; - int rc = 0; - - down_write(&qos->lq_rw_sem); - svr = ltd->ltd_qos.ltq_svr; - if (!svr) { - rc = -ENOENT; - goto out; - } - - svr->lsq_tgt_count--; - if (svr->lsq_tgt_count == 0) { - CDEBUG(D_OTHER, "removing server %s\n", - obd_uuid2str(&svr->lsq_uuid)); - list_del(&svr->lsq_svr_list); - ltd->ltd_qos.ltq_svr = NULL; - kfree(svr); - } - - qos->lq_dirty = 1; - qos->lq_rr.lqr_dirty = 1; -out: - up_write(&qos->lq_rw_sem); - return rc; -} -EXPORT_SYMBOL(lqos_del_tgt); - -/** - * lu_prandom_u64_max - returns a pseudo-random u64 number in interval - * [0, ep_ro) - * - * #ep_ro right open interval endpoint - * - * Return: a pseudo-random 64-bit number that is in interval [0, ep_ro). - */ -u64 lu_prandom_u64_max(u64 ep_ro) -{ - u64 rand = 0; - - if (ep_ro) { -#if BITS_PER_LONG == 32 - /* - * If ep_ro > 32-bit, first generate the high - * 32 bits of the random number, then add in the low - * 32 bits (truncated to the upper limit, if needed) - */ - if (ep_ro > 0xffffffffULL) - rand = prandom_u32_max((u32)(ep_ro >> 32)) << 32; - - if (rand == (ep_ro & 0xffffffff00000000ULL)) - rand |= prandom_u32_max((u32)ep_ro); - else - rand |= prandom_u32(); -#else - rand = ((u64)prandom_u32() << 32 | prandom_u32()) % ep_ro; -#endif - } - - return rand; -} -EXPORT_SYMBOL(lu_prandom_u64_max); - -static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) -{ - struct obd_statfs *statfs = &tgt->ltd_statfs; - - return statfs->os_bavail * statfs->os_bsize; -} - -static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) -{ - return tgt->ltd_statfs.os_ffree; -} - -/** - * Calculate penalties per-tgt and per-server - * - * Re-calculate penalties when the configuration changes, active targets - * change and after statfs refresh (all these are reflected by lq_dirty flag). - * On every tgt and server: decay the penalty by half for every 8x the update - * interval that the device has been idle. That gives lots of time for the - * statfs information to be updated (which the penalty is only a proxy for), - * and avoids penalizing server/tgt under light load. - * See lqos_calc_weight() for how penalties are factored into the weight. - * - * @qos lu_qos - * @ltd lu_tgt_descs - * @active_tgt_nr active tgt number - * @ maxage qos max age - * @is_mdt MDT will count inode usage - * - * Return: 0 on success - * -EAGAIN the number of tgt isn't enough or all - * tgt spaces are almost the same - */ -int lqos_calc_penalties(struct lu_qos *qos, struct lu_tgt_descs *ltd, - u32 active_tgt_nr, u32 maxage, bool is_mdt) -{ - struct lu_tgt_desc *tgt; - struct lu_svr_qos *svr; - u64 ba_max, ba_min, ba; - u64 ia_max, ia_min, ia = 1; - u32 num_active; - int prio_wide; - time64_t now, age; - int rc; - - if (!qos->lq_dirty) { - rc = 0; - goto out; - } - - num_active = active_tgt_nr - 1; - if (num_active < 1) { - rc = -EAGAIN; - goto out; - } - - /* find bavail on each server */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - svr->lsq_bavail = 0; - /* if inode is not counted, set to 1 to ignore */ - svr->lsq_iavail = is_mdt ? 0 : 1; - } - qos->lq_active_svr_count = 0; - - /* - * How badly user wants to select targets "widely" (not recently chosen - * and not on recent MDS's). As opposed to "freely" (free space avail.) - * 0-256 - */ - prio_wide = 256 - qos->lq_prio_free; - - ba_min = (u64)(-1); - ba_max = 0; - ia_min = (u64)(-1); - ia_max = 0; - now = ktime_get_real_seconds(); - - /* Calculate server penalty per object */ - ltd_foreach_tgt(ltd, tgt) { - if (!tgt->ltd_active) - continue; - - /* when inode is counted, bavail >> 16 to avoid overflow */ - ba = tgt_statfs_bavail(tgt); - if (is_mdt) - ba >>= 16; - else - ba >>= 8; - if (!ba) - continue; - - ba_min = min(ba, ba_min); - ba_max = max(ba, ba_max); - - /* Count the number of usable servers */ - if (tgt->ltd_qos.ltq_svr->lsq_bavail == 0) - qos->lq_active_svr_count++; - tgt->ltd_qos.ltq_svr->lsq_bavail += ba; - - if (is_mdt) { - /* iavail >> 8 to avoid overflow */ - ia = tgt_statfs_iavail(tgt) >> 8; - if (!ia) - continue; - - ia_min = min(ia, ia_min); - ia_max = max(ia, ia_max); - - tgt->ltd_qos.ltq_svr->lsq_iavail += ia; - } - - /* - * per-tgt penalty is - * prio * bavail * iavail / (num_tgt - 1) / 2 - */ - tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia >> 8; - do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); - tgt->ltd_qos.ltq_penalty_per_obj >>= 1; - - age = (now - tgt->ltd_qos.ltq_used) >> 3; - if (qos->lq_reset || age > 32 * maxage) - tgt->ltd_qos.ltq_penalty = 0; - else if (age > maxage) - /* Decay tgt penalty. */ - tgt->ltd_qos.ltq_penalty >>= (age / maxage); - } - - num_active = qos->lq_active_svr_count - 1; - if (num_active < 1) { - /* - * If there's only 1 server, we can't penalize it, so instead - * we have to double the tgt penalty - */ - num_active = 1; - ltd_foreach_tgt(ltd, tgt) { - if (!tgt->ltd_active) - continue; - - tgt->ltd_qos.ltq_penalty_per_obj <<= 1; - } - } - - /* - * Per-server penalty is - * prio * bavail * iavail / server_tgts / (num_svr - 1) / 2 - */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - ba = svr->lsq_bavail; - ia = svr->lsq_iavail; - svr->lsq_penalty_per_obj = prio_wide * ba * ia >> 8; - do_div(ba, svr->lsq_tgt_count * num_active); - svr->lsq_penalty_per_obj >>= 1; - - age = (now - svr->lsq_used) >> 3; - if (qos->lq_reset || age > 32 * maxage) - svr->lsq_penalty = 0; - else if (age > maxage) - /* Decay server penalty. */ - svr->lsq_penalty >>= age / maxage; - } - - qos->lq_dirty = 0; - qos->lq_reset = 0; - - /* - * If each tgt has almost same free space, do rr allocation for better - * creation performance - */ - qos->lq_same_space = 0; - if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min && - (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) { - qos->lq_same_space = 1; - /* Reset weights for the next time we enter qos mode */ - qos->lq_reset = 1; - } - rc = 0; - -out: - if (!rc && qos->lq_same_space) - return -EAGAIN; - - return rc; -} -EXPORT_SYMBOL(lqos_calc_penalties); - -bool lqos_is_usable(struct lu_qos *qos, u32 active_tgt_nr) -{ - if (!qos->lq_dirty && qos->lq_same_space) - return false; - - if (active_tgt_nr < 2) - return false; - - return true; -} -EXPORT_SYMBOL(lqos_is_usable); - -/** - * Calculate weight for a given tgt. - * - * The final tgt weight is bavail >> 16 * iavail >> 8 minus the tgt and server - * penalties. See lqos_calc_ppts() for how penalties are calculated. - * - * @tgt target descriptor - */ -void lqos_calc_weight(struct lu_tgt_desc *tgt) -{ - struct lu_tgt_qos *ltq = &tgt->ltd_qos; - u64 temp, temp2; - - temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8); - temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; - if (temp < temp2) - ltq->ltq_weight = 0; - else - ltq->ltq_weight = temp - temp2; -} -EXPORT_SYMBOL(lqos_calc_weight); - -/** - * Re-calculate weights. - * - * The function is called when some target was used for a new object. In - * this case we should re-calculate all the weights to keep new allocations - * balanced well. - * - * @qos lu_qos - * @ltd lu_tgt_descs - * @tgt target where a new object was placed - * @active_tgt_nr active tgt number - * @total_wt new total weight for the pool - * - * Return: 0 - */ -int lqos_recalc_weight(struct lu_qos *qos, struct lu_tgt_descs *ltd, - struct lu_tgt_desc *tgt, u32 active_tgt_nr, - u64 *total_wt) -{ - struct lu_tgt_qos *ltq; - struct lu_svr_qos *svr; - - ltq = &tgt->ltd_qos; - LASSERT(ltq); - - /* Don't allocate on this device anymore, until the next alloc_qos */ - ltq->ltq_usable = 0; - - svr = ltq->ltq_svr; - - /* - * Decay old penalty by half (we're adding max penalty, and don't - * want it to run away.) - */ - ltq->ltq_penalty >>= 1; - svr->lsq_penalty >>= 1; - - /* mark the server and tgt as recently used */ - ltq->ltq_used = svr->lsq_used = ktime_get_real_seconds(); - - /* Set max penalties for this tgt and server */ - ltq->ltq_penalty += ltq->ltq_penalty_per_obj * active_tgt_nr; - svr->lsq_penalty += svr->lsq_penalty_per_obj * active_tgt_nr; - - /* Decrease all MDS penalties */ - list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { - if (svr->lsq_penalty < svr->lsq_penalty_per_obj) - svr->lsq_penalty = 0; - else - svr->lsq_penalty -= svr->lsq_penalty_per_obj; - } - - *total_wt = 0; - /* Decrease all tgt penalties */ - ltd_foreach_tgt(ltd, tgt) { - if (!tgt->ltd_active) - continue; - - if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) - ltq->ltq_penalty = 0; - else - ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; - - lqos_calc_weight(tgt); - - /* Recalc the total weight of usable osts */ - if (ltq->ltq_usable) - *total_wt += ltq->ltq_weight; - - CDEBUG(D_OTHER, - "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", - tgt->ltd_index, ltq->ltq_usable, - tgt_statfs_bavail(tgt) >> 10, - ltq->ltq_penalty_per_obj >> 10, - ltq->ltq_penalty >> 10, - ltq->ltq_svr->lsq_penalty_per_obj >> 10, - ltq->ltq_svr->lsq_penalty >> 10, - ltq->ltq_weight >> 10); - } - - return 0; -} -EXPORT_SYMBOL(lqos_recalc_weight); diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c index 04d6acc..60c50a0 100644 --- a/fs/lustre/obdclass/lu_tgt_descs.c +++ b/fs/lustre/obdclass/lu_tgt_descs.c @@ -35,6 +35,7 @@ #include <linux/module.h> #include <linux/list.h> +#include <linux/random.h> #include <obd_class.h> #include <obd_support.h> #include <lustre_disk.h> @@ -42,17 +43,221 @@ #include <lu_object.h> /** + * lu_prandom_u64_max - returns a pseudo-random u64 number in interval + * [0, ep_ro) + * + * @ep_ro right open interval endpoint + * + * Return: a pseudo-random 64-bit number that is in interval [0, ep_ro). + */ +u64 lu_prandom_u64_max(u64 ep_ro) +{ + u64 rand = 0; + + if (ep_ro) { +#if BITS_PER_LONG == 32 + /* + * If ep_ro > 32-bit, first generate the high + * 32 bits of the random number, then add in the low + * 32 bits (truncated to the upper limit, if needed) + */ + if (ep_ro > 0xffffffffULL) + rand = prandom_u32_max((u32)(ep_ro >> 32)) << 32; + + if (rand == (ep_ro & 0xffffffff00000000ULL)) + rand |= prandom_u32_max((u32)ep_ro); + else + rand |= prandom_u32(); +#else + rand = ((u64)prandom_u32() << 32 | prandom_u32()) % ep_ro; +#endif + } + + return rand; +} +EXPORT_SYMBOL(lu_prandom_u64_max); + +void lu_qos_rr_init(struct lu_qos_rr *lqr) +{ + spin_lock_init(&lqr->lqr_alloc); + lqr->lqr_dirty = 1; +} +EXPORT_SYMBOL(lu_qos_rr_init); + +/** + * Add a new target to Quality of Service (QoS) target table. + * + * Add a new MDT/OST target to the structure representing an OSS. Resort the + * list of known MDSs/OSSs by the number of MDTs/OSTs attached to each MDS/OSS. + * The MDS/OSS list is protected internally and no external locking is required. + * + * @qos lu_qos data + * @tgt target description + * + * Return: 0 on success + * -ENOMEM on error + */ +int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *tgt) +{ + struct lu_svr_qos *svr = NULL; + struct lu_svr_qos *tempsvr; + struct obd_export *exp = tgt->ltd_exp; + int found = 0; + u32 id = 0; + int rc = 0; + + /* tgt not connected, this function will be called again later */ + if (!exp) + return 0; + + down_write(&qos->lq_rw_sem); + /* + * a bit hacky approach to learn NID of corresponding connection + * but there is no official API to access information like this + * with OSD API. + */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + if (obd_uuid_equals(&svr->lsq_uuid, + &exp->exp_connection->c_remote_uuid)) { + found++; + break; + } + if (svr->lsq_id > id) + id = svr->lsq_id; + } + + if (!found) { + svr = kzalloc(sizeof(*svr), GFP_NOFS); + if (!svr) { + rc = -ENOMEM; + goto out; + } + memcpy(&svr->lsq_uuid, &exp->exp_connection->c_remote_uuid, + sizeof(svr->lsq_uuid)); + ++id; + svr->lsq_id = id; + } else { + /* Assume we have to move this one */ + list_del(&svr->lsq_svr_list); + } + + svr->lsq_tgt_count++; + tgt->ltd_qos.ltq_svr = svr; + + CDEBUG(D_OTHER, "add tgt %s to server %s (%d targets)\n", + obd_uuid2str(&tgt->ltd_uuid), obd_uuid2str(&svr->lsq_uuid), + svr->lsq_tgt_count); + + /* + * Add sorted by # of tgts. Find the first entry that we're + * bigger than... + */ + list_for_each_entry(tempsvr, &qos->lq_svr_list, lsq_svr_list) { + if (svr->lsq_tgt_count > tempsvr->lsq_tgt_count) + break; + } + /* + * ...and add before it. If we're the first or smallest, tempsvr + * points to the list head, and we add to the end. + */ + list_add_tail(&svr->lsq_svr_list, &tempsvr->lsq_svr_list); + + qos->lq_dirty = 1; + qos->lq_rr.lqr_dirty = 1; + +out: + up_write(&qos->lq_rw_sem); + return rc; +} +EXPORT_SYMBOL(lu_qos_add_tgt); + +/** + * Remove MDT/OST target from QoS table. + * + * Removes given MDT/OST target from QoS table and releases related + * MDS/OSS structure if no target remain on the MDS/OSS. + * + * @qos lu_qos data + * @ltd target description + * + * Return: 0 on success + * -ENOENT if no server was found + */ +static int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) +{ + struct lu_svr_qos *svr; + int rc = 0; + + down_write(&qos->lq_rw_sem); + svr = ltd->ltd_qos.ltq_svr; + if (!svr) { + rc = -ENOENT; + goto out; + } + + svr->lsq_tgt_count--; + if (svr->lsq_tgt_count == 0) { + CDEBUG(D_OTHER, "removing server %s\n", + obd_uuid2str(&svr->lsq_uuid)); + list_del(&svr->lsq_svr_list); + ltd->ltd_qos.ltq_svr = NULL; + kfree(svr); + } + + qos->lq_dirty = 1; + qos->lq_rr.lqr_dirty = 1; +out: + up_write(&qos->lq_rw_sem); + return rc; +} + +static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) +{ + struct obd_statfs *statfs = &tgt->ltd_statfs; + + return statfs->os_bavail * statfs->os_bsize; +} + +static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) +{ + return tgt->ltd_statfs.os_ffree; +} + +/** + * Calculate weight for a given tgt. + * + * The final tgt weight is bavail >> 16 * iavail >> 8 minus the tgt and server + * penalties. See ltd_qos_penalties_calc() for how penalties are calculated. + * + * @tgt target descriptor + */ +void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt) +{ + struct lu_tgt_qos *ltq = &tgt->ltd_qos; + u64 temp, temp2; + + temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8); + temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; + if (temp < temp2) + ltq->ltq_weight = 0; + else + ltq->ltq_weight = temp - temp2; +} +EXPORT_SYMBOL(lu_tgt_qos_weight_calc); + +/** * Allocate and initialize target table. * * A helper function to initialize the target table and allocate * a bitmap of the available targets. * * @ltd target's table to initialize + * @is_mdt target table for MDTs * * Return: 0 on success * negated errno on error **/ -int lu_tgt_descs_init(struct lu_tgt_descs *ltd) +int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt) { mutex_init(&ltd->ltd_mutex); init_rwsem(&ltd->ltd_rw_sem); @@ -66,11 +271,22 @@ int lu_tgt_descs_init(struct lu_tgt_descs *ltd) return -ENOMEM; ltd->ltd_tgts_size = BITS_PER_LONG; - ltd->ltd_tgtnr = 0; - ltd->ltd_death_row = 0; ltd->ltd_refcount = 0; + /* Set up allocation policy (QoS and RR) */ + INIT_LIST_HEAD(&ltd->ltd_qos.lq_svr_list); + init_rwsem(&ltd->ltd_qos.lq_rw_sem); + ltd->ltd_qos.lq_dirty = 1; + ltd->ltd_qos.lq_reset = 1; + /* Default priority is toward free space balance */ + ltd->ltd_qos.lq_prio_free = 232; + /* Default threshold for rr (roughly 17%) */ + ltd->ltd_qos.lq_threshold_rr = 43; + ltd->ltd_is_mdt = is_mdt; + + lu_qos_rr_init(&ltd->ltd_qos.lq_rr); + return 0; } EXPORT_SYMBOL(lu_tgt_descs_init); @@ -147,7 +363,7 @@ static int lu_tgt_descs_resize(struct lu_tgt_descs *ltd, u32 newsize) * -ENOMEM if reallocation failed * -EEXIST if target existed */ -int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) +int ltd_add_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) { u32 index = tgt->ltd_index; int rc; @@ -174,19 +390,294 @@ int lu_tgt_descs_add(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) LTD_TGT(ltd, tgt->ltd_index) = tgt; set_bit(tgt->ltd_index, ltd->ltd_tgt_bitmap); - ltd->ltd_tgtnr++; + + ltd->ltd_lov_desc.ld_tgt_count++; + if (tgt->ltd_active) + ltd->ltd_lov_desc.ld_active_tgt_count++; return 0; } -EXPORT_SYMBOL(lu_tgt_descs_add); +EXPORT_SYMBOL(ltd_add_tgt); /** * Delete target from target table */ -void lu_tgt_descs_del(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) +void ltd_del_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt) { + lu_qos_del_tgt(&ltd->ltd_qos, tgt); LTD_TGT(ltd, tgt->ltd_index) = NULL; clear_bit(tgt->ltd_index, ltd->ltd_tgt_bitmap); - ltd->ltd_tgtnr--; + ltd->ltd_lov_desc.ld_tgt_count--; + if (tgt->ltd_active) + ltd->ltd_lov_desc.ld_active_tgt_count--; +} +EXPORT_SYMBOL(ltd_del_tgt); + +/** + * Whether QoS data is up-to-date and QoS can be applied. + */ +bool ltd_qos_is_usable(struct lu_tgt_descs *ltd) +{ + if (!ltd->ltd_qos.lq_dirty && ltd->ltd_qos.lq_same_space) + return false; + + if (ltd->ltd_lov_desc.ld_active_tgt_count < 2) + return false; + + return true; +} +EXPORT_SYMBOL(ltd_qos_is_usable); + +/** + * Calculate penalties per-tgt and per-server + * + * Re-calculate penalties when the configuration changes, active targets + * change and after statfs refresh (all these are reflected by lq_dirty flag). + * On every tgt and server: decay the penalty by half for every 8x the update + * interval that the device has been idle. That gives lots of time for the + * statfs information to be updated (which the penalty is only a proxy for), + * and avoids penalizing server/tgt under light load. + * See lu_qos_tgt_weight_calc() for how penalties are factored into the weight. + * + * \param[in] ltd lu_tgt_descs + * + * \retval 0 on success + * \retval -EAGAIN the number of tgt isn't enough or all tgt spaces are + * almost the same + */ +int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd) +{ + struct lu_qos *qos = &ltd->ltd_qos; + struct lov_desc *desc = &ltd->ltd_lov_desc; + struct lu_tgt_desc *tgt; + struct lu_svr_qos *svr; + u64 ba_max, ba_min, ba; + u64 ia_max, ia_min, ia = 1; + u32 num_active; + int prio_wide; + time64_t now, age; + int rc; + + if (!qos->lq_dirty) { + rc = 0; + goto out; + } + + num_active = desc->ld_active_tgt_count - 1; + if (num_active < 1) { + rc = -EAGAIN; + goto out; + } + + /* find bavail on each server */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + svr->lsq_bavail = 0; + /* if inode is not counted, set to 1 to ignore */ + svr->lsq_iavail = ltd->ltd_is_mdt ? 0 : 1; + } + qos->lq_active_svr_count = 0; + + /* + * How badly user wants to select targets "widely" (not recently chosen + * and not on recent MDS's). As opposed to "freely" (free space avail.) + * 0-256 + */ + prio_wide = 256 - qos->lq_prio_free; + + ba_min = (u64)(-1); + ba_max = 0; + ia_min = (u64)(-1); + ia_max = 0; + now = ktime_get_real_seconds(); + + /* Calculate server penalty per object */ + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + /* when inode is counted, bavail >> 16 to avoid overflow */ + ba = tgt_statfs_bavail(tgt); + if (ltd->ltd_is_mdt) + ba >>= 16; + else + ba >>= 8; + if (!ba) + continue; + + ba_min = min(ba, ba_min); + ba_max = max(ba, ba_max); + + /* Count the number of usable servers */ + if (tgt->ltd_qos.ltq_svr->lsq_bavail == 0) + qos->lq_active_svr_count++; + tgt->ltd_qos.ltq_svr->lsq_bavail += ba; + + if (ltd->ltd_is_mdt) { + /* iavail >> 8 to avoid overflow */ + ia = tgt_statfs_iavail(tgt) >> 8; + if (!ia) + continue; + + ia_min = min(ia, ia_min); + ia_max = max(ia, ia_max); + + tgt->ltd_qos.ltq_svr->lsq_iavail += ia; + } + + /* + * per-tgt penalty is + * prio * bavail * iavail / (num_tgt - 1) / 2 + */ + tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia; + do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); + tgt->ltd_qos.ltq_penalty_per_obj >>= 1; + + age = (now - tgt->ltd_qos.ltq_used) >> 3; + if (qos->lq_reset || age > 32 * desc->ld_qos_maxage) + tgt->ltd_qos.ltq_penalty = 0; + else if (age > desc->ld_qos_maxage) + /* Decay tgt penalty. */ + tgt->ltd_qos.ltq_penalty >>= age / desc->ld_qos_maxage; + } + + num_active = qos->lq_active_svr_count - 1; + if (num_active < 1) { + /* + * If there's only 1 server, we can't penalize it, so instead + * we have to double the tgt penalty + */ + num_active = 1; + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + tgt->ltd_qos.ltq_penalty_per_obj <<= 1; + } + } + + /* + * Per-server penalty is + * prio * bavail * iavail / server_tgts / (num_svr - 1) / 2 + */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + ba = svr->lsq_bavail; + ia = svr->lsq_iavail; + svr->lsq_penalty_per_obj = prio_wide * ba * ia; + do_div(ba, svr->lsq_tgt_count * num_active); + svr->lsq_penalty_per_obj >>= 1; + + age = (now - svr->lsq_used) >> 3; + if (qos->lq_reset || age > 32 * desc->ld_qos_maxage) + svr->lsq_penalty = 0; + else if (age > desc->ld_qos_maxage) + /* Decay server penalty. */ + svr->lsq_penalty >>= age / desc->ld_qos_maxage; + } + + qos->lq_dirty = 0; + qos->lq_reset = 0; + + /* + * If each tgt has almost same free space, do rr allocation for better + * creation performance + */ + qos->lq_same_space = 0; + if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min && + (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) { + qos->lq_same_space = 1; + /* Reset weights for the next time we enter qos mode */ + qos->lq_reset = 1; + } + rc = 0; + +out: + if (!rc && qos->lq_same_space) + return -EAGAIN; + + return rc; +} +EXPORT_SYMBOL(ltd_qos_penalties_calc); + +/** + * Re-calculate penalties and weights of all tgts. + * + * The function is called when some target was used for a new object. In + * this case we should re-calculate all the weights to keep new allocations + * balanced well. + * + * \param[in] ltd lu_tgt_descs + * \param[in] tgt recently used tgt + * \param[out] total_wt new total weight for the pool + * + * \retval 0 + */ +int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, + u64 *total_wt) +{ + struct lu_qos *qos = &ltd->ltd_qos; + struct lu_tgt_qos *ltq; + struct lu_svr_qos *svr; + + ltq = &tgt->ltd_qos; + LASSERT(ltq); + + /* Don't allocate on this device anymore, until the next alloc_qos */ + ltq->ltq_usable = 0; + + svr = ltq->ltq_svr; + + /* + * Decay old penalty by half (we're adding max penalty, and don't + * want it to run away.) + */ + ltq->ltq_penalty >>= 1; + svr->lsq_penalty >>= 1; + + /* mark the server and tgt as recently used */ + ltq->ltq_used = svr->lsq_used = ktime_get_real_seconds(); + + /* Set max penalties for this tgt and server */ + ltq->ltq_penalty += ltq->ltq_penalty_per_obj * + ltd->ltd_lov_desc.ld_active_tgt_count; + svr->lsq_penalty += svr->lsq_penalty_per_obj * + ltd->ltd_lov_desc.ld_active_tgt_count; + + /* Decrease all MDS penalties */ + list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { + if (svr->lsq_penalty < svr->lsq_penalty_per_obj) + svr->lsq_penalty = 0; + else + svr->lsq_penalty -= svr->lsq_penalty_per_obj; + } + + *total_wt = 0; + /* Decrease all tgt penalties */ + ltd_foreach_tgt(ltd, tgt) { + if (!tgt->ltd_active) + continue; + + if (ltq->ltq_penalty < ltq->ltq_penalty_per_obj) + ltq->ltq_penalty = 0; + else + ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; + + lu_tgt_qos_weight_calc(tgt); + + /* Recalc the total weight of usable osts */ + if (ltq->ltq_usable) + *total_wt += ltq->ltq_weight; + + CDEBUG(D_OTHER, + "recalc tgt %d usable=%d avail=%llu tgtppo=%llu tgtp=%llu svrppo=%llu svrp=%llu wt=%llu\n", + tgt->ltd_index, ltq->ltq_usable, + tgt_statfs_bavail(tgt) >> 10, + ltq->ltq_penalty_per_obj >> 10, + ltq->ltq_penalty >> 10, + ltq->ltq_svr->lsq_penalty_per_obj >> 10, + ltq->ltq_svr->lsq_penalty >> 10, + ltq->ltq_weight >> 10); + } + + return 0; } -EXPORT_SYMBOL(lu_tgt_descs_del); +EXPORT_SYMBOL(ltd_qos_update);

[505/622] lustre: obdclass: lu_tgt_descs cleanup

Commit Message

Patch