Message ID | 20231022010812.9201-1-eddyz87@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | exact states comparison for iterator convergence checks | expand |
On Sun, 2023-10-22 at 04:08 +0300, Eduard Zingerman wrote: [...] > Changelog: > V1 -> V2 [2], applied changes suggested by Alexei offlist: > - __explored_state() function removed; > - same_callsites() function is now used in clean_live_states(); > - patches #1,2 are added as preparatory code movement; > - in process_iter_next_call() a safeguard is added to verify that > cur_st->parent exists and has expected insn index / call sites. I have V3 ready and passing CI. However I checked on Alexei's concerns regarding performance on explored states cache miss and verifier does not behave well with this patch-set. For example, the program at the end of the email causes verifier to "hang" (loop inside is_state_visited() to no end). There are several options to fix this: (a) limit total iteration depth, as in [1], the limit would have to be at-least 1000 to make iters/task_vma pass; (b) limit maximal number of checkpoint states associated with instruction and evict those with lowest dfs_depth; (c) choose bigger constants in `sl->miss_cnt > sl->hit_cnt * 3 + 3` for checkpoint states. Given that current failure mode is bad, should I submit V3 as-is or should I explore options (b,c) and add one of those to V3? [1] https://github.com/eddyz87/bpf/tree/bpf-iter-next-exact-widening-max-iter-depth --- SEC("?raw_tp") __failure __naked int max_iter_depth(void) { /* This is equivalent to C program below. * The counter stored in r6 is used as precise after the loop, * thus preventing widening. Verifier won't be able to conclude * that such program terminates but it should gracefully exit. * * r6 = 0 * bpf_iter_num_new(&fp[-8], 0, 10) * while (bpf_iter_num_next(&fp[-8])) { * r6 += 1; * } * bpf_iter_num_destroy(&fp[-8]) * ... force r6 precise ... * return 0 */ asm volatile ( "r6 = 0;" "r1 = r10;" "r1 += -8;" "r2 = 0;" "r3 = 10;" "call %[bpf_iter_num_new];" "loop_%=:" "r1 = r10;" "r1 += -8;" "call %[bpf_iter_num_next];" "if r0 == 0 goto loop_end_%=;" "r6 += 1;" "goto loop_%=;" "loop_end_%=:" "r1 = r10;" "r1 += -8;" "call %[bpf_iter_num_destroy];" "r0 = r10;" "r0 += r6;" /* this forces r6 to be precise */ "r0 = 0;" "exit;" : : __imm(bpf_iter_num_new), __imm(bpf_iter_num_next), __imm(bpf_iter_num_destroy) : __clobber_all ); }
On Mon, 2023-10-23 at 20:17 +0300, Eduard Zingerman wrote: > On Sun, 2023-10-22 at 04:08 +0300, Eduard Zingerman wrote: > [...] > > Changelog: > > V1 -> V2 [2], applied changes suggested by Alexei offlist: > > - __explored_state() function removed; > > - same_callsites() function is now used in clean_live_states(); > > - patches #1,2 are added as preparatory code movement; > > - in process_iter_next_call() a safeguard is added to verify that > > cur_st->parent exists and has expected insn index / call sites. > > I have V3 ready and passing CI. > > However I checked on Alexei's concerns regarding performance on > explored states cache miss and verifier does not behave well with this > patch-set. For example, the program at the end of the email causes > verifier to "hang" (loop inside is_state_visited() to no end). > > There are several options to fix this: > (a) limit total iteration depth, as in [1], the limit would have to be > at-least 1000 to make iters/task_vma pass; > (b) limit maximal number of checkpoint states associated with > instruction and evict those with lowest dfs_depth; > (c) choose bigger constants in `sl->miss_cnt > sl->hit_cnt * 3 + 3` for > checkpoint states. I played a bit with constants in 'eviction on miss' formula using [1] (option c). There are three relevant tests: - iters/max_iter_depth: should report load failure within a reasonable time; - iters/checkpoint_states_deletion: should pass; - verif_scale_pyperf600_iter: should pass. I think iters/checkpoint_states_deletion represents the worst case scenario, because depending on number of variables N, it produces 2**N distinct states. The formula for eviction that does not loose relevant states is: sl->miss_cnt > sl->hit_cnt * 2**N + 2**N (because states start to repeat after 2**N iterations). W/o eviction for checkpoint states maximal number of variables verifier could handle in this test is 9, with reported 958,883 insns processed. Which corresponds to formula (sl->miss_cnt > sl->hit_cnt * 512 + 512). Using these values I get the following execution times: | test | time ./test_progs -a <test> | |----------------------------+-----------------------------| | verif_scale_pyperf600_iter | 0.2s | | checkpoint_states_deletion | 5.8s | | max_iter_depth | 23.9s | Going one step lower to 8 variables (and 256 as a constant), checkpoint_states_deletion takes 248,133 insns to complete and timings table looks as follows: | test | time ./test_progs -a <test> | |----------------------------+-----------------------------| | verif_scale_pyperf600_iter | 0.2s | | checkpoint_states_deletion | 1.0s | | max_iter_depth | 15.2s | So, it is possible to get speedup for worst case scenario by leaving some instruction budget on the table. IMO using formula (sl->miss_cnt > sl->hit_cnt * 512 + 512) to evict checkpoints kind-off sort-off makes sense but is very hacky. (Or 256). I think that better solution would be to go for option (b) from a previous email: - evict old checkpoints basing on dfs_depth; - also use a secondary hash-table for checkpoints and hash not only insn_idx but also some fingerprint of register states, thus avoiding long state list walks. But that would require some additional coding and I know that Alexei wants to land this thing sooner than later. [1] https://github.com/eddyz87/bpf/tree/iter-exact-eviction-formula-experiments