Elminate BCE in mask algorithm
Thanks again to @renthraysk This provides another significant speedup. benchmark old MB/s new MB/s speedup Benchmark_mask/2/fast-8 405.48 513.25 1.27x Benchmark_mask/3/fast-8 518.93 661.92 1.28x Benchmark_mask/4/fast-8 1207.10 1252.39 1.04x Benchmark_mask/8/fast-8 1708.82 1655.63 0.97x Benchmark_mask/16/fast-8 3418.58 3051.25 0.89x Benchmark_mask/32/fast-8 5789.43 5813.31 1.00x Benchmark_mask/128/fast-8 12819.53 14804.50 1.15x Benchmark_mask/512/fast-8 18247.06 21659.50 1.19x Benchmark_mask/4096/fast-8 19802.31 23885.68 1.21x Benchmark_mask/16384/fast-8 20896.97 25081.11 1.20x
Loading
Please register or sign in to comment