perf: Add new efficient API read_to_vec #246

frankdavid · 2024-11-20T14:03:17Z

I found that a source of significant performance loss is the read method of Memory. The read method takes a mutable buffer which it fills with values read from the stable memory. According to Rust rules, the buffer passed to read must be initialized before it's passed to read (buffers containing uninitialized values are unsound and can cause UB).

The usual pattern is to create a properly sized Vec, eg. by using vec![0; size] or vec.resize(size, 0) and pass that to read. However, initializing the bytes with values that get overwritten by read is only necessary in order to be sound and requires significant number of instructions.

This PR introduces a new method read_to_vec which allows passing in a Vec with any size (most of the time an empty Vec will be used). Implementations can be more efficient by reading directly into the Vec and skipping initialization. This can lead to instruction reductions of up to 20%.

github-actions · 2024-11-20T14:12:45Z

`canbench` 🏋 (dir: .)

Significant performance change detected! ⚠️

./canbench_results.yml is up to date ✅


---------------------------------------------------

Benchmark: btreemap_insert_blob_4_1024
  total:
    instructions: 512.87 M (improved by 19.10%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 123 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_4_1024_v2
  total:
    instructions: 613.16 M (improved by 16.93%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 92 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_1024
  total:
    instructions: 623.85 M (improved by 17.72%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 183 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_1024_v2
  total:
    instructions: 729.17 M (improved by 15.83%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 138 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_16_1024
  total:
    instructions: 712.61 M (improved by 15.07%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 215 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_16_1024_v2
  total:
    instructions: 810.60 M (improved by 15.16%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 161 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_32_1024
  total:
    instructions: 753.24 M (improved by 15.44%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 230 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_32_1024_v2
  total:
    instructions: 851.68 M (improved by 15.45%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 173 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_64_1024
  total:
    instructions: 1.02 B (improved by 11.89%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 245 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_64_1024_v2
  total:
    instructions: 1.12 B (improved by 12.74%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 183 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_128_1024
  total:
    instructions: 1.29 B (improved by 11.30%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 260 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_128_1024_v2
  total:
    instructions: 1.39 B (improved by 11.19%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 195 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_256_1024
  total:
    instructions: 1.85 B (improved by 8.90%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 292 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_256_1024_v2
  total:
    instructions: 1.95 B (improved by 9.27%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 219 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_512_1024
  total:
    instructions: 2.94 B (improved by 7.06%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 351 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_512_1024_v2
  total:
    instructions: 3.03 B (improved by 7.59%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 263 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_4
  total:
    instructions: 4.82 B (improved by 2.91%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 235 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_4_v2
  total:
    instructions: 4.87 B (improved by 4.19%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 176 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_8
  total:
    instructions: 4.88 B (improved by 3.31%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 237 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_8_v2
  total:
    instructions: 4.96 B (improved by 4.00%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 178 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_16
  total:
    instructions: 4.91 B (improved by 2.90%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 241 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_16_v2
  total:
    instructions: 4.97 B (improved by 4.06%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 181 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_32
  total:
    instructions: 4.93 B (improved by 2.83%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 239 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_32_v2
  total:
    instructions: 4.99 B (improved by 3.91%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 180 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_64
  total:
    instructions: 4.93 B (improved by 3.31%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 250 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_64_v2
  total:
    instructions: 5.01 B (improved by 4.10%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 188 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_128
  total:
    instructions: 4.93 B (improved by 3.49%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 262 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_128_v2
  total:
    instructions: 4.99 B (improved by 4.44%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 196 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_256
  total:
    instructions: 4.94 B (improved by 3.70%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 292 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_256_v2
  total:
    instructions: 5.02 B (improved by 4.39%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 219 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_512
  total:
    instructions: 5.03 B (improved by 4.24%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 348 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_512_v2
  total:
    instructions: 5.11 B (improved by 4.93%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 261 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_u64
  total:
    instructions: 372.74 M (-0.52%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 7 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_u64_v2
  total:
    instructions: 456.25 M (-1.44%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 6 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_blob_8
  total:
    instructions: 366.62 M (0.55%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 7 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_blob_8_v2
  total:
    instructions: 447.26 M (-0.66%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 5 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_u64
  total:
    instructions: 348.87 M (-1.40%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 6 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_u64_v2
  total:
    instructions: 463.56 M (improved by 2.14%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 4 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_10mib_values
  total:
    instructions: 143.83 M (0.30%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 32 pages (no change)

---------------------------------------------------

Benchmark: btreemap_read_keys_from_range
  total:
    instructions: 115.75 M (regressed by 2.35%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_read_every_third_value_from_range
  total:
    instructions: 115.75 M (regressed by 2.35%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_small_values
  total:
    instructions: 15.06 M (regressed by 3.65%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_rev_small_values
  total:
    instructions: 15.10 M (regressed by 3.95%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_10mib_values
  total:
    instructions: 17.17 M (0.31%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_rev_10mib_values
  total:
    instructions: 17.17 M (0.32%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_small_values
  total:
    instructions: 9.93 M (improved by 3.57%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_rev_small_values
  total:
    instructions: 10.16 M (improved by 3.49%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_10mib_values
  total:
    instructions: 509.52 K (-1.40%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_rev_10mib_values
  total:
    instructions: 511.87 K (-1.40%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_small_values
  total:
    instructions: 14.99 M (improved by 4.63%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_rev_small_values
  total:
    instructions: 15.03 M (improved by 4.62%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_10mib_values
  total:
    instructions: 17.17 M (0.16%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_rev_10mib_values
  total:
    instructions: 17.17 M (0.16%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_count_small_values
  total:
    instructions: 9.79 M (improved by 3.62%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_count_10mib_values
  total:
    instructions: 523.05 K (-0.78%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_4_1024
  total:
    instructions: 506.15 M (improved by 20.91%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_4_1024_v2
  total:
    instructions: 624.71 M (improved by 18.20%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_1024
  total:
    instructions: 659.76 M (improved by 21.77%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_1024_v2
  total:
    instructions: 797.26 M (improved by 19.19%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_16_1024
  total:
    instructions: 838.51 M (improved by 19.88%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_16_1024_v2
  total:
    instructions: 980.30 M (improved by 18.81%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_32_1024
  total:
    instructions: 905.60 M (improved by 19.51%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_32_1024_v2
  total:
    instructions: 1.05 B (improved by 18.77%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_64_1024
  total:
    instructions: 1.23 B (improved by 15.36%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_64_1024_v2
  total:
    instructions: 1.38 B (improved by 15.48%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_128_1024
  total:
    instructions: 1.57 B (improved by 13.67%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_128_1024_v2
  total:
    instructions: 1.71 B (improved by 13.55%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_256_1024
  total:
    instructions: 2.21 B (improved by 10.97%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_256_1024_v2
  total:
    instructions: 2.35 B (improved by 11.14%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_512_1024
  total:
    instructions: 3.56 B (improved by 8.27%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_512_1024_v2
  total:
    instructions: 3.69 B (improved by 8.79%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_u64
  total:
    instructions: 536.83 M (-0.20%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_u64_v2
  total:
    instructions: 664.97 M (-0.54%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_blob_8
  total:
    instructions: 522.41 M (1.07%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_blob_8_v2
  total:
    instructions: 639.86 M (0.32%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_u64
  total:
    instructions: 461.07 M (-1.05%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_u64_v2
  total:
    instructions: 617.03 M (-1.48%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_4_1024
  total:
    instructions: 193.37 M (improved by 6.10%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_4_1024_v2
  total:
    instructions: 282.41 M (improved by 7.22%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_1024
  total:
    instructions: 221.01 M (improved by 8.97%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_1024_v2
  total:
    instructions: 306.87 M (improved by 9.17%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_16_1024
  total:
    instructions: 312.57 M (-0.93%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_16_1024_v2
  total:
    instructions: 389.36 M (improved by 8.26%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_32_1024
  total:
    instructions: 341.51 M (improved by 4.89%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_32_1024_v2
  total:
    instructions: 423.41 M (improved by 9.46%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_64_1024
  total:
    instructions: 585.78 M (improved by 4.16%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_64_1024_v2
  total:
    instructions: 675.98 M (improved by 7.09%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_128_1024
  total:
    instructions: 838.31 M (improved by 4.66%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_128_1024_v2
  total:
    instructions: 925.30 M (improved by 6.18%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_256_1024
  total:
    instructions: 1.34 B (improved by 4.89%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_256_1024_v2
  total:
    instructions: 1.43 B (improved by 5.67%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_512_1024
  total:
    instructions: 2.35 B (improved by 4.67%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_512_1024_v2
  total:
    instructions: 2.43 B (improved by 5.33%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_u64
  total:
    instructions: 196.25 M (-1.58%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_u64_v2
  total:
    instructions: 279.73 M (improved by 4.17%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_blob_8
  total:
    instructions: 195.43 M (-1.38%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_blob_8_v2
  total:
    instructions: 273.40 M (improved by 4.13%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_u64
  total:
    instructions: 209.05 M (improved by 2.98%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_u64_v2
  total:
    instructions: 311.22 M (improved by 4.66%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: memory_manager_baseline
  total:
    instructions: 1.18 B (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 8000 pages (no change)

---------------------------------------------------

Benchmark: memory_manager_overhead
  total:
    instructions: 1.18 B (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 8320 pages (no change)

---------------------------------------------------

Benchmark: memory_manager_grow
  total:
    instructions: 351.69 M (no change)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 32.00 K pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_4
  total:
    instructions: 3.98 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_8
  total:
    instructions: 4.01 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 1 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_16
  total:
    instructions: 4.07 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 2 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_32
  total:
    instructions: 4.19 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 5 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_64
  total:
    instructions: 4.43 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 9 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_128
  total:
    instructions: 4.90 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 19 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_u64
  total:
    instructions: 6.11 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 1 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_4
  total:
    instructions: 5.71 M (improved by 3.08%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_8
  total:
    instructions: 6.51 M (improved by 9.02%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_16
  total:
    instructions: 8.87 M (improved by 10.92%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_32
  total:
    instructions: 9.51 M (improved by 12.45%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_64
  total:
    instructions: 13.89 M (improved by 11.57%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_128
  total:
    instructions: 18.89 M (improved by 12.61%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_u64
  total:
    instructions: 5.87 M (improved by 8.71%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

dsarlis · 2024-11-20T15:16:28Z

src/lib.rs

+    fn read_to_vec(&self, offset: u64, len: usize, dst: &mut std::vec::Vec<u8>) {
+        dst.resize(len, 0);
+        self.read(offset, &mut dst[..]);
+    }


I am not sure if we want to have this be part of the Memory trait. It depends on how religious we want to be here.

The Memory trait is supposed to model the operations that are available for stable memory which in turn models the operations available for heap memory in Wasm. There's no read_to_vec so following that logic, maybe we avoid adding it to the trait.

We can probably have it as a free function or you think there's a significant advantage in having it on the trait?

The benefit of having it on the trait is that implementors of the trait can decide to override it in an efficient way. For example, Ic0StableMemory can read directly read into a Vec skipping initialization.

is supposed to model the operations that are available for stable memory

That's a valid point. Another alternative could be changing the read method to read into dst: &mut [MaybeUninit<u8>] which is technically the truth. It's unfortunately an unsolved problem in Rust to create uninitialized [u8] slices (something totally acceptable in C or C++). The documentation is very explicit about warning people not to create slices with uninitialized values even if those values will never be read.

One more alternative would be changing the read method to take a raw pointer and a length. In such cases, Rust does not require any initialization. Callers could then do:

let mut v = Vec::with_capacity(len); memory.read(offset, len, v.as_mut_ptr()); v.set_len(x);

However, read would likely need to be marked as unsafe since clients would need to guarantee that len number of bytes after the passed pointer are writable.

I tested what if we replace read_to_vec with unsafe fn read_unsafe(&self, offset: u64, dst: *mut u8, count: usize)

It allows for even better optimizations than read_to_vec leading to up to 40% instruction reduction. This function is unsafe because clients must promise that count number of bytes after dst are writeable.

(We could still keep read_to_vec but make it into an associated method like you mentioned. It would provide a safe and efficient way for clients to read and would use read_unsafe under the hood.)

So, to make sure I understand your suggestion correctly, it would be:

Add unsafe fn read_unsafe for maximum performance if clients know what they are doing -- not super excited because I think it can be a big trap but as long as we provide safe alternatives I'm not strongly opposed.

Add read_to_vec as an associated method, not on the trait.

We would also leave the original read but it's the slowest of them all?

Thanks for all the exploration around possible options.

That's correct. I'd keep the original read because sometimes you have a fixed-size array that you want to fill. If the array is small, initializing it is fast and using a vector is unnecessary (and slower). read is the best option for this use-case.

When reading a larger blob, it's better not to allocate on the stack, so reading into a Vec is the best approach. read_to_vec can be used for that.

And Memory implementations must be able to read into uninitialized buffers (pointer + length) which is trivial for the mostly used Ic0Memory implementation. The default implementation will just initialize the buffer with 0-s first and then call the existing read(). If we make read_to_vec into an associated function, we don't have to pollute the Memory interface with Vec.

if clients know what they are doing

The API will be the same as for example std::ptr::copy with the same requirements. And we can provide safe wrappers while staying efficient.

I'll send a new PR with this new approach.

Bit of a random question, but we have several canisters that use stable-structures, and we'd love to benefit from these improvements - would we be able to just by upgrading the crate once this change is out?

@rikonor Yeah, we'll need to cut a new release because there's a few optimizations already included that are unreleased. Maybe we can target to make a release after this change with read_to_vec is merged.

frankdavid · 2024-11-21T15:39:33Z

Superseded by #248

perf: Add new efficient API read_to_vec

d724d21

frankdavid requested a review from a team as a code owner November 20, 2024 14:03

dsarlis reviewed Nov 20, 2024

View reviewed changes

frankdavid closed this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add new efficient API read_to_vec #246

perf: Add new efficient API read_to_vec #246

frankdavid commented Nov 20, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024

dsarlis Nov 20, 2024 •

edited

Loading

frankdavid Nov 20, 2024

frankdavid Nov 20, 2024 •

edited

Loading

frankdavid Nov 21, 2024 •

edited

Loading

dsarlis Nov 21, 2024

frankdavid Nov 21, 2024

rikonor Nov 21, 2024

dsarlis Nov 21, 2024

frankdavid commented Nov 21, 2024

perf: Add new efficient API read_to_vec #246

perf: Add new efficient API read_to_vec #246

Conversation

frankdavid commented Nov 20, 2024 • edited Loading

github-actions bot commented Nov 20, 2024

canbench 🏋 (dir: .)

dsarlis Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

frankdavid Nov 20, 2024

Choose a reason for hiding this comment

frankdavid Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

frankdavid Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

dsarlis Nov 21, 2024

Choose a reason for hiding this comment

frankdavid Nov 21, 2024

Choose a reason for hiding this comment

rikonor Nov 21, 2024

Choose a reason for hiding this comment

dsarlis Nov 21, 2024

Choose a reason for hiding this comment

frankdavid commented Nov 21, 2024

frankdavid commented Nov 20, 2024 •

edited

Loading

`canbench` 🏋 (dir: .)

dsarlis Nov 20, 2024 •

edited

Loading

frankdavid Nov 20, 2024 •

edited

Loading

frankdavid Nov 21, 2024 •

edited

Loading