Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Add new efficient API read_to_vec #246

Closed
wants to merge 1 commit into from

Conversation

frankdavid
Copy link
Contributor

@frankdavid frankdavid commented Nov 20, 2024

I found that a source of significant performance loss is the read method of Memory. The read method takes a mutable buffer which it fills with values read from the stable memory. According to Rust rules, the buffer passed to read must be initialized before it's passed to read (buffers containing uninitialized values are unsound and can cause UB).

The usual pattern is to create a properly sized Vec, eg. by using vec![0; size] or vec.resize(size, 0) and pass that to read. However, initializing the bytes with values that get overwritten by read is only necessary in order to be sound and requires significant number of instructions.

This PR introduces a new method read_to_vec which allows passing in a Vec with any size (most of the time an empty Vec will be used). Implementations can be more efficient by reading directly into the Vec and skipping initialization. This can lead to instruction reductions of up to 20%.

@frankdavid frankdavid requested a review from a team as a code owner November 20, 2024 14:03
Copy link

canbench 🏋 (dir: .)

Significant performance change detected! ⚠️

./canbench_results.yml is up to date ✅


---------------------------------------------------

Benchmark: btreemap_insert_blob_4_1024
  total:
    instructions: 512.87 M (improved by 19.10%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 123 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_4_1024_v2
  total:
    instructions: 613.16 M (improved by 16.93%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 92 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_1024
  total:
    instructions: 623.85 M (improved by 17.72%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 183 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_1024_v2
  total:
    instructions: 729.17 M (improved by 15.83%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 138 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_16_1024
  total:
    instructions: 712.61 M (improved by 15.07%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 215 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_16_1024_v2
  total:
    instructions: 810.60 M (improved by 15.16%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 161 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_32_1024
  total:
    instructions: 753.24 M (improved by 15.44%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 230 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_32_1024_v2
  total:
    instructions: 851.68 M (improved by 15.45%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 173 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_64_1024
  total:
    instructions: 1.02 B (improved by 11.89%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 245 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_64_1024_v2
  total:
    instructions: 1.12 B (improved by 12.74%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 183 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_128_1024
  total:
    instructions: 1.29 B (improved by 11.30%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 260 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_128_1024_v2
  total:
    instructions: 1.39 B (improved by 11.19%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 195 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_256_1024
  total:
    instructions: 1.85 B (improved by 8.90%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 292 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_256_1024_v2
  total:
    instructions: 1.95 B (improved by 9.27%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 219 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_512_1024
  total:
    instructions: 2.94 B (improved by 7.06%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 351 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_512_1024_v2
  total:
    instructions: 3.03 B (improved by 7.59%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 263 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_4
  total:
    instructions: 4.82 B (improved by 2.91%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 235 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_4_v2
  total:
    instructions: 4.87 B (improved by 4.19%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 176 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_8
  total:
    instructions: 4.88 B (improved by 3.31%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 237 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_8_v2
  total:
    instructions: 4.96 B (improved by 4.00%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 178 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_16
  total:
    instructions: 4.91 B (improved by 2.90%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 241 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_16_v2
  total:
    instructions: 4.97 B (improved by 4.06%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 181 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_32
  total:
    instructions: 4.93 B (improved by 2.83%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 239 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_32_v2
  total:
    instructions: 4.99 B (improved by 3.91%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 180 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_64
  total:
    instructions: 4.93 B (improved by 3.31%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 250 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_64_v2
  total:
    instructions: 5.01 B (improved by 4.10%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 188 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_128
  total:
    instructions: 4.93 B (improved by 3.49%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 262 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_128_v2
  total:
    instructions: 4.99 B (improved by 4.44%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 196 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_256
  total:
    instructions: 4.94 B (improved by 3.70%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 292 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_256_v2
  total:
    instructions: 5.02 B (improved by 4.39%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 219 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_512
  total:
    instructions: 5.03 B (improved by 4.24%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 348 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_1024_512_v2
  total:
    instructions: 5.11 B (improved by 4.93%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 261 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_u64
  total:
    instructions: 372.74 M (-0.52%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 7 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_u64_v2
  total:
    instructions: 456.25 M (-1.44%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 6 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_blob_8
  total:
    instructions: 366.62 M (0.55%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 7 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_u64_blob_8_v2
  total:
    instructions: 447.26 M (-0.66%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 5 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_u64
  total:
    instructions: 348.87 M (-1.40%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 6 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_blob_8_u64_v2
  total:
    instructions: 463.56 M (improved by 2.14%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 4 pages (no change)

---------------------------------------------------

Benchmark: btreemap_insert_10mib_values
  total:
    instructions: 143.83 M (0.30%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 32 pages (no change)

---------------------------------------------------

Benchmark: btreemap_read_keys_from_range
  total:
    instructions: 115.75 M (regressed by 2.35%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_read_every_third_value_from_range
  total:
    instructions: 115.75 M (regressed by 2.35%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_small_values
  total:
    instructions: 15.06 M (regressed by 3.65%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_rev_small_values
  total:
    instructions: 15.10 M (regressed by 3.95%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_10mib_values
  total:
    instructions: 17.17 M (0.31%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_rev_10mib_values
  total:
    instructions: 17.17 M (0.32%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_small_values
  total:
    instructions: 9.93 M (improved by 3.57%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_rev_small_values
  total:
    instructions: 10.16 M (improved by 3.49%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_10mib_values
  total:
    instructions: 509.52 K (-1.40%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_keys_rev_10mib_values
  total:
    instructions: 511.87 K (-1.40%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_small_values
  total:
    instructions: 14.99 M (improved by 4.63%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_rev_small_values
  total:
    instructions: 15.03 M (improved by 4.62%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_10mib_values
  total:
    instructions: 17.17 M (0.16%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_values_rev_10mib_values
  total:
    instructions: 17.17 M (0.16%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_count_small_values
  total:
    instructions: 9.79 M (improved by 3.62%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_iter_count_10mib_values
  total:
    instructions: 523.05 K (-0.78%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_4_1024
  total:
    instructions: 506.15 M (improved by 20.91%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_4_1024_v2
  total:
    instructions: 624.71 M (improved by 18.20%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_1024
  total:
    instructions: 659.76 M (improved by 21.77%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_1024_v2
  total:
    instructions: 797.26 M (improved by 19.19%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_16_1024
  total:
    instructions: 838.51 M (improved by 19.88%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_16_1024_v2
  total:
    instructions: 980.30 M (improved by 18.81%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_32_1024
  total:
    instructions: 905.60 M (improved by 19.51%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_32_1024_v2
  total:
    instructions: 1.05 B (improved by 18.77%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_64_1024
  total:
    instructions: 1.23 B (improved by 15.36%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_64_1024_v2
  total:
    instructions: 1.38 B (improved by 15.48%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_128_1024
  total:
    instructions: 1.57 B (improved by 13.67%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_128_1024_v2
  total:
    instructions: 1.71 B (improved by 13.55%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_256_1024
  total:
    instructions: 2.21 B (improved by 10.97%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_256_1024_v2
  total:
    instructions: 2.35 B (improved by 11.14%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_512_1024
  total:
    instructions: 3.56 B (improved by 8.27%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_512_1024_v2
  total:
    instructions: 3.69 B (improved by 8.79%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_u64
  total:
    instructions: 536.83 M (-0.20%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_u64_v2
  total:
    instructions: 664.97 M (-0.54%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_blob_8
  total:
    instructions: 522.41 M (1.07%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_u64_blob_8_v2
  total:
    instructions: 639.86 M (0.32%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_u64
  total:
    instructions: 461.07 M (-1.05%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_remove_blob_8_u64_v2
  total:
    instructions: 617.03 M (-1.48%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_4_1024
  total:
    instructions: 193.37 M (improved by 6.10%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_4_1024_v2
  total:
    instructions: 282.41 M (improved by 7.22%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_1024
  total:
    instructions: 221.01 M (improved by 8.97%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_1024_v2
  total:
    instructions: 306.87 M (improved by 9.17%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_16_1024
  total:
    instructions: 312.57 M (-0.93%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_16_1024_v2
  total:
    instructions: 389.36 M (improved by 8.26%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_32_1024
  total:
    instructions: 341.51 M (improved by 4.89%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_32_1024_v2
  total:
    instructions: 423.41 M (improved by 9.46%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_64_1024
  total:
    instructions: 585.78 M (improved by 4.16%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_64_1024_v2
  total:
    instructions: 675.98 M (improved by 7.09%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_128_1024
  total:
    instructions: 838.31 M (improved by 4.66%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_128_1024_v2
  total:
    instructions: 925.30 M (improved by 6.18%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_256_1024
  total:
    instructions: 1.34 B (improved by 4.89%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_256_1024_v2
  total:
    instructions: 1.43 B (improved by 5.67%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_512_1024
  total:
    instructions: 2.35 B (improved by 4.67%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_512_1024_v2
  total:
    instructions: 2.43 B (improved by 5.33%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_u64
  total:
    instructions: 196.25 M (-1.58%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_u64_v2
  total:
    instructions: 279.73 M (improved by 4.17%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_blob_8
  total:
    instructions: 195.43 M (-1.38%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_u64_blob_8_v2
  total:
    instructions: 273.40 M (improved by 4.13%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_u64
  total:
    instructions: 209.05 M (improved by 2.98%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap_get_blob_8_u64_v2
  total:
    instructions: 311.22 M (improved by 4.66%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: memory_manager_baseline
  total:
    instructions: 1.18 B (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 8000 pages (no change)

---------------------------------------------------

Benchmark: memory_manager_overhead
  total:
    instructions: 1.18 B (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 8320 pages (no change)

---------------------------------------------------

Benchmark: memory_manager_grow
  total:
    instructions: 351.69 M (no change)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 32.00 K pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_4
  total:
    instructions: 3.98 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_8
  total:
    instructions: 4.01 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 1 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_16
  total:
    instructions: 4.07 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 2 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_32
  total:
    instructions: 4.19 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 5 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_64
  total:
    instructions: 4.43 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 9 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_blob_128
  total:
    instructions: 4.90 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 19 pages (no change)

---------------------------------------------------

Benchmark: vec_insert_u64
  total:
    instructions: 6.11 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 1 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_4
  total:
    instructions: 5.71 M (improved by 3.08%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_8
  total:
    instructions: 6.51 M (improved by 9.02%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_16
  total:
    instructions: 8.87 M (improved by 10.92%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_32
  total:
    instructions: 9.51 M (improved by 12.45%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_64
  total:
    instructions: 13.89 M (improved by 11.57%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_blob_128
  total:
    instructions: 18.89 M (improved by 12.61%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_get_u64
  total:
    instructions: 5.87 M (improved by 8.71%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Comment on lines +63 to +66
fn read_to_vec(&self, offset: u64, len: usize, dst: &mut std::vec::Vec<u8>) {
dst.resize(len, 0);
self.read(offset, &mut dst[..]);
}
Copy link
Member

@dsarlis dsarlis Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we want to have this be part of the Memory trait. It depends on how religious we want to be here.

The Memory trait is supposed to model the operations that are available for stable memory which in turn models the operations available for heap memory in Wasm. There's no read_to_vec so following that logic, maybe we avoid adding it to the trait.

We can probably have it as a free function or you think there's a significant advantage in having it on the trait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of having it on the trait is that implementors of the trait can decide to override it in an efficient way. For example, Ic0StableMemory can read directly read into a Vec skipping initialization.

is supposed to model the operations that are available for stable memory

That's a valid point. Another alternative could be changing the read method to read into dst: &mut [MaybeUninit<u8>] which is technically the truth. It's unfortunately an unsolved problem in Rust to create uninitialized [u8] slices (something totally acceptable in C or C++). The documentation is very explicit about warning people not to create slices with uninitialized values even if those values will never be read.

Copy link
Contributor Author

@frankdavid frankdavid Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more alternative would be changing the read method to take a raw pointer and a length. In such cases, Rust does not require any initialization. Callers could then do:

let mut v = Vec::with_capacity(len);
memory.read(offset, len, v.as_mut_ptr());
v.set_len(x);

However, read would likely need to be marked as unsafe since clients would need to guarantee that len number of bytes after the passed pointer are writable.

Copy link
Contributor Author

@frankdavid frankdavid Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested what if we replace read_to_vec with unsafe fn read_unsafe(&self, offset: u64, dst: *mut u8, count: usize)

It allows for even better optimizations than read_to_vec leading to up to 40% instruction reduction. This function is unsafe because clients must promise that count number of bytes after dst are writeable.

(We could still keep read_to_vec but make it into an associated method like you mentioned. It would provide a safe and efficient way for clients to read and would use read_unsafe under the hood.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to make sure I understand your suggestion correctly, it would be:

  1. Add unsafe fn read_unsafe for maximum performance if clients know what they are doing -- not super excited because I think it can be a big trap but as long as we provide safe alternatives I'm not strongly opposed.
  2. Add read_to_vec as an associated method, not on the trait.
  3. We would also leave the original read but it's the slowest of them all?

Thanks for all the exploration around possible options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I'd keep the original read because sometimes you have a fixed-size array that you want to fill. If the array is small, initializing it is fast and using a vector is unnecessary (and slower). read is the best option for this use-case.

When reading a larger blob, it's better not to allocate on the stack, so reading into a Vec is the best approach. read_to_vec can be used for that.

And Memory implementations must be able to read into uninitialized buffers (pointer + length) which is trivial for the mostly used Ic0Memory implementation. The default implementation will just initialize the buffer with 0-s first and then call the existing read(). If we make read_to_vec into an associated function, we don't have to pollute the Memory interface with Vec.

if clients know what they are doing

The API will be the same as for example std::ptr::copy with the same requirements. And we can provide safe wrappers while staying efficient.

I'll send a new PR with this new approach.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a random question, but we have several canisters that use stable-structures, and we'd love to benefit from these improvements - would we be able to just by upgrading the crate once this change is out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rikonor Yeah, we'll need to cut a new release because there's a few optimizations already included that are unreleased. Maybe we can target to make a release after this change with read_to_vec is merged.

@frankdavid
Copy link
Contributor Author

Superseded by #248

@frankdavid frankdavid closed this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants