New APIs: ByteString.toIndex() and ByteString.toFraction() #729

swankjesse · 2020-06-01T23:43:20Z

The first one may be useful with hashing to put byte strings in
partitioning buckets for scaling. For example, to divide a dataset
into 32 partitions, hash the key then use toIndex(32) to map the
key to its partition.

The second one may be useful with dynamic experiments and A/B
tests. For example, to assign a control group to 5% of customers
hash the customer key then check if toFraction() is less than 0.05.

The first one may be useful with hashing to put byte strings in partitioning buckets for scaling. For example, to divide a dataset into 32 partitions, hash the key then use toIndex(32) to map the key to its partition. The second one may be useful with dynamic experiments and A/B tests. For example, to assign a control group to 5% of customers hash the customer key then check if toFraction() is less than 0.05.

swankjesse · 2020-06-02T01:34:09Z

okio/src/commonTest/kotlin/okio/ByteStringTest.kt

+   * bytes. For example, "aaaaaaaaab".toIndex(3) is 1, but if we did arbitrary-precision math the
+   * result would be 2.
+   */
+  @Test fun toIndexHonorsFirstFourBytesOnly() {


of everything, this part hurts me the most

zhxnlai · 2020-06-02T02:02:17Z

okio/src/commonMain/kotlin/okio/ByteString.kt

+   *
+   * @return a value that is greater than or equal to `0.0` and less than `1.0`.
+   */
+  fun toFraction(): Double


Great comments!

JakeWharton · 2020-06-19T02:34:05Z

Recording some out-of-band discussion points: we're looking for a word to replace "to" here since the index and fraction concepts are not intrinsic to domain of how you think of a bytestring. Jesse had a fun analogy where it's like having String.toIndex() which returned the absolute position of a string in a dictionary of all words. It's not perfect, of course, since strings can be arbitrary characters that aren't words. It is, however, a useful mental model since String.toIndex() would be a weird API but String.dictionaryIndex() makes sense. So what's the "dictionary" of the set of all possible bytes such that we can name these methods something like thingIndex() and thingFraction()?

swankjesse · 2020-06-20T19:19:20Z

I've been thinking about the names here. My biggest concern with the current names is they could be interpreted as decoding ASCII, like readDecimalLong.

My new recommendation is to drop the index function, and rename toFraction() to unitIntervalDouble(). You can get the result of toIndex by dividing that by the target size.

swankjesse force-pushed the jwilson.0601.toFraction_toIndex branch from b2ea609 to c5d9dca Compare June 1, 2020 23:44

swankjesse force-pushed the jwilson.0601.toFraction_toIndex branch from c5d9dca to 3d41608 Compare June 2, 2020 01:28

swankjesse commented Jun 2, 2020

View reviewed changes

zhxnlai reviewed Jun 2, 2020

View reviewed changes

zhxnlai approved these changes Jun 2, 2020

View reviewed changes

swankjesse force-pushed the master branch from 2a696bb to ef6bf7f Compare June 27, 2022 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New APIs: ByteString.toIndex() and ByteString.toFraction() #729

New APIs: ByteString.toIndex() and ByteString.toFraction() #729

swankjesse commented Jun 1, 2020 •

edited

Loading

swankjesse Jun 2, 2020

zhxnlai Jun 2, 2020

JakeWharton commented Jun 19, 2020

swankjesse commented Jun 20, 2020

New APIs: ByteString.toIndex() and ByteString.toFraction() #729

Are you sure you want to change the base?

New APIs: ByteString.toIndex() and ByteString.toFraction() #729

Conversation

swankjesse commented Jun 1, 2020 • edited Loading

swankjesse Jun 2, 2020

Choose a reason for hiding this comment

zhxnlai Jun 2, 2020

Choose a reason for hiding this comment

JakeWharton commented Jun 19, 2020

swankjesse commented Jun 20, 2020

swankjesse commented Jun 1, 2020 •

edited

Loading