-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New APIs: ByteString.toIndex() and ByteString.toFraction() #729
base: master
Are you sure you want to change the base?
Conversation
b2ea609
to
c5d9dca
Compare
The first one may be useful with hashing to put byte strings in partitioning buckets for scaling. For example, to divide a dataset into 32 partitions, hash the key then use toIndex(32) to map the key to its partition. The second one may be useful with dynamic experiments and A/B tests. For example, to assign a control group to 5% of customers hash the customer key then check if toFraction() is less than 0.05.
c5d9dca
to
3d41608
Compare
* bytes. For example, "aaaaaaaaab".toIndex(3) is 1, but if we did arbitrary-precision math the | ||
* result would be 2. | ||
*/ | ||
@Test fun toIndexHonorsFirstFourBytesOnly() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of everything, this part hurts me the most
* | ||
* @return a value that is greater than or equal to `0.0` and less than `1.0`. | ||
*/ | ||
fun toFraction(): Double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comments!
Recording some out-of-band discussion points: we're looking for a word to replace "to" here since the index and fraction concepts are not intrinsic to domain of how you think of a bytestring. Jesse had a fun analogy where it's like having |
I've been thinking about the names here. My biggest concern with the current names is they could be interpreted as decoding ASCII, like readDecimalLong. My new recommendation is to drop the index function, and rename toFraction() to unitIntervalDouble(). You can get the result of toIndex by dividing that by the target size. |
The first one may be useful with hashing to put byte strings in
partitioning buckets for scaling. For example, to divide a dataset
into 32 partitions, hash the key then use toIndex(32) to map the
key to its partition.
The second one may be useful with dynamic experiments and A/B
tests. For example, to assign a control group to 5% of customers
hash the customer key then check if toFraction() is less than 0.05.