Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New APIs: ByteString.toIndex() and ByteString.toFraction() #729

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

swankjesse
Copy link
Collaborator

@swankjesse swankjesse commented Jun 1, 2020

The first one may be useful with hashing to put byte strings in
partitioning buckets for scaling. For example, to divide a dataset
into 32 partitions, hash the key then use toIndex(32) to map the
key to its partition.

The second one may be useful with dynamic experiments and A/B
tests. For example, to assign a control group to 5% of customers
hash the customer key then check if toFraction() is less than 0.05.

@swankjesse swankjesse force-pushed the jwilson.0601.toFraction_toIndex branch from b2ea609 to c5d9dca Compare June 1, 2020 23:44
The first one may be useful with hashing to put byte strings in
partitioning buckets for scaling. For example, to divide a dataset
into 32 partitions, hash the key then use toIndex(32) to map the
key to its partition.

The second one may be useful with dynamic experiments and A/B
tests. For example, to assign a control group to 5% of customers
hash the customer key then check if toFraction() is less than 0.05.
@swankjesse swankjesse force-pushed the jwilson.0601.toFraction_toIndex branch from c5d9dca to 3d41608 Compare June 2, 2020 01:28
* bytes. For example, "aaaaaaaaab".toIndex(3) is 1, but if we did arbitrary-precision math the
* result would be 2.
*/
@Test fun toIndexHonorsFirstFourBytesOnly() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of everything, this part hurts me the most

*
* @return a value that is greater than or equal to `0.0` and less than `1.0`.
*/
fun toFraction(): Double
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comments!

@JakeWharton
Copy link
Collaborator

Recording some out-of-band discussion points: we're looking for a word to replace "to" here since the index and fraction concepts are not intrinsic to domain of how you think of a bytestring. Jesse had a fun analogy where it's like having String.toIndex() which returned the absolute position of a string in a dictionary of all words. It's not perfect, of course, since strings can be arbitrary characters that aren't words. It is, however, a useful mental model since String.toIndex() would be a weird API but String.dictionaryIndex() makes sense. So what's the "dictionary" of the set of all possible bytes such that we can name these methods something like thingIndex() and thingFraction()?

@swankjesse
Copy link
Collaborator Author

I've been thinking about the names here. My biggest concern with the current names is they could be interpreted as decoding ASCII, like readDecimalLong.

My new recommendation is to drop the index function, and rename toFraction() to unitIntervalDouble(). You can get the result of toIndex by dividing that by the target size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants