Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed false positives #2273 #2285

Merged
merged 3 commits into from
Nov 12, 2024
Merged

Conversation

rsb-23
Copy link
Contributor

@rsb-23 rsb-23 commented Sep 3, 2024

  • Updated user-agent in header
  • Removed redundant user-agents from data.json
  • Fixed false positives for username zqxzxzcvj {except few}

- Updated user-agent in header and removed duplicate
-
@ppfeister
Copy link
Member

Removed redundant user-agents from data.json

Did you test those targets for both positive and negatives after removal? Some of those user agents were specifically tailored and the targets were non functional without, such as with YouTube where you need to masquerade as googlebot, or all queries fail.

@rsb-23
Copy link
Contributor Author

rsb-23 commented Sep 3, 2024

Removed redundant user-agents from data.json

Did you test those targets for both positive and negatives after removal? Some of those user agents were specifically tailored and the targets were non functional without, such as with YouTube where you need to masquerade as googlebot, or all queries fail.

Yes, All test cases passed.
Also, I had tested --site Youtube using mohak_mangal for positive and mohak_mangal2z for negative.
I'll test and confirm for Linkedin and Spotify too.

Btw, we must include testcases for each site for both positives and negatives while adding the site.

PS: Can we add a test/function to check 'probable false-positives' using random string made of fjvxqz?

@ppfeister
Copy link
Member

Yeah, the unit tests themselves don't check individual targets yet. They just test sections of the code itself. That would require a manual check before/after change. I believe there used to be some form of this, but it broke at some point and was later removed.

I don't think it'd be wise to test every single target as part of the PR/push ci, which would generate a lot of unnecessary traffic, but the original behavior was either when prompted by the developer or via a weekly ci run. It's on my to-do list to re-implement this.

Note that the negative test cases won't be hardcoded as that is too easily broken (and can very easily be intentionally broken by someone, by simply registering the hardcoded names). Rather, they will be generated at runtime based on either the given regexCheck or a simple string, with a few attempts possible for the unlikely chance it hits a real username.


I'll jump on in a bit to review these changes here

@rsb-23
Copy link
Contributor Author

rsb-23 commented Sep 4, 2024

  • I have successfully tested all 3 (Linkedin, spotify and youtube which had unique user-agent) for both positives and negatives using claimed and zqxzxzcvj username.

Yeah, I absolutely agrees with your points. So, I have few approaches to it. (feel free to ignore 😜 )

  1. We can have check.py script for all the cases and checks required by developer for contribution. This can then be run in local as pre-commit hook or manually. Hence no additional traffic is generated and also simplifies developers' task.

  2. For negative testcases, generate 2 usernames from less frequent chars like q, v, etc. and then showing common sites as probable false positives.

sherlock_project/resources/data.json Outdated Show resolved Hide resolved
sherlock_project/resources/data.json Outdated Show resolved Hide resolved
@ppfeister ppfeister merged commit 46023a8 into sherlock-project:master Nov 12, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants