Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Regex update to avoid over-redaction of GitHub issues (#325)
Hello - this PR adjusts the regexes found in `airflow/include/tasks/extract/github.py` that remove boilerplate text from GitHub issue thread text. Before this change, the regular expressions will remove too much text due to the greedy matching. Consider the following example: ``` Discussion here. <!--\r\nThank you. http://chris.beams.io/posts/git-commit/\r\n--> More discussion here. <!--\r\nThank you. http://chris.beams.io/posts/git-commit/\r\n--> Even more discussion here ``` The two lines containing comments should be removed, but the greedy match in the regular expression `<!--\r\nThank you.*?http://chris.beams.io/posts/git-commit/\r\n-->` will cause the line `More discussion here.` to be removed as well. To fix that behavior, this PR replaces each greedy `.*` sequence with a lazy `.*?` sequence so that the minimum (intended) match is removed.
- Loading branch information