Skip to content

Commit

Permalink
Regex update to avoid over-redaction of GitHub issues (#325)
Browse files Browse the repository at this point in the history
Hello - this PR adjusts the regexes found in
`airflow/include/tasks/extract/github.py` that remove boilerplate text
from GitHub issue thread text. Before this change, the regular
expressions will remove too much text due to the greedy matching.

Consider the following example:

```
Discussion here.
<!--\r\nThank you. http://chris.beams.io/posts/git-commit/\r\n-->
More discussion here.
<!--\r\nThank you. http://chris.beams.io/posts/git-commit/\r\n-->
Even more discussion here
```

The two lines containing comments should be removed, but the greedy
match in the regular expression `<!--\r\nThank
you.*?http://chris.beams.io/posts/git-commit/\r\n-->` will cause the
line `More discussion here.` to be removed as well.

To fix that behavior, this PR replaces each greedy `.*` sequence with a
lazy `.*?` sequence so that the minimum (intended) match is removed.
  • Loading branch information
bismuthsalamander authored May 29, 2024
1 parent e8ee422 commit 5a4fed4
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions airflow/include/tasks/extract/github.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,12 +214,12 @@ def extract_github_issues(repo_base: str, github_conn_id: str, cutoff_date: str
issues_drop_text = [
dedent(
""" <\\!--\r
.*Licensed to the Apache Software Foundation \\(ASF\\) under one.*under the License\\.\r
.*?Licensed to the Apache Software Foundation \\(ASF\\) under one.*?under the License\\.\r
-->"""
),
"<!-- Please keep an empty line above the dashes. -->",
"<!--\r\nThank you.*http://chris.beams.io/posts/git-commit/\r\n-->",
r"\*\*\^ Add meaningful description above.*newsfragments\)\.",
"<!--\r\nThank you.*?http://chris.beams.io/posts/git-commit/\r\n-->",
r"\*\*\^ Add meaningful description above.*?newsfragments\)\.",
]

issue_markdown_template = dedent(
Expand Down

0 comments on commit 5a4fed4

Please sign in to comment.