-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checks for problems in Accumulo #4957
base: 3.1
Are you sure you want to change the base?
Conversation
This commit: - Moves existing checks (`checkTablets` and the fate check for dangling locks) into the appropriate new `admin check` command - Adds new checks - New tests in AdminCheckIT - SYSTEM_CONFIG now checks for - valid locked table/namespace ids (the locked table/namespaces exist) - locked table/namespaces are associated with a fate op - ROOT_METADATA now checks for - offline tablets - missing "columns" - invalid "columns" - ROOT_TABLE now checks for - offline tablets - tablets for metadata table have no holes, valid (null) prev end row for first tablet, and valid (null) end row for last tablet - missing columns - invalid columns - METADATA_TABLE now checks for - offline tablets - tablets for user tables (and scanref) have no holes, valid (null) prev end row for first tablet, and valid (null) end row for last tablet - missing columns - invalid columns - SYSTEM_FILES now checks for - missing system files - USER_FILES now checks for - missing user files Part of apache#4892
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinrr888 these changes look good. I was running the changes locally and I noticed at the top level the accumulo check-server-config
command exists. Was wondering if that should be rolled into this check command, if it should could add that to the checklist on #4892.
...er/base/src/main/java/org/apache/accumulo/server/util/checkCommand/UserFilesCheckRunner.java
Outdated
Show resolved
Hide resolved
server/base/src/main/java/org/apache/accumulo/server/util/checkCommand/MetadataCheckRunner.java
Show resolved
Hide resolved
server/base/src/main/java/org/apache/accumulo/server/util/checkCommand/MetadataCheckRunner.java
Outdated
Show resolved
Hide resolved
...base/src/main/java/org/apache/accumulo/server/util/checkCommand/SystemConfigCheckRunner.java
Outdated
Show resolved
Hide resolved
- `System.out` -> `log.trace/warn` to avoid flooding output with unnecessary/detailed info. The most important info (e.g., output of `admin check list` command, and the final run status table from the `admin check run` command) is still printed to stdout. Problems found are now logged at warn instead of stdout. Detailed, non-error info logged at trace. - Created new check `Check.TABLE_LOCKS` which ensures that table and namespace locks are valid and are associated with a FATE op. - New check `assertNoOtherChecksRan()` in `AdminCheckIT` which ensures only the expected checks ran - A few misc review changes: `MetadataCheckRunner` code improved to only fetch required columns when scanning, object creation moved outside of a loop
Thanks, I had missed that. Added to #4892. Also noticed |
Changes in 11f1f76:
Let me know how these changes look. Something still left TODO is tests for failing checks. Tests should be added to ensure everything that is supposed to be checked correctly fails when it is expected to. I imagine this will take quite a bit of time to do and the changes are already pretty large, so thinking it might be best to leave for follow-on. |
This PR:
checkTablets
and the fate check for dangling locks) into the appropriate newadmin check
commandThere are still quite a few checks that need to be added (mentioned in #4687) and probably more. This is a first/starting PR for these checks. More checks will be added in follow-ons. Something else still left todo are tests for these checks for FAILING cases.
Part of #4892