-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify stop-here behavior to use accumulo-service #5094
base: 2.1
Are you sure you want to change the base?
Conversation
Fixes a bug in stop-here where the process would shut down other parts of the cluster. Modifies the behavior to only stop local processes.
@@ -573,49 +573,32 @@ function stop_all() { | |||
} | |||
|
|||
function stop_here() { | |||
# Determine hostname without errors to user | |||
hosts_to_check=("$(hostname -a 2>/dev/null | head -1)" "$(hostname -f)") | |||
|
|||
if echo "${TSERVER_HOSTS}" | grep -Eq 'localhost|127[.]0[.]0[.]1'; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the issue is really with this check. start_here
gets the local addresses using:
local_hosts="$(hostname -a 2>/dev/null) $(hostname) localhost 127.0.0.1 $(hostname -I)"
then iterates over ${TSERVER_HOSTS}
, and if there is a match, then it starts the service. I'm thinking we should be doing the same thing here instead of what we are currently doing. I'm concerned about calling accumulo-service
here vs calling it in control_service
because as this change gets merged up to main
I don't think it will work the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to "local_hosts", the logic in control_service
attempts to do another validation to figure out if it should talk locally or create an SSH connection.
if [[ $host == localhost || $host == "$(hostname -s)" || $host == "$(hostname -f)" || "$(hostname -I)" =~ $host ]]; then
// do local connection
else
// SSH to host
Having two different host validation methods doesn't seem ideal and since stop-here
should be locally scoped it seemed a better form of control to always run the local accumulo-service
command vs possibly ssh to the target host.
If the concern is that we are calling accumulo-service
two separate ways, I could change the structure of control_service to accept a --local
flag which would never allow the SSH option to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike start-here
, there are only two reasons why this method needs knowledge of its own hostname.
The first is for issuing an admin stop
command to try and gracefully stop the tserver via a FATE operation.
The second is for control_service
to figure out if it needs to run a local command or not.
However I'm not sure of the value of running that command as both this commit and the previous code did not wait for the stop command to complete successfully before starting to issue direct stop
and kill
commands to accumulo-service
.
If we bypass the fate step, then there's no reason to bother with looking at the hostname for stop-here
as accumulo-service lists all of the running processes by looking at pid files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change the structure of control_service to accept a --local flag which would never allow the SSH option to run
This makes sense to me. Obviously if the user is executing a command for the local host, we should not be ssh'ing.
[[ -n ${!var_name} ]] && NUM_SSERVERS=${!var_name} | ||
G="SSERVER_HOSTS_${group}" | ||
for sserver in ${!G}; do | ||
end_service $end_cmd "$sserver" sserver "-g" "$group" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the bugs that is fixed is here where $sserver
is being used instead of $host
.
This causes all scan servers to be terminated across the cluster.
Fixes a bug in stop-here where accumulo-cluster would shut down other parts of the cluster.
Modifies the behavior to only stop local processes.