-
-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Cluster Nodes Unable to Contact Webseed in TLS-Enabled Cluster Setup #964
Comments
At a guess the webseed URL doesn't conform to the BEP for a multi file torrent. Make sure it's a single file torrent if you're going to specify a URL to a single file. You could also put a panic in where it's closing the webseed to find out it's reasoning. |
It is a single file of size 14GB. To be more accurate it is "gzip compressed data" of 14GB.
|
Could you provide the metainfo here? I'll get back to you on the close thing tomorrow. |
Here is the metainfo (.torrent)
|
Feel free to email it to me. Specifically I want to check the structure of the internal fields as that affects how webseeding works. |
Thanks! Sending you. One thing though:
Once this timeout occurs, then |
The info checks out (it is a single file, but the URL should also be fine). I need to find out why the webseed peer is being closed. There should only be two ways: It's banned, or the torrent closes. There should be copious logging calling out why, or you can put a panic here: Line 149 in 33e0ed5
|
There is a sort of integration test in a semi-formed state that could help with this once we have a better reason. |
I'm not sure if you're looking at the correct case I mentioned. Apologies for any confusion. Here is the issue more clearly explained (copied from post above): Case: Anacrolix/Torrent Process Runs on the New Hosts with existing cluster having TLSThe 14 GB file distribution gets stuck on these new nodes because none of the new peers can contact the web seed (master node) present in the existing cluster. In the web seed section of full-status, it is empty:
I believe you might be looking at the wrong case. The CLOSED status shown below indicates that torrenting through Anacrolix completed successfully. I captured this full-status after the torrent process finished.
My main issue is why when TLS is enabled webseed section remains empty. In the master node(webseed)'s logs, I do not see any of these new peers contacting it. |
I don't quite follow. If they're not able to contact the webseed, there should be errors generated telling you why. |
hi @anacrolix,
For the webseed client, I've already configured it to skip server certificate verification during the torrent client setup using the config.WebTransport = &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
} This configuration works when I download the torrent file directly from the master node using a similar API ( url = https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>.torrent
client := http.DefaultClient
if se.configs.AllowInsecureCerts {
client = &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
}
resp, err := client.Get(url) This request results in a 200 response, successfully downloading the .torrent file. However, when I provide a similar API/URL while adding the webseed ( Could you please help me understand why the torrent file downloads successfully from the master node, but the webseed client fails to download from the master node when using the similar API and both having the |
There's no reason TLS shouldn't work, I've had it work before with webseeding in production scenarios. I think if there's a bug it's that you're not seeing helpful log messages. I don't have much time to allocate to this at the moment but the webseed code isn't lengthy and some tracing through to find where things are going wrong might be worthwhile. |
I'm not sure WebTransport is the correct config item, unfortunately there are quite a few of them due to slight variations in how http is consumed in BitTorrent that I haven't been able to merge. However as above you should be seeing a reason for it not working so just fixing that isn't productive for the project at least. |
I am also using tls config through WebTransport, it is able connect and send request ,but after some time I am seeing below error and getting status as Status :
Error :
|
Okay, as above being banned would make sense. Is it possible your http server does not implement range requests or is serving incorrect or incomplete data? |
Yes we have added response.addHeader("Accept-Ranges", "bytes"); One more thing I have observed is when we add webseed peer and call download then it starts downloading . If we put 2/3 min gap and add webseed it did not start .I have put a torrent.AddWebSeedsOpt to trace in AddWebSeeds, I see torrent is not sending request to server . |
Great. It's very likely missing a "tickle" for webseed peers if reader priorities have already been set. I should be able to statically verify that. |
I am also seeing error |
Hi @anacrolix, which configuration should be set to true to enable "Local Service Discovery"? I want to ensure cross-rack communication is possible. |
I've not implemented this yet. #248. |
Thanks for the update, @anacrolix. I have another question based on this. In my public cloud environment, nodes are spread across different AZs/racks within a VPC network. The security group for this VPC network allows "All traffic" (all protocols, all ports) from all sources (0.0.0.0/0), which should mean that the torrent port 7191 (in my case) is open for communication across racks in the VPC network. However, when I attempted to start a connection between two nodes located in different racks, the connection was reset or closed every time.
The code snippet above shows that when trying to connect to the destination IP via the torrent port, the connection gets reset. Could this be because there is no LSD (Local Service Discovery) implementation within the library, which uses multicast advertisements to enable nodes to discover peers that may be able to help them with their downloads? |
I've pushed fixes to master that should improve webseed performance, and fix the stall that occurs if you add webseeds after adding the torrent (and some delay). |
I've checked this, you are setting it in the correct place. |
This may be due to automatic blocking of internal IPs in the client. It won't be anything to do with the lack of LSD. |
Can you try running
Maybe take a look at Lines 220 to 228 in f471182
|
Is there any update on this? |
Overview
Adding new hosts within a cluster with TLS enabled is problematic due to a prerequisite that new nodes should have a 14 GB file distributed using the BitTorrent client running on these hosts. This torrent process is stuck indefinitely.
Architecture
Cluster Architecture
Within our cluster, we have a master node and worker nodes that report the cluster's state to the master. The master generates the .torrent file, which is a trackerless torrent file. The master somewhat acts as a tracker, providing each peer with information about other peers to communicate with during torrenting.
Torrent Architecture
Torrent Process During Fresh Cluster Install
This is the process followed during a fresh cluster setup:
Torrent Process During New Host Addition in Existing Cluster
This is the general flow of how new hosts are added in an existing cluster:
Scenarios with New Host(s) Addition
Without TLS Enabled on the Existing Cluster
With TLS Enabled in the Existing Cluster
Case #1: Libtorrent Client Process Runs on the New Hosts
The 14 GB file gets distributed within a few minutes.
Case #2: Anacrolix/Torrent Process Runs on the New Hosts
The 14 GB file distribution gets stuck on these new nodes because none of the new peers can contact the web seed (master node) present in the existing cluster. In the web seed section of full-status, it is empty:
Hi @anacrolix, Can you please provide pointers on why this API:
http://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>
is not reachable from peer to the web seed present?The text was updated successfully, but these errors were encountered: