Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESPNow recovery from "invalid buffer" #9816

Open
stanelie opened this issue Nov 18, 2024 · 6 comments
Open

ESPNow recovery from "invalid buffer" #9816

stanelie opened this issue Nov 18, 2024 · 6 comments
Labels
bug espressif applies to multiple Espressif chips network
Milestone

Comments

@stanelie
Copy link

CircuitPython version

Adafruit CircuitPython 9.2.0 on 2024-10-28; Waveshare ESP32-S3-Zero with ESP32S3

Code/REPL

def read_packet():
    global packet
    try:
        packet = e.read()
    except ValueError as error:
        print(f"{error}")

Behavior

Hello.
This is more a support question then a bug report. I cannot find anywhere in the documentation how to purge the ESPNow buffer, or how to destroy the ESPNow object so that I can recreate it (to recover from errors).

I run this code where a ESP32 is receiving packets from 2 different senders, and I guess the packets interfere when they arrive at the same time? When that happens, I receive an Invalid buffer error and I cannot recover from this. I can catch the error so the code keeps going, but the buffer keeps saying it is invalid.

Description

No response

Additional information

No response

@stanelie stanelie added the bug label Nov 18, 2024
@anecdata
Copy link
Member

anecdata commented Nov 18, 2024

@stanelie you can try deinit (or as a workaround reload or reset if that doesn't work):
https://docs.circuitpython.org/en/latest/shared-bindings/espnow/index.html#espnow.ESPNow.deinit
(I saw this In some early testing, and opted to reload onInvalid buffer.)

@stanelie
Copy link
Author

deinit works perfectly for my purpose. Thanks @anecdata !

@jepler
Copy link
Member

jepler commented Nov 19, 2024

Inspecting the C source, I do believe there's an implementation error.

 68 // Callback triggered when an ESP-NOW packet is received.
 69 // Write the peer MAC address and the message into the recv_buffer as an ESPNow packet.
 70 // If the buffer is full, drop the message and increment the dropped count.
 71 static void recv_cb(const esp_now_recv_info_t *esp_now_info, const uint8_t *msg, int msg_len) {
 72     espnow_obj_t *self = MP_STATE_PORT(espnow_singleton);
 73     ringbuf_t *buf = self->recv_buffer;
 74 
 75     if (sizeof(espnow_packet_t) + msg_len > ringbuf_num_empty(buf)) {
 76         self->read_failure++;
 77         return;
 78     }
 79 
 80     espnow_header_t header;
 81     header.magic = ESPNOW_MAGIC;
 82     header.msg_len = msg_len;
 83     header.rssi = esp_now_info->rx_ctrl->rssi;
 84     header.time_ms = mp_hal_ticks_ms();
 85 
 86     ringbuf_put_n(buf, (uint8_t *)&header, sizeof(header));
 87     ringbuf_put_n(buf, esp_now_info->src_addr, ESP_NOW_ETH_ALEN);
 88     ringbuf_put_n(buf, msg, msg_len);
 89 
 90     self->read_success++;
 91 }

At line 75, the code intends to check thatall the ringbuf_put_n calls will succeeed, by checking the available space against a calculated value. However, this calculation neglects that the source address of length ESP_NOW_ETH_ALEN is also put into the ring buffer.

Consequently, the subsequent calls to ringbuf_get_n within common_hal_espnow_read can fail.

The fix may be as simple as adding ESP_NOW_ETH_ALEN to the calculation in line 75.

Another potential problem source—and I don't know if this can occur in practice—is if common_hal_espnow_read and recv_cb can execute at the same time, for instance because they are on different cores. Suppose that recv_cb has done just 1 of the put_n calls at the same time as common_hal_espnow_read tries to perform the second get_n call. In this case, the call could fail. The docs state "The receiving callback function also runs from the Wi-Fi task" so I believe this means it will run on the PRO core while CircuitPython is running on the APP core, which means this kind of interleaved behavior is possible.

Actually, now that I think about it ringbuf is a CircuitPython API and at is not multithread / multicore safe, so it may not be appropriate to use here at all. For instance, if get and put are happening from multiple cores at the same time, the used field (whichis not volatile and is simply incremented and decremented with ++ and --) could get garbled. Using FreeRTOS ring buffers might be more advisable: https://docs.espressif.com/projects/esp-idf/en/v5.3/esp32/api-reference/system/freertos_additions.html#ring-buffers

@jepler
Copy link
Member

jepler commented Nov 19, 2024

However, I'm glad you were able to work around the problem with the use of deinit.

@dhalbert
Copy link
Collaborator

Actually, now that I think about it ringbuf is a CircuitPython API and at is not multithread / multicore safe, so it may not be appropriate to use here at all. For instance, if get and put are happening from multiple cores at the same time, the used field (which is not volatile and is simply incremented and decremented with ++ and --) could get garbled.

I think the original, original ringbuf may have been atomically safe, but once we went to two-byte entries, I think that went out the window. That's why I made the implementation more straightforward; there were many latent bugs having to do with the requested size vs needed size of the buffer (one entry more). My current expectation is that ringbuf operations will be done in a critical section.

@stanelie
Copy link
Author

This high level discussion is way above my level, but if you do implement a fix, I will be ready to check if it works.
:)

@dhalbert dhalbert added this to the 9.x.x milestone Nov 25, 2024
@dhalbert dhalbert added network espressif applies to multiple Espressif chips labels Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug espressif applies to multiple Espressif chips network
Projects
None yet
Development

No branches or pull requests

4 participants