steamcompmgr, wlserver: fix various data races found w/ thread sanitizer #1405

sharkautarch · 2024-07-04T22:15:02Z

For testing w/ TSAN, I added on the meson flags -Db_sanitize=thread --buildtype=debugoptimized when compiling gamescope

I've only done tests w/ running gamescope in nested mode on my X11 host desktop, so this PR might need testing on a wayland host w/ the gamescope wayland backend.

I used the following command to test the latest commit from master w/ TSAN:
TSAN_OPTIONS="suppressions=${HOME}/gamescope/suppress_mangoapp" build/src/gamescope -- vkcube |& tee gamescope_tsan.log
note that I had to suppress mangoapp_update() (the suppress_mangoapp files is only one line: race:mangoapp_update) to avoid the log getting filled with messages about it

I observed at least three unique data race messages, occurring after gamescope + vkcube were actively running (so after initialization), that would reliably get triggered whenever I did the following in order:

Cause gamescope's window(s) to go out-of-focus
wait
bring the window back into focus
move the cursor around a bit

SUMMARY: ThreadSanitizer: data race ../src/wlserver.cpp:337 in bump_input_counter

SUMMARY: ThreadSanitizer: data race ../src/wlserver.cpp:2398 in wlserver_mousebutton(int, bool, unsigned int)

SUMMARY: ThreadSanitizer: data race ../src/wlserver.cpp:2379 in wlserver_mousewarp(double, double, unsigned int, bool)

Full log: https://gist.github.com/sharkautarch/4a8cf292d8b70d133904708ac29465ed

While there were a lot of TSAN warnings which could be false positives and/or non-issues,
the data races addressed by this PR seemed like a bigger deal

EDIT:
~~I just realized that if (bPrevState & suspended) should be if (!bPrevState & suspended) OOPSIES~~
~~Will fix that soon~~
^FIXED

Also, some explanations for some of the weird things in this PR:

I had to make bCursorHidden be std::atomic<char> in order to be able to use atomic fetch_or. This should be safe because I made sure that bCursorHidden is always initialized and assigned bool values which means that the variable will always be either 1 or 0
I use atomic_ref w/ inputCounter instead of a normal atomic because inputCounter is passed into an Xlib function. That means that the Xlib function is doing a non-atomic write to inputCounter, but TSAN doesn’t seem to complain, so either it’s a false negative or the threads just never actually race at that spot… I’ve been using gcc’s thread sanitizer, so maybe this might warrant checking w/ clang’s thread sanitizer…!
This PR uses a lot of acquire/release memory ordering. This is just to minimize any performance impact, since this PR adds a significant amount of atomic ops. I’m pretty sure that only two threads could simultaneously access memory (at least in the areas that this PR deals with) so, from what I understand, using acquire/release should be fine

EDIT: for the second bullet point, now that I think about it, Xlib is a shared library, which means that the only way for TSAN to catch races from Xlib function calls would be to rebuild Xlib w/ TSAN. So I think I’ll need to edit this PR again
~~^FIXED~~
^Just realized that XChangeProperty doesn't modify its input data (or at least I don't think it does). I updated the PR again

last update:
to deal with the one remaining edge case I saw, I changed all instances of
~~wlserver.bCursorHidden.store(!wlserver.bCursorHasImage, std::memory_order_release);~~
to:
~~std::atomic<char> atomicTrue{true};~~
~~atomicTrue.compare_exchange_strong(wlserver.bCursorHidden, wlserver.bCursorHasImage);~~

~~which makes the line a single CAS RMW operation~~

~~Not sure if the change is actually necessary,~~
~~and the compare-and-exchange is definitely slower than a load+store.~~
~~So let me know whether you think I should undo that...~~
~~(edit: for now I've decided to change compare_exchange_strong to compare_exchange_weak, tho this only makes a difference w/ non-x86 architectures)~~
^ I undid the CAS thing because I realized it didn't work the way I was thinking it would

misyltoad · 2024-07-06T03:43:23Z

Please don't do this alignas(std::atomic_ref<bool>::required_alignment)

Just make them atomics.

misyltoad · 2024-07-06T03:44:18Z

You also don't really need to do explicit acquire/release here. We can optimize those later. Just keeo them default (seq_cst)

sharkautarch · 2024-07-06T13:58:23Z

@Joshua-Ashton

Please don't do this alignas(std::atomic_ref::required_alignment)

ok

You also don't really need to do explicit acquire/release here
I don't think we need seq_cst, all seq_cst adds compared to acquire/release is synchronization between different variables, whereas acquire/release already does synchronization between memory read/writes to the same variable.

If you're sure, then I'll change everything to seq_cst

but on the otherhand:
Just doing some quick calculations, if someone was using a 1000hz polling rate mouse, assuming that translates into wlserver_mousemotion() running 1000 times per second (1hz = 1 cycle/second), using seq_cst atomic writes would mean adding 46000 cpu cycles/second to wlserver_mousemotion() on x86 on an intel coffeelake cpu

seq_cst atomic writes to memory on x86 compile down to a mov+xchg (the latter does synchronization) (memory order doesn't make any difference for atomic reads on x86 [as far as the cpu is concerned, that is])
looking up the latency of xchg r,m on https://www.agner.org/optimize/instruction_tables.pdf for intel coffeelake, shows that it has a latency of 23 cycles
there are two atomic stores in wlserver_mousemotion()
2 x 23 x 1000=46000
though now that I think about it, the lowest clock rates of all the coffeelake cpus is 1.7GHz, so: ( (46000Hz)/(1.7GHz) ) * 1second =~ 0.0273ms ... Ok I think I may have just defeated my own point lol
RIP anyone who's still using a pentium cpu: xchg r,m is either >100 cpu cycles (pentium 4) or just high, whatever that means...

on the steam deck's zen 2-based-cpu xchg r,m has a latency of about 18 cycles, tho I guess it's unlikely that someone would use a 1000hz input device on the steam deck

sharkautarch · 2024-07-06T14:21:49Z

@Joshua-Ashton
Ok, I have changed the bCursorHasImage stuff to normal std::atomic stuff
and I changed the acquire/release stuff to seq_cst

misyltoad · 2024-07-06T16:59:16Z

You don't need to specify it... you can just use =

You can also simplify a lot of your code with std::exchange

sharkautarch · 2024-07-06T19:31:38Z

@Joshua-Ashton

You don't need to specify it... you can just use =

ok

You can also simplify a lot of your code with std::exchange

Uhhhhh I'm not sure if you meant to say std::atomic<T>::exchange instead,
because I don't think std::exchange works with atomics:
https://godbolt.org/z/YaMfzvnzd

if you actually meant to say std::atomic<T>::exchange, then I don't see any place where that'd be useful,
because for the places where there is an atomic load + atomic store, ~~I don't want to modify the variable that I do .load() from~~

I had tried to replace some atomic load+atomic stores with one compare and exchange to get around that issue, but wasn't able to use it without getting into the same issue as w/ atomic exchange.

EDIT: I just realized that std::atomic<T>::exchange only modifies the atomic itself -- not whatever's passed into it.
and it also returns the old value of the atomic.
So I was able to replace const bool bCursorWasHidden = wlserver.bCursorHidden.fetch_or(true); with const bool bCursorWasHidden = wlserver.bCursorHidden.exchange(true);

However, c++ std::atomic exchange's function prototype is: T std::atomic<T>::exchange(T desired)
so attempting to replace, say, wlserver.bCursorHidden.store(wlserver.bCursorHasImage.load()) with wlserver.bCursorHidden.exchange(wlserver.bCursorHasImage) I think would be a compile error

sharkautarch · 2024-07-08T13:54:33Z

@Joshua-Ashton
I’ve edited this PR based on your last feedback
Anything else that I need to change?

misyltoad · 2024-07-14T21:13:29Z

src/wlserver.hpp

-	bool bCursorHidden = true;
-	bool bCursorHasImage = true;
+	std::atomic<uint64_t> ulLastMovedCursorTime = 0;
+	std::atomic<char> bCursorHidden = true;


Why char?

For some reason, you can't do fetch_or on a boolean atomic...
Which in the one case where I use it, I can't cleanly replace it with something like an atomic exchange.
only integral atomics can do stuff like fetch_or
https://en.cppreference.com/w/cpp/atomic/atomic#Specializations_for_integral_types
(note that technically on x86_64, w/ how I grab the old value of the atomic when using fetch_or, fetch_or actually gets compiled into an atomic compare exchange (which is still lock-free) because x86_64 lock or instruction does not return the old value of the atomic. Funnily enough, aarch64 and riscv [w/ the right atomic isa support] have their own version of lock or that actually returns the old value of the atomic.)

So I use char, since a bool, when stored in a struct/class (in most implementations), actually occupies the same amount of memory as a char.
Would you prefer I use uint8_t instead of char?

I believe most modern cpus should support lock-free atomics on 8 bit values, so I think it should be fine?

I know that basically all x86_64 cpus support it,
plus the same aarch64 isa extension that provided support for stuff like atomic increment also gave support for operations on a wider variety of atomic sizes, including 8bit:
https://devblogs.microsoft.com/oldnewthing/20220811-00/?p=106963

sharkautarch · 2024-09-01T14:22:01Z

@Joshua-Ashton
Update:
I changed the one remaining fetch_or to a ternary of suspended ? exchange(true) : load
So now bCursorHidden is std::atomic<bool>

sharkautarch force-pushed the wlserver_fix_data_races branch 13 times, most recently from c0251ce to a542451 Compare July 5, 2024 21:04

sharkautarch force-pushed the wlserver_fix_data_races branch from a542451 to d3625ed Compare July 6, 2024 14:19

sharkautarch force-pushed the wlserver_fix_data_races branch 8 times, most recently from ddc0f15 to 69a4a18 Compare July 7, 2024 19:30

misyltoad reviewed Jul 14, 2024

View reviewed changes

sharkautarch requested a review from misyltoad July 19, 2024 17:12

sharkautarch force-pushed the wlserver_fix_data_races branch from 54f8202 to 6ed69b0 Compare September 1, 2024 14:19

steamcompmgr, wlserver: fix various data races found w/ thread sanitizer

387113d

sharkautarch force-pushed the wlserver_fix_data_races branch from 6ed69b0 to 387113d Compare September 1, 2024 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steamcompmgr, wlserver: fix various data races found w/ thread sanitizer #1405

steamcompmgr, wlserver: fix various data races found w/ thread sanitizer #1405

sharkautarch commented Jul 4, 2024 •

edited

Loading

misyltoad commented Jul 6, 2024

misyltoad commented Jul 6, 2024

sharkautarch commented Jul 6, 2024 •

edited

Loading

sharkautarch commented Jul 6, 2024

misyltoad commented Jul 6, 2024

sharkautarch commented Jul 6, 2024 •

edited

Loading

sharkautarch commented Jul 8, 2024

misyltoad Jul 14, 2024

sharkautarch Jul 14, 2024 •

edited

Loading

sharkautarch commented Sep 1, 2024

steamcompmgr, wlserver: fix various data races found w/ thread sanitizer #1405

Are you sure you want to change the base?

steamcompmgr, wlserver: fix various data races found w/ thread sanitizer #1405

Conversation

sharkautarch commented Jul 4, 2024 • edited Loading

misyltoad commented Jul 6, 2024

misyltoad commented Jul 6, 2024

sharkautarch commented Jul 6, 2024 • edited Loading

sharkautarch commented Jul 6, 2024

misyltoad commented Jul 6, 2024

sharkautarch commented Jul 6, 2024 • edited Loading

sharkautarch commented Jul 8, 2024

misyltoad Jul 14, 2024

Choose a reason for hiding this comment

sharkautarch Jul 14, 2024 • edited Loading

Choose a reason for hiding this comment

sharkautarch commented Sep 1, 2024

sharkautarch commented Jul 4, 2024 •

edited

Loading

sharkautarch commented Jul 6, 2024 •

edited

Loading

sharkautarch commented Jul 6, 2024 •

edited

Loading

sharkautarch Jul 14, 2024 •

edited

Loading