Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB cannot open because writing to MANIFEST first before writing to SSTs #13161

Open
NeverMore2744 opened this issue Nov 26, 2024 · 2 comments
Open

Comments

@NeverMore2744
Copy link

I run RocksDB 9.7.4 on BlueFS based on remote cloud disks. When I disconnect the disks and reconnect again, I found that I cannot open the DB because the SST size is wrong:

Corruption: file is too short (3582 bytes) to be an sstableblue/000105.sst

Then I open the MANIFEST file and find that it already marked the SST as 3582 bytes, but its actual size is only 0 bytes. It seems that the RocksDB writes the MANIFEST file BEFORE it writes the SST file.

It never happens when I used RocksDB 6.11.4 before for several months.

image
@cbi42
Copy link
Member

cbi42 commented Nov 27, 2024

It seems that the RocksDB writes the MANIFEST file BEFORE it writes the SST file.

Hi @NeverMore2744, for flush and compaction, RocksDB always writes the SST file first before adding them to the MANIFEST. I would suspect that this is more likely to be a file system or storage issue.

@NeverMore2744
Copy link
Author

NeverMore2744 commented Nov 28, 2024

Hi @NeverMore2744, for flush and compaction, RocksDB always writes the SST file first before adding them to the MANIFEST. I would suspect that this is more likely to be a file system or storage issue.

Hi @cbi42 , thank you for your comment. It makes sense to me, and I wonder if it is related to some write options of writing SSTs and MANIFEST. For example, the DB writes asynchronously to SST (or looks to be synchronous but the storage deals with it asynchronously) so the SST write returns immediately, and then the DB updates the MANIFEST.

In my RocksDB options, we should be using direct I/O, i.e., use_direct_io_for_flush_and_compaction=true. It may not guarantee synchronous writes, since "direct" is different from "synchronous"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants