Skip to content

[Question] Appending to existing parquet files in Azure Blob Storage #471

Closed Answered by aloneguid
kylemottmac asked this question in Q&A
Discussion options

You must be logged in to vote

I don't think that's possible, because block blobs do not support random access, which is required by append. In general, you should treat parquet files as immutable. Most big data engines impelment "append" operation as creating a new file and uploading to blob storage. This is in fact cheaper for the writer and more performant for the reader to manage, as entire large file does not need to be downloaded every time.

In addition to that, most parquet libraries do not support append mode (see https://issues.apache.org/jira/browse/PARQUET-1022), and parquet.net implements it as a bonus ;)

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@kylemottmac
Comment options

Answer selected by kylemottmac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants