[Question] Appending to existing parquet files in Azure Blob Storage #471

kylemottmac · 2024-02-01T01:48:33Z

kylemottmac
Feb 1, 2024

I'm working on a project and want to use your library to read&write parquet files to Azure Blob Storage for storing time series data. Parquet seems to be a very good fit for the kinds of workloads my service will need to execute.

I want to append large chunks of rows (as a new row group) to existing parquet files stored in Azure Blob Storage. I can't do this directly because the blob writer creates a stream that is not seekable, as was pointed out in this issue.

Can anyone recommend other workarounds other than downloading the parquet file into a memory stream and persisting the result? It's been a few years since that issue raised.

Answered by aloneguid

Feb 2, 2024

I don't think that's possible, because block blobs do not support random access, which is required by append. In general, you should treat parquet files as immutable. Most big data engines impelment "append" operation as creating a new file and uploading to blob storage. This is in fact cheaper for the writer and more performant for the reader to manage, as entire large file does not need to be downloaded every time.

In addition to that, most parquet libraries do not support append mode (see https://issues.apache.org/jira/browse/PARQUET-1022), and parquet.net implements it as a bonus ;)

View full answer

aloneguid · 2024-02-02T10:56:00Z

aloneguid
Feb 2, 2024
Maintainer

I don't think that's possible, because block blobs do not support random access, which is required by append. In general, you should treat parquet files as immutable. Most big data engines impelment "append" operation as creating a new file and uploading to blob storage. This is in fact cheaper for the writer and more performant for the reader to manage, as entire large file does not need to be downloaded every time.

In addition to that, most parquet libraries do not support append mode (see https://issues.apache.org/jira/browse/PARQUET-1022), and parquet.net implements it as a bonus ;)

1 reply

kylemottmac Feb 6, 2024
Author

Thanks @aloneguid - that's the confirmation I was looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Appending to existing parquet files in Azure Blob Storage #471

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[Question] Appending to existing parquet files in Azure Blob Storage #471

kylemottmac Feb 1, 2024

Replies: 1 comment · 1 reply

aloneguid Feb 2, 2024 Maintainer

kylemottmac Feb 6, 2024 Author

kylemottmac
Feb 1, 2024

Replies: 1 comment 1 reply

aloneguid
Feb 2, 2024
Maintainer

kylemottmac Feb 6, 2024
Author