cpdash
is a small cli tool for printing the content of objects on s3 to stdout. Objects will be gunzipped before printing, if that fails the raw content will be printed.
Note: Remember that aws will charge you by the byte for data moved from s3 to the internet.
If you have Go installed, installing cpdash
is as easy as
go install ./cmd/cpdash
See go help install
for more info.
The command
cpdash "s3://bucket/prefix**"
is intended to provide functionality similar to what one would expect from the call
aws s3 cp --recursive s3://bucket/prefix -
Currently, aws-cli does not support streaming in combination with --recursive
. A poor substitute is given by
aws s3 ls --recursive s3://bucket/prefix | awk '{print($4)}' | xargs -n 1 -P 32 -I {} sh -c "aws s3 cp s3://bucket/{} -"
That is, download all objects prefixed by prefix
in the bucket bucket
in parallel, printing the content to stdout. The main additional features of cpdash are:
-
Content of different objects will not be interleaved
-
Objects that are gzip or zstd compressed will have their content decompressed before printing
-
More general globbing than just
**
at the end
Group globs will not work if the groups contain path separators /
, i.e. commands like
cpdash "s3://bucket/prefix/{pattern1/pattern2,pattern3/pattern4}/*"
will silently not match any paths.
cpdash -h
Licensed under the Apache License, Version 2, see LICENSE for more information