拆分gzip压缩日志文件而不将未压缩的拆分存储在磁盘上

I have a recurring task of splitting a set of large (about 1-2 GiB each) gzipped Apache logfiles into several parts (say chunks of 500K lines). The final files should be gzipped again to limit the disk usage.

我有一个经常性的任务，即将一组大的（每个大约1-2 GiB）gzip压缩的Apache日志文件拆分成几个部分（比如说500K行的块）。应该再次压缩最终文件以限制磁盘使用。

On Linux I would normally do:

在Linux上我通常会这样做：

zcat biglogfile.gz | split -l500000

The resulting files files will be named xaa, xab, xac, etc So I do:

生成的文件文件将命名为xaa，xab，xac等。所以我这样做：

gzip x*

The effect of this method is that as an intermediate result these huge files are temporarily stored on disk. Is there a way to avoid this intermediate disk usage?

这种方法的效果是，作为中间结果，这些巨大的文件暂时存储在磁盘上。有没有办法避免这种中间磁盘使用？

Can I (in a way similar to what xargs does) have split pipe the output through a command (like gzip) and recompress the output on the fly? Or am I looking in the wrong direction and is there a much better way to do this?

我可以（以类似于xargs的方式）通过命令（如gzip）拆分输出并在运行中重新压缩输出吗？或者我是否朝错误的方向看，是否有更好的方法来做到这一点？

Thanks.

谢谢。

3 个解决方案

#1

You can use the slpit --filter option as explained in the manual e.g.

您可以使用slpit --filter选项，如手册中所述，例如：

zcat biglogfile.gz | split -l500000 --filter='gzip > $FILE.gz'

Edit: not aware when --filter option was introduced but according to comments, it is not working in core utils 8.4.

编辑：不知道何时引入了--filter选项但根据注释，它在核心工具8.4中不起作用。

3 个解决方案

#1

更多相关文章

随机推荐