Tuesday, October 14, 2014

Operate on the body of a file but not the header

Sometimes you need to run some UNIX command on a file but only want to operate on the body of the file, not the header. Create a file called body somewhere in your $PATH, make it executable, and add this to it:
#!/bin/bash
IFS= read -r header
printf '%s\n' "$header"
eval $@
Now, when you need to run something but ignore the header, use the body command first. For example, we can create a simple data set with a header row and some numbers:
$ echo -e "header\n1\n5\n4\n7\n3"
header
1
5
4
7
3
We can pipe the whole thing to sort:
$ echo -e "header\n1\n5\n4\n7\n3" | sort
1
3
4
5
7
header
Oops, we don’t want the header to be included in the sort. Let’s use the body command to operate only on the body, skipping the header:
$ echo -e "header\n1\n5\n4\n7\n3" | body sort
header
1
3
4
5
7
Sure, there are other ways to solve the problem with sort, but body will solve many more problems. If you have multiple header rows, just call body multiple times.
Inspired by this post on Stack Exchange.

2 comments:

  1. Why not using awk then ? You only have to call it once, whatever the number of header lines is:
    Example: if your file has 4 header lines

    awk 'NR>4' myfile.txt | sort

    ReplyDelete
    Replies
    1. Thanks for that tip. I wonder if either one of these is faster than another? Either way, I'd rather type "body" than the awk command with special characters, but I suppose you could always alias body to awk 'NR>1' or something like that.

      Delete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.