How can I remove lines containing matching text from two files in Linux? Let's discuss how to perform this task using Linux comm and sort command.
Suppose, you have two files, say file1 and file2, having the following contents:
$ cat file1 email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org
$ cat file2 email@example.com firstname.lastname@example.org
The purpose is to get a file with contents that are unique to file1 (matching lines of file2 should be removed from file1).
So, the resulting file should be as follows.
$ cat file3 email@example.com firstname.lastname@example.org
This can be done with the Linux command “comm”. The basic syntax of this command is as follows:
comm [-1] [-2] [-3 ] file1 file2
-1 Suppress the output column of lines unique to file1. -2 Suppress the output column of lines unique to file2. -3 Suppress the output column of lines duplicated in file1 and file2. file1 Name of the first file to compare. file2 Name of the second file to compare.
Before applying “comm”, we need to sort the input files. So, in order to get the lines unique to file1, we can use a combination of “comm” and “sort” commands as follows.
$ comm -23 <(sort file1) <(sort file2) > file3
The above command will create the file3 file with the unique contents from file1 and file2. You can read comm command man pages for more details.