$ cat twos.txtĬ) For the input file twos.txt, create a file uniq.txt with all the unique lines and dupl.txt with all the duplicate lines. For example, hehe haha and haha hehe will be considered as duplicates. Compare the lines irrespective of order of the fields. Assume space as field separator with two fields on each line. $ cat lines.txtī) Retain only first copy of a line for the input file lines.txt. For example hi there and HI TheRE will be considered as duplicates. Case should be ignored while comparing lines. ExercisesĪ) Retain only first copy of a line for the input file lines.txt. Next chapter will show how to write awk scripts instead of the usual one-liners. If you don't need regexp based separators and if your input is too big to handle, then specialized command line tools sort and uniq will be better suited compared to awk. This chapter showed how to work with duplicate contents, both record and field based. $ awk ' a=1' duplicates.txt duplicates.txt The following example shows it in action along with an illustration of how the logic works. It eliminates line based duplicates while retaining input order. Whole line duplicatesĪwk '!a++' is one of the most famous awk one-liners. Advantage with awk include regexp based field and record separators, input doesn't have to be sorted, and in general more flexibility because it is a programming language. These are typically solved with sort and uniq commands. This could be based on entire line content or based on certain fields. Often, you need to eliminate duplicates from an input file.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |