"Mike" == Mike Peckar <fog@fognet.com> writes:
Mike> This seemed like it should be simple, but I’m at wits end. I Mike> simply want to find duplicates in the third column of a csv Mike> file, and output the duplicate line _and_ the original line Mike> that matched it. There’s a million examples out there that Mike> will output just the duplicate but not both. Mike> In the data below, I’m looking for lines that match in the 3^rd column… The sorting part is easy... sort -k 3 -t "," <file> Now to find the duplicates... I'd probably jump to a perl script: perl -e '@a=(<>);foreach @a {@t=split(",",$_) {push @{$t{$t[2]}},$_; } foreach sort (keys %t) { if ($#{$t{$_}} > 0) { print @{$t{$_}}, "\n";}}' Should also do the right thing. First it splits into keys, stuffs the line into an assoc array of arrays. Then it sorts the assoc array and prints out those with more than one entry in it. Admittedly done off the top of my head, without any actual testing. :-) Mike> Normal,Server,xldspntc02,,10.33.52.185, Mike> Normal,Server,xldspntc02,,10.33.52.186, Mike> Normal,Server,xldspntc04,,10.33.52.187, Mike> Normal,Server,xldspntcs01,10.33.16.198, Mike> Normal,Server,xldspntcs01,,10.33.16.199, Mike> Normal,Server,xldsps01,10.33.16.162, Mike> Normal,Server,xldsps02,10.33.16.163, Mike> My desired output would be: Mike> Normal,Server,xldspntc02,,10.33.52.185, Mike> Normal,Server,xldspntc02,,10.33.52.186, Mike> Normal,Server,xldspntcs01,10.33.16.198, Mike> Normal,Server,xldspntcs01,,10.33.16.199, Mike> $ awk -F, 'dup[$3]++' file.csv Mike> I played around with the prev variable, but could not pumb it out fully, Mike> e.g { print prev } Mike> Mike Mike> _______________________________________________ Mike> Wlug mailing list Mike> Wlug@mail.wlug.org Mike> http://mail.wlug.org/mailman/listinfo/wlug
participants (1)
-
John Stoffel