UNIX copy lines to new file IF one column matches AND another has a value below 5x10^-8 -
similar question many previous ones (including mine) can't find solution. purely syntax error , cannot figure out how make work.
i have 2 files in unix. in file1 have 5 columns , 6000 rows. trying match rows in file2 rows in file1 if column 1 matches , if value in row 5 of file1 less 0.00000005 said row.
file1:
snps context intergenic risk allele frequency p-value rs9747992 intergenic 1 0.086 2.00e-07 rs2059865 intron 0 0.235 3.00e-07 rs117020818 intergenic 1 0.046 7.00e-07 rs1074145 intergenic 1 0.162 4.00e-09
file2:
snpid hg18chr bp a1 a2 zscore pval ceumaf rs3131972 1 742584 g 0.289 0.7726 . rs3131969 1 744045 g 0.393 0.6946 . rs3131967 1 744197 t c 0.443 0.658 . rs1048488 1 750775 t c -0.289 0.7726 .
i can first part keeps outputting file larger first two. unsure if real result file or full of duplicates? cannot 'less than' command. have tried putting command second pattern , piping it, below:
awk 'fnr==nr{a[$1]=$0;next}{if ($1 in a) {print $0}}' file1 file2 > output | awk '{if (a[$5] < 0.00000005)}'
and
awk 'fnr==nr{a[$1]=$0;next}{if ($1 in && $5 < 0.00000005)} {print $0}}' file1 file2 > output
both times it's giving me same size file larger either file1 or file2. if want examples of tables please say.
tentative solution:
a tentative solution using make new file containing lines file1 have <0.00000005 value. works though know original answer posterity.
awk '$5<=0.00000005' file1 > file11
per comments above, if you're using file2 filter list, need load a[]
array.
i've made small sample of how works, test $28 < .000005
should easy add have in code.
with file data1
1 2 3 4 5 6 7 2 3 4 5 6 7 8 4 5 8 7 8 9 10
and file searchlist
3
then
awk 'fnr==nr{a[$0]=$0;next} fnr!=nr{ if ($2 in a) print $0} #dbg end{for (x in a) print "x="x " a[x]=" a[x] }' searchlist data1
gives output
2 3 4 5 6 7 8
edit per our conversation in comments, best guess without seeing required output be
i've added record in file1 there can match
rs3131972 intergenic 1 0.086 2.00e-07 awk '( fnr==nr && (sprintf("%.07f",$5) < .000000005) ) { a[$1]=$0 #dbg print "a["$1"]="a[$1] next } fnr!=nr{ #dbg print "$1="$1 if ($1 in a)print "matched:" $0 }' file1 file2
the output now
matched:rs3131972 1 742584 g 0.289 0.7726 .
ihth
Comments
Post a Comment