UNIX copy lines to new file IF one column matches AND another has a value below 5x10^-8 -


similar question many previous ones (including mine) can't find solution. purely syntax error , cannot figure out how make work.

i have 2 files in unix. in file1 have 5 columns , 6000 rows. trying match rows in file2 rows in file1 if column 1 matches , if value in row 5 of file1 less 0.00000005 said row.

file1:

snps    context intergenic  risk allele frequency   p-value rs9747992   intergenic  1   0.086   2.00e-07 rs2059865   intron  0   0.235   3.00e-07 rs117020818 intergenic  1   0.046   7.00e-07 rs1074145   intergenic  1   0.162   4.00e-09 

file2:

snpid hg18chr bp a1 a2 zscore pval ceumaf rs3131972   1   742584    g   0.289   0.7726  . rs3131969   1   744045    g   0.393   0.6946  . rs3131967   1   744197  t   c   0.443   0.658   . rs1048488   1   750775  t   c   -0.289  0.7726  . 

i can first part keeps outputting file larger first two. unsure if real result file or full of duplicates? cannot 'less than' command. have tried putting command second pattern , piping it, below:

awk 'fnr==nr{a[$1]=$0;next}{if ($1 in a) {print $0}}' file1 file2 > output | awk '{if (a[$5] < 0.00000005)}' 

and

awk 'fnr==nr{a[$1]=$0;next}{if ($1 in && $5 < 0.00000005)} {print $0}}' file1 file2 > output 

both times it's giving me same size file larger either file1 or file2. if want examples of tables please say.

tentative solution:

a tentative solution using make new file containing lines file1 have <0.00000005 value. works though know original answer posterity.

awk '$5<=0.00000005' file1 > file11 

per comments above, if you're using file2 filter list, need load a[] array.

i've made small sample of how works, test $28 < .000005 should easy add have in code.

with file data1

1 2 3 4 5 6 7 2 3 4 5 6 7 8 4 5 8 7 8 9 10 

and file searchlist

3 

then

awk 'fnr==nr{a[$0]=$0;next}      fnr!=nr{ if ($2 in a) print $0}      #dbg end{for (x in a) print "x="x " a[x]=" a[x] }' searchlist data1 

gives output

2 3 4 5 6 7 8 

edit per our conversation in comments, best guess without seeing required output be

i've added record in file1 there can match

rs3131972   intergenic  1   0.086   2.00e-07   awk '( fnr==nr && (sprintf("%.07f",$5) < .000000005) ) {         a[$1]=$0         #dbg print "a["$1"]="a[$1]         next     }      fnr!=nr{          #dbg print "$1="$1          if ($1 in a)print "matched:" $0     }' file1 file2 

the output now

matched:rs3131972   1   742584    g   0.289   0.7726  . 

ihth


Comments

Popular posts from this blog

c++ - OpenMP unpredictable overhead -

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

javascript - Wordpress slider, not displayed 100% width -