reading complex and uneven data from text using java -


i have read texts file not , little complex in order

index . word / doc_id : position1 postition2 (....and on), doc_id : position1 postition2 (....and on), 

so word appear in n number of documents , appear n number of times in document. example copying small section of file, cannot put words occur many times because of space constraints.

example:

13137 . speeding / d85 : 5999  ,  13138 . spell / d53 : 1513  ,  13139 . spelling / d3 : 344 351  ,  13140 . spending / d71 : 398  ,  13141 . spiderman / d60 : 650 733 997 1023 1053 1133 1152 1169  ,  13142 . spiders / d75 : 704  , d91 : 19834  , (...and on) 

please me this. also, format file in better way generated file, may can reformat , generate better formatted text file.

thank :)

perhaps should use new line delimiter. here's mean

13137 . speeding / d85 : 5999 13138 . spell / d53 : 1513  13139 . spelling / d3 : 344 351 13140 . spending / d71 : 398 13141 . spiderman / d60 : 650 733 997 1023 1053 1133 1152 1169 13142 . spiders / d75 : 704 , d91 : 19834 

in other words, format of following nature

index . word / doc_id : position1 postition2 ... , doc_id : position1 ... index . word / doc_id : position1 postition2 ... , doc_id : position1 ... index . word / doc_id : position1 postition2 ... , doc_id : position1 ... 

edit

now can retrieve 1 line @ time, push them scanner or stringtokenizer or use string.split remembering whitespace used delimiter. parse through each token keeping track of .,/,: , ,. know format of each line , separators used; use information , proceed.


Comments

Popular posts from this blog

c++ - OpenMP unpredictable overhead -

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

javascript - Wordpress slider, not displayed 100% width -