reading complex and uneven data from text using java -
i have read texts file not , little complex in order
index . word / doc_id : position1 postition2 (....and on), doc_id : position1 postition2 (....and on),
so word appear in n number of documents , appear n number of times in document. example copying small section of file, cannot put words occur many times because of space constraints.
example:
13137 . speeding / d85 : 5999 , 13138 . spell / d53 : 1513 , 13139 . spelling / d3 : 344 351 , 13140 . spending / d71 : 398 , 13141 . spiderman / d60 : 650 733 997 1023 1053 1133 1152 1169 , 13142 . spiders / d75 : 704 , d91 : 19834 , (...and on)
please me this. also, format file in better way generated file, may can reformat , generate better formatted text file.
thank :)
perhaps should use new line delimiter. here's mean
13137 . speeding / d85 : 5999 13138 . spell / d53 : 1513 13139 . spelling / d3 : 344 351 13140 . spending / d71 : 398 13141 . spiderman / d60 : 650 733 997 1023 1053 1133 1152 1169 13142 . spiders / d75 : 704 , d91 : 19834
in other words, format of following nature
index . word / doc_id : position1 postition2 ... , doc_id : position1 ... index . word / doc_id : position1 postition2 ... , doc_id : position1 ... index . word / doc_id : position1 postition2 ... , doc_id : position1 ...
edit
now can retrieve 1 line @ time, push them scanner
or stringtokenizer
or use string.split
remembering whitespace used delimiter. parse through each token keeping track of .
,/
,:
, ,
. know format of each line , separators used; use information , proceed.
Comments
Post a Comment