reading complex and uneven data from text using java -

- January 15, 2014

i have read texts file not , little complex in order

index . word / doc_id : position1 postition2 (....and on), doc_id : position1 postition2 (....and on),

so word appear in n number of documents , appear n number of times in document. example copying small section of file, cannot put words occur many times because of space constraints.

example:

13137 . speeding / d85 : 5999  ,  13138 . spell / d53 : 1513  ,  13139 . spelling / d3 : 344 351  ,  13140 . spending / d71 : 398  ,  13141 . spiderman / d60 : 650 733 997 1023 1053 1133 1152 1169  ,  13142 . spiders / d75 : 704  , d91 : 19834  , (...and on)

please me this. also, format file in better way generated file, may can reformat , generate better formatted text file.

thank :)

perhaps should use new line delimiter. here's mean

13137 . speeding / d85 : 5999 13138 . spell / d53 : 1513  13139 . spelling / d3 : 344 351 13140 . spending / d71 : 398 13141 . spiderman / d60 : 650 733 997 1023 1053 1133 1152 1169 13142 . spiders / d75 : 704 , d91 : 19834

in other words, format of following nature

index . word / doc_id : position1 postition2 ... , doc_id : position1 ... index . word / doc_id : position1 postition2 ... , doc_id : position1 ... index . word / doc_id : position1 postition2 ... , doc_id : position1 ...

edit

now can retrieve 1 line @ time, push them scanner or stringtokenizer or use string.split remembering whitespace used delimiter. parse through each token keeping track of .,/,: , ,. know format of each line , separators used; use information , proceed.

Search This Blog

WINAPI

reading complex and uneven data from text using java -

Comments

Post a Comment

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

tomcat - Spring Boot - Application failed to start with classpath -