apache - Error in remote streaming pdf files to Solr -


i trying stream remote files solr indexing using stream.url parameter as

curl 'http://localhost:8983/solr/update/csv?stream.url=http://www.artofproblemsolving.com/resources/papers/satont.pdf&stream.contenttype=application/pdf;charset=utf-8' 

following solution here remote streaming solr. however, solr server throws error

<?xml version="1.0" encoding="utf-8"?> <response> <lst name="responseheader"> <int name="status">400</int> <int name="qtime">518</int> </lst>        <lst name="error"> <str name="msg">document missing mandatory uniquekey field: id</str><int name="code">400</int> </lst> </response> 

i tried looking in solr documentation , wiki pages couldn't find single example. appreciated.

update

here schema.xml file - http://pastebin.com/akmrud9n

the problem there 1 field, i.e., id required="true" multivalued="false" properties , being used uniquekey as

<uniquekey>id</uniquekey> 

and there must field set uniquekey else solr remote streaming doesn't work. field should use instead of id then?

you trying send pdf file legacy csv import endpoint. so, strange things , complains.

you want use extract handler. covers lot of information, including giving example pdf file , setting id explicitly:

curl "http://example.com:8983/solr/update/extract?stream.file=/path/to/file/stateslefttovisit.doc&stream.contenttype=application/msword&literal.id=states.doc"


Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -