I am trying to get all of terms and related postings which called Terms from a Lucene`s document field(i.e. How to calculate term frequeny in Lucene?). According to documentation there is a method to do that:
public final Terms getTermVector​(int docID, String field) throws IOException
Retrieve term vector for this document and field, or null if term vectors were not indexed. The returned Fields instance acts like a single-document inverted index (the docID will be 0).
There is a field called int docID. What is this?? for a given document what is the id field of that and how does Lucene recognize that?
According to Lucene's documentation i have used StringField as id and it is not a int.
import org.apache.lucene.document.*;
Document doc = new Document();
Field idField = new StringField("id",post.Id,Field.Store.YES);
Field bodyField = new TextField("body", post.Body, Field.Store.YES);
doc.add(idField);
doc.add(bodyField);
I have five question accordingly:
- How does Lucene recognize the
idfield is used asdocIdfor this document? or even Lucene does it or not ?? - I used
Stringfor id but this method give aint. Does it cause a problem? - Is there any appropriate method to get postings?
- I have used
TextField. Is there any way to retrieve term vector(Terms) of that field? I don't want to re-index my doc as explained here, because it is too large (35-GB). - Is there any way to get terms count and get each term frequency from
TextField?