Spacy 3 Confidence Score on Named-Entity recognition

Question

I need to get a confidence score for the tags predicted by NER 'de_core_news_lg' model. There was a well known solution to the problem in the Spacy 2:

nlp = spacy.load('de_core_news_lg')
doc = nlp('ich möchte mit frau Mustermann in der Musterbank sprechen')
text = content
doc = nlp.make_doc(text)
beams = nlp.entity.beam_parse([doc], beam_width=16, beam_density=0.0001)
for score, ents in nlp.entity.moves.get_beam_parses(beams[0]):
    print (score, ents)
    entity_scores = defaultdict(float)
    for start, end, label in ents:
        # print ("here")
        entity_scores[(start, end, label)] += score
        print ('entity_scores', entity_scores)

However, in Spacy 3 I get the following error:

AttributeError: 'German' object has no attribute 'entity'

Obviously language object does not have entity attribute anymore. Does anyone know how to get the confidence scores in Spacy 3?

This is a dupe of this https://stackoverflow.com/questions/67421308/spacy-3-beam-parse-for-ner-probability — polm23, May 13 '21 at 05:41

mbrunecky · Answer 1 · 2021-04-29T17:01:20.463

The core of the answer is 'use pipeline component "beam_ner", and look at the EntityRecognizer.pyx code. Then there is unit test test_ner.py test_beam_ner_scores() which pretty much shows how to do it. If you want to see how to modify your config,cfg, save the model (as done in make_nlp() below) and look at saved model config.cfg.

The PROBLEM is that it only works for the unit test generated 'model'. It fails miserably for my real models (5000 docs ~4k text each, training NER f-scores about 75%). By 'miserably' I mean the 'greedy' search finds my entities, but 'beam search' reports hundreds of tokens (even punctuations) with 'scores' such as 0.013. And (based upon offsets) those usually come from a small section of the document.

This is frustrating, because I believe the spacy train (for 'beam_ner') uses the same code to 'validate' training iterations, and training-reported scores are almost decent (well, 10% below Spacy 2, but that happens bot for training with 'ner' and 'beam_ner').

So I am posting this in a hope that someone has a better luck OR can point out WHAT am I doing wrong.

So far Spacy3 has been a major disaster for me: Can not get confidences, no more can I use GPU (I have only 6GB), the Ray based parallelization does not work (on Windows = experimental) and by using 'transformer' based model my training NER scores are 10% worse than in Spacy 2.

Code

import spacy
from spacy.lang.en import English
from spacy.language import Language
from spacy.tokens import Doc
from spacy.training import Example

# Based upon test_ner.py test_beam_ner_scores()

TRAIN_DATA = [
    ("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}),
    ("I like London and Berlin.",  {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]}),
    ("You like Paris and Prague.", {"entities": [(9, 14, "LOC"), (19, 25, "LOC")]}),
]

def make_nlp(model_dir):
    # ORIGINALLY: Test that we can get confidence values out of the beam_ner pipe
    nlp = English()
    config = { "beam_width": 32, "beam_density": 0.001 }
    ner = nlp.add_pipe("beam_ner", config=config)
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])
    optimizer = nlp.initialize()
    # update once
    losses = {}
    nlp.update(train_examples, sgd=optimizer, losses=losses)
    # save
    #if not model_dir.exists():
    #model_dir.mkdir()
    nlp.to_disk(model_dir)
    print("Saved model to", model_dir)
    return nlp


def test_greedy(nlp, text):
    # Report predicted entities using the default 'greedy' search (no confidences)
    doc = nlp(text)    
    print("GREEDY search");
    for ent in doc.ents:
        print("Greedy offset=", ent.start_char, "-", ent.end_char, ent.label_, "text=",  ent.text)
 
def test_beam(nlp, text):
    # Report predicted entities using the beam search (beam_width 16 or higher)
    ner = nlp.get_pipe("beam_ner")

    # Get the prediction scores from the beam search
    doc = nlp.make_doc(text)
    docs = [doc]
    # beams = StateClass returned from ner.predict(docs)
    beams = ner.predict(docs)
    print("BEAM search, labels", ner.labels);

    # Show individual entities and their scores as reported
    scores = ner.scored_ents(beams)[0]
    for ent, sco in scores.items():
        tok = doc[ent[0]]
        lbl = ent[2]
        spn = doc[ent[0]: ent[1]]           
        print('Beam-search', ent[0], ent[1], 'offset=', tok.idx, lbl, 'score=', sco,
              'text=', spn.text.replace('\n', '  '))

MODEL_DIR = "./test_model"
TEST_TEXT = "I like London and Paris."
  
if __name__ == "__main__":
    # You may have to repeat make_nlp() several times to produce a semi-decent 'model'
    # nlp = make_nlp(MODEL_DIR)
    nlp = spacy.load(MODEL_DIR)
    test_greedy(nlp, TEST_TEXT)
    test_beam  (nlp, TEST_TEXT)

The result should look like (after repeating make_nlp to generate a usable 'model'):

GREEDY search
Greedy offset= 7 - 13 LOC text= London
Greedy offset= 18 - 23 LOC text= Paris
BEAM search, labels ('LOC', 'PERSON')
Beam-search 2 3 offset= 7 LOC score= 0.5315668466265199 text= London
Beam-search 4 5 offset= 18 LOC score= 0.7206478212662492 text= Paris
Beam-search 0 1 offset= 0 LOC score= 0.4679245513356703 text= I
Beam-search 3 4 offset= 14 LOC score= 0.4670399792743775 text= and
Beam-search 5 6 offset= 23 LOC score= 0.2799470367073933 text= .
Beam-search 1 2 offset= 2 LOC score= 0.21658368070744227 text= like

score 0 · Answer 2 · answered May 04 '21 at 06:57

There is currently not a good way to get the confidence for NER scores in spaCy v3. However, there's a SpanCategorizer component in development that will make this easy to do. It's not certain but we hope to release it in the next minor version. You can follow development in the PR for the feature or read more about it here.

Spacy 3 Confidence Score on Named-Entity recognition

2 Answers2

Linked