The core of the answer is 'use pipeline component "beam_ner", and look at the EntityRecognizer.pyx code. Then there is unit test test_ner.py test_beam_ner_scores() which pretty much shows how to do it.
If you want to see how to modify your config,cfg, save the model (as done in make_nlp() below) and look at saved model config.cfg.
The PROBLEM is that it only works for the unit test generated 'model'. It fails miserably for my real models (5000 docs ~4k text each, training NER f-scores about 75%).
By 'miserably' I mean the 'greedy' search finds my entities, but 'beam search' reports hundreds of tokens (even punctuations) with 'scores' such as 0.013. And (based upon offsets) those usually come from a small section of the document.
This is frustrating, because I believe the spacy train (for 'beam_ner') uses the same code to 'validate' training iterations, and training-reported scores are almost decent (well, 10% below Spacy 2, but that happens bot for training with 'ner' and 'beam_ner').
So I am posting this in a hope that someone has a better luck OR can point out WHAT am I doing wrong.
So far Spacy3 has been a major disaster for me: Can not get confidences, no more can I use GPU (I have only 6GB), the Ray based parallelization does not work (on Windows = experimental) and by using 'transformer' based model my training NER scores are 10% worse than in Spacy 2.
Code
import spacy
from spacy.lang.en import English
from spacy.language import Language
from spacy.tokens import Doc
from spacy.training import Example
# Based upon test_ner.py test_beam_ner_scores()
TRAIN_DATA = [
    ("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}),
    ("I like London and Berlin.",  {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]}),
    ("You like Paris and Prague.", {"entities": [(9, 14, "LOC"), (19, 25, "LOC")]}),
]
def make_nlp(model_dir):
    # ORIGINALLY: Test that we can get confidence values out of the beam_ner pipe
    nlp = English()
    config = { "beam_width": 32, "beam_density": 0.001 }
    ner = nlp.add_pipe("beam_ner", config=config)
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])
    optimizer = nlp.initialize()
    # update once
    losses = {}
    nlp.update(train_examples, sgd=optimizer, losses=losses)
    # save
    #if not model_dir.exists():
    #model_dir.mkdir()
    nlp.to_disk(model_dir)
    print("Saved model to", model_dir)
    return nlp
def test_greedy(nlp, text):
    # Report predicted entities using the default 'greedy' search (no confidences)
    doc = nlp(text)    
    print("GREEDY search");
    for ent in doc.ents:
        print("Greedy offset=", ent.start_char, "-", ent.end_char, ent.label_, "text=",  ent.text)
 
def test_beam(nlp, text):
    # Report predicted entities using the beam search (beam_width 16 or higher)
    ner = nlp.get_pipe("beam_ner")
    # Get the prediction scores from the beam search
    doc = nlp.make_doc(text)
    docs = [doc]
    # beams = StateClass returned from ner.predict(docs)
    beams = ner.predict(docs)
    print("BEAM search, labels", ner.labels);
    # Show individual entities and their scores as reported
    scores = ner.scored_ents(beams)[0]
    for ent, sco in scores.items():
        tok = doc[ent[0]]
        lbl = ent[2]
        spn = doc[ent[0]: ent[1]]           
        print('Beam-search', ent[0], ent[1], 'offset=', tok.idx, lbl, 'score=', sco,
              'text=', spn.text.replace('\n', '  '))
MODEL_DIR = "./test_model"
TEST_TEXT = "I like London and Paris."
  
if __name__ == "__main__":
    # You may have to repeat make_nlp() several times to produce a semi-decent 'model'
    # nlp = make_nlp(MODEL_DIR)
    nlp = spacy.load(MODEL_DIR)
    test_greedy(nlp, TEST_TEXT)
    test_beam  (nlp, TEST_TEXT)
The result should look like (after repeating make_nlp to generate a usable 'model'):
GREEDY search
Greedy offset= 7 - 13 LOC text= London
Greedy offset= 18 - 23 LOC text= Paris
BEAM search, labels ('LOC', 'PERSON')
Beam-search 2 3 offset= 7 LOC score= 0.5315668466265199 text= London
Beam-search 4 5 offset= 18 LOC score= 0.7206478212662492 text= Paris
Beam-search 0 1 offset= 0 LOC score= 0.4679245513356703 text= I
Beam-search 3 4 offset= 14 LOC score= 0.4670399792743775 text= and
Beam-search 5 6 offset= 23 LOC score= 0.2799470367073933 text= .
Beam-search 1 2 offset= 2 LOC score= 0.21658368070744227 text= like