I love FastAPI. However, I'm not well-versed in the nuances of the "guts" behind its magic (uvicorn, watchgod, etc.)
I'm developing an API that will let users interact with a HiggingFace transformer model. Specifically, I'm using a RobertaForQuestionAnswering model.
However, I'm running into a few problems that I'm unable to debug with the VSCode debugger. Trying to "step into" the problem doesn't yield anything at all.
Problems:
- Even when I declare the RobertaForQuestionAnsweringmodel at the top of my routes file like so:
tokenizer = RobertaTokenizer.from_pretrained(data_dir)
model = RobertaForQuestionAnswering.from_pretrained(data_dir)
transformer_pipeline = pipeline(task="question-answering", model=model, tokenizer=tokenizer)
@router.post("/test")
async def test():
    return response_model(question="What is my name?", context="My name is Joe")
FastApi starts a new subprocess EVERY TIME the model is mentioned, and re-creates it EVERY TIME, which is very time consuming. I have tried to use FastAPI's dependency injection and a few other methods, but I think these efforts are futile since FastAPI creates a new ephemeral subprocess for every invocation of the transformer model.
- When running the FastAPI app with uvicorn, and using a single worker (the default), the API "shuts down" gracefully whenever the transformer model is run. So, I need to run the app with workers=2like so:
uvicorn.run("project_name.api.webapp:init_fastapi_application",host=host,port=port, log_level='debug', workers=2)
Where init_fastapi_application is just a custom function that returns the app object. Running with multiple workers is a total "hacky" fix to the "graceful shut down" problem I'm having, and I frustratingly can't figure out why it fixes the problem!
TLDR;
- How/why/when does FastApi create new subprocesses, and why are they being created every time I invoke a Hugging Face Transformer Model? 
- Why does running FastApi without multiple workers fail when trying to interact with the transformer model? 
Tried:
- Dependency injection with FastApi "Depends"
- "Await"-ing the long transformer model creation
- Using FastAPI "@asynccontextmanager"
- Using "@app.on_event("startup")" to load the transformer model before app startup
- Creating a singleton TransformerObject class to force FastApi to share the same object instance (which is futile because FastAPI creates a new subprocess every time)
EDIT: I was able to access my state like so:
webapp.py:
def init_fastapi_application():
    @asynccontextmanager
    async def lifespan(app: FastAPI):
        ''' Run at startup
            Initialise the Client and add it to request.state
        '''
        transformer_pipeline = TransformerQA.get_instance().model
        yield {'transformer_pipeline': transformer_pipeline}
        ''' Run on shutdown
            Close the connection
            Clear variables and release the resources
        '''
        print("in asynccontextmanager shutdown block")
    app = FastAPI(lifespan=lifespan)
    app.include_router(api_routers.routes.router)
    return app
routes.py:
@router.post("/test2")
async def test2(request: Request):
    model = request.state._state['transformer_pipeline']
    return model("What is my name?","My name is Joe")
But whenever the model is run (with a single worker),
print("in asynccontextmanager shutdown block")
Gets run, indicating the app gets shut down. Here is the full console output:
######### Start the app
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
######### Can hit endpoints that don't involve my 
######### model, and they work just fine here.
...
######### Hit the /test2 endpoint (returns the payload to the client still which is interesting)
INFO:     Shutting down
INFO:     Waiting for connections to close. (CTRL+C to force quit)
INFO:     127.0.0.1:39138 - "POST /test2 HTTP/1.1" 200 OK
INFO:     Waiting for application shutdown.
in asynccontextmanager shutdown block
INFO:     Application shutdown complete.
INFO:     Finished server process [126197]
