Running FastAPI with HuggingFace Transformer Crashing API / Spawning Subprocess

Question

I love FastAPI. However, I'm not well-versed in the nuances of the "guts" behind its magic (uvicorn, watchgod, etc.)

I'm developing an API that will let users interact with a HiggingFace transformer model. Specifically, I'm using a RobertaForQuestionAnswering model.

However, I'm running into a few problems that I'm unable to debug with the VSCode debugger. Trying to "step into" the problem doesn't yield anything at all.

Problems:

Even when I declare the RobertaForQuestionAnswering model at the top of my routes file like so:

tokenizer = RobertaTokenizer.from_pretrained(data_dir)
model = RobertaForQuestionAnswering.from_pretrained(data_dir)
transformer_pipeline = pipeline(task="question-answering", model=model, tokenizer=tokenizer)

@router.post("/test")
async def test():
    return response_model(question="What is my name?", context="My name is Joe")

FastApi starts a new subprocess EVERY TIME the model is mentioned, and re-creates it EVERY TIME, which is very time consuming. I have tried to use FastAPI's dependency injection and a few other methods, but I think these efforts are futile since FastAPI creates a new ephemeral subprocess for every invocation of the transformer model.

When running the FastAPI app with uvicorn, and using a single worker (the default), the API "shuts down" gracefully whenever the transformer model is run. So, I need to run the app with workers=2 like so:

uvicorn.run("project_name.api.webapp:init_fastapi_application",host=host,port=port, log_level='debug', workers=2)

Where init_fastapi_application is just a custom function that returns the app object. Running with multiple workers is a total "hacky" fix to the "graceful shut down" problem I'm having, and I frustratingly can't figure out why it fixes the problem!

TLDR;

How/why/when does FastApi create new subprocesses, and why are they being created every time I invoke a Hugging Face Transformer Model?
Why does running FastApi without multiple workers fail when trying to interact with the transformer model?

Tried:

Dependency injection with FastApi "Depends"
"Await"-ing the long transformer model creation
Using FastAPI "@asynccontextmanager"
Using "@app.on_event("startup")" to load the transformer model before app startup
Creating a singleton TransformerObject class to force FastApi to share the same object instance (which is futile because FastAPI creates a new subprocess every time)

EDIT: I was able to access my state like so:

webapp.py:

def init_fastapi_application():

    @asynccontextmanager
    async def lifespan(app: FastAPI):
        ''' Run at startup
            Initialise the Client and add it to request.state
        '''
        transformer_pipeline = TransformerQA.get_instance().model
        yield {'transformer_pipeline': transformer_pipeline}
        ''' Run on shutdown
            Close the connection
            Clear variables and release the resources
        '''
        print("in asynccontextmanager shutdown block")


    app = FastAPI(lifespan=lifespan)
    app.include_router(api_routers.routes.router)
    return app

routes.py:

@router.post("/test2")
async def test2(request: Request):
    model = request.state._state['transformer_pipeline']
    return model("What is my name?","My name is Joe")

But whenever the model is run (with a single worker),

print("in asynccontextmanager shutdown block")

Gets run, indicating the app gets shut down. Here is the full console output:

######### Start the app
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
######### Can hit endpoints that don't involve my 
######### model, and they work just fine here.
...
######### Hit the /test2 endpoint (returns the payload to the client still which is interesting)
INFO:     Shutting down
INFO:     Waiting for connections to close. (CTRL+C to force quit)
INFO:     127.0.0.1:39138 - "POST /test2 HTTP/1.1" 200 OK
INFO:     Waiting for application shutdown.
in asynccontextmanager shutdown block
INFO:     Application shutdown complete.
INFO:     Finished server process [126197]

Have you seen [this](https://stackoverflow.com/a/76322910/17865804) and [this](https://stackoverflow.com/a/71613757/17865804)? — Chris, Jun 09 '23 at 16:20
Wrap it around JS instead https://huggingface.co/docs/huggingface.js/index? — alvas, Jun 09 '23 at 16:26
Hi @Chris, I updated the question. I'm still getting the mysterious shutting down problem when running with a single worker. — banjoeschmoe, Jun 09 '23 at 17:40
Also, I can't thank you enough for your comment @Chris . Even if I can't figure out why it shuts down when only using one worker, you've solved my problem of re-creating the model every time a route is hit. — banjoeschmoe, Jun 09 '23 at 18:00

Running FastAPI with HuggingFace Transformer Crashing API / Spawning Subprocess

0 Answers0