Multi-tenancy with SQLAlchemy

Question

I've got a web-application which is built with Pyramid/SQLAlchemy/Postgresql and allows users to manage some data, and that data is almost completely independent for different users. Say, Alice visits alice.domain.com and is able to upload pictures and documents, and Bob visits bob.domain.com and is also able to upload pictures and documents. Alice never sees anything created by Bob and vice versa (this is a simplified example, there may be a lot of data in multiple tables really, but the idea is the same).

Now, the most straightforward option to organize the data in the DB backend is to use a single database, where each table (pictures and documents) has user_id field, so, basically, to get all Alice's pictures, I can do something like

user_id = _figure_out_user_id_from_domain_name(request)
pictures = session.query(Picture).filter(Picture.user_id==user_id).all()

This is all easy and simple, however there are some disadvantages

I need to remember to always use additional filter condition when making queries, otherwise Alice may see Bob's pictures;
If there are many users the tables may grow huge
It may be tricky to split the web application between multiple machines

So I'm thinking it would be really nice to somehow split the data per-user. I can think of two approaches:

Have separate tables for Alice's and Bob's pictures and documents within the same database (Postgres' Schemas seems to be a correct approach to use in this case):

documents_alice
documents_bob
pictures_alice
pictures_bob

and then, using some dark magic, "route" all queries to one or to the other table according to the current request's domain:

_use_dark_magic_to_configure_sqlalchemy('alice.domain.com')
pictures = session.query(Picture).all()  # selects all Alice's pictures from "pictures_alice" table
...
_use_dark_magic_to_configure_sqlalchemy('bob.domain.com')
pictures = session.query(Picture).all()  # selects all Bob's pictures from "pictures_bob" table

Use a separate database for each user:
```
- database_alice
   - pictures
   - documents
- database_bob
   - pictures
   - documents 
```
which seems like the cleanest solution, but I'm not sure if multiple database connections would require much more RAM and other resources, limiting the number of possible "tenants".

So, the question is, does it all make sense? If yes, how do I configure SQLAlchemy to either modify the table names dynamically on each HTTP request (for option 1) or to maintain a pool of connections to different databases and use the correct connection for each request (for option 2)?

Closely related: http://stackoverflow.com/questions/9298296/sqlalchemy-support-of-postgres-schemas — Craig Ringer, Nov 14 '12 at 02:34
@CraigRinger: yeah, if the "SET search_path TO ..." thingie from the accepted answer works, that would be a solution for option #1. Thanks. — Sergey, Nov 14 '12 at 03:18
If you want to avoid sharding your database right off the bat, there are a pair of recipes on sqlalchemy.org for [Pre-Filtered Queries](http://www.sqlalchemy.org/trac/wiki/UsageRecipes/PreFilteredQuery) and [Global Filters](http://www.sqlalchemy.org/trac/wiki/UsageRecipes/GlobalFilter) that may help you avoid pulling data that you don't want by accident. — Sean Vieira, Nov 14 '12 at 03:24

score 9 · Answer 1 · answered Sep 12 '13 at 06:36

After pondering on jd's answer I was able to achieve the same result for postgresql 9.2, sqlalchemy 0.8, and flask 0.9 framework:

from sqlalchemy import event
from sqlalchemy.pool import Pool
@event.listens_for(Pool, 'checkout')
def on_pool_checkout(dbapi_conn, connection_rec, connection_proxy):
    tenant_id = session.get('tenant_id')
    cursor = dbapi_conn.cursor()
    if tenant_id is None:
        cursor.execute("SET search_path TO public, shared;")
    else:
        cursor.execute("SET search_path TO t" + str(tenant_id) + ", shared;")
    dbapi_conn.commit()
    cursor.close()

score 4 · Accepted Answer · answered Jan 05 '13 at 00:07

Ok, I've ended up with modifying search_path in the beginning of every request, using Pyramid's NewRequest event:

from pyramid import events

def on_new_request(event):

    schema_name = _figire_out_schema_name_from_request(event.request)
    DBSession.execute("SET search_path TO %s" % schema_name)


def app(global_config, **settings):
    """ This function returns a WSGI application.

    It is usually called by the PasteDeploy framework during
    ``paster serve``.
    """

    ....

    config.add_subscriber(on_new_request, events.NewRequest)
    return config.make_wsgi_app()

Works really well, as long as you leave transaction management to Pyramid (i.e. do not commit/roll-back transactions manually, letting Pyramid to do that at the end of request) - which is ok as committing transactions manually is not a good approach anyway.

score 3 · Answer 3 · answered Dec 15 '12 at 09:39

What works very well for me it to set the search path at the connection pool level, rather than in the session. This example uses Flask and its thread local proxies to pass the schema name so you'll have to change schema = current_schema._get_current_object() and the try block around it.

from sqlalchemy.interfaces import PoolListener
class SearchPathSetter(PoolListener):
    '''
    Dynamically sets the search path on connections checked out from a pool.
    '''
    def __init__(self, search_path_tail='shared, public'):
        self.search_path_tail = search_path_tail

    @staticmethod
    def quote_schema(dialect, schema):
        return dialect.identifier_preparer.quote_schema(schema, False)

    def checkout(self, dbapi_con, con_record, con_proxy):
        try:
            schema = current_schema._get_current_object()
        except RuntimeError:
            search_path = self.search_path_tail
        else:
            if schema:
                search_path = self.quote_schema(con_proxy._pool._dialect, schema) + ', ' + self.search_path_tail
            else:
                search_path = self.search_path_tail
        cursor = dbapi_con.cursor()
        cursor.execute("SET search_path TO %s;" % search_path)
        dbapi_con.commit()
        cursor.close()

At engine creation time:

engine = create_engine(dsn, listeners=[SearchPathSetter()])

`current_schema` is a proxy created by an instance of `werkzeug.local.Local()`. Something like `thread_locals = Local(); current_schema = thread_locals('schema')`. The current value of the schema is set at the start of a request. It's a convenient way of having a globally accessible value tied to the current thread. — jd., Sep 12 '13 at 09:02

Multi-tenancy with SQLAlchemy

3 Answers3

Linked