I'm working on a Hadoop cluster (HDP) with Hadoop 3. Spark and Hive are also installed.
Since Spark and Hive catalogs are separated, it's a bit confusing sometimes, to know how and where to save data in a Spark application.
I know, that the property spark.sql.catalogImplementation can be set to either in-memory (to use a Spark session-based catalog) or hive (using Hive catalog for persistent metadata storing -> but the metadata is still separated from the Hive DBs and tables).
I'm wondering what the property metastore.catalog.default does. When I set this to hive I can see my Hive tables, but since the tables are stored in the /warehouse/tablespace/managed/hive directory in HDFS, my user has no access to this directory (because hive is of course the owner).
So, why should I set the metastore.catalog.default = hive, if I can't access the tables from Spark? Does it have something to do with Hortonwork's Hive Warehouse Connector?
Thank you for your help.