I'm running a model using GLM (using ML in Spark 2.0) on data that has one categorical independent variable.  I'm converting that column into dummy variables using StringIndexer and OneHotEncoder, then using VectorAssembler to combine it with a continuous independent variable into a column of sparse vectors.  
If my column names are continuous and categorical where the first is a column of floats and the second is a column of strings denoting (in this case, 8) different categories:
string_indexer = StringIndexer(inputCol='categorical', 
                               outputCol='categorical_index')
encoder = OneHotEncoder(inputCol ='categorical_index',
                        outputCol='categorical_vector')
assembler = VectorAssembler(inputCols=['continuous', 'categorical_vector'],
                            outputCol='indep_vars')
pipeline  = Pipeline(stages=string_indexer+encoder+assembler)
model = pipeline.fit(df)
df = model.transform(df)
Everything works fine to this point, and I run the model:
glm = GeneralizedLinearRegression(family='gaussian', 
                                  link='identity',
                                  labelCol='dep_var',
                                  featuresCol='indep_vars')
model = glm.fit(df)
model.params
Which outputs:
DenseVector([8440.0573, 3729.449, 4388.9042, 2879.1802, 4613.7646, 5163.3233, 5186.6189, 5513.1392])
Which is great, because I can verify that these coefficients are essentially correct (via other sources). However, I haven't found a good way to link these coefficients to the original column names, which I need to do (I've simplified this model for SO; there's more involved.)
The relationship between column names and coefficients is broken by StringIndexer and OneHotEncoder.  I've found one fairly slow way:
df[['categorical', 'categorical_index']].distinct()
Which gives me a small dataframe relating the the string names to the numerical names, which I think I could then relate back to the keys in the sparse vector? This is very clunky and slow though, when you consider the scale of the data.
Is there a better way to do this?