Pandas create dataframe from distance matrix with multiindex

Question

I have two kinds of objects - F and P - with tags on them and I want to calculate the distances between each tag that belongs to different category and then construct a dataframe with one row per pair of tags from different categories and their distance. The code below seems to do what I want:

import itertools
import operator
from collections import OrderedDict

import numpy as np
import pandas as pd
from scipy.spatial.distance import cdist

i = np.sqrt(2)
j = 2 * i
# dicts mapping category and tag to x, y coordinates
timeframe_f = OrderedDict(
    [(('F1', 'tag1f1'), (0, 0)), (('F2', 'tag1f2'), (-i, -i)), ])
timeframe_p = OrderedDict(
    [(('B1', 'tag1b1'), (i, i)), (('B2', 'tag1b2'), (j, j)),
     (('B2', 'tag2b2'), (2 * j, 2 * j)), ])
# calculate the distances
distances = cdist(np.array(list(timeframe_f.values())),
                  np.array(list(timeframe_p.values())), 'sqeuclidean')
print('distances:\n', distances, '\n')
# here is the matrix with the MultiIndex
distances_matrix = pd.DataFrame(data=distances,
                                index=pd.MultiIndex.from_tuples(
                                    timeframe_f.keys(),
                                    names=['F', 'Ftags']),
                                columns=pd.MultiIndex.from_tuples(
                                    timeframe_p.keys(),
                                    names=['P', 'Ptags']), )
print('distances_matrix:\n', distances_matrix, '\n')
# hacky construction of the data frame
index = list(map(lambda x: operator.add(*x), (
    itertools.product(timeframe_f.keys(), timeframe_p.keys()))))
# print(index)
multi_index = pd.MultiIndex.from_tuples(index)
distances_df = pd.DataFrame(data=distances.ravel(),
                            index=multi_index, ).reset_index()
print('distances_df:\n', distances_df)

It prints:

distances:
 [[  4.  16.  64.]
 [ 16.  36. 100.]] 

distances_matrix:
 P             B1     B2       
Ptags     tag1b1 tag1b2 tag2b2
F  Ftags                      
F1 tag1f1    4.0   16.0   64.0
F2 tag1f2   16.0   36.0  100.0 

distances_df:
   level_0 level_1 level_2 level_3      0
0      F1  tag1f1      B1  tag1b1    4.0
1      F1  tag1f1      B2  tag1b2   16.0
2      F1  tag1f1      B2  tag2b2   64.0
3      F2  tag1f2      B1  tag1b1   16.0
4      F2  tag1f2      B2  tag1b2   36.0
5      F2  tag1f2      B2  tag2b2  100.0

but I would like to find a way to do this directly using the distances_matrix. I had a look at various other questions as:

Python Pandas - How to flatten a hierarchical index in columns: but this manipulates the column names as strings while I want to construct the index using the column product
Pandas Multiindex Groupby on Columns: here we don't have a multiindex in columns, although having a way to use this would be great as I eventually want to group by category

score 2 · Accepted Answer · answered May 30 '18 at 19:04

2

Is this what you need ?

distances_matrix.reset_index().melt(id_vars=['F','Ftags'])
Out[434]: 
    F   Ftags   P   Ptags  value
0  F1  tag1f1  B1  tag1b1    4.0
1  F2  tag1f2  B1  tag1b1   16.0
2  F1  tag1f1  B2  tag1b2   16.0
3  F2  tag1f2  B2  tag1b2   36.0
4  F1  tag1f1  B2  tag2b2   64.0
5  F2  tag1f2  B2  tag2b2  100.0

answered May 30 '18 at 19:04

BENY

317,841
20
164
234

Oh great - let me check - what is this melt method (I really can't seem to wrap my brain around this pandas arcane column/index manipulations)? – Mr_and_Mrs_D May 30 '18 at 19:07
@Mr_and_Mrs_D it is more like reshape , from wide to long format – BENY May 30 '18 at 19:07
I had tried reset index and stack etc combinations to no avail - this is exactly what I want. Do you think given the OrderedDicts I use that this method is guaranteed to give me the correct rows (mapping the correct distances)? Also is there maybe another way to do it - immediately constructing the final dataframe (out of curiosity)? – Mr_and_Mrs_D May 30 '18 at 19:23
Aside: "This function is useful to massage a DataFrame..." - [melt docs](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.melt.html) ... I should have searched for "massage" apparently... – Mr_and_Mrs_D May 30 '18 at 19:25

Pandas create dataframe from distance matrix with multiindex

1 Answers1