Assume I have a DataFrame df like:
source      tables      columns   data_type   length
src1        table1      col1      INT         4
src1        table1      col2      CHAR        2
src1        table2      col1      CHAR        2
src2        table1      col1      INT         4
src2        table1      col2      DATE        3
NOTE: the DataFrame also has another 4 columns which aren't relevant to the problem
Need an output that looks similar to:
{
  "src1": {
    "table1": {
      "col1": {
        "type": "INT"
        "length": 4
      },
      "col2": {
        "type": "CHAR"
        "length": 2
      }
    },
    "table2": {
      "col1": {
        "type": "CHAR"
        "length": 2
      }
    }
  },
  "src2": {
    "table1": {
      "col1": {
        "type": "INT"
        "length": 4
      },
      "col2": {
        "type": "DATE"
        "length": 3
      }
    }
  }
}
The code the I currently have produces the same output as above with the exclusion of the actual data type values (ie. instead of "type": "CHAR", I'm getting "type": "") as I'm not sure how I'd be able to nest the values accordingly. Here is the code:
def make_nested(df): 
        f = lambda: defaultdict(f)   
        data = f()  
        for row in df.to_numpy().tolist():
            t = data
            for r in row[:-6]:
                t = t[r]
            t[row[-6]] = {
                "type": '',
                "length": ''
            }
        return data
My question is how can I properly append the data_type and length column values into each columns JSON object without sacrificing the exact format? Thanks.
 
    