This was an interesting problem. Here's what I came up with:
// Utility functions
const isInt = Number.isInteger
const path = (ps = [], obj = {}) =>
ps .reduce ((o, p) => (o || {}) [p], obj)
const assoc = (prop, val, obj) =>
isInt (prop) && Array .isArray (obj)
? [... obj .slice (0, prop), val, ...obj .slice (prop + 1)]
: {...obj, [prop]: val}
const assocPath = ([p = undefined, ...ps], val, obj) =>
p == undefined
? obj
: ps.length == 0
? assoc(p, val, obj)
: assoc(p, assocPath(ps, val, obj[p] || (obj[p] = isInt(ps[0]) ? [] : {})), obj)
// Helper functions
function * getPaths(o, p = []) {
if (Object(o) !== o || Object .keys (o) .length == 0) yield p
if (Object(o) === o)
for (let k of Object .keys (o))
yield * getPaths (o[k], [...p, isInt (Number (k)) ? Number (k) : k])
}
const canonicalPath = (path) =>
path.map (n => isInt (Number (n)) ? 0 : n)
const splitPaths = (xs) =>
Object .values ( xs.reduce (
(a, p, _, __, cp = canonicalPath (p), key = cp .join ('\u0000')) =>
({...a, [key]: a [key] || {canonical: cp, path: p} })
, {}
))
// Main function
const canonicalRep = (data) => splitPaths ([...getPaths (data)])
.reduce (
(a, {path:p, canonical}) => assocPath(canonical, path(p, data), a),
Array.isArray(data) ? [] : {}
)
// Test
const data = [{"dog": "lmn", "tiger": [{"bengoltiger": {"height": {"x": 4}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"width": {"a": 8}}, "indiantiger": {"b": 3}}]}, {"dog": "pqr", "lion": 90, "tiger": [{"bengoltiger": {"width": {"m": 3}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"height": {"n": 8}}, "indiantiger": {"b": 3}}]}]
console .log (
canonicalRep (data)
)
The first few functions are plain utility functions that I would keep in a system library. They have plenty of uses outside this code:
isInt is simply a first-class function alias to Number.isInteger
path finds the nested property of an object along a given pathway
path(['b', 1, 'c'], {a: 10, b: [{c: 20, d: 30}, {c: 40}], e: 50}) //=> 40
assoc returns a new object cloning your original, but with the value of a certain property set to or replaced with the supplied one.
assoc('c', 42, {a: 1, b: 2, c: 3, d: 4}) //=> {a: 1, b: 2, c: 42, d: 4}
Note that internal objects are shared by reference where possible.
assocPath does this same thing, but with a deeper path, building nodes as needed.
assocPath(['a', 'b', 1, 'c', 'd'], 42, {a: {b: [{x: 1}, {x: 2}], e: 3})
//=> {a: {b: [{x: 1}, {c: {d: 42}, x: 2}], e: 3}}
Except for isInt, these borrow their APIs from Ramda. (Disclaimer: I'm a Ramda author.) But these are unique implementations.
The next function, getPaths, is an adaptation of one from another SO answer. It lists all the paths in your object in the format used by path and assocPath, returning an array of values which are integers if the relevant nested object is an array and strings otherwise. Unlike the function from which is was borrowed, it only returns paths to leaf values.
For your original object, it returns an iterator for this data:
[
[0, "dog"],
[0, "tiger", 0, "bengoltiger", "height", "x"],
[0, "tiger", 0, "indiantiger", "foor"],
[0, "tiger", 0, "indiantiger", "paw"],
[0, "tiger", 1, "bengoltiger", "width", "a"],
[0, "tiger", 1, "indiantiger", "b"],
[1, "dog"],
[1, "lion"],
[1, "tiger", 0, "bengoltiger", "width", "m"],
[1, "tiger", 0, "indiantiger", "foor"],
[1, "tiger", 0, "indiantiger", "paw"],
[1, "tiger", 1, "bengoltiger", "height", "n"],
[1, "tiger", 1, "indiantiger", "b"]
]
If I wanted to spend more time on this, I would replace that version of getPaths with a non-generator version, just to keep this code consistent. It shouldn't be hard, but I'm not interested in spending more time on it.
We can't use those results directly to build your output, since they refer to array elements beyond the first one. That's where splitPaths and its helper canonicalPath come in. We create the canonical paths by replacing all integers with 0, giving us a data structure like this:
[{
canonical: [0, "dog"],
path: [0, "dog"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "height", "x"],
path: [0, "tiger", 0, "bengoltiger", "height", "x"]
}, {
canonical: [0, "tiger", 0, "indiantiger", "foor"],
path: [0, "tiger", 0, "indiantiger", "foor"]
}, {
canonical: [0, "tiger", 0, "indiantiger", "paw"],
path: [0, "tiger", 0, "indiantiger", "paw"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "width", "a"],
path: [0, "tiger", 1, "bengoltiger", "width", "a"]
}, {
canonical: [0, "tiger", 0, "indiantiger", "b"],
path: [0, "tiger", 1, "indiantiger", "b"]
}, {
canonical: [0, "lion"],
path: [1, "lion"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "width", "m"],
path: [1, "tiger", 0, "bengoltiger", "width", "m"]
}, {
canonical: [0, "tiger", 0, "bengoltiger", "height", "n"],
path: [1, "tiger", 1, "bengoltiger", "height", "n"]
}]
Note that this function also removes duplicate canonical paths. We originally had both [0, "tiger", 0, "indiantiger", "foor"] and [1, "tiger", 0, "indiantiger", "foor"], but the output only contains the first one.
It does this by storing them in an object under a key created by joining the path together with the non-printable character \u0000. This was the easiest way to accomplish this task, but there is an extremely unlikely failure mode possible 1 so if we really wanted we could do a more sophisticated duplicate checking. I wouldn't bother.
Finally, the main function, canonicalRep builds a representation out of your object by calling splitPaths and folding over the result, using canonical to say where to put the new data, and applying the path function to your path property and the original object.
Our final output, as requested, looks like this:
[
{
dog: "lmn",
lion: 90,
tiger: [
{
bengoltiger: {
height: {
n: 8,
x: 4
},
width: {
a: 8,
m: 3
}
},
indiantiger: {
b: 3,
foor: "b",
paw: "a"
}
}
]
}
]
What's fascinating for me is that I saw this as an interesting programming challenge, although I couldn't really imagine any practical uses for it. But now that I've coded it, I realize it will solve a problem in my current project that I'd put aside a few weeks ago. I will probably implement this on Monday!
Update
Some comments discuss a problem with a subsequent empty value tries to override a prior filled value, causing a loss in data.
This version attempts to alleviate this with the following main function:
const canonicalRep = (data) => splitPaths ([...getPaths (data)])
.reduce (
(a, {path: p, canonical}, _, __, val = path(p, data)) =>
isEmpty(val) && !isEmpty(path(canonical, a))
? a
: assocPath(canonical, val, a),
Array.isArray(data) ? [] : {}
)
using a simple isEmpty helper function:
const isEmpty = (x) =>
x == null || (typeof x == 'object' && Object.keys(x).length == 0)
You might want to update or expand this helper in various ways.
My first pass worked fine with the alternate data supplied, but not when I switched the two entries in the outer array. I fixed that, and also made sure that an empty value is kept if it's not overridden with actual data (that's the z property in my test object.)
I believe this snippet solves the original problem and the new one:
// Utility functions
const isInt = Number.isInteger
const path = (ps = [], obj = {}) =>
ps .reduce ((o, p) => (o || {}) [p], obj)
const assoc = (prop, val, obj) =>
isInt (prop) && Array .isArray (obj)
? [... obj .slice (0, prop), val, ...obj .slice (prop + 1)]
: {...obj, [prop]: val}
const assocPath = ([p = undefined, ...ps], val, obj) =>
p == undefined
? obj
: ps.length == 0
? assoc(p, val, obj)
: assoc(p, assocPath(ps, val, obj[p] || (obj[p] = isInt(ps[0]) ? [] : {})), obj)
const isEmpty = (x) =>
x == null || (typeof x == 'object' && Object.keys(x).length == 0)
function * getPaths(o, p = []) {
if (Object(o) !== o || Object .keys (o) .length == 0) yield p
if (Object(o) === o)
for (let k of Object .keys (o))
yield * getPaths (o[k], [...p, isInt (Number (k)) ? Number (k) : k])
}
// Helper functions
const canonicalPath = (path) =>
path.map (n => isInt (Number (n)) ? 0 : n)
const splitPaths = (xs) =>
Object .values ( xs.reduce (
(a, p, _, __, cp = canonicalPath (p), key = cp .join ('\u0000')) =>
({...a, [key]: a [key] || {canonical: cp, path: p} })
, {}
))
// Main function
const canonicalRep = (data) => splitPaths ([...getPaths (data)])
.reduce (
(a, {path: p, canonical}, _, __, val = path(p, data)) =>
isEmpty(val) && !isEmpty(path(canonical, a))
? a
: assocPath(canonical, val, a),
Array.isArray(data) ? [] : {}
)
// Test data
const data1 = [{"dog": "lmn", "tiger": [{"bengoltiger": {"height": {"x": 4}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"width": {"a": 8}}, "indiantiger": {"b": 3}}]}, {"dog": "pqr", "lion": 90, "tiger": [{"bengoltiger": {"width": {"m": 3}}, "indiantiger": {"foor": "b", "paw": "a"}}, {"bengoltiger": {"height": {"n": 8}}, "indiantiger": {"b": 3}}]}]
const data2 = [{"d": "Foreign Trade: Export/Import: Header Data", "a": "false", "f": [{"g": "TRANSPORT_MODE", "i": "2"}, {"k": "System.String", "h": "6"}], "l": "true"}, {"a": "false", "f": [], "l": "false", "z": []}]
const data3 = [data2[1], data2[0]]
// Demo
console .log (canonicalRep (data1))
console .log (canonicalRep (data2))
console .log (canonicalRep (data3))
.as-console-wrapper {max-height: 100% !important; top: 0}
Why not change assoc?
This update grew out of discussion after I rejected an edit attempt to do the same sort of empty-checking inside assoc. I rejected that as too far removed from the original attempt. When I learned what it was supposed to do, I knew that what had to be changed was canonicalRep or one of its immediate helper functions.
The rationale is simple. assoc is a general-purpose utility function designed to do a shallow clone of an object, changing the named property to the new value. This should not have complex logic regarding whether the value is empty. It should remain simple.
By introducing the isEmpty helper function, we can do all this with only a minor tweak to canonicalRep.
1That failure mode could happen if you had certain nodes containing that separator, \u0000. For instance, if you had paths [...nodes, "abc\u0000", "def", ...nodes] and [...nodes, "abc", "\u0000def", ...nodes], they would both map to "...abc\u0000\u0000def...". If this is a real concern, we could certainly use other forms of deduplication.