I need to convert hierarchical data (AVRO data, which boils down to JSON) into tabular data (csv). Since AVRO have strict schema, I know essentially what form the JSON will take, but I have to do this for many different schema, so I'm looking for a consistent, declarative way to express the transformations I need to make. For example, if my incoming data looks like this…
{
    "customers": [
        {
            "addresses": [
                {
                    "city": "Los Angeles", 
                    "country": "USA", 
                    "county": null, 
                    "postalCode": "90064", 
                    "stateOrProvince": "California", 
                    "street1": "11832 W. Pico Blvd.", 
                    "street2": "", 
                    "street3": "", 
                    "street4": "", 
                    "tags": [
                        "BILLING"
                    ]
                }
            ], 
            "company": "", 
            "dateCreated": "2009-04-24T11:42:31+00:00", 
            "dateOfBirth": null, 
            "doNotCall": null, 
            "email": {
                "emailAddress": "general@magentocommerce.com"
            }, 
            "emailOptOut": null, 
            "fullName": {
                "firstName": "Test", 
                "lastName": "General", 
                "middleName": "", 
                "prefix": "", 
                "suffix": ""
            }, 
            "gender": null, 
            "id": {
                "Id": "2", 
                "namespace": "1000020016"
            }, 
            "lastModified": "2009-05-08T23:33:06+00:00", 
            "primaryPhone": {
                "number": "866.4.VARIEN", 
                "type": "UNKNOWN"
            }, 
            "sourceIds": null
        }
    ], 
    "totalItemsFound": 3
}
…I might need to output one row for each customer, like this:
MERCHANT ID|NUM CUSTOMERS|ID|FIRST NAME|LAST NAME|EMAIL|PHONE|STREET|CITY|STATE|ZIP|COUNTRY|EMAIL PREFERENCE
some.merch|3|1000020016-2|Test|General|general@magentocommerce.com|866.4.VARIEN|11832 W. Pico Blvd.|Los Angeles|California|90064|USA|N
I need to be able to express the following things:
- Get all the values from a given key as an array: All the dates-of-birth
- Repeat one value over every row: totalItemsFound, repeated in every row
- Repeat a static value in every row that comes from static data I already know the merchant channel never changes
- And the tricky one: Arbitrarily manipulate the incoming data to produce the desired output:
- Convert the customer's id into namespace-id
- Invert and change a null/boolean value into y/n, as in emailOptOut to EMAIL PREFERENCE
- (re-)format a date or currency
- etc
 
I started out with jsonpath, but that only solves #1 above. I've been slowly adding a language around jsonpath to serve 2 and 3, but I really don't have a good answer for 4 (besides eval., and I'd really hate to do that). I looked at JSON/T, but couldn't find a python library for it. I even seriously considered writing a middleware to convert the JSON into XML so that I could use XSLT, but I'm hoping someone here at S/O has a better solution before I get that desperate.
 
     
    