I would like to use the open data IMDb, but they serve it in TSV format, which is not very convenient.
https://datasets.imdbws.com/title.crew.tsv.gz
tconst  directors   writers
tt0000238   nm0349785   \N
tt0000239   nm0349785   \N
tt0000240   \N  \N
tt0000241   nm0349785   \N
tt0000242   nm0617588   nm0617588
tt0000243   nm0349785   \N
tt0000244   nm0349785   \N
tt0000245   \N  \N
tt0000246   nm0617588   \N
tt0000247   nm0002504,nm0005690,nm2156608   nm0000636,nm0002504
tt0000248   nm0808310   \N
tt0000249   nm0808310   \N
tt0000250   nm0005717   \N
tt0000251   nm0177862   \N
I want to convert TSV data to JSON.
[
  {
    "tconst": "tt0000247",
    "directors": [
      "nm0005690",
      "nm0002504",
      "nm2156608"
    ],
    "writers": [
      "nm0000636",
      "nm0002504"
    ]
  },
  {
    "tconst": "tt0000248",
    "directors": [
      "nm0808310"
    ],
    "writers": [
      "\\N"
    ]
  }
]
I can do this with the command:
jq -rRs 'split("\n")[1:-1] |
         map([split("\t")[]|split(",")] | {
                 "tconst":.[0][0],
                 "directors":.[1],
                 "writers":.[2]
             }
    )' ./title.crew.tsv > ./title.crew.json
However, the file turns out to be very large, I get out of memory errors.
1. How can split this TSV file into several JSON files, each with 1000 records?
./title.crew.page1.json
./title.crew.page2.json
./title.crew.page3.json
2. How can exclude empty fields? To have an empty array.
"writers": [ "\\N" ] -> "writers": [ ]
UPD (The second question was solved.):
jq -rRs 'split("\n")[1:-1] |
         map([split("\t")[]|split(",")] | 
         .[2] |= if .[0] == "\\N" then [] else . end | {
                 "tconst":.[0][0],
                 "directors":.[1],
                 "writers":.[2]
             }
    )' ./title.crew.tsv > ./title.crew.json
[
  {
    "tconst": "tt0000247",
    "directors": [
      "nm0005690",
      "nm0002504",
      "nm2156608"
    ],
    "writers": [
      "nm0000636",
      "nm0002504"
    ]
  },
  {
    "tconst": "tt0000248",
    "directors": [
      "nm0808310"
    ],
    "writers": []
  }
]
Thanks for answers.
 
     
    