Apologies if this is very simple or has already been asked, I am new to Python and working with json files, so I'm quite confused.
I have a 9 GB json file scraped from a website. This data consists of information about some 3 million individuals. Each individual has attributes, but not all individuals have the same attributes. An attribute corresponds to a key in the json file, like so:
{
  "_id": "in-00000001",
  "name": {
    "family_name": "Trump",
    "given_name": "Donald"
  },
  "locality": "United States",
  "skills": [
    "Twitter",
    "Real Estate",
    "Golf"
     ],
  "industry": "Government",
  "experience": [
  {
    "org": "Republican",
    "end": "Present",
    "start": "January 2017",
    "title": "President of the United States"
  },
  {
    "org": "The Apprentice",
    "end": "2015",
    "start": "2003",
    "title": "The guy that fires people"
  }]
}
So here, _id, name, locality, skills, industry and experience are attributes (keys). Another profile may have additional attributes, like education, awards, interests, or lack some attribute found in another profile, like the skills attribute, and so on.
What I'd like to do is scan through each profile in the json file, and if a profile contains the attributes skills, industry and experience, I'd like to extract that information and insert it into a data frame (I suppose I need Pandas for this?). From experience, I would want to specifically extract the name of their current employer, i.e. the most recent listing under org. The data frame would look like this:
    Industry   | Current employer | Skills
    ___________________________________________________________________
    Government | Republican       | Twitter, Real Estate, Golf
    Marketing  | Marketers R Us   | Branding, Social Media, Advertising
... and so on for all profiles with these three attributes.
I'm struggling to find a good resource that explains how to do this kind of thing, hence my question.
I suppose rough pseudocode would be:
for each profile in open(path to .json file):
    if profile has keys "experience", "industry" AND "skills":
        on the same row of the data frame:
            insert current employer into "current employer" column of 
            data frame
            insert industry into "industry" column of data frame
            insert list of skills into "skills" column of data frame
I just need to know how to write this in Python.
 
     
    