I'm scraping data from ~500 .js files, all of them are formatted like this:
dict[0]=[{"some_key": "<b>名詞</b>", "another_key": "modification"}, {"some_key": "<b>名詞</b>", "another_key": "idea"}]
My code looks like this:
my_file = open(filename, 'r',encoding='utf-8', errors='ignore')
obj = my_file.read()
try:
my_indexer_left = obj.replace('[', 'xxx', 1).find('[')
my_indexer_right = obj.rfind(']')
new_obj = obj[my_indexer_left:my_indexer_right+1]
And after this new_obj is created I can't convert it out of a string.
I tried list(new_obj):
new_list_obj = list(new_obj)
for item in new_list_obj:
print(item)
And while print(type(new_list_obj)) tells me list, the print statement prints out one character at a time.
I've tried several other things along these lines to get this to work.
The closest I came was referencing the answer here to come up with the following:
j = json.dumps(new_obj,ensure_ascii=False).encode('utf8').decode()
But when I print(j) all of the quotation marks (") are turned into \" and when I print(type(j)) it says str.
I want to be able to read these files, iterate over all the dictionary (json) objects and access the keys and values.