I am trying to retrieve all journals that exist within the a subject area of Scopus, say 'Medicine', using the python package pybliometrics.
According to the Scopus search (online), there are 13,477 Journals in this category.
Accessing the SerialTitle API of Scopus via pybliometrics.scopus.SerialSearch() for category Medicine, the subjArea='MEDI' and subjCode='2700'. The list of all codes associated with the Scopus subject categories are listed here
I am not able to get more than 5000 journals. But with parameter subjArea='MEDI' I am able to retrieve 5000+ documents but not more than 10,000.
I do not understand why searching with subjArea and subjCode fetches different results for me. Can anyone help me understand why this could be happening?
I am adding my code for both these search queries for better understanding:
import pandas as pd
from pybliometrics.scopus import SerialSearch
def search_by_subject_area(subject_area):
print("Searching journals by subject area....")
df = pd.DataFrame()
i = 0
# limitation of i<10000 is added otherwise raises error of scopus500
while (i > -1 and i < 10000):
s = SerialSearch(query={"subj": f"{str(subject_area)}"}, start=f'{i}', refresh=True)
if s.get_results_size() == 0:
break
else:
i += s.get_results_size()
df_new = pd.DataFrame(s.results)
df = pd.concat([df, df_new], axis=0, ignore_index=True)
print(i, " journals obtained!")
def search_by_subject_code(code):
print("------------------------------------------------\n Searching journals by subject codes....")
df = pd.DataFrame()
i = 0
while (i > -1):
s = SerialSearch(query={"subjCode": f"{code}"}, start=f'{i}', refresh=True)
if s.get_results_size() == 0:
break
else:
i += s.get_results_size()
df_new = pd.DataFrame(s.results)
df = pd.concat([df, df_new], axis=0, ignore_index=True)
print(i, " journals obtained!")
if __name__ == '__main__':
search_by_subject_area(subject_area = 'MEDI')
search_by_subject_code('2700')