I would like to write a scraper that has 3 different "groups" of attributes (or data) that would [and likely should] be kept separately.
I was hoping to use DataClasses and aim at Pythonic practices, but DataClasses don't feel appropriate for reasons stated in more detail later.
The 3 groups [or "interfaces"] are as follows:
#1: HTTP Header fields
- has defaults, but needs to be mutable at/after object instantiation of the (#3) request class object
- ideally acts like a dict when using a request method inside #3 request object
#2: API parameters for the URL query request
- has defaults, but also needs to be mutable at/after instantiation
- ideally acts like a dict when using a request method inside #3 request object
#3:Response Object (the data) after the request is returned to the user from the API server.
- I would later implement methods for the object to have output formats such as
CSV,JSON,SQL DB,S3, etc. That would be [at least] a 4th interface.
The Task I've been trying to Accomplish
I want an interface where a user can instantiate a class, e.g. Player with the API params they need and are also update HTTP header (as needed).
Here's my current code (pic form):

The HTTP Header and the API Params are both easily stored as Python dicts (or JSON). I have included them below.
=> The question is how do I make them mutable in the Request object (Class) at instantiation (creation) and able to be updated after instantiation (creation)?
Inheritance via DataClassses? I have tried to put these dictionaries in DataClasses but they don't like them link as it's a hack to try to get around the
default_factoryusingfieldfrom the dataclass module. It's possible, but defeats using Dataclasses to avoid all the extra syntax. Using Dataclasses also makes it so theMyDataClass.__dict__has way more stuff to it thanPythonClass.__dict__. => Thus use a regular Python Class or Dict...Using a Regular Python Class: There seems to be two options to allow mutability of the HTTP Header at creation. 1)
Inheritance, but that muddies the waters of the attributes of the HTTP Header with the API Params. 2)Composition, setting an attribute field to the HTTPClassHeader and doing some work to be able to convert back to adictto use in therequest_data()method.Putting the Dicts into the Players (Request Class) doesn't allow mutability via a nice keyword interface (or I'm not aware how to implement it).
Here's my code in text form:
class Players:
__endpoint__ = "CommonallPlayers"
def __init__(self, IsOnlyCurrentSeason=0, LeagueID="00", Season="2021-22", header= HTTPHeader) -> None:
# these first 3 attributes constitute the (#2) API Params
self.IsOnlyCurrentSeason = IsOnlyCurrentSeason
self.LeagueID = LeagueID
self.Season = Season
self.header = HTTPHeader # (1) inherit as a Class or Dict?
def encode_api_params(self):
return self.__dict__ # if only 3 attributes, this works, but not if I add more attributes HTTP or self.request_data
def get_http_header(self):
# ideally can return the http_header as a dict
pass
# ideally this is NOT instantiated (as doesn't have data, shouldn't be accessible to user until AFTER request)
def request_data(self):
url_api = f"{BASE_URL}/{self.__endpoint__}"
return requests.get(url_api,
params=self.encode_api_params(),
headers=self.get_http_header())
# works, has current defaults (current season)
c = Players()
# a common use case, using a different Season than the default (current season)
c = Players(Season="1999-00")
# A possible needed change, with 2 possible desired interface
c = Players(Season="1999-00", header={"Referer": "https://www.another-website.com/"})
c = Players(Season="1999-00").header(Referer="https://www.another-website.com/")
# Final outputs
c.request_data().to_csv("downloads/my_data.csv")
c.request_data().to_sql("table-name")
Here's the HTTP HEADER, the API Params, and Request Object in the simplest form are as follows (running these together would return some data):
HTTP_HEADER = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Host": "stats.nba.com",
"Origin": "https://www.nba.com",
"Referer": "https://www.nba.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"Sec-GPC": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36",
"x-nba-stats-origin": "stats",
"x-nba-stats-token": "true",
}
params = {'IsOnlyCurrentSeason': 0, 'LeagueID': '00', 'Season': '2021-22'}
r = requests.get("https://stats.nba.com/stats/commonallplayers", # base url
params=params, # expects (#2) params, the api parameters to be a dict
headers=headers) # expects (#1) headers to be a dict
r.json()