I made a small program in pyhton that searches through a music website and collects music data. The music has a format of [artist] - [music name] [music file format]. At first I used re.search to find a certain artist (I used regex because there are some other characters and irregularities in the music info above, and the only indicator for finding the artist was the - following the artist).
Somehow it didn't work so I changed it to re.findall just in case but it still didn't work. since I'm a beginner at python I thought I sis something wrong so I wrote some test code to study what was wrong. And this is what I got.
when I changed the x string (which would be the music info) and ran re.findall again it gave me a different result(none). I 100% thought the result would be the same. why is this behaving like this? And could this be the reason why my original code's re.serach, re.findall wasn't working?
I've included the code just in case. (used selenium)
idx = 1
while True:
try:
hxp1 = "(//h3[@class='entry-title td-module-title']/a)[" + str(idx) + "]"
text = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, hxp1)))
# info = eg) 'Michael Jackson - Beat it [FLAC, MP3, WAV]'
info = text.get_attribute('title') # get 'info' as string
# ARTIST = eg) 'Michael Jackson'
regex = ARTIST + ' - '
match = re.findall(regex, info) # or use re.search
# do something with 'match'...
idx += 1
except:
# do something...
break
