I want to get images of Discogs releases. Can I do it without Discogs API? They don't have links to the images in their db dumps.
2 Answers
To do this without the API, you would have to load a web page and extract the image from the html source code. You can find the relevant page by loading https://www.discogs.com/release/xxxx where xxxx is the release number. Since html is just a text file, you can now extract the jpeg URL.
I don't know what your programming language is, but I'm sure it can handle String functions, like indexOf and subString. You could extract the html's OG:Image content for picture.
So taking an example: https://www.discogs.com/release/8140515
- Find the
.indexOf("og:image\" content=\");save asstartPosto some integer. - That's 19 chars so next do a
.indexOf(".jpg", startPos + 19);into aendPos.
This gets the first occurence of .jpg after index of startPos + 19 any other chars. Now extract a subString from html text
img_URL = myHtmlStr.substring(startPos+19, endPos);You should end up with a string reading like this below (extracted URL):
https://img.discogs.com/_zHBK73yJ5oON197YTDXM7JoBjA=/fit-in/600x600/filters:strip_icc():format(jpeg):mode_rgb():quality(90)/discogs-images/R-8140515-1460073064-5890.jpeg.jpgThe process can be shortened to finding the startPos index of
https://img., then find first occurrence of.jpgwhen searching from after that startPos index. Extract within that length range. This is because the image URL is only mentioned in the html source athttps://img.
Compare page at : https://www.discogs.com/release/8140515 with extracted URL image below.

- 14,790
- 4
- 25
- 57
-
**note :** You might have to fine-tune those index Pos numbers. eg: You might change from **+19** to **+21** in order to cut off the quotation marks etc (**if needed** by your coding tool). You'll figure it out when testing... – VC.One Feb 20 '16 at 04:21
-
Trying to fetch images of many releases, won't Discogs block automatic access? – Collector Feb 20 '16 at 10:25
-
@Collector, I don't think so (unless you can show otherwise). Access was not blocked for any of my testing AS3 code or PHP code. Each loaded 5 images just to check paths are parsed correctly. – VC.One Feb 21 '16 at 16:25
-
2Okay. The question was to get images without API. I believe I showed a good / correct answer for that. As for 5000 pics, that's a new detail. I'm not a server expert. I can only suggest you pace it out to fly under the radar, cos I can imagine 5000 requests from same IP address **at once** will look suspicious & be IP blocked. An "all day, everyday" site-user could access 5000 images spread over a week & wont be blocked so y'know... pace it out. – VC.One Feb 23 '16 at 00:14
This is how to do it with Java & Jsoup library.
- get HTML page of the release
- parse HTML & get
<meta property="og:image" content=".." />to getcontentvalue
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class DiscogRelease {
private final String url;
public DiscogRelease(String url) {
this.url = url;
}
public String getImageUrl() {
try {
Document doc = Jsoup.connect(this.url).get();
Elements metas = doc.head().select("meta[property=\"og:image\"]");
if (!metas.isEmpty()) {
Element element = metas.get(0);
return element.attr("content");
}
} catch (IOException ex) {
Logger.getLogger(DiscogRelease.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
}
- 2,321
- 26
- 33