4

I'd like to backup content of my blog which is powered by posterous.com. I'd like to save all texts and images to the local disk. Ability to browse it offline is a plus.

What I've already tried:

wget

wget -mk http://myblogurl

It downloads the first page with list of posts, then stops with "20 redirections exceeded" message.

WinHttpTrack

It downloads the first page with redirection to the www.posterous.com home page instead of real page content.

Edit: The url of the site I'm trying to backup is blog.safabyte.net

3 Answers3

1

Posterous.com does maintain an API that might help you. In particular, their http://posterous.com/api/reading API might be of use. You may use it to obtain an XML file containing all of your posts and their content.

For example, http://posterous.com/api/readposts?hostname=jasonpearce retrieves all 12 posts that I've made to Posterous.

1

This worked for me:

wget -r -l inf -k -E -p -nc http://blog.safabyte.net/

It seems like using -m turns on -N (timestamping) and posterous is not sending last modified headers which upset wget, so instead I just used -r -l inf directly.

The options used are:

-r recursive
-l inf infinite depth
-k suffix html files with .html
-E update the saved files with links to local files
-p download page resources
-nc don't redownload urls more than once

This command is still not downloading resources from other domains, which means it doesn't fetch the images as they're hosted on a different CDN.

0

Managed to download at least all html content. Following code seems to download all pages from the blog (using Wget 1.11.3 on Windows XP):

wget -mk http://blog.safabyte.net/*

Posts images are still not downloaded. It looks like it's probably because they are stored on the different domains.

Html content is on blog.safabyte.com/* while images are in http://posterous.com/getfile/files.posterous.com/cheated-by-safabyte/* and files.posterous.com