Questions tagged [webarchive]
17 questions
20
votes
7 answers
Trouble using wget or httrack to mirror archived website
I am trying to use wget to create a local mirror of a website. But I am finding that I am not getting all the linking pages.
Here is the website
http://web.archive.org/web/20110722080716/http://cst-www.nrl.navy.mil/lattice/
I don't want all pages…
user695322
- 301
13
votes
7 answers
Extract files from a web archive (.warc)
I have a number of web sites I am archiving in order to retain many of the linked files there, specifically a number of PDFs.
I haven't had a problem using the Heritrix crawler to collect the sites. However I haven't found a good solution to…
wxs
- 245
6
votes
4 answers
How can I recover data from a website after shutdown?
Background: Twitch had a karaoke service called Twitch Sings which shutdown last December. It was developed by Harmonix, the same studio who created Guitar Hero and Rock Band, which is why information about it is worth preserving. I am trying to…
Malvineous
- 85
2
votes
1 answer
seeking a tool creates space-efficient web archives
I am seeking a tool to space-efficiently archive a blog that is changing every day or even two or three times a day. I don't mean that individual blog posts change - not regularly anyway - I just mean that new blog entries are added and older…
H2ONaCl
- 1,458
- 4
- 22
- 36
2
votes
2 answers
How can I archive web pages linked to in my Delicious bookmarks?
I'm looking for ways to quickly backup the web pages I have collected in my Delicious Bookmarks, to guard against linkrot, etc. The most efficient method I've come up with so far would be to export my Delicious bookmarks into a single web page/HTML…
Stephen Lynch
- 141
2
votes
0 answers
Have httrack backup a site?
I am trying to backup a site with httrack but it isnt doing what I want.
It has been going for 20mins already and downloading what looks to be nonsense images and js files from other sites. The page I linked was the 'archive' page which has a link…
user3109
2
votes
1 answer
How to recover emails from a closed server
I have an old email account on Altern.org. Unfortunately, the server had been closed and I didn't succeed to contact the moderator to retrieve my emails.
Is there any archive server (as web.archive.org) to recover my old emails.
user56980
- 123
1
vote
1 answer
Browse archived website
I have html/webpage files stored in a folder locally on my machine. I can view this content just fine using any browser. However, long term, I would prefer to have the contents stored inside of some kind of archive format (ZIP?). I could do this and…
dtmland
- 2,933
1
vote
1 answer
How to archive a website with occasional PHP errors?
I am trying to archive a website that will soon vanish. I tried wget and httrack.
The problem is that the website returns PHP errors (database connection error) from time to time and the downloaded page is worthless. In any case the HTTP status is…
filo
- 223
1
vote
0 answers
What exactly is the point of WARC files with wget?
I am trying to archive my small advertisements from a small advertisement portal immediately after I have sold the respective article and before I delete the advertisement.
I know that I can't do that using the means that browsers provide by…
Binarus
- 2,039
- 14
- 27
1
vote
1 answer
Is there a way I can get previous messages from a mailing list into my mailbox?
I've just joined a certain mailing list. The list has a web archive, which is nice, but I would like to have past messages from that list - all of them or a just a few months back - in my own mailbox, as though I had been a subscriber. Preferably…
einpoklum
- 10,666
1
vote
1 answer
Why am I getting http://takeoverAd.html/? added to my URL on archive.org?
I am trying to browse to the following URL on archive.org:
https://web.archive.org/web/20020304231443/http://www.everlore.com/items/items.asp?mode=show&IID=641
This redirects me to an error page which says:
This page is not available on the…
Zhro
- 957
1
vote
1 answer
How to open a (maybe) corrupted webarchive on Windows
First of all, is the first time I handle a WARC file...
I have a webarchive file which seems to be corrupted (in some way), I have installed Safari on Windows and I get this (the same thing happen on a Mac):
I try to open if with 7-zip but it say…
Barzo
- 111
0
votes
1 answer
Best approach to archive a website periodically
I am working on an approach to archive our website (dynamically generated) periodically (say every month) and keep it versionned so that I can go back and pull a page at a certain period.
My initial approach is to crawl the site recursively and…
Balaji Natarajan
- 101
0
votes
0 answers
How can I recreate an email folder from online web archives?
For reasons, I have misplaced a mail folder with a bunch of messages from a mailing list. Luckily - this list has an online archive:
https://listarchives.libreoffice.org/global/design/
is there any way to recreate a Mozilla MBox (more specifically,…
einpoklum
- 10,666