1

I have this setup:

  • The remote server is very unstable
  • The content it serves is pretty much static, as long as I know the URL, its content isn't changing much.
  • I'm using Firefox, Windows platform

"Unstable" server may return 5xx or take too long to reply etc. So if I am lucky enough to land on an actual page, I'd like to save it locally for further reading later when I have time. About "how lucky" - well, if I loaded the page, I might be able to load something again in 1-2 minutes.. or 1-2 hours. Or days. I don't know, really - that's outside of my control and I'd rather not focus on "trying to fix the server", I don't own it.

Problem is - if I try to "Save page", Firefox tries to request the content from the server instead of saving the content that I already have loaded in the tab. Obviously, since the server is unstable, more often than not it will save the 5xx response page and not the actual content.

Since the content is mostly text, I'm fine even if I save just the page without styles, header/footer images etc. But "view source" (aka Ctrl+U) seems to do the same thing, i.e. tries to request the content from the server. How to avoid that? How to just save the content that I already happen to have loaded into the memory, without requesting the server to return it to me once again?

This answer implies that Firefox doesn't actually request it the second time, but it contradicts what I see happening in my case. I clearly see the request in the network and the page save goes to the "Downloads" as well. And of course, the fact it "saves" the server error response pages says that itself.

EDIT: The bare minimum I'd like to have is being able to follow the links (articles can have references) and copy / paste the text. So screen capture is hardly an option. Besides, is it too much to expect to be able to save the content that's already loaded?

Alma Do
  • 115
  • 1
  • 1
  • 10

3 Answers3

2

You can use the Web Developer Tools built into Firefox to do this.

  • Open tools: CtrlShifti or menu Tools / Web Developer / Toggle Tools
  • In the tool area, click on the tab "Inspector". This shows you the source of page, as Firefox is currently showing, including any changes performed by Javascript.
  • At the top, there should be a line starting with <html ..... Right-click this line, select Copy / Outer HTML.
  • Paste the clipboard into an editor and save.

This will give you the complete HTML source of the page, as displayed. This also works for complex web applications, because the HTML will reflect any changes that scripts made, such as loading additional content via AJAX.

What will be missing will be external files, such as images, CSS and scripts. If you want to include that, best use a Firefox Add-on, for example Save Page WE.

sleske
  • 23,525
1

I guess that the reason that Firefox reloads the page is because the page as-displayed may not be the same as the original. For example, it might have been managed by JavaScript code or some installed extension such as Greasemonkey.

There is a method for getting the raw HTML. This method will conserve text and links, but sometimes not the exact look, because it doesn't also save the CSS files. You will get the raw HTML, but not the complete page with its external JavaScript and CSS.

Here is it:

  • With the page displayed, type Ctrl+U to display the page source
  • Use the menu File > Save Page As ..., or use the right-click context menu
  • Save the contents in an .html file.

This method should not cause Firefox (or any other browser) to access again the website.

harrymc
  • 498,455
1

You could :

  • Select (drag select) what you want, or select all on the page Ctrl A
  • Copy Ctrl C
  • Paste to Word, LibreOffice Writer, or some other application that shows html as a page.

This is a little cumbersome, but would accomplish your goal as stated. Links are still accessible, and text still savable.

Rick
  • 31