0

My objective is to read a .txt file of websites that I have stored and then output the contents of each entry in the .txt file as a .nc file. This is my first time using GitBash and I think that I have some of this basic loop constructed, but i'm not sure how I can curl each website from the .txt file and output it with the appropriate name. Below I have included a minimal working example:

Input (mytextfile.txt):

https://data.pacificclimate.org/data/downscaled_gcms/tasmax_day_BCCAQv2+ANUSPLIN300_MRI-CGCM3_historical+rcp45_r1i1p1_19500101-21001231.nc.nc?tasmax[0:55114][152:152][290:290]
https://data.pacificclimate.org/data/downscaled_gcms/tasmax_day_BCCAQv2+ANUSPLIN300_GFDL-ESM2G_historical+rcp45_r1i1p1_19500101-21001231.nc.nc?tasmax[0:55114][152:152][290:290]
https://data.pacificclimate.org/data/downscaled_gcms/tasmax_day_BCCAQv2+ANUSPLIN300_HadGEM2-ES_historical+rcp45_r1i1p1_19500101-21001231.nc.nc?tasmax[0:55114][152:152][290:290]

Code:

for url in $(mytextfile.txt); do 
curl --globoff "$url" > FileNameGoesHere.nc

Output individual .nc files for each URL:

tasmax45_MRI-CGCM3.nc
tasmax45_GFDL-ESM2G.nc
tasmax45_HadGEM2-ES.nc

So to my understanding, right now the script would read in each line of my text file, execute a curl on it, then output with >, but I want each of the output files to have a particular name.

If I was using Python then zip(url,out_name) would be something I would think of (not sure if this is a thing in GitBash though). Additionally, sometimes when I curl the web contents will not download fully (e.g if file is 500kb, sometimes it'll download with 450kb because of connection errors), so if someone was interested in helping me build some functionality so that if this occurs it will redownload, that would be great, but not necessary.

tripleee
  • 175,061
  • 34
  • 275
  • 318
GrayLiterature
  • 381
  • 2
  • 13
  • What do you mean by "a particular name"? Please [edit] your question to show expected results for a few files, or a way to deduce the correct file name for each. – tripleee Nov 08 '20 at 16:35
  • Possible duplicate of https://stackoverflow.com/questions/28725333/looping-over-pairs-of-values-in-bash – tripleee Nov 08 '20 at 16:37
  • Does this answer your question? [How to loop over files in directory and change path and add suffix to filename](https://stackoverflow.com/questions/20796200/how-to-loop-over-files-in-directory-and-change-path-and-add-suffix-to-filename) – tripleee Nov 08 '20 at 16:47

1 Answers1

0

Don't read lines with for.

while read -r url; do
    outputfile=${url%%+*}
    curl --globoff -o "${outputfile##*/}.nc" "$url"
done <mytextfile.txt

This uses a couple of parameter expansions to split out the product name from the URL. Also, we use the -o option of curl to specify the output file name, rather than shell redirection.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I tried to execute this, but after the `done` line I just get another carrot that is looking for a command? – GrayLiterature Nov 08 '20 at 17:36
  • You probably mean "caret", or do you really see an orange vegetable? If the file contained valid URLs it should have downloaded them (though usually `curl` without `-s` prints a progress report for each). – tripleee Nov 09 '20 at 05:28
  • Thanks for the updated question; I refactored this to specify the output file name as you imply (but don't explain) in your question. The previous code will have attempted to create directories named `http:` etc in the current directory, and then a subdirectory `data.pacificclimate.org` inside that, etc, which probably failed (unless these directories already existed for other reasons). – tripleee Nov 09 '20 at 05:55