I have a form that has fields for a couple of URLS. I wrote a Zend Framework validator that does a trivial preg_match to screen out ridiculous strings, and then does a curl HEAD request (CURLOPT_NOBODY) to screen out 404's and other connectivity issues. In testing I came across the mysterious return code 0 with "unknown SSL protocol error", so I added a check to accept as valid anything that gave a message with "SSL" in it, since that would suggest that the URL reached a webserver.
But one particular URL that our customers would likely use in practice redirects to an s3.amazonaws.com URL for a PDF file. In a browser, both the original URL, and the s3 URL it redirects to, display the PDF just fine. Since I used CURLOPT_FOLLOWLOCATION, I expected my validator would accept it. But instead it gave a 404. I then tried specifying the s3 URL directly, and that gave a 403(!). Thinking that possibly the 403 was triggered by the fact that I had specified a header of 'HTTP_X_REQUESTED_WITH: XMLHttpRequest', I commented out that line in the code. But it still gave a 403.
How can this happen? It seems to me that amazon s3 would have to look for HEAD requests explicitly, and deliberately issue a 404 or 403 depending on whether it came via a redirect???
I suppose I could delete the CURLOPT_NOBODY to have it send a GET request, but that seems silly since I don't care about the body.
Here is my complete code:
<?php
class Oshk_ZendX_Validate_Url {
    static $debug = true;
    // Based on https://stackoverflow.com/a/42619410/467590
    const PATTERN = '/^(https?:\/\/)?[^" ]+(\.[^" ]+)*$/';
    public static function isValid($value) {
        $STDERR = fopen("php://stderr", "w");
        $value = (string) $value;
        $matches = array();
        if (! preg_match(self::PATTERN, $value, $matches)) {
            fwrite($STDERR, sprintf("File '%s', line %d, value '%s' does not match pattern '%s'\n", __FILE__, __LINE__, $value, self::PATTERN));
            fclose($STDERR);
            return false;
        }
        if (! array_key_exists(1, $matches)) {
            $value = "https://$value";
        }
        if (self::$debug) {
            fwrite($STDERR, sprintf("File '%s', line %d, \$value = '%s', \$matches = %s", __FILE__, __LINE__, $value, print_r($matches, true)));
        }
        // URL looks well-formed. Ask curl to send a HEAD request to it
        $ch = curl_init($value);
        if ($ch === false) {
            throw new Exception("curl_init($value) failed!");
        }
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_HEADER, 0); // From https://www.php.net/manual/en/curl.examples-basic.php
        curl_setopt($ch, CURLOPT_HTTPHEADER, array('HTTP_X_REQUESTED_WITH: XMLHttpRequest'));
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36');
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_FAILONERROR, true);
        if (self::$debug) {
            curl_setopt($ch, CURLOPT_VERBOSE, true);
            curl_setopt($ch, CURLOPT_STDERR, $STDERR);
        }
        $data = curl_exec($ch);
        $msg = curl_error($ch);
        $status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        if (self::$debug) {
            // https://stackoverflow.com/a/14436877/467590
            $allinfo = curl_getinfo($ch);
            fwrite($STDERR, sprintf("File '%s', line %d, \$allinfo = %s\n", __FILE__, __LINE__, print_r($allinfo, true)));
        }
        curl_close($ch);
        if (self::$debug) {
            fwrite($STDERR,  sprintf("File '%s', line %d, data = '%s'\n", __FILE__, __LINE__, substr($data, 0, 255)));
        }
        if(! strlen($data) && $status != 0 && false === strpos($msg, 'SSL')) {
            fwrite($STDERR, sprintf("File '%s', line %d, '%s' gives bad status code %d when accessed, with message '%s'\n", __FILE__, __LINE__, $value, $status, $msg));
            fclose($STDERR);
            return false;
        }
        if (self::$debug) {
            fwrite($STDERR, sprintf("File '%s', line %d, url = '%s'\n", __FILE__, __LINE__, $value));
            fwrite($STDERR, sprintf("File '%s', line %d, data = '%s'\n", __FILE__, __LINE__, substr($data, 0, 255)));
        }
        unset($data);
        if (self::$debug) {
            fwrite($STDERR, sprintf("File '%s', line %d, \$msg = '%s'\n", __FILE__, __LINE__, $msg));
            fwrite($STDERR, sprintf("File '%s', line %d, \$status = '%s'\n", __FILE__, __LINE__, $status));
            fwrite($STDERR, sprintf("File '%s', line %d, \$value = '%s'\n", __FILE__, __LINE__, $value));
        }
        if (($status >= 100 & $status < 400) || false !== strpos($msg, 'SSL')) {
            fclose($STDERR);
            return true;
        }
        fwrite($STDERR, sprintf("File '%s', line %d, '%s' gives bad status code %d when accessed, with message '%s'\n", __FILE__, __LINE__, $value, $status, $msg));
        fclose($STDERR);
        return false;
    }
}
echo var_dump(Oshk_ZendX_Validate_Url::isValid($argv[1]));
Here is the bash shell session running it with the original URL:
$ php curltest.php 'https://americandrivingsociety.org/docs.ashx?id=1037680'
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 21, $value = 'https://americandrivingsociety.org/docs.ashx?id=1037680', $matches = Array
(
        [0] => https://americandrivingsociety.org/docs.ashx?id=1037680
        [1] => https://
)
*   Trying 208.66.171.71:443...
* Connected to americandrivingsociety.org (208.66.171.71) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: \xampp7412\apache\bin\curl-ca-bundle.crt
    CApath: none
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=americandrivingsociety.org
*  start date: Sep  2 00:00:00 2022 GMT
*  expire date: Oct  3 23:59:59 2023 GMT
*  subjectAltName: host "americandrivingsociety.org" matched cert's "americandrivingsociety.org"
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
> HEAD /docs.ashx?id=1037680 HTTP/1.1
Host: americandrivingsociety.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: */*
HTTP_X_REQUESTED_WITH: XMLHttpRequest
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
* The requested URL returned error: 404 Not Found
* Closing connection 0
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 46, $allinfo = Array
(
        [url] => https://americandrivingsociety.org/docs.ashx?id=1037680
        [content_type] =>
        [http_code] => 404
        [header_size] => 0
        [request_size] => 250
        [filetime] => -1
        [ssl_verify_result] => 0
        [redirect_count] => 0
        [total_time] => 0.132769
        [namelookup_time] => 0.009406
        [connect_time] => 0.035694
        [pretransfer_time] => 0.090879
        [size_upload] => 0
        [size_download] => 0
        [speed_download] => 0
        [speed_upload] => 0
        [download_content_length] => -1
        [upload_content_length] => -1
        [starttransfer_time] => 0.132714
        [redirect_time] => 0
        [redirect_url] =>
        [primary_ip] => 208.66.171.71
        [certinfo] => Array
                (
                )
        [primary_port] => 443
        [local_ip] => 16.1.1.151
        [local_port] => 55977
        [http_version] => 2
        [protocol] => 2
        [ssl_verifyresult] => 0
        [scheme] => HTTPS
        [appconnect_time_us] => 90757
        [connect_time_us] => 35694
        [namelookup_time_us] => 9406
        [pretransfer_time_us] => 90879
        [redirect_time_us] => 0
        [starttransfer_time_us] => 132714
        [total_time_us] => 132769
)
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 50, data = ''
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 53, 'https://americandrivingsociety.org/docs.ashx?id=1037680' gives bad status code 404 when accessed, with message 'The requested URL returned error: 404 Not Found'
C:\xampp1826\htdocs\OSH0\curltest.php:77:
bool(false)
repete@DESKTOP-CLQS7C1 /cygdrive/c/xampp1826/htdocs/OSH0
$
Here's the same thing using the s3 URL it redirects to:
    $ php curltest.php 'https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D'
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 21, $value = 'https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D', $matches = Array
(
        [0] => https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D
        [1] => https://
)
*   Trying 52.216.56.0:443...
* Connected to s3.amazonaws.com (52.216.56.0) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: \xampp7412\apache\bin\curl-ca-bundle.crt
    CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=s3.amazonaws.com
*  start date: Apr 11 00:00:00 2023 GMT
*  expire date: Dec 20 23:59:59 2023 GMT
*  subjectAltName: host "s3.amazonaws.com" matched cert's "s3.amazonaws.com"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
*  SSL certificate verify ok.
> HEAD /ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D HTTP/1.1
Host: s3.amazonaws.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept: */*
HTTP_X_REQUESTED_WITH: XMLHttpRequest
* Mark bundle as not supporting multiuse
* The requested URL returned error: 403 Forbidden
* Closing connection 0
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 46, $allinfo = Array
(
        [url] => https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D
        [content_type] =>
        [http_code] => 403
        [header_size] => 0
        [request_size] => 523
        [filetime] => -1
        [ssl_verify_result] => 0
        [redirect_count] => 0
        [total_time] => 0.128771
        [namelookup_time] => 0.027331
        [connect_time] => 0.043198
        [pretransfer_time] => 0.107906
        [size_upload] => 0
        [size_download] => 0
        [speed_download] => 0
        [speed_upload] => 0
        [download_content_length] => -1
        [upload_content_length] => -1
        [starttransfer_time] => 0.128721
        [redirect_time] => 0
        [redirect_url] =>
        [primary_ip] => 52.216.56.0
        [certinfo] => Array
                (
                )
        [primary_port] => 443
        [local_ip] => 16.1.1.151
        [local_port] => 56277
        [http_version] => 2
        [protocol] => 2
        [ssl_verifyresult] => 0
        [scheme] => HTTPS
        [appconnect_time_us] => 107740
        [connect_time_us] => 43198
        [namelookup_time_us] => 27331
        [pretransfer_time_us] => 107906
        [redirect_time_us] => 0
        [starttransfer_time_us] => 128721
        [total_time_us] => 128771
)
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 50, data = ''
File 'C:\xampp1826\htdocs\OSH0\curltest.php', line 53, 'https://s3.amazonaws.com/ClubExpressClubFiles/548049/documents/Omnibus_02-01-2023_Black_Prong_Driving_Derby_4_581817244.pdf?AWSAccessKeyId=AKIA6MYUE6DNNNCCDT4J&Expires=1683645984&response-content-disposition=inline%3B%20filename%3DOmnibus_02-01-2023_Black_Prong_Driving_Derby_4.pdf&Signature=YQGemVm9Gphf2EZ%2F4K%2FIyK%2Bmm7I%3D' gives bad status code 403 when accessed, with message 'The requested URL returned error: 403 Forbidden'
C:\xampp1826\htdocs\OSH0\curltest.php:77:
bool(false)
repete@DESKTOP-CLQS7C1 /cygdrive/c/xampp1826/htdocs/OSH0
$
 
    