3

I have to need scrap data from a secure (https) website after login and show this data to my MVC5 application. It is very easy to scrap data from a unsecured web site after login as i have done using following method:

 public async Task<ActionResult> Index()
    {
        HttpClient client = new HttpClient();

        var values = new Dictionary<string, string>
        {
           { "User.UserName", "abc" },
           { "User.Password", "abc" }
        };

        var content = new FormUrlEncodedContent(values);
        client.BaseAddress = new Uri("http://abc1.com/Account/Login");
        client.DefaultRequestHeaders.Accept.Clear();
        client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/x-www-form-urlencoded"));
        var response= await client.PostAsync("http://abc1.com/Account/Login", content); 
        HttpResponseMessage response1 = await client.GetAsync("http://abc1.com/user/Index"); // This page data was reqired
        var responseString = await response1.Content.ReadAsStringAsync(); 
        ViewBag.LogedIn = responseString;
        return View();
    }

After this i got the next page data in view bag as my requirement. But in case of Https website it is not working and also no error occurs.

Please suggest me what changes i should do within this method so that it also Login for a secured website.

Thanks in advance.

Amandeep Singh
  • 372
  • 6
  • 20
  • Just use the correct address. `https` or `http`, the treatment is the same. As for `no error occurs` - you don't check the status code of the response. How do you know that there was no error? A response with a 500 status code is still a response with a 500 status code. Its content may contain some data about the error, maybe not. Check `response1.StatusCode` – Panagiotis Kanavos Apr 06 '17 at 07:45
  • Use chrome developer tools to inspect request sent when you log in. Maybe you are missing some important field or header. – Reygoch Apr 06 '17 at 07:48
  • Reygoch and @PanagiotisKanavos thanks for reply... i have checked status code after request that is always "ok". but when i tried to fetch the next page its showing login page again. Please check the comment in below answer 1 i have provided details in first comment. – Amandeep Singh Apr 06 '17 at 09:21
  • Please post a *complete* question explaining what you tried and what the problem is. What you describe is normal behaviour. There are no errors to report. You didn't keep any cookies around, so there's no way for the server to know that you've logged in in the past. Create and add a CookieContainer to HttpClient – Panagiotis Kanavos Apr 06 '17 at 09:22

1 Answers1

0

I'm sure its nothing in your code -

You could use Postman to send your request to help debug the error. Very helpful for this kind of thing

https://www.getpostman.com/

Perhaps the server you are trying to connect to is using a different protocol to what you expect?

Check out this answer for more info about forcing protocol change in your code :-)

Make Https call using HttpClient

Community
  • 1
  • 1
grimdog_john
  • 675
  • 1
  • 7
  • 15
  • It looks like the login portal is using OpenID/oAuth as an authentication method. Check out more on this here - http://madskristensen.net/post/openid-implementation-in-csharp-and-aspnet. There's some example code to download. You'll probably need to change your code after all to implement OpenID. – grimdog_john Apr 06 '17 at 09:31
  • Postman also allows you to use oAuth authentication (Check the Authorization tab on your request) but you'll need the various credentials to grab a valid token – grimdog_john Apr 06 '17 at 09:38
  • One more thing I should have added - You can use the interceptor feature within postman to 'capture' a request as its going through. You basically turn the interceptor on, login to the site as you would normally, then you should see the POST request in your postman history so you can inspect it to see where you're going wrong. Given the site you want to scrape is using OpenID/oAuth, it may be less straightforward than this, but you should at least see all the requests being made so you can inspect them. https://www.getpostman.com/docs/capture – grimdog_john Apr 06 '17 at 10:04
  • Yes if i login through another tab for same site and after that same request giving me next page data from rest client hit..But if log out then always got login page from the request of RestClient. Means not able to login through this restclient also. – Amandeep Singh Apr 06 '17 at 10:14
  • That's a shame - It's probably something to do with the OpenID/oAuth. I tried a similar thing with just a standard login page with basic authentication and it worked OK. You might be able to combine the request with the Authorizations tab I mentioned earlier. Unfortunately I don't know a whole lot about oAuth :-( – grimdog_john Apr 06 '17 at 10:24
  • Thanks bro for your valuable time, I have resolved this by using RestClient testing. Thanks again for your efforts. – Amandeep Singh Apr 06 '17 at 11:10