Scraping password protected sites

by **neeraj** on Fri Jul 24, 2015 6:12 am

How would you do this in Dyalog?

Code: Select all: __author__ = 'ngupta' from bs4 import BeautifulSoup import mechanize LOGIN_URL = "https://www.schwab.com/" LOGIN_FORM_NAME = "SignonForm" LOGIN_USER_ID_FIELD = "SignonAccountNumber" LOGIN_PASSWORD_FIELD = "SignonPassword" """Create browser""" mech_br = mechanize.Browser() mech_br.set_handle_robots(False) mech_br.set_handle_refresh(False) mech_br.addheaders = [('User-agent', 'Firefox')] user_id="your_id" password="your_pwd" mech_br.open(LOGIN_URL) mech_br.select_form(name=LOGIN_FORM_NAME) mech_br[LOGIN_USER_ID_FIELD] = user_id mech_br[LOGIN_PASSWORD_FIELD] = password login_response = mech_br.submit() soup = BeautifulSoup(login_response.read(),"html.parser") table = soup.find("table", {"id": "tblCharlesSchwabBank"}) balance = float(table('tr')[1]('td')[2].span.text[1:]) # 2nd row, 3rd cell print balance

by **neeraj** on Fri Jul 24, 2015 6:17 am

RUNNING THE SCRIPT:

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 "/Users/ngupta/Dropbox/python/pycharm projects/MechanizeTest/Test4.py"
698.53

Process finished with exit code 0

by **Vince|Dyalog** on Tue Jul 28, 2015 11:14 am

Hi Neeraj,

I would suggest searching for the internet for "c# web scrape login" and then translating c# examples into APL using our .NET interface.

Regards,

Vince

by **PGilbert** on Tue Jul 28, 2015 3:23 pm

Based on the suggestion of Vince and this web page: http://webdata-scraping.com/login-website-programmatically-using-c-web-scraping/ you can do the following in .Net:

Code: Select all: url←'https://www.schwab.com/' ⎕USING←'System.Windows.Forms,System.Windows.Forms.dll' ⎕USING,←⊂'System.Drawing,System.Drawing.dll' wb←⎕NEW WebBrowser wb.Dock←wb.Dock.Fill wb.Navigate(⊂url) ⎕DL 5 htmlDoc←wb.Document html←⎕UCS wb.DocumentStream.ToArray signonAcc←htmlDoc.GetElementById(⊂'SignonAccountNumber') ⍝ signonAcc.InnerText←'user_id' ⍝ No error but property is not changed signonAcc.InnerHtml←'user_id' signonPwd←htmlDoc.GetElementById(⊂'SignonPassword') ⍝ signonPwd.InnerText←'password' ⍝ No error but property is not changed signonPwd.InnerHtml←'password' loginBtn←htmlDoc.GetElementById(⊂'&lid=Log in') loginBtn.InvokeMember(⊂'click') ⍝ Show the WebBrowser in a WindowsForm fm←⎕NEW Form fm.Size←⎕NEW Size(1100,680) fm.Text←'URL [ ',url,' ]' fm.onClosed←'_GetWebResults_onClosed' fm.Controls.Add wb fm.Show ⍬

and for the onClosed event function:

Code: Select all: _GetWebResults_onClosed(sender event) (⌷sender.Controls).Dispose

This is working code that is not bugging but you will have to try it with your ID and Password. 'htmlDoc' is a System.Windows.Forms.HtmlDocument that you can interrogate easily with .GetElementById or .GetElementsByTagName . You find those ID and TagName by inspecting manually the html of the page or if you use Safari you can right click on an element of the page and on the contextual menu you choose 'Inspect Element' and it will show you the HTML of that element and finds its ID more easily. Sometimes you may need to put ⌷ or ⍬⍴⌷ in front of the result of .GetElementById or .GetElementsByTagName to get it in the proper rank.

Good luck.

by **neeraj** on Thu Jul 30, 2015 4:09 am

Thanks to both of you. I will try and see how it works out.

The tool of thought for

software solutions

Scraping password protected sites

Scraping password protected sites

Re: Scraping password protected sites

Re: Scraping password protected sites

Re: Scraping password protected sites

Re: Scraping password protected sites

Who is online

QUICK LINKS