Scraping password protected sites
5 posts
• Page 1 of 1
Scraping password protected sites
How would you do this in Dyalog?
- Code: Select all
__author__ = 'ngupta'
from bs4 import BeautifulSoup
import mechanize
LOGIN_URL = "https://www.schwab.com/"
LOGIN_FORM_NAME = "SignonForm"
LOGIN_USER_ID_FIELD = "SignonAccountNumber"
LOGIN_PASSWORD_FIELD = "SignonPassword"
"""Create browser"""
mech_br = mechanize.Browser()
mech_br.set_handle_robots(False)
mech_br.set_handle_refresh(False)
mech_br.addheaders = [('User-agent', 'Firefox')]
user_id="your_id"
password="your_pwd"
mech_br.open(LOGIN_URL)
mech_br.select_form(name=LOGIN_FORM_NAME)
mech_br[LOGIN_USER_ID_FIELD] = user_id
mech_br[LOGIN_PASSWORD_FIELD] = password
login_response = mech_br.submit()
soup = BeautifulSoup(login_response.read(),"html.parser")
table = soup.find("table", {"id": "tblCharlesSchwabBank"})
balance = float(table('tr')[1]('td')[2].span.text[1:]) # 2nd row, 3rd cell
print balance
- neeraj
- Posts: 82
- Joined: Wed Dec 02, 2009 12:10 am
- Location: Ithaca, NY, USA
Re: Scraping password protected sites
RUNNING THE SCRIPT:
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 "/Users/ngupta/Dropbox/python/pycharm projects/MechanizeTest/Test4.py"
698.53
Process finished with exit code 0
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 "/Users/ngupta/Dropbox/python/pycharm projects/MechanizeTest/Test4.py"
698.53
Process finished with exit code 0
- neeraj
- Posts: 82
- Joined: Wed Dec 02, 2009 12:10 am
- Location: Ithaca, NY, USA
Re: Scraping password protected sites
Hi Neeraj,
I would suggest searching for the internet for "c# web scrape login" and then translating c# examples into APL using our .NET interface.
Regards,
Vince
I would suggest searching for the internet for "c# web scrape login" and then translating c# examples into APL using our .NET interface.
Regards,
Vince
- Vince|Dyalog
- Posts: 413
- Joined: Wed Oct 01, 2008 9:39 am
Re: Scraping password protected sites
Based on the suggestion of Vince and this web page: http://webdata-scraping.com/login-website-programmatically-using-c-web-scraping/ you can do the following in .Net:
and for the onClosed event function:
This is working code that is not bugging but you will have to try it with your ID and Password. 'htmlDoc' is a System.Windows.Forms.HtmlDocument that you can interrogate easily with .GetElementById or .GetElementsByTagName . You find those ID and TagName by inspecting manually the html of the page or if you use Safari you can right click on an element of the page and on the contextual menu you choose 'Inspect Element' and it will show you the HTML of that element and finds its ID more easily. Sometimes you may need to put ⌷ or ⍬⍴⌷ in front of the result of .GetElementById or .GetElementsByTagName to get it in the proper rank.
Good luck.
- Code: Select all
url←'https://www.schwab.com/'
⎕USING←'System.Windows.Forms,System.Windows.Forms.dll'
⎕USING,←⊂'System.Drawing,System.Drawing.dll'
wb←⎕NEW WebBrowser
wb.Dock←wb.Dock.Fill
wb.Navigate(⊂url)
⎕DL 5
htmlDoc←wb.Document
html←⎕UCS wb.DocumentStream.ToArray
signonAcc←htmlDoc.GetElementById(⊂'SignonAccountNumber')
⍝ signonAcc.InnerText←'user_id' ⍝ No error but property is not changed
signonAcc.InnerHtml←'user_id'
signonPwd←htmlDoc.GetElementById(⊂'SignonPassword')
⍝ signonPwd.InnerText←'password' ⍝ No error but property is not changed
signonPwd.InnerHtml←'password'
loginBtn←htmlDoc.GetElementById(⊂'&lid=Log in')
loginBtn.InvokeMember(⊂'click')
⍝ Show the WebBrowser in a WindowsForm
fm←⎕NEW Form
fm.Size←⎕NEW Size(1100,680)
fm.Text←'URL [ ',url,' ]'
fm.onClosed←'_GetWebResults_onClosed'
fm.Controls.Add wb
fm.Show ⍬
and for the onClosed event function:
- Code: Select all
_GetWebResults_onClosed(sender event)
(⌷sender.Controls).Dispose
This is working code that is not bugging but you will have to try it with your ID and Password. 'htmlDoc' is a System.Windows.Forms.HtmlDocument that you can interrogate easily with .GetElementById or .GetElementsByTagName . You find those ID and TagName by inspecting manually the html of the page or if you use Safari you can right click on an element of the page and on the contextual menu you choose 'Inspect Element' and it will show you the HTML of that element and finds its ID more easily. Sometimes you may need to put ⌷ or ⍬⍴⌷ in front of the result of .GetElementById or .GetElementsByTagName to get it in the proper rank.
Good luck.
-
PGilbert - Posts: 436
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: Scraping password protected sites
Thanks to both of you. I will try and see how it works out.
- neeraj
- Posts: 82
- Joined: Wed Dec 02, 2009 12:10 am
- Location: Ithaca, NY, USA
5 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group