Receiving HTML from AJAX appl

Using (or providing) Microsoft.NET Classes

Receiving HTML from AJAX appl

Postby nah on Mon Sep 27, 2010 9:40 pm

I have run an application that picked and unpacked HTML from a web page.
Now, the web page has been redesigned, so now it uses AJAX.
This means that the HTML is no longer updated.
Has anyone here experience on this subject?

Best regards
Niels
nah
 
Posts: 2
Joined: Thu May 20, 2010 6:45 am

Re: Receiving HTML from AJAX appl

Postby Morten|Dyalog on Tue Oct 05, 2010 10:20 am

I don't think there is an easy solution to this, as the web page is now expecting some client code to run in the web browser. Writing your own JavaScript interpreter (or similar) is probably more work that you would like to do :-).

You might be able to get away with using a tool like "Fiddler" to spy on the HTTP communication and see what the AJAX client-side sends to the server and reverse engineer that - this MIGHT give you the information that you need, depending on what is going on. But this is probably a very long shot.

Can you let us know which page you are trying to "scrape"?
User avatar
Morten|Dyalog
 
Posts: 453
Joined: Tue Sep 09, 2008 3:52 pm

Re: Receiving HTML from AJAX appl

Postby nah on Tue Oct 05, 2010 8:13 pm

A nice example is "http://www.soccerway.com/national/sweden/allsvenskan/2010/regular-season/",
delivering Swedish soccer results. When I study the source to the shown page I can extract the content,
but after pressing "Previous" I see the previous results on the screen but the source is not updated.
nah
 
Posts: 2
Joined: Thu May 20, 2010 6:45 am

Re: Receiving HTML from AJAX appl

Postby harsman on Wed Oct 06, 2010 7:40 am

That the page is using AJAX means it is retrieving data from the server in a more data oriented format than HTML, usually JSON or XML. This might actually make it easier to extract data compared to scraping it from HTML.

If you look at the Javascript source or watch network traffic (either via an external tool like Fiddler that Morten suggested, or with a browser integrated tool like Firebug for Firefox), you should be able to reverse engineer what HTTP-requests to make to get the data.
harsman
 
Posts: 27
Joined: Thu Nov 26, 2009 12:21 pm

Re: Receiving HTML from AJAX appl

Postby alexbalako on Thu Oct 07, 2010 6:00 pm

Niels,

You may try to use Internet explorer ActiveX control which will execute java script for you on a page.
Than pool HTML from it.
alexbalako
 
Posts: 16
Joined: Mon Nov 30, 2009 8:58 pm

Re: Receiving HTML from AJAX appl

Postby Dick Bowman on Thu Nov 03, 2011 3:52 pm

Have there been any further developments on this topic in the past year?

I find myself in a similar situation - a little application that page-scraped HTML now broken because the site author (British Met Office) now generates the pages seen in the browser with JavaScript. Obviously (?) the data I want to bring into APL is reaching my computer, but the browser seems to hide it from me.

Any specific suggestions about tools to look at? I'm not sure whether the last post is talking about general principles or something specific.
Visit http://apl.dickbowman.com to read more from Dick Bowman
User avatar
Dick Bowman
 
Posts: 235
Joined: Thu Jun 18, 2009 4:55 pm

Re: Receiving HTML from AJAX appl

Postby Morten|Dyalog on Thu Nov 03, 2011 4:12 pm

Dick Bowman wrote:Have there been any further developments on this topic in the past year?

Not directly, but the MiServer team has a prototype of a tool to encode and decode JSON, that will be used for AJAX-style interaction with MiServer applications.

However, unless the data supplier documents the format of the required HTTP transactions, the only "solution" for the problem extracting data from web applications which use AJAX is to snoop on the communication between the Javascript application running in the browser and the server, and use Conga to send a similar request to the server, and either ⎕XML or the JSON-decoding tools (or something else, depending on the format) to take the result apart.
User avatar
Morten|Dyalog
 
Posts: 453
Joined: Tue Sep 09, 2008 3:52 pm

Re: Receiving HTML from AJAX appl

Postby Dick Bowman on Wed Nov 16, 2011 3:05 pm

Quick update to confirm that this thread has shown me what I needed to do...

0⊃ Firebug revealed that the Javascript was pulling files with the .json extension from the distant server
1⊃ Put together a quick/dirty decoder for the .json files

Which has put the broken part of the application back into action.

Thanks to all.
Visit http://apl.dickbowman.com to read more from Dick Bowman
User avatar
Dick Bowman
 
Posts: 235
Joined: Thu Jun 18, 2009 4:55 pm


Return to Microsoft.NET

Who is online

Users browsing this forum: No registered users and 1 guest