C# webbrowser control - Synchronization for Page navigation/loading

We will face lot of difficulties/errors if we are not handling page synchronization properly when using .NET webbrowser control for scrapping/crawling web pages.

(i-e) We need to write a code to start other activities only when page navigation is completely done.

We can use the below function "waitTillLoad()" for this synchronization purpose.

It will wait till the browser readystate becomes "complete".

Since, initally the readystate will be "complete" there is a possibility of incorrectly exiting this function even before starting new page loading.

So to avoid this issue we have enhanced the function to wait for non-complete status before waiting for complete status.

(i-e) Page loading should occur only after stating the page navigation.

We need to mention timeout period (waittime), as the function may fall into infinite loop if we are calling it two times without initiating any further page navigation.

We can use the same function with little modifications in vb.net also.
It will be more useful and also I hope it will be more reliable as we are using it in many tools and applications for long time.

I can say that it is very essential if you are using webbrowser control for doing any page scrapping and web crawling.




private void waitTillLoad()
{
WebBrowserReadyState loadStatus;
//wait till beginning of loading next page
int waittime = 100000;
int counter = 0;
while (true)
{
loadStatus = webBrowser1.ReadyState;
Application.DoEvents();

if((counter > waittime)(loadStatus == WebBrowserReadyState.
           Uninitialized)  (loadStatus == WebBrowserReadyState.Loading)  
      (loadStatus == WebBrowserReadyState.Interactive))
{
break;
}
counter++;
}

//wait till the page get loaded.
counter = 0;
while (true)
{
loadStatus = webBrowser1.ReadyState;
Application.DoEvents();

if (loadStatus == WebBrowserReadyState.Complete)
{
break;
}
counter++;

}

}

If you have better solution, just tell me !

0 comments: