List

In a previous post I covered the basics of how to use a Python module called Selenium in order to automate the process of filling out web forms. This can be useful when you need to use such an interface to extract large quantities of data. Or maybe there is not that much data to extract but there are too many manual clicks involved.

 

A problem you might face while using Selenium is that some web forms have more than a single page and you need to know if you are on the right page before you start issuing new Selenium commands. At best this will result in an error but at worst it would mean you are filling in the wrong item (with the same name or id) on a different page and you would not even know about it.

 

So how do you know whether you are on the right page? Or which page are you on? There is a very simple trick that is well detailed at the following reference.

(If you have been watching this for more than a few seconds I feel obliged to tell you – nothing is going to load)

The basic concept of the method is simple. You know you are on some page i and you have issued a command that should take you to page i+1. You enter a While loop and you keep checking for an item that you know is present on page i but not on page i+1. When trying to select that item fails it will return a “stale error” and that would signal that you are no longer on page i. You use this error to exit the While loop and you are once again on your way to scrape those precious bytes. Unless something very abnormal happened you should be on page i+1 but if you are working with something that is abnormal by definition or are just very risk averse, then it might be necessary to add a second check to verify that you are indeed on page i+1.

 

It is a very short and clean implementation that solves something that is easy for a user to understand but is very complicated for Selenium (for reasons shortly discussed in the above reference).

Leave a Reply

Your email address will not be published. Required fields are marked *

  Posts

1 2 3
October 12th, 2017

Four Not-So-Random Links On Conservation

November 14th, 2016

Code Riffs: Stata and Regression Tables

September 16th, 2016

Trade Ban on Ivory: Are We Getting it Right?

May 27th, 2016

#WildForLife Campaign Aims to Reduce Demand for Wildlife Trade Prodcuts

May 1st, 2016

Recent Examples of the Large Number of Species Traded Globally

March 18th, 2016

A Bats Housing Boom

February 23rd, 2016

Scraping a Web Form with Multiple Pages Using Selenium – A Neat Trick

December 19th, 2015

Scraping Web Forms with Selenium

December 3rd, 2015

Watch Racing Extinction

December 1st, 2015

Classifying Neighboring Spatial Units