RPA Challenge with Python and Selenium

I found Robotic Process Automation (RPA) Challenge recently. It may have few months already, but still completing it was fun. The challenge idea is simple – there is an Excel file with 10 lines and 7 columns and it needs to be submitted line by line through online form. The trick is the form changes location of input fields after each submission.

I did some quick research for other’s results to have a benchmark, and view into tools they used. You can find it on LinkedIn or Youtube. Completion time varied, fastest around 7-8 seconds, but most people resulted around 10-12 seconds. Ok, so need to beat that 🙂

First try

I used Python and Selenium driving Chrome. My initial approach was to read Excel with openpyxl and locate right fields using mainly XPath, then update sending keys to them, and finally click the submit button through Selenium as well. Code is below. The result was 7.5 sec. Surely not that good I was targeting it to be. I knew I need a second try.

driver = webdriver.Chrome()
driver.get("http://www.rpachallenge.com/")

wb = openpyxl.load_workbook('challenge.xlsx')
ws = wb.active
col_names = [ws.cell(row=1,column=j).value.strip() for j in range(1,8)]

driver.execute_script("document.getElementById('start').click()")

for i in range(2,12):
    vals = [ws.cell(row=i,column=j).value for j in range(1,8)]

    submit_button = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.XPATH, "//*[@id='randomForm']/div[2]/input")))
    any_field = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'js-inputContainer') and contains(@class, 'input-group')]")))
    all_fields = driver.find_elements_by_xpath("//div[contains(@class, 'js-inputContainer') and contains(@class, 'input-group')]")
    
    for field in all_fields:
        field_label = field.find_element_by_tag_name("label").text
        field_input = field.find_element_by_tag_name("input")
        
        for idx, col in enumerate(col_names):
            if field_label == col:
                field_input.send_keys(vals[idx])

    submit_button.click()

Final try

I modified the approach and used Selenium method to run Javascript within the browser. In short words, with this approach, I located all the fields and updated them in one move. Submit button click has been operated using same method. This time result was satisfactory, 0.17 sec. It is 40 times faster than using previous approach and more than 60 times faster than using RPA tool (comparing to what I found on the web) .
Here’s the code.

driver = webdriver.Chrome()
driver.get("http://www.rpachallenge.com/")

wb = openpyxl.load_workbook('challenge.xlsx')
ws = wb.active
col_names = [ws.cell(row=1,column=j).value.strip() for j in range(1,8)]

driver.execute_script("document.getElementById('start').click()")

for i in range(2,12):
    vals = [ws.cell(row=i,column=j).value for j in range(1,8)]
    scripts = ["document.evaluate('//label[contains(text()," + '"' + col_names[idx]  + '"' + ")]', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.parentElement.getElementsByTagName('input')[0].setAttribute('value', '" + str(val) + "')" for idx, val in enumerate(vals)]
    script_to_be_executed = ";".join(scripts)
    driver.execute_script(script_to_be_executed)
    driver.execute_script("document.getElementsByClassName('btn btn-default')[0].click()")

The code is 12 lines length (plus no more than a few lines for library imports), so it is really short. I bet using any RPA tool you’ll have much more code-like lines.

Video

You can see the results in short video below.

Please check previous article I wrote comparing 4 different automation methods as well. If you have some spare time – give this challenge a try. It is a nice one!

Update

I’ve been challenge by another person who did it with twice better result. And this is what I love about challenges 🙂 So I had another thought on this and finally I managed to decrease time to 0.034 sec. with following code.

execute = []

wb = openpyxl.load_workbook('challenge.xlsx')
ws = wb.active
col_names = [ws.cell(row=1,column=j).value.strip() for j in range(1,8)]

execute.append("document.getElementById('start').click()")
for i in range(2,12):
    vals = [ws.cell(row=i,column=j).value for j in range(1,8)]
    scripts = ["document.evaluate('//label[contains(text()," + '"' + col_names[idx]  + '"' + ")]', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.parentElement.getElementsByTagName('input')[0].setAttribute('value', '" + str(val) + "')" for idx, val in enumerate(vals)]
    scripts = ";".join(scripts)
    execute.append(scripts)
    execute.append("document.getElementsByClassName('btn btn-default')[0].click()")
    
execute = ";".join(execute)
driver.execute_script(execute)

The code above is basically translates Excel data into series of JS lines and runs it with one command. Very effective in this case, not recommended in real life though.

@sejm_watch - what are the sparrows in the Polish parliament chirping about<<

Dodaj komentarz

avatar
  Subscribe  
Notify of