RPA Challenge (Shortest Path) with Python, Selenium & Tesseract
Few weeks ago I found and tried myself in RPA challenge. It was fun indeed. There have been few new ones released lately. I’m taking closer look into one of these – shortest path. It seems to even more interesting!
What’s the task then? Authors want challengers to do attended bot, but I decided to do it in unattended mode as it is simply doable. Goal of the exercise, in short summary, is to match pairs of balloons on the map, each red with closest green. Then read data shown below the map, fill the form with the data and submit. Difficulty here is data table rows comes in random order and most of row headers are images with text and noise. It is not that easy to read it and match with corresponding field on the form.
I did it using 2 different ways – with and without OCR (Optical Character Recognition). The one with no OCR is much faster (~7-8 seconds for entire exercise), for OCR I used Tesseract OCR (~40 seconds). The final time is not that important as both methods require few tricks to make them work well and this what’s much more interesting here. Let me share the details.
Selecting right points on a map
Let me first describe how we can make bot to select points (balloons) on map in unattended mode. Aim is to select red balloon (demand) with closest green one (supply). For human operator that’s no problem unless points are too close to each other. Bot needs to do some calculation though.
One possible way to do it is to locate image position on the screen and then calculate distances. However not all balloons are visible without scrolling the map. Let’s see what’s the code behind them.
Highlighted CSS function moves the object in 3-dimensional space, pixel numbers are reposition vector coordinates (x, y, z). 3rd dimension is not used (always 0px). Assuming we have x, y coordinates now, we can calculate the distances between the balloons using Pythagorean theorem and choose shortest distances.
(x1, y1), (x2, y2) are coordinates of respectively balloons 1 and 2. c is the distance we’re looking for. And a & b are 2 remaining sides of right triangle (with c as hypotenuse). a = (x2 – x1), b = (y2 – y1) therefore c = sqrt((x2 – x1)^2 + (y2 – y1)^2).
Here’s python code for that. Function takes red balloon we’re looking to find a pair for and list of all green balloons as arguments. It calculates distances between red balloon and all green balloons, choosing smallest value and returning green balloon object.
#calculate closest (to 'balloon') balloon from targets list def findClosestBalloon(balloon, targets): closest_distance = 10000 closest_target = None balloon_coords = getBalloonCoord(balloon) for target in targets: target_coords = getBalloonCoord(target) a = abs(int(balloon_coords) - int(target_coords)) b = abs(int(balloon_coords) - int(target_coords)) c = pow((a*a+b*b),(1/2)) if c<closest_distance: closest_target = target closest_distance = c return closest_target #get red/green balloon coordinates def getBalloonCoord(balloon): styles = balloon['style'] styles = styles.split(";") for style in styles: style = style.split(":") if style.strip() == "transform": coords = style.strip().replace("translate3d(","").replace(")","").replace("px","").split(",")[0:2] return coords
Reading text from images
So what’s the challenge here? Most of the details’ headers are noised images with text. To properly find out which detail is in a row we need to read text from image. But is this necessarily required?
You have to be clever!
We have everything we need to correctly capture or rather to do an educated guess what is represented in each table row. How? Through data analysis.
- Ship preference – can be only three values: Enclosed, Flatbed or SteepDeck
- Cargo preference – can be two values only: Urgent or Permit required (Premit required actually)
- State – 2 characters, non-numerical
- Zip Code – 5 characters, last always numerical
- Demand date – 10 characters, 3rd and 6th = “-“, remaining characters numerical
- Address 2 – you can find in balloon popup
Address 1, City and Cargo remains with no clear rules, as above, we can use. What to do now? Assuming noise on images is random and knowing the texts are different in characters length, we can try to see what’s the average color of an image (presumably ‘Address 1’ will be darker than ‘City’). Another trick here is the images are provided to webpage as base64 encoded data.
from PIL import Image import numpy as np from io import BytesIO import base64 #getting image data img_base64 = td_img['src'].replace("data:image/png;base64,", "") #decoding image data, and reading the image img = Image.open(BytesIO(base64.b64decode(img_base64))) #calculating image mean img_mean = np.mean(img)
Results of this exercise are promising, ‘City’ image mean is in range (1.2, 1.75), ‘Cargo’ image mean in range (1.75, 2.17) and ‘Address 1’ in (2.17, 3). Important to mention that ‘Address 2’ image mean is very similar to ‘Address 1’ so we’d need to exclude it. This way we have complete set of rules to assign our variables.
if len(tds.text)==2 and tds.text.isnumeric()==False: demand_state = tds.text elif tds.text=="Enclosed": demand_ship_preference = "Enclosed" elif tds.text=="Flatbed": demand_ship_preference = "Flatbed" elif tds.text=="SteepDeck": demand_ship_preference = "SteepDeck" elif tds.text=="Premit Required": demand_cargo_preference = "Premit Required" elif tds.text=="Urgent": demand_cargo_preference = "Urgent" elif len(tds.text)==10 and tds.text[2:3]=="-": demand_date = tds.text elif len(tds.text)==5 and tds.text[-1:].isnumeric()==True: demand_zip = tds.text elif img_mean>1.2 and img_mean<1.75: demand_city = tds.text elif img_mean>1.75 and img_mean<2.17: demand_cargo = tds.text elif img_mean>2.17 and img_mean<3 and tds.text!=demand_address2: demand_address1 = tds.text
We can treat supply data similarly.
Second approach is to read image text with OCR. I used Tesseract OCR as it is first of all free and secondly considered best among free OCRs. I already mentioned that images have, apart of text, plenty of random noise therefore OCR didn’t initially returned good results. What we can do about it? Denoise it!
import cv2 from PIL import Image img = np.array(img) #convert image to array alpha = img[:,:,3] # extract alpha img = ~alpha # invert b/w _, blackAndWhite = cv2.threshold(img, 140, 255, cv2.THRESH_BINARY_INV) #apply treshold with cv2 img = cv2.bitwise_not(blackAndWhite) #invert again
Results are not stunning but enough to improve Tesseract hit rate.
However this is still not it. Some results are distorted ‘Slate’ or ‘.Stale’ instead of ‘State’. There is a method to measure distance between words numerically. It is called Levenshtein distance (or edit distance) and we can use it easily in python. Here’s full code for text recognition.
from nltk.metrics import edit_distance import string import cv2 from PIL import Image word_dict = ['State','Cargo','Ship preference','Zip Code','City','Address 1','Address 2','Cargo preference','Shipping date'] def findClosestPhrase(img, word_dict): img = np.array(img) alpha = img[:,:,3] # extract it img = ~alpha # invert b/w _, blackAndWhite = cv2.threshold(img, 140, 255, cv2.THRESH_BINARY_INV) img = cv2.bitwise_not(blackAndWhite) img_text = pytesseract.image_to_string(img, lang='eng') # read text from image img_text = img_text.translate(str.maketrans('', '', string.punctuation)) # remove punctuation words_distance = [edit_distance(img_text, x) for x in word_dict] #Levenshtein distance return word_dict[word_dist.index(min(words_distance))]
Good things about RPA challenge is you can do it as many times as you want and secondly it shows you the rating. Having scripts for both methods I checked how accurate are they. The scripts performed 100 (updating 8000 fields) times and the result is as follows.
Method with no OCR missed to find out correctly 204 times out of 8000 updated fields (accuracy 97.45%). Tesseract supported by denoising and Levenshtein distance missed 16 times out of 8000 fields (accuracy 99.8%). Both results are pretty good, but I believe there’s still plenty to improve and 100% is in the reach.
Updating form fields
Once I have all the contract data assigned to variables I can inject JS script to the page to update the fields. I won’t be describing the method here. You can find it in previous article.
You can find video version here.
Of course this challenge is not real life scenario but it includes elements you may approach on your RPA way. I’m encouraging you to give this and the previous challenge a try!