Resume parsing

Want to talk about?

It’s a boring time for a recruiter. SAP knows well how to it his job, there is nothing to do about. Position requests from managers are being processed themselves, posted to internal and external web-portals or agencies. Feedbacks are being sent back to managers by themselves and automatically. Everything is integrated. CV is being read from email, parsed to bones and stored in candidate database. Interviews are being initiated from a mobile phone, rooms are reserved. Boring, no fun at all.

Everything is clear except CV. We know every resume is made of a typical skeleton, where is personal info, contacts, work experience. Every part could be formalized, parsed to its components and analyzed by a number of factors and variants of appearance.

We understand that First and Last names could match file name, never is written with punctuation characters, always start with a capital letter or are all capital and resides in the top part of a doc.

We also understand that contact phone number has fixed number of digits, patterns are also well known and it’s placed somewhere by name or e-mail address.

We understand that work experience is a consequence of the same type blocks with company, period, position and job functions specification. It’s just a table which can be retrieved from CV somehow. Let’s say exported in XML format, where we can easily find repeating elements that appear more than once.

But how to find it in a plain text? Elementary! There are so-called regular expressions. Google it. Let’s use them to do resume parsing.

Here is the way to work with them in  ABAP:
IN text
WITH new

Or any email address could be found in a text with this code:

Data v_pattern = ‘^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$’.

lr_matcher = cl_abap_matcher=> create (pattern = v_pattern Text = ’’).

CALL METHOD lr_matcher->match RECEIVING success = v_sucess.

IF v_sucess = abap_false.

Message ‘Invalid email id’ TYPE ‘I’.


(с) Code samples are from SDN. Author is:
Author: Shaira Madhu
Company: Applexus Software Solutions (P) Ltd
Created on: 25 October 2010