Classify job titles (form 2 sources) into standard caregories using scikit -comparing 2-3 different approaches –
Project detail
Create software on AWS EC2 in python from the scratch.
This software will take data from an .xls spreadsheet and will return another xls with a new column “standardized role”.
We need to extract standard company roles (i.e. HR or Operations) from 2 fields: headline and title – categorization
using python library, possibly scikit
we need to use TfidfVectorizer to find keywords from these 2 original fields
and also clustering/classification
It would be nice to compare the result from 2-3 different approaches if they all match then automatically categorize the record otherwise ask for human supervision.