Job Statistics Scraper


The aim of this project is to collect and organize data from different sources of job listings. It then conducts a descriptive analysis of the data and identifies the most frequent skills demanded by employers.

Description

This project aims to perform data analysis, data manipulation and data predictions, on some scraped sources for job listings. The idea here is that the application will help users that are in the process of job seeking. It will assist them by scraping job posts, present exploratory analysis and outputs the most common hard/soft skills that most companies require for a set number of job titles.

Analytics

Exploratory Data Analysis

The steps I explore are the following:

  • Initial Data Exploration
  • Text Pre-Processing
    • Convert all text to lower case
    • Remove special characters, unnecessary punctuation and digits
    • Tokenize and remove stop words
    • Lemmatize the description words
  • Wordcloud visualizations
  • Visualizing n-gram distributions

Pie Distribution

Classifiers

The problem at hand is transformed into a supervised learning problem. For this, sever classifiers are tested and I present the results. Firstly, the dataset has to be prepared before any machine learning technique takes place. For this I did the following:

  • Bag-of-Words
  • Null accuracy checks
  • Optimizing BoW

Then I trained seceral classifiers. In more detail the classifiers that I tested are the following:

  • MultinomialNB
  • Support Vector Machines
  • Linear Support Vector Machine
  • Random Forest
  • Logistic Regression

Simple Classifiers

In addition, more advanced classifiers were created based on word embeddings, neural networks and word2vec.

Advanced Classifiers