Job Statistics Scraper
The aim of this project is to collect and organize data from different sources of job listings. It then conducts a descriptive analysis of the data and identifies the most frequent skills demanded by employers.
Description
This project aims to perform data analysis, data manipulation and data predictions, on some scraped sources for job listings. The idea here is that the application will help users that are in the process of job seeking. It will assist them by scraping job posts, present exploratory analysis and outputs the most common hard/soft skills that most companies require for a set number of job titles.
Exploratory Data Analysis
The steps I explore are the following:
- Initial Data Exploration
- Text Pre-Processing
- Convert all text to lower case
- Remove special characters, unnecessary punctuation and digits
- Tokenize and remove stop words
- Lemmatize the description words
- Wordcloud visualizations
- Visualizing n-gram distributions
Classifiers
The problem at hand is transformed into a supervised learning problem. For this, sever classifiers are tested and I present the results. Firstly, the dataset has to be prepared before any machine learning technique takes place. For this I did the following:
- Bag-of-Words
- Null accuracy checks
- Optimizing BoW
Then I trained seceral classifiers. In more detail the classifiers that I tested are the following:
- MultinomialNB
- Support Vector Machines
- Linear Support Vector Machine
- Random Forest
- Logistic Regression
In addition, more advanced classifiers were created based on word embeddings, neural networks and word2vec.