Real Life Application I
Build Skill Recommendation for a Job Title
This is a very basic skill recommendation engine, just for practice and illustration, that how a very small logic can build great insights if data is interpreted effectively. We have taken a data of sample size, just 1000 and so we are not concerned about the accuracy of the output here.
Learnings
Reading .json file and writing JSON format file in python
Practice with for loop, dictionary, function along with error handling
Finally, we will create a recommendation engine
This will give a graph-like data structure, you can practice with it as well: link to be updated for details
Sample Dataset
The .json file contains multilayer nested lists
The database is prepared by scraping 3000 LinkedIn profiles
Each element (list) of the list correspond to one individual profile on Linkedin
The first element of list within the list are job titles and the second element are skills
Example: [[j1,s1], [j2,s2], [j3,s3]], this list has 3 individual profiles, j1 is a list of job titles for the first profile, and s1 is a list of skills of the first profile
# Sample dataset and output here
sample_dataset = [[["full stack developer", "java developer"],["java", "maven", "angular"]], [["product manager", "java developer"],["Kibana", "system design", "java"]], [["python developer", "full stack developer"], ["python","django", "system design", "angular"]]]
output_dictionary = {'full stack developer': {'count': 2,
'skills': [('angular', 2),
('django', 1),
('java', 1),
('python', 1),
('maven', 1),
('system design', 1)]},
'java developer': {'count': 2,
'skills': [('java', 2),
('maven', 1),
('Kibana', 1),
('angular', 1),
('system design', 1)]},
'product manager': {'count': 1,
'skills': [('java', 1), ('Kibana', 1), ('system design', 1)]},
'python developer': {'count': 1,
'skills': [('django', 1),
('python', 1),
('angular', 1),
('system design', 1)]}}
# we create a function out of it which gives us an output as given below
# In case of difficulties, look at the application code first and then try yourself
What and How to do?
Create a dictionary of tupple dictionary of the structure:
# for "full stack developer" and 20 skills limimt, the output is:
# within skills we have, list of tuple to keep the order skills as per count
# dictionaries are unordered, so that would lower down the value of dataset
{'count': 37,
'skills': [('java', 0.5945945945945946),
('javascript', 0.24324324324324326),
('reactjs', 0.1891891891891892),
('spring framework', 0.1891891891891892),
('fullstack', 0.16216216216216217),
('spring boot', 0.16216216216216217),
('microservices', 0.16216216216216217),
('developer', 0.16216216216216217),
('css', 0.13513513513513514),
('angular', 0.13513513513513514),
('html', 0.13513513513513514),
('ios', 0.10810810810810811),
('ui', 0.10810810810810811),
('php', 0.10810810810810811),
('c#', 0.10810810810810811),
('advance java', 0.10810810810810811),
('android', 0.10810810810810811),
('sql', 0.10810810810810811),
('ui_html', 0.08108108108108109),
('scala', 0.08108108108108109)]}
To do it:
Create a dictionary within a dictionary
where outermost dictionary keys are the job titles, and the first level dictionary kays are the skills
Corresponding to each job title, whatever skill you are getting, keep track of the count as values
Keep track of the count of each job title as well
Finally, divide the count of skill for a job title with the job title count to get the degree of association between a job title and a skill
store JSON file
Application Github link
Flask API - Recommendation Engine
Objective:
Create a Flask API, where you take job title and number of skills to return as input, and return skills corresponding to the job title
How to do it?
Use the previously built model for the purpose. Load the stored JSON file
Check if the key is present, return top skills; else return an error "model need to retrain"
If you want to read more about RESTful API, refer to the following link
Application Github Link
Web Scrapper (without selenium)
This is done for web pages with static content only (HTML and CSS)
For dynamic content (javascript), we need to use browsers (selenium and scrappy are well known)
Objective
Create the best-selling books database with the following information:
Name of the book
Author of the book
Language
First edition year
Genre
First Paragraph of the book's Wikipedia page
Our aim is to collect as much information we can collect, but not necessarily all will be present, so save the default value as an empty string if the information is not available
Packages to be Used
BeautifulSoup
requests
time
How to do it?
We will store all the information in the form of a dictionary, where the key will be the book name and values will be comprised of all attributes related to the book
Use the following page as the starting point for different books: https://en.wikipedia.org/wiki/List_of_best-selling_books
After getting the list of books and their Wikipedia pages, store information in the dictionary which can be collected from this page, and use links to get information about the book from a specific Wikipedia page for the book
Steps to follow for Scrapping
Use requests package for loading the static content of the page, type of content will be string
It is difficult to get different attributes from string, so we use the BeautifulSoup for the extraction
To identify the field we want to scrap, open the browser, right click and click on the inspect. Selecting the item with give the property, which helps in getting the exact content with BeautifulSoup
Application Github Link
to be updated....
Last updated
Was this helpful?