Real Life Application I

Build Skill Recommendation for a Job Title

This is a very basic skill recommendation engine, just for practice and illustration, that how a very small logic can build great insights if data is interpreted effectively. We have taken a data of sample size, just 1000 and so we are not concerned about the accuracy of the output here.

Learnings

  • Reading .json file and writing JSON format file in python

  • Practice with for loop, dictionary, function along with error handling

  • Finally, we will create a recommendation engine

  • This will give a graph-like data structure, you can practice with it as well: link to be updated for details

Sample Dataset

Linkedin profiles Job Title and Skill Data - 3000 Profiles
  • The .json file contains multilayer nested lists

  • The database is prepared by scraping 3000 LinkedIn profiles

  • Each element (list) of the list correspond to one individual profile on Linkedin

  • The first element of list within the list are job titles and the second element are skills

  • Example: [[j1,s1], [j2,s2], [j3,s3]], this list has 3 individual profiles, j1 is a list of job titles for the first profile, and s1 is a list of skills of the first profile

# Sample dataset and output here
sample_dataset = [[["full stack developer", "java developer"],["java", "maven", "angular"]], [["product manager", "java developer"],["Kibana", "system design", "java"]], [["python developer", "full stack developer"], ["python","django", "system design", "angular"]]]

output_dictionary = {'full stack developer': {'count': 2,
  'skills': [('angular', 2),
   ('django', 1),
   ('java', 1),
   ('python', 1),
   ('maven', 1),
   ('system design', 1)]},
 'java developer': {'count': 2,
  'skills': [('java', 2),
   ('maven', 1),
   ('Kibana', 1),
   ('angular', 1),
   ('system design', 1)]},
 'product manager': {'count': 1,
  'skills': [('java', 1), ('Kibana', 1), ('system design', 1)]},
 'python developer': {'count': 1,
  'skills': [('django', 1),
   ('python', 1),
   ('angular', 1),
   ('system design', 1)]}}
   
  # we create a function out of it which gives us an output as given below
  # In case of difficulties, look at the application code first and then try yourself

What and How to do?

  • Create a dictionary of tupple dictionary of the structure:

# for "full stack developer" and 20 skills limimt, the output is:
   # within skills we have, list of tuple to keep the order skills as per count
   # dictionaries are unordered, so that would lower down the value of dataset
 
 {'count': 37,
 'skills': [('java', 0.5945945945945946),
  ('javascript', 0.24324324324324326),
  ('reactjs', 0.1891891891891892),
  ('spring framework', 0.1891891891891892),
  ('fullstack', 0.16216216216216217),
  ('spring boot', 0.16216216216216217),
  ('microservices', 0.16216216216216217),
  ('developer', 0.16216216216216217),
  ('css', 0.13513513513513514),
  ('angular', 0.13513513513513514),
  ('html', 0.13513513513513514),
  ('ios', 0.10810810810810811),
  ('ui', 0.10810810810810811),
  ('php', 0.10810810810810811),
  ('c#', 0.10810810810810811),
  ('advance java', 0.10810810810810811),
  ('android', 0.10810810810810811),
  ('sql', 0.10810810810810811),
  ('ui_html', 0.08108108108108109),
  ('scala', 0.08108108108108109)]}
  • To do it:

    • Create a dictionary within a dictionary

    • where outermost dictionary keys are the job titles, and the first level dictionary kays are the skills

    • Corresponding to each job title, whatever skill you are getting, keep track of the count as values

    • Keep track of the count of each job title as well

    • Finally, divide the count of skill for a job title with the job title count to get the degree of association between a job title and a skill

    • store JSON file

Flask API - Recommendation Engine

Objective:

  • Create a Flask API, where you take job title and number of skills to return as input, and return skills corresponding to the job title

How to do it?

  • Use the previously built model for the purpose. Load the stored JSON file

  • Check if the key is present, return top skills; else return an error "model need to retrain"

  • If you want to read more about RESTful API, refer to the following link

Web Scrapper (without selenium)

  • This is done for web pages with static content only (HTML and CSS)

  • For dynamic content (javascript), we need to use browsers (selenium and scrappy are well known)

Objective

  • Create the best-selling books database with the following information:

    • Name of the book

    • Author of the book

    • Language

    • First edition year

    • Genre

    • First Paragraph of the book's Wikipedia page

Our aim is to collect as much information we can collect, but not necessarily all will be present, so save the default value as an empty string if the information is not available

Packages to be Used

  • BeautifulSoup

  • requests

  • time

How to do it?

  • We will store all the information in the form of a dictionary, where the key will be the book name and values will be comprised of all attributes related to the book

  • Use the following page as the starting point for different books: https://en.wikipedia.org/wiki/List_of_best-selling_books

  • After getting the list of books and their Wikipedia pages, store information in the dictionary which can be collected from this page, and use links to get information about the book from a specific Wikipedia page for the book

Steps to follow for Scrapping

  • Use requests package for loading the static content of the page, type of content will be string

  • It is difficult to get different attributes from string, so we use the BeautifulSoup for the extraction

  • To identify the field we want to scrap, open the browser, right click and click on the inspect. Selecting the item with give the property, which helps in getting the exact content with BeautifulSoup

to be updated....

Last updated

Was this helpful?