Real Life Application I
Last updated
Was this helpful?
Last updated
Was this helpful?
This is a very basic skill recommendation engine, just for practice and illustration, that how a very small logic can build great insights if data is interpreted effectively. We have taken a data of sample size, just 1000 and so we are not concerned about the accuracy of the output here.
Reading .json file and writing JSON format file in python
Practice with for loop, dictionary, function along with error handling
Finally, we will create a recommendation engine
This will give a graph-like data structure, you can practice with it as well: link to be updated for details
The .json file contains multilayer nested lists
The database is prepared by scraping 3000 LinkedIn profiles
Each element (list) of the list correspond to one individual profile on Linkedin
The first element of list within the list are job titles and the second element are skills
Example: [[j1,s1], [j2,s2], [j3,s3]], this list has 3 individual profiles, j1 is a list of job titles for the first profile, and s1 is a list of skills of the first profile
Create a dictionary of tupple dictionary of the structure:
To do it:
Create a dictionary within a dictionary
where outermost dictionary keys are the job titles, and the first level dictionary kays are the skills
Corresponding to each job title, whatever skill you are getting, keep track of the count as values
Keep track of the count of each job title as well
Finally, divide the count of skill for a job title with the job title count to get the degree of association between a job title and a skill
store JSON file
Create a Flask API, where you take job title and number of skills to return as input, and return skills corresponding to the job title
Check if the key is present, return top skills; else return an error "model need to retrain"
If you want to read more about RESTful API, refer to the following link
This is done for web pages with static content only (HTML and CSS)
For dynamic content (javascript), we need to use browsers (selenium and scrappy are well known)
Create the best-selling books database with the following information:
Name of the book
Author of the book
Language
First edition year
Genre
First Paragraph of the book's Wikipedia page
Our aim is to collect as much information we can collect, but not necessarily all will be present, so save the default value as an empty string if the information is not available
BeautifulSoup
requests
time
We will store all the information in the form of a dictionary, where the key will be the book name and values will be comprised of all attributes related to the book
After getting the list of books and their Wikipedia pages, store information in the dictionary which can be collected from this page, and use links to get information about the book from a specific Wikipedia page for the book
Use requests package for loading the static content of the page, type of content will be string
It is difficult to get different attributes from string, so we use the BeautifulSoup for the extraction
To identify the field we want to scrap, open the browser, right click and click on the inspect. Selecting the item with give the property, which helps in getting the exact content with BeautifulSoup
to be updated....
Use for the purpose. Load the stored JSON file
Use the following page as the starting point for different books: