Real Life Application II

Miscellaneous Applications

Kafka Functionalities

Create a topic
List all topics
Delete a topic
Consume messages from a Kafka topic
Produce messages to a Kafka topic

ElasticSearch (ES) Functionalities

Writing data to ES index (writing in bulk)
Query ES data
Update ES data

Redis Functionalities

Different data types in Redis and their use
Writing data to Redis
Updating Redis data

Search Engines: Books and Movies

Dependencies

Elasticsearch (data writer using python to ES)
Flask REST API (for GET request)

Objective

We will be creating two search engines:
- Book search engine
- Movie search engine
Input should be:
- where to search: books or movies
- Search fields?
  - For books: author, sold copies, search based on keywords, language, genre, etc.
  - For movies: director, actors, rating, keyword-based search, release year, etc.
- Number of results (keep default as 10)

Hint: The same dataset of IMDb movies can also be used for recommendation engine,

Input is a movie name, the recommendation system returns the top 10 similar movies
To read more about the recommendation system, here is the link (further data-science related applications and content is under progress)

Dataset

Dataset for books: Use the JSON file prepared with scrapping

Dataset for movies: Use the attached CSV file

For the detailed (more attributes) and bigger movie dataset (85000): refer to this Kaggle link

How to do it?

We will be creating two indices:
- book_dataset
- movie_dataset
Write and test query in "dev tools" (within Kibana)
Create Flask API for communication with the application

Github Repository: Basic Search Engine

to be updated....

Creating First Data Pipeline

Data Pipeline Flow

Elasticsearch (data writer using python to ES)
Kafka reader and ES writer, data pipeline example:
- Scrap data of books from Wikipedia and store to JSON file
- moving data from JSON file to a Kafka topic
- read from Kafka topic and write to designated ES index
- Use this ES index for search engine

Hint: In the pipeline, we should not use JSON, instead directly write to a Kafka topic (here written in a JSON file first to separately write two applications for understanding)

Pre-requisites

Basic web scrapper
Kafka functionalities
Kafka to ES writer

Application Github Repository Link

Here, we are just bringing together different elements, different applications we have built in the previous sections.

link to be updated....

+Functionalities

Writing data to ES index (writing in bulk)
Query ES data
Update ES data