Real Life Application II

Miscellaneous Applications

Kafka Functionalities

  • Create a topic

  • List all topics

  • Delete a topic

  • Consume messages from a Kafka topic

  • Produce messages to a Kafka topic

ElasticSearch (ES) Functionalities

  • Writing data to ES index (writing in bulk)

  • Query ES data

  • Update ES data

Redis Functionalities

  • Different data types in Redis and their use

  • Writing data to Redis

  • Updating Redis data

Search Engines: Books and Movies

Dependencies

  • Elasticsearch (data writer using python to ES)

  • Flask REST API (for GET request)

Objective

  • We will be creating two search engines:

    • Book search engine

    • Movie search engine

  • Input should be:

    • where to search: books or movies

    • Search fields?

      • For books: author, sold copies, search based on keywords, language, genre, etc.

      • For movies: director, actors, rating, keyword-based search, release year, etc.

    • Number of results (keep default as 10)

Hint: The same dataset of IMDb movies can also be used for recommendation engine,

  • Input is a movie name, the recommendation system returns the top 10 similar movies

  • To read more about the recommendation system, here is the link (further data-science related applications and content is under progress)

Dataset

Dataset for books: Use the JSON file prepared with scrapping

Dataset for movies: Use the attached CSV file

For the detailed (more attributes) and bigger movie dataset (85000): refer to this Kaggle link

How to do it?

  • We will be creating two indices:

    • book_dataset

    • movie_dataset

  • Write and test query in "dev tools" (within Kibana)

  • Create Flask API for communication with the application

Github Repository: Basic Search Engine

to be updated....

Creating First Data Pipeline

Data Pipeline Flow

  • Elasticsearch (data writer using python to ES)

  • Kafka reader and ES writer, data pipeline example:

    • Scrap data of books from Wikipedia and store to JSON file

    • moving data from JSON file to a Kafka topic

    • read from Kafka topic and write to designated ES index

    • Use this ES index for search engine

Hint: In the pipeline, we should not use JSON, instead directly write to a Kafka topic (here written in a JSON file first to separately write two applications for understanding)

Pre-requisites

Here, we are just bringing together different elements, different applications we have built in the previous sections.

link to be updated....

+Functionalities

  • Writing data to ES index (writing in bulk)

  • Query ES data

  • Update ES data

Redis Functionalities

  • Different data types in Redis and their use

  • Writing data to Redis

  • Updating Redis data

Search Engines: Books and Movies

Dependencies

  • Elasticsearch (data writer using python to ES)

  • Flask REST API (for GET request)

Objective

  • We will be creating two search engines:

    • Book search engine

    • Movie search engine

  • Input should be:

    • where to search: books or movies

    • Search fields?

      • For books: author, sold copies, search based on keywords, language, genre, etc.

      • For movies: director, actors, rating, keyword-based search, release year, etc.

    • Number of results (keep default as 10)

Hint: The same dataset of IMDb movies can also be used for recommendation engine,

  • Input is a movie name, the recommendation system returns the top 10 similar movies

  • To read more about the recommendation system, here is the link (further data-science related applications and content is under progress)

Dataset

Dataset for books: Use the JSON file prepared with scrapping

Dataset for movies: Use the attached CSV file

For the detailed (more attributes) and bigger movie dataset (85000): refer to this Kaggle link

How to do it?

  • We will be creating two indices:

    • book_dataset

    • movie_dataset

  • Write and test query in "dev tools" (within Kibana)

  • Create Flask API for communication with the application

Github Repository: Basic Search Engine

to be updated....

Creating First Data Pipeline

Data Pipeline Flow

  • Elasticsearch (data writer using python to ES)

  • Kafka reader and ES writer, data pipeline example:

    • Scrap data of books from Wikipedia and store to JSON file

    • moving data from JSON file to a Kafka topic

    • read from Kafka topic and write to designated ES index

    • Use this ES index for search engine

Hint: In the pipeline, we should not use JSON, instead directly write to a Kafka topic (here written in a JSON file first to separately write two applications for understanding)

Pre-requisites

Here, we are just bringing together different elements, different applications we have built in the previous sections.

link to be updated....

Last updated

Was this helpful?