A simple webscraper which can extract metadata from a movie or show’s IMDb page
Requirements
How to Run Program
Metadata Attributes
Built-in Functions
[ ! ] Before you begin
pip install beautifulsoup4
pip install requests
time.sleep() as shown in main.pyRun the included main.py file or create your own instance
# main.py
import IMDbScraper
# Create an instance
scraper = IMDbScraper.IMDb_Scraper()
# Start scraping
scraper.scrape("Morbius")
scraper.scrape("https://www.imdb.com/title/tt5108870/?ref_=fn_al_tt_1")
# Output
# Title: Morbius
# Type: Movie
# Year: 2022
# Runtime: 1h 44m
# Date: April 1, 2022
# Age Rating: PG-13
# Genre: Action, Adventure, Horror, Sci-Fi, Thriller
# Cast: Jared Leto, Matt Smith, Adria Arjona, Jared Harris
# Directed by: Daniel Espinosa
# Writers: Matt Sazama, Burk Sharpless
# Keywords: vampire, based on comic, marvel comics, superhero, blood
| Attribute | Data type |
|---|---|
| title | str |
| original_title | str |
| title_type | str |
| year | int |
| end_year | int |
| day | int |
| month | int |
| date | str |
| runtime | int |
| age_rating | str |
| imdb_rating | int |
| votes | int |
| plot | str |
| poster_url | str |
| trailer_url | str |
| url | str |
| genre | list |
| cast | list |
| directors | list |
| writers | list |
| keywords | list |
| countries* | list |
| languages* | list |
| locations* | list |
*results may not be 100% accurate
Takes the name of a movie/show or an IMDb title page URL (https://www.imdb.com/title/tt0111161). Returns a dictionary with all extracted metadata.
Converts seconds to equivalent hours and minutes and formats them into a string, which is returned.
format_runtime(5570)
# Returns a string
# 1h 32m
Prints the metadata in a readable format
Returns a formatted string from list
my_list = ["spam", "eggs", "foo", "bar"]
to_string(my_list)
# Returns a string
# spam, eggs, foo, bar
Creates a simple webpage using the scraped data with the poster and trailer