rmdb
IMDb enthält hunderttausende Filme, Serien, Schauspieler, Bewertungen und andere Film-Infos. Da wäre es schön, wenn man diese Daten irgendwie analysieren könnte. Leider gibt's keine bezahlbare API, wohl aber einen automatischen Export der wichtigsten Filmdaten, als Textdateien.
rmdb ist mein kleines PHP-Script, das diese Export-Dateien runterlädt und in eine relationale Datenbank importiert, damit man die Filmdaten per SQL durchsuchen kann.
Damit kann man einige interessante Infos zu Tage fördern:
Verteilung von Genres über die Jahre
SELECT start_year,
COUNT(*) AS count
FROM titles
LEFT JOIN titles_genres AS tg1
ON tg1.title_id = titles.id
WHERE tg1.genre_id = "drama" -- repeat for a few genres
-- Exclude tv episodes etc.
AND title_type = "movie"
AND start_year < YEAR(CURRENT_DATE())
GROUP BY start_year
ORDER BY start_year ASC;
Die 10 produktivsten Regisseure von guten Filmen in den 60ern
SELECT n.primary_name AS name,
COUNT(*) AS movies
FROM principals
LEFT JOIN names AS n
ON n.id = name_id
LEFT JOIN titles AS t
ON t.id = title_id
WHERE category_id = "director"
AND t.title_type = "movie"
AND t.start_year >= 1960
AND t.start_year <= 1969
AND t.runtime_minutes >= 90
AND t.average_rating > 6
AND t.num_votes > 10000
GROUP BY name_id
ORDER BY movies DESC
LIMIT 10;
name | movies |
---|---|
Jean-Luc Godard | 7 |
Blake Edwards | 6 |
Akira Kurosawa | 5 |
Michelangelo Antonioni | 5 |
Alfred Hitchcock | 5 |
John Frankenheimer | 5 |
Stanley Kramer | 4 |
Robert Aldrich | 4 |
Sergio Leone | 4 |
Roman Polanski | 4 |
Die 50 am besten bewerteten Horror-Komödien
SELECT CONCAT("[", primary_title, "](https://www.imdb.com/title/", id, ")") AS primary_title,
start_year,
average_rating,
num_votes
FROM titles
LEFT JOIN titles_genres AS tg1
ON tg1.title_id = titles.id
LEFT JOIN titles_genres AS tg2
ON tg2.title_id = titles.id
WHERE tg1.genre_id = "horror"
AND tg2.genre_id = "comedy"
-- Exclude TV episodes etc.
AND title_type = "movie"
-- Exclude little voted on movies where average_rating is often too high.
-- Higher num_votes = more popular
AND num_votes > 20000
ORDER BY average_rating DESC,
num_votes DESC
LIMIT 50;
Beliebte SciFi-Filme der 70er
Ohne Star Wars, da es nur Action, Adventure und Fantasy Genres hat.
SELECT CONCAT("[", primary_title, "](https://www.imdb.com/title/", id, ")") AS
primary_title,
start_year,
average_rating,
num_votes
FROM titles
LEFT JOIN titles_genres AS tg1
ON tg1.title_id = titles.id
WHERE tg1.genre_id = "sci-fi"
-- Exclude TV episodes etc.
AND title_type = "movie"
-- Exclude little voted on movies where average_rating is often too high.
-- Higher num_votes = more popular
AND num_votes > 10000
AND start_year >= 1970
AND start_year <= 1979
ORDER BY average_rating DESC,
num_votes DESC;
Die 10 schlechtesten Regisseure, die trotzdem noch Filme drehen
SELECT CONCAT("[", d_name, "](https://www.imdb.com/name/", name_id, ")") AS name,
ROUND(avg, 2) AS average_rating,
movies
FROM (SELECT name_id,
n.primary_name AS d_name,
SUM(t.average_rating) / COUNT(*) AS avg,
COUNT(*) AS movies
FROM principals
LEFT JOIN names AS n
ON n.id = name_id
LEFT JOIN titles AS t
ON t.id = title_id
WHERE category_id = "director"
AND t.title_type = "movie"
AND t.num_votes > 1000
AND n.death_year IS NULL
GROUP BY name_id
ORDER BY avg ASC) AS t
WHERE movies > 10
ORDER BY avg ASC
LIMIT 10;
name | average_rating | movies |
---|---|---|
Uwe Boll | 3.603 | 29 |
Fred Olen Ray | 3.7 | 11 |
Albert Pyun | 4.146 | 28 |
Bert I. Gordon | 4.158 | 12 |
Timothy Woodward Jr. | 4.191 | 11 |
Jim Wynorski | 4.25 | 14 |
Nico Mastorakis | 4.307 | 15 |
Charles Band | 4.515 | 13 |
Steven C. Miller | 4.636 | 11 |
Gregory Hatanaka | 4.727 | 11 |