robsite

rmdb

IMDb enthält hunderttausende Filme, Serien, Schauspieler, Bewertungen und andere Film-Infos. Da wäre es schön, wenn man diese Daten irgendwie analysieren könnte. Leider gibt's keine bezahlbare API, wohl aber einen automatischen Export der wichtigsten Filmdaten, als Textdateien.

rmdb ist mein kleines PHP-Script, das diese Export-Dateien runterlädt und in eine relationale Datenbank importiert, damit man die Filmdaten per SQL durchsuchen kann.

Damit kann man einige interessante Infos zu Tage fördern:

Verteilung von Genres über die Jahre

SELECT start_year,
       COUNT(*) AS count
FROM   titles
       LEFT JOIN titles_genres AS tg1
              ON tg1.title_id = titles.id
WHERE  tg1.genre_id = "drama" -- repeat for a few genres

       -- Exclude tv episodes etc.
       AND title_type = "movie"

       AND start_year < YEAR(CURRENT_DATE())

GROUP  BY start_year
ORDER  BY start_year ASC; 
Verteilung von Horror, Western und Sci-Fi Filmen über die Jahre

Die 10 produktivsten Regisseure von guten Filmen in den 60ern

SELECT n.primary_name AS name,
       COUNT(*)       AS movies
FROM   principals
       LEFT JOIN names AS n
              ON n.id = name_id
       LEFT JOIN titles AS t
              ON t.id = title_id
WHERE  category_id = "director"
       AND t.title_type = "movie"
       AND t.start_year >= 1960
       AND t.start_year <= 1969
       AND t.runtime_minutes >= 90
       AND t.average_rating > 6
       AND t.num_votes > 10000
GROUP  BY name_id
ORDER  BY movies DESC
LIMIT  10; 
namemovies
Jean-Luc Godard7
Blake Edwards6
Akira Kurosawa5
Michelangelo Antonioni5
Alfred Hitchcock5
John Frankenheimer5
Stanley Kramer4
Robert Aldrich4
Sergio Leone4
Roman Polanski4

Die 50 am besten bewerteten Horror-Komödien

SELECT CONCAT("[", primary_title, "](https://www.imdb.com/title/", id, ")") AS primary_title,
       start_year,
       average_rating,
       num_votes
FROM   titles
       LEFT JOIN titles_genres AS tg1
              ON tg1.title_id = titles.id
       LEFT JOIN titles_genres AS tg2
              ON tg2.title_id = titles.id
WHERE  tg1.genre_id = "horror"
       AND tg2.genre_id = "comedy"

       -- Exclude TV episodes etc.
       AND title_type = "movie"
       
       -- Exclude little voted on movies where average_rating is often too high. 
       -- Higher num_votes = more popular
       AND num_votes > 20000

ORDER  BY average_rating DESC,
          num_votes DESC
LIMIT  50; 
primary_titlestart_yearaverage_ratingnum_votes
Shaun of the Dead20047.9547079
Evil Dead II19877.7161069
Zombieland20097.6562503
What We Do in the Shadows20147.6178609
One Cut of the Dead20177.623294
Tucker and Dale vs Evil20107.5176660
An American Werewolf in London19817.5104122
Dead Alive19927.596468
Stree20187.533219
Army of Darkness19927.4174895
The Rocky Horror Picture Show19757.4148043
Bhool Bhulaiyaa20077.426087
Gremlins19847.3219049
The Return of the Living Dead19857.360200
House19777.326640
The Lost Boys19877.2138911
Re-Animator19857.263177
Bhoot Police20217.226129
Tremors19907.1135475
The Frighteners19967.187546
Little Shop of Horrors19867.174525
The Fearless Vampire Killers19677.131521
Dellamorte Dellamore19947.121375
Bubba Ho-Tep20026.948378
Warm Bodies20136.8229755
Ready or Not20196.8141645
Odd Thomas20136.852720
Creepshow19826.846916
Zombieland: Double Tap20196.7173135
Trick 'r Treat20076.790509
Fresh20226.738519
Housebound20146.734328
Fido20066.729187
Behind the Mask: The Rise of Leslie Vernon20066.723800
Night of the Creeps19866.723355
Happy Death Day20176.6136327
Death Becomes Her19926.6115999
May20026.636838
Slither20066.581269
Arachnophobia19906.569144
The Witches of Eastwick19876.569079
Bad Taste19876.546735
Better Watch Out20166.537711
Fright Night20116.4105215
Gremlins 2: The New Batch19906.4102085
Severance20066.438791
The People Under the Stairs19916.434750
Mayhem20176.420928
The Babysitter20176.389374
Dead Snow20096.366839

Beliebte SciFi-Filme der 70er

Ohne Star Wars, da es nur Action, Adventure und Fantasy Genres hat.

SELECT CONCAT("[", primary_title, "](https://www.imdb.com/title/", id, ")") AS
       primary_title,
       start_year,
       average_rating,
       num_votes
FROM   titles
       LEFT JOIN titles_genres AS tg1
              ON tg1.title_id = titles.id
WHERE  tg1.genre_id = "sci-fi"

       -- Exclude TV episodes etc.
       AND title_type = "movie"

       -- Exclude little voted on movies where average_rating is often too high. 
       -- Higher num_votes = more popular
       AND num_votes > 10000

       AND start_year >= 1970
       AND start_year <= 1979

ORDER  BY average_rating DESC,
          num_votes DESC; 
primary_titlestart_yearaverage_ratingnum_votes
Alien19798.5855270
A Clockwork Orange19718.3811076
Ivan Vasilyevich Changes His Profession19738.216458
Stalker19798.1131654
Solaris1972889827
Fantastic Planet19737.730480
Close Encounters of the Third Kind19777.6198472
Superman19787.4172371
Invasion of the Body Snatchers19787.459190
The Andromeda Strain19717.236803
Sleeper19737.142854
Time After Time19797.118387
Soylent Green1973764000
The Boys from Brazil1978727798
Westworld19736.957182
The Stepford Wives19756.917631
Mad Max19796.8204229
Logan's Run19766.856057
The Brood19796.829614
Slaughterhouse-Five19726.813000
THX 113819716.751240
Phantasm19796.636464
Silent Running19726.628965
The Man Who Fell to Earth19766.626180
Rollerball19756.624752
Horror Express19726.510758
Star Trek: The Motion Picture19796.489032
The Omega Man19716.431325
Shivers19756.420837
A Boy and His Dog19756.417756
Escape from the Planet of the Apes19716.335051
Rabid19776.318535
The Fury19786.315211
Moonraker19796.299575
Death Race 200019756.227669
Dark Star19746.224315
Conquest of the Planet of the Apes19726.132523
The Crazies19736.113134
Beneath the Planet of the Apes1970646579
The Black Hole19795.925455
Piranha19785.921507
Zardoz19745.822467
Futureworld19765.710713
Battle for the Planet of the Apes19735.430854

Die 10 schlechtesten Regisseure, die trotzdem noch Filme drehen

SELECT CONCAT("[", d_name, "](https://www.imdb.com/name/", name_id, ")") AS name,
       ROUND(avg, 2) AS average_rating,
       movies
FROM   (SELECT name_id,
               n.primary_name                       AS d_name,
               SUM(t.average_rating) / COUNT(*)     AS avg,
               COUNT(*)                             AS movies
        FROM   principals
               LEFT JOIN names AS n
                      ON n.id = name_id
               LEFT JOIN titles AS t
                      ON t.id = title_id
        WHERE  category_id = "director"
               AND t.title_type = "movie"
               AND t.num_votes > 1000
               AND n.death_year IS NULL
        GROUP  BY name_id
        ORDER  BY avg ASC) AS t
WHERE  movies > 10
ORDER  BY avg ASC
LIMIT  10; 
nameaverage_ratingmovies
Uwe Boll3.60329
Fred Olen Ray3.711
Albert Pyun4.14628
Bert I. Gordon4.15812
Timothy Woodward Jr.4.19111
Jim Wynorski4.2514
Nico Mastorakis4.30715
Charles Band4.51513
Steven C. Miller4.63611
Gregory Hatanaka4.72711
· robsite, meins, imdb, filme ·
Mastodon