67 KiB
67 KiB
import pandas as pd
movies = pd.read_csv("25k_IMDb_movie_Dataset.csv")
movies
movie title | Run Time | Rating | User Rating | Generes | Overview | Plot Kyeword | Director | Top 5 Casts | Writer | year | path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Top Gun: Maverick | $170,000,000 (estimated) | 8.6 | 187K | ['Action', 'Drama'] | After more than thirty years of service as one... | ['fighter jet', 'sequel', 'u.s. navy', 'fighte... | Joseph Kosinski | ['Jack Epps Jr.', 'Peter Craig', 'Tom Cruise',... | Jim Cash | -2022 | /title/tt1745960/ |
1 | Jurassic World Dominion | 2 hours 27 minutes | 6 | 56K | ['Action', 'Adventure', 'Sci-Fi'] | Four years after the destruction of Isla Nubla... | ['dinosaur', 'jurassic park', 'tyrannosaurus r... | Colin Trevorrow | ['Colin Trevorrow', 'Derek Connolly', 'Chris P... | Emily Carmichael | -2022 | /title/tt8041270/ |
2 | Top Gun | $15,000,000 (estimated) | 6.9 | 380K | ['Action', 'Drama'] | As students at the United States Navy's elite ... | ['pilot', 'male camaraderie', 'u.s. navy', 'gr... | Tony Scott | ['Jack Epps Jr.', 'Ehud Yonay', 'Tom Cruise', ... | Jim Cash | -1986 | /title/tt0092099/ |
3 | Lightyear | $71,101,257 | 5.2 | 32K | ['Animation', 'Action', 'Adventure'] | While spending years attempting to return home... | ['galaxy', 'spaceship', 'robot', 'rocket', 'sp... | Angus MacLane | ['Jason Headley', 'Matthew Aldrich', 'Chris Ev... | Angus MacLane | -2022 | /title/tt10298810/ |
4 | Spiderhead | not-released | 5.4 | 23K | ['Action', 'Crime', 'Drama'] | In the near future, convicts are offered the c... | ['discover', 'medical', 'test', 'reality', 'fi... | Joseph Kosinski | ['Rhett Reese', 'Paul Wernick', 'Chris Hemswor... | George Saunders | -2022 | /title/tt9783600/ |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
24397 | Delicatessen | FRF 24,000,000 (estimated) | 7.6 | 85K | ['Comedy', 'Crime'] | Post-apocalyptic surrealist black comedy about... | ['surrealist', 'black comedy', 'human meat', '... | Marc Caro | ['Jean-Pierre Jeunet', 'Marc Caro', 'Gilles Ad... | Jean-Pierre Jeunet | -1991 | /title/tt0101700/ |
24398 | Bitch Ass | not-released | 5.5 | 52 | ['Crime', 'Horror'] | A gang initiation goes wrong when a group of f... | [] | Bill Posley | ['Bill Posley', 'Teon Kelley', 'Tunde Laleye',... | Jonathan Colomb | -2022 | /title/tt13991504/ |
24399 | Bullwhip | not-released | 5.1 | 398 | ['Crime', 'Romance', 'Western'] | In order to avoid the hangman's noose, a cowbo... | ['taming of the shrew', 'fur trader', 'busines... | Harmon Jones | ['Guy Madison', 'Rhonda Fleming', 'James Griff... | Adele Buffington | -1958 | /title/tt0051438/ |
24400 | The Freshman | 1 hour 42 minutes | 6.4 | 20K | ['Comedy', 'Crime'] | An N.Y.C. film school student accepts a job wi... | ['endangered species', 'fish out of water', 'g... | Andrew Bergman | ['Marlon Brando', 'Matthew Broderick', 'Bruno ... | Andrew Bergman | -1990 | /title/tt0099615/ |
24401 | Guys and Dolls | $5,500,000 (estimated) | 7.1 | 18K | ['Comedy', 'Crime', 'Musical'] | In New York, a gambler is challenged to take a... | ['mission', 'gambler', 'new york city', 'based... | Joseph L. Mankiewicz | ['Abe Burrows', 'Damon Runyon', 'Marlon Brando... | Jo Swerling | -1955 | /title/tt0048140/ |
24402 rows × 12 columns
import numpy as np
train, dev, test = np.split(movies.sample(frac=1, random_state=69), [int(.6*len(movies)), int(.8*len(movies))])
train
movie title | Run Time | Rating | User Rating | Generes | Overview | Plot Kyeword | Director | Top 5 Casts | Writer | year | path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
4782 | Golmaal: Fun Unlimited | 2 hours 30 minutes | 7.4 | 17K | ['Action', 'Comedy', 'Drama'] | Four runaway crooks take shelter in a bungalow... | ['blind couple', 'friend', 'mute', 'slacker', ... | Rohit Shetty | ['Ajay Devgn', 'Arshad Warsi', 'Sharman Joshi'... | Neeraj Vora | -2006 | /title/tt0495034/ |
9291 | The Remains of the Day | 2 hours 14 minutes | 7.8 | 74K | ['Drama', 'Romance'] | A butler who sacrificed body and soul to servi... | ['class differences', 'butler', 'housekeeper',... | James Ivory | ['Ruth Prawer Jhabvala', 'Anthony Hopkins', 'E... | Kazuo Ishiguro | -1993 | /title/tt0107943/ |
3137 | Blue Iguana | 1 hour 40 minutes | 5.6 | 5.2K | ['Action', 'Comedy', 'Crime'] | Ex-jailbirds Eddie and Paul are on parole and ... | ['singing in a car', 'reference to the world c... | Hadi Hajaig | ['Sam Rockwell', 'Phoebe Fox', 'Ben Schwartz',... | Hadi Hajaig | -2018 | /title/tt2316479/ |
17052 | Nocturna | 1 hour 20 minutes | 7.2 | 2.4K | ['Animation', 'Adventure', 'Family'] | An orphan boy named Tim is afraid of the dark.... | ['orphan', 'night', 'cat', 'one word title', '... | Adrià García | ['Adrià García', 'Víctor Maldonado', 'Teresa V... | Víctor Maldonado | -2007 | /title/tt0836682/ |
675 | Alexander | $155,000,000 (estimated) | 5.6 | 169K | ['Action', 'Biography', 'Drama'] | Alexander, the King of Macedonia and one of th... | ['ancient greece', 'greek', 'macedonia', 'sex ... | Oliver Stone | ['Christopher Kyle', 'Laeta Kalogridis', 'Coli... | Oliver Stone | -2004 | /title/tt0346491/ |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6365 | Yalghaar | 2 hours 38 minutes | 6.4 | 1K | ['Action', 'Romance', 'War'] | The film "explores what happens in the lives o... | ['live'] | Hassan Rana | ['Shaan Shahid', 'Humayun Saeed', 'Adnan Siddi... | Hassan Rana | -2017 | /title/tt3945864/ |
7172 | Brown of Harvard | not-released | 6.2 | 1.5K | ['Action', 'Drama', 'Romance'] | Tom Brown shows up at Harvard, confident and a... | ['harvard', 'no homoeroticism', 'pre code film... | Jack Conway | ['Donald Ogden Stewart', 'Andrew Percival Youn... | Rida Johnson Young | -1926 | /title/tt0016690/ |
12644 | Deseo | 1 hour 37 minutes | 4.6 | 417 | ['Comedy', 'Drama', 'Romance'] | A succession of erotic encounters weaved into ... | ['smoking marijuana', 'female full frontal nud... | Antonio Zavala Kugler | ['Antonio Zavala Kugler', 'Christian Bach', 'A... | Arthur Schnitzler | -2013 | /title/tt1236434/ |
23970 | Saezuru Tori Wa Habatakanai: The Clouds Gather | 1 hour 25 minutes | 6.9 | 681 | ['Animation', 'Crime', 'Drama'] | Yashiro is the president of the Shinseikai Ent... | ['yaoi', 'boys love', 'gay', 'anime', 'yakuza'... | Kaori Makita | ['Kou Yoneda', 'Tarusuke Shingaki', 'Wataru Ha... | Hiroshi Seko | -2020 | /title/tt10675392/ |
13423 | 99 | 2 hours 15 minutes | 7.3 | 2.9K | ['Comedy', 'Crime', 'Drama'] | A gangster deputes two of his men to recover m... | ['cricket the sport', 'briefcase', 'caper', 'e... | Krishna D.K. | ['Raj Nidimoru', 'Krishna D.K.', 'Sita Menon',... | Raj Nidimoru | (I) (2009) | /title/tt1370429/ |
14641 rows × 12 columns
dev
movie title | Run Time | Rating | User Rating | Generes | Overview | Plot Kyeword | Director | Top 5 Casts | Writer | year | path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
20683 | The Animals | not-released | 4.2 | 398 | ['Western'] | A woman tracks down the five men who raped her... | ['rape and revenge', 'arizona territory', 'ari... | Ron Joy | ['Michele Carey', 'Henry Silva', 'Keenan Wynn'... | Richard Bakalyan | -1970 | /title/tt0065407/ |
8040 | The Lincoln Lawyer | $40,000,000 (estimated) | 7.3 | 236K | ['Crime', 'Drama', 'Mystery'] | A lawyer defending a wealthy man begins to bel... | ['defense lawyer', 'plot twist', 'drug rehabil... | Brad Furman | ['Michael Connelly', 'Matthew McConaughey', 'M... | John Romano | -2011 | /title/tt1189340/ |
23664 | What's the Worst That Could Happen? | 1 hour 34 minutes | 5.4 | 16K | ['Comedy', 'Crime'] | A rich man catches a thief burglarizing his ho... | ['breaking and entering', 'bankruptcy', 'quest... | Sam Weisman | ['Matthew Chapman', 'Martin Lawrence', 'Danny ... | Donald E. Westlake | -2001 | /title/tt0161083/ |
9078 | Cleopatra | not-released | 6.2 | 652 | ['Animation', 'History', 'Romance'] | In order to foil the enemy aliens' "Cleopatra ... | ['adult anime', 'adult animation', 'anime', 'd... | Osamu Tezuka | ['Osamu Tezuka', 'Shigemi Satoyoshi', 'Chinats... | Eiichi Yamamoto | -1963 | /title/tt0056937/ |
17118 | Seven Cities of Gold | 1 hour 43 minutes | 5.9 | 479 | ['Adventure', 'Biography', 'History'] | In 1769, a Spanish expedition to California se... | ['limping man', 'prologue', 'voice over narrat... | Robert D. Webb | ['John C. Higgins', 'Joseph Petracca', 'Richar... | Richard L. Breen | -1955 | /title/tt0048603/ |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1248 | The Wall | 1 hour 28 minutes | 6.2 | 27K | ['Action', 'Drama', 'Thriller'] | Two American Soldiers are trapped by a lethal ... | ['sniper', 'soldier', 'deception', 'wound', 'd... | Doug Liman | ['Aaron Taylor-Johnson', 'John Cena', 'Laith N... | Dwain Worrell | (II) (2017) | /title/tt4218696/ |
7091 | Commando 3 | 2 hours 13 minutes | 5.7 | 3K | ['Action', 'Adventure', 'Thriller'] | Karan goes to London to stop a terrorist attac... | ['chase'] | Aditya Datt | ['Junaid Wasi', 'Gulshan Devaiah', 'Robin Chau... | Darius Yarmil | -2019 | /title/tt8983168/ |
15635 | Adam Resurrected | 1 hour 46 minutes | 6.2 | 4.2K | ['Drama', 'War'] | In the aftermath of World War II, a former cir... | ['man wears a white suit', 'desert hotel', 'ti... | Paul Schrader | ['Noah Stollman', 'Jeff Goldblum', 'Willem Daf... | Yoram Kaniuk | -2008 | /title/tt0479341/ |
12914 | Main Street | 1 hour 32 minutes | 4.7 | 2.9K | ['Drama'] | Durham is slowly dying like the tobacco busine... | ['economic depression', 'reference to lucky st... | John Doyle | ['Colin Firth', 'Ellen Burstyn', 'Patricia Cla... | Horton Foote | -2010 | /title/tt1365483/ |
2825 | The Outrage | 1 hour 36 minutes | 6.2 | 2.2K | ['Crime', 'Drama', 'Western'] | Travelers in the 1870s Southwest discuss a rec... | ['highwayman', 'man bound and gagged', 'gun du... | Martin Ritt | ['Akira Kurosawa', 'Ryûnosuke Akutagawa', 'Pau... | Michael Kanin | -1964 | /title/tt0058437/ |
4880 rows × 12 columns
test
movie title | Run Time | Rating | User Rating | Generes | Overview | Plot Kyeword | Director | Top 5 Casts | Writer | year | path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
16788 | High in the Clouds | not-released | no-rating | 0 | ['Animation', 'Adventure', 'Comedy'] | A squirrel embarks on a journey to find an ani... | ['journey', 'cloud', 'animal', 'friend', 'trag... | Timothy Reckart | ['Jon Croker', 'Geoff Dunbar', 'Timothy Reckar... | Philip Ardagh | -2022 | /title/tt1458167/ |
17155 | Curious George 3: Back to the Jungle | 1 hour 21 minutes | 5.4 | 598 | ['Animation', 'Adventure', 'Comedy'] | Curious George goes on an epic adventure to sp... | ['sequel', 'third part', 'jungle', 'curious ge... | Phil Weinstein | ['H.A. Rey', 'Chuck Tately', 'Frank Welker', '... | Margret Rey | -2015 | /title/tt4622340/ |
5985 | Tarkan: Altin Madalyon | not-released | 6.2 | 1.6K | ['Action', 'Adventure', 'History'] | The story of Tarkan and his friends efforts to... | ['black magic', 'blood', 'axe', 'snake', 'pros... | Mehmet Aslan | ['Sadik Sendil', 'Kartal Tibet', 'Eva Bender',... | Sezgin Burak | -1973 | /title/tt0274933/ |
1640 | Harley Davidson and the Marlboro Man | $23,000,000 (estimated) | 6.1 | 20K | ['Action', 'Crime', 'Drama'] | Forced by the imminent foreclosure of their fr... | ['swimming pool', 'night club', 'voyeurism', '... | Simon Wincer | ['Mickey Rourke', 'Don Johnson', 'Chelsea Fiel... | Don Michael Paul | -1991 | /title/tt0102005/ |
9826 | I Am All Girls | not-released | 5.9 | 5.8K | ['Crime', 'Drama', 'Mystery'] | A special crimes investigator forms an unlikel... | ['child', 'bond', 'investigator', 'murder', 'c... | Donovan Marsh | ['Marcell Greeff', 'Emile Leuvennink', 'Erica ... | Wayne Fitzjohn | -2021 | /title/tt9013182/ |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11825 | The Apostle | 2 hours 14 minutes | 7.2 | 14K | ['Drama'] | After his happy life spins out of control, a p... | ['southern gothic', 'timeframe 1930s', 'preach... | Robert Duvall | ['Robert Duvall', 'Todd Allen', 'Paul Bagget',... | Robert Duvall | -1997 | /title/tt0118632/ |
14740 | My Normal | not-released | 4.1 | 756 | ['Drama', 'Romance'] | A lesbian dominatrix finds a way to use her un... | ['dominatrix', 'lesbian', 'lesbian kiss', 'blo... | Irving Schwartz | ['Renee Garzon', 'Keith Planit', 'Nicole LaLib... | Abdul Malik Abbott | -2009 | /title/tt1117983/ |
9818 | Lust och fägring stor | 2 hours 10 minutes | 6.9 | 6.5K | ['Drama', 'Romance', 'War'] | Malmö, Sweden during the Second World War. Sti... | ['extramarital affair', 'teacher student sex',... | Bo Widerberg | ['Johan Widerberg', 'Marika Lagercrantz', 'Tom... | Bo Widerberg | -1995 | /title/tt0113720/ |
4041 | RoboCop Returns | not-released | no-rating | 0 | ['Action', 'Adventure', 'Crime'] | RoboCop returns to fight crime in Detroit. | ['sequel', 'reboot', 'non comic book superhero... | Abe Forsythe | ['Edward Neumeier', 'Justin Rhodes', 'Abe Fors... | Michael Miner | NaN | /title/tt8688550/ |
23755 | The Maiden Heist | 1 hour 30 minutes | 6 | 17K | ['Comedy', 'Crime'] | A comedy centered on three museum security gua... | ['heist crime', 'caper crime', 'forgery', 'hei... | Peter Hewitt | ['Christopher Walken', 'Joseph McKenna', 'Wynn... | Michael LeSieur | -2009 | /title/tt1107860/ |
4881 rows × 12 columns
movies.describe(include='all')
movie title | Run Time | Rating | User Rating | Generes | Overview | Plot Kyeword | Director | Top 5 Casts | Writer | year | path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 24402 | 24402 | 24402 | 24402 | 24402 | 24158 | 24402 | 24402 | 24402 | 24402 | 23624 | 24402 |
unique | 23922 | 1556 | 91 | 1684 | 746 | 23957 | 21546 | 11604 | 24211 | 15562 | 250 | 23922 |
top | Rage | not-released | no-rating | 0 | ['Drama'] | none | [] | See company contact information | ['See producer', 'See preliminary cast'] | See writer | -2022 | /title/tt0114224/ |
freq | 4 | 8475 | 1740 | 1740 | 943 | 142 | 1696 | 142 | 142 | 142 | 1201 | 4 |
movies["movie title"].value_counts()
Rage 4 The Killer 4 The Beast 4 Spiral 4 The Silence 3 .. The Mule 1 Donnie Brasco 1 Little Miss Sunshine 1 Three Billboards Outside Ebbing, Missouri 1 Guys and Dolls 1 Name: movie title, Length: 23922, dtype: int64
movies["Run Time"].value_counts()
not-released 8475 1 hour 30 minutes 503 1 hour 35 minutes 376 1 hour 38 minutes 350 1 hour 31 minutes 338 ... $14,492 1 $181,415 1 $11,060,485 1 $1,043,910 1 FRF 24,000,000 (estimated) 1 Name: Run Time, Length: 1556, dtype: int64
movies["Rating"].value_counts()
no-rating 1740 6.4 852 6.2 847 6.1 819 6.3 809 ... 9.9 2 9.8 2 9.4 2 1 2 9.5 2 Name: Rating, Length: 91, dtype: int64
movies["User Rating"].value_counts()
0 1740 11K 325 1.2K 323 1.1K 315 1.3K 295 ... 501K 1 769K 1 321K 1 991K 1 347K 1 Name: User Rating, Length: 1684, dtype: int64
movies["Generes"].value_counts()
['Drama'] 943 ['Action', 'Crime', 'Drama'] 867 ['Crime', 'Drama', 'Thriller'] 609 ['Comedy', 'Drama', 'Romance'] 608 ['Crime', 'Drama'] 550 ... ['Drama', 'Romance', 'Crime'] 1 ['Drama', 'Crime', 'Mystery'] 1 ['Family', 'Adventure', 'Comedy'] 1 ['Crime', 'Mystery', 'Horror'] 1 ['Crime', 'Romance', 'Western'] 1 Name: Generes, Length: 746, dtype: int64