Subscribe to the PwC Newsletter

Join the community, edit dataset, edit dataset tasks.

Some tasks are inferred based on the benchmarks list.

Add a Data Loader

Remove a data loader.

Edit Dataset Modalities

Edit dataset languages, edit dataset variants.

The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset.

Add a new evaluation result row

Imdb movie reviews.

imdb dataset of 50k movie reviews

The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.

Benchmarks Edit Add a new result Link an existing benchmark

Dataset loaders edit add remove.

imdb dataset of 50k movie reviews

Similar Datasets

License edit, modalities edit, languages edit.

imdb dataset of 50k movie reviews

IMDB Movie Reviews Large Dataset - 50k Reviews

laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k

Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more .

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Imdb-movie-reviews-large-dataset-50k.

This dataset is taken from https://ai.stanford.edu/~amaas/data/sentiment/ and then preprocess to put all positive and negative reviews in the same file for training and testing. It help you to put more effort on algorithm instead of data collection.

imdb_reviews

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Additional Documentation : Explore on Papers With Code north_east

Homepage : http://ai.stanford.edu/~amaas/data/sentiment/

Source code : tfds.datasets.imdb_reviews.Builder

Download size : 80.23 MiB

Auto-cached ( documentation ): Yes

Supervised keys (See as_supervised doc ): ('text', 'label')

Figure ( tfds.show_examples ): Not supported.

imdb_reviews/plain_text (default config)

Config description : Plain text

Dataset size : 129.83 MiB

Feature structure :

imdb_reviews/bytes

Config description : Uses byte-level text encoding with tfds.deprecated.text.ByteTextEncoder

Dataset size : 129.88 MiB

imdb_reviews/subwords8k

Config description : Uses tfds.deprecated.text.SubwordTextEncoder with 8k vocab size

Dataset size : 54.72 MiB

imdb_reviews/subwords32k

Config description : Uses tfds.deprecated.text.SubwordTextEncoder with 32k vocab size

Dataset size : 50.33 MiB

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-12-10 UTC.

Datasets: datasets-maintainers / imdb Copied like 58

Dataset structure, data instances, data fields, data splits, dataset creation, curation rationale, source data, annotations, personal and sensitive information, considerations for using the data, social impact of dataset, discussion of biases, other known limitations, additional information, dataset curators, licensing information, citation information, contributions, dataset card for "imdb", dataset summary.

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Supported Tasks and Leaderboards

More Information Needed

An example of 'train' looks as follows.

The data fields are the same among all splits.

Initial Data Collection and Normalization

Who are the source language producers, annotation process, who are the annotators.

Thanks to @ghazi-f , @patrickvonplaten , @lhoestq , @thomwolf for adding this dataset.

Models trained or fine-tuned on imdb

imdb dataset of 50k movie reviews

lvwerra/distilbert-imdb

Sileod/deberta-v3-base-tasksource-nli.

imdb dataset of 50k movie reviews

mrm8488/t5-base-finetuned-imdb-sentiment

Fabriceyhc/bert-base-uncased-imdb.

imdb dataset of 50k movie reviews

edbeeching/gpt-neo-125M-imdb

imdb dataset of 50k movie reviews

federicopascual/finetuning-sentiment-model-3000-samples

Spaces using imdb.

imdb dataset of 50k movie reviews

IMDB movie review sentiment classification dataset

Load_data function.

Loads the IMDB dataset .

This is a dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a list of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words".

As a convention, "0" does not stand for a specific word, but instead is used to encode the pad token.

x_train, x_test : lists of sequences, which are lists of indexes (integers). If the num_words argument was specific, the maximum possible index value is num_words - 1 . If the maxlen argument was specified, the largest possible sequence length is maxlen .

y_train, y_test : lists of integer labels (1 or 0).

Note that the 'out of vocabulary' character is only used for words that were present in the training set but are not included because they're not making the num_words cut here. Words that were not seen in the training set but are in the test set have simply been skipped.

get_word_index function

Retrieves a dict mapping words to their index in the IMDB dataset.

The word index dictionary. Keys are word strings, values are their index.

IMAGES

  1. IMDB Dataset of 50K Movie Reviews

    imdb dataset of 50k movie reviews

  2. IMDB Dataset of 50K Movie Reviews (Spanish)

    imdb dataset of 50k movie reviews

  3. Solved Dataset :IMDB Dataset of 50K Movie Reviews

    imdb dataset of 50k movie reviews

  4. Solved Dataset :IMDB Dataset of 50K Movie Reviews

    imdb dataset of 50k movie reviews

  5. Solved Dataset :IMDB Dataset of 50K Movie Reviews

    imdb dataset of 50k movie reviews

  6. IMDb’s Top Indian Movies of the Year are Vikram Vedha and Baahubali 2: The Conclusion

    imdb dataset of 50k movie reviews

VIDEO

  1. How imdb rating works!! #shorts

  2. Pathan movie #bestvlogs

  3. Top Recommended Movie Recaps

  4. TOP 5 highest IMDB rating in Hollywood movies #shorts

  5. Reviewing Every Movie on the IMDb Top 250

  6. How to rate movie in imdb earn money with movie rating

COMMENTS

  1. IMDB Dataset of 50K Movie Reviews

    IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing

  2. IMDb Movie Reviews Dataset

    The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as

  3. Large Movie Review Dataset

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets.

  4. laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k

    IMDB Movie Reviews Large Dataset - 50k Reviews. Contribute to laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k development by creating an

  5. Machine Learning on Movie Reviews (IMDB Dataset

    This tutorial uses the IMDB movie reviews dataset containing 50k movie reviews to create a sentiment analysis and machine learning model.

  6. Deep learning on Movie Reviews Dataset (IMDB Dataset

    Deep learning on Movie Reviews Dataset (IMDB Dataset - 50k reviews) | Deep Learning Project 2 · Key moments. View all · Key moments · Description.

  7. imdb_reviews

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark

  8. imdb · Datasets at Hugging Face

    "I can't believe that those praising this movie herein aren't thinking of some other film. I was prepared for the possibility that this would be awful, but the

  9. IMDB Dataset of 50K Movie Reviews (kaggle) Language : Python

    Answer to Solved Dataset :IMDB Dataset of 50K Movie Reviews.

  10. IMDB movie review sentiment classification dataset

    This is a dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded