🏷️sec_sentiment_rnn
Like word similarity and analogy tasks,
we can also apply pretrained word vectors
to sentiment analysis.
Since the IMDb review dataset
in :numref:sec_sentiment
is not very big,
using text representations
that were pretrained
on large-scale corpora
may reduce overfitting of the model.
As a specific example
illustrated in :numref:fig_nlp-map-sa-rnn
,
we will represent each token
using the pretrained GloVe model,
and feed these token representations
into a multilayer bidirectional RNN
to obtain the text sequence representation,
which will
be transformed into
sentiment analysis outputs :cite:Maas.Daly.Pham.ea.2011
.
For the same downstream application,
we will consider a different architectural
choice later.
from d2l import mxnet as d2l
from mxnet import gluon, init, np, npx
from mxnet.gluon import nn, rnn
npx.set_np()
batch_size = 64
train_iter, test_iter, vocab = d2l.load_data_imdb(batch_size)
#@tab pytorch
from d2l import torch as d2l
import torch
from torch import nn
batch_size = 64
train_iter, test_iter, vocab = d2l.load_data_imdb(batch_size)
In text classifications tasks,
such as sentiment analysis,
a varying-length text sequence
will be transformed into fixed-length categories.
In the following BiRNN
class,
while each token of a text sequence
gets its individual
pretrained GloVe
representation via the embedding layer
(self.embedding
),
the entire sequence
is encoded by a bidirectional RNN (self.encoder
).
More concretely,
the hidden states (at the last layer)
of the bidirectional LSTM
at both the initial and final time steps
are concatenated
as the representation of the text sequence.
This single text representation
is then transformed into output categories
by a fully connected layer (self.decoder
)
with two outputs ("positive" and "negative").
class BiRNN(nn.Block):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = rnn.LSTM(num_hiddens, num_layers=num_layers,
bidirectional=True, input_size=embed_size)
self.decoder = nn.Dense(2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = np.concatenate((outputs[0], outputs[-1]), axis=1)
outs = self.decoder(encoding)
return outs
#@tab pytorch
class BiRNN(nn.Module):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = nn.LSTM(embed_size, num_hiddens, num_layers=num_layers,
bidirectional=True)
self.decoder = nn.Linear(4 * num_hiddens, 2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
self.encoder.flatten_parameters()
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs, _ = self.encoder(embeddings)
# Concatenate the hidden states of the initial time step and final
# time step to use as the input of the fully connected layer. Its
# shape is (batch size, 4 * no. of hidden units)
encoding = torch.cat((outputs[0], outputs[-1]), dim=1)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
outs = self.decoder(encoding)
return outs
Let's construct a bidirectional RNN with two hidden layers to represent single text for sentiment analysis.
#@tab all
embed_size, num_hiddens, num_layers, devices = 100, 100, 2, d2l.try_all_gpus()
net = BiRNN(len(vocab), embed_size, num_hiddens, num_layers)
net.initialize(init.Xavier(), ctx=devices)
#@tab pytorch
def init_weights(m):
if type(m) == nn.Linear:
nn.init.xavier_uniform_(m.weight)
if type(m) == nn.LSTM:
for param in m._flat_weights_names:
if "weight" in param:
nn.init.xavier_uniform_(m._parameters[param])
net.apply(init_weights);
Below we load the pretrained 100-dimensional (needs to be consistent with embed_size
) GloVe embeddings for tokens in the vocabulary.
#@tab all
glove_embedding = d2l.TokenEmbedding('glove.6b.100d')
Print the shape of the vectors
for all the tokens in the vocabulary.
#@tab all
embeds = glove_embedding[vocab.idx_to_token]
embeds.shape
We use these pretrained
word vectors
to represent tokens in the reviews
and will not update
these vectors during training.
net.embedding.weight.set_data(embeds)
net.embedding.collect_params().setattr('grad_req', 'null')
#@tab pytorch
net.embedding.weight.data.copy_(embeds)
net.embedding.weight.requires_grad = False
Now we can train the bidirectional RNN for sentiment analysis.
lr, num_epochs = 0.01, 5
trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': lr})
loss = gluon.loss.SoftmaxCrossEntropyLoss()
d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
#@tab pytorch
lr, num_epochs = 0.01, 5
trainer = torch.optim.Adam(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss(reduction="none")
d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
We define the following function to predict the sentiment of a text sequence using the trained model net
.
#@save
def predict_sentiment(net, vocab, sequence):
"""Predict the sentiment of a text sequence."""
sequence = np.array(vocab[sequence.split()], ctx=d2l.try_gpu())
label = np.argmax(net(sequence.reshape(1, -1)), axis=1)
return 'positive' if label == 1 else 'negative'
#@tab pytorch
#@save
def predict_sentiment(net, vocab, sequence):
"""Predict the sentiment of a text sequence."""
sequence = torch.tensor(vocab[sequence.split()], device=d2l.try_gpu())
label = torch.argmax(net(sequence.reshape(1, -1)), dim=1)
return 'positive' if label == 1 else 'negative'
Finally, let's use the trained model to predict the sentiment for two simple sentences.
#@tab all
predict_sentiment(net, vocab, 'this movie is so great')
#@tab all
predict_sentiment(net, vocab, 'this movie is so bad')
pip install spacy
) and install the English package (python -m spacy download en
). In the code, first, import spaCy (import spacy
). Then, load the spaCy English package (spacy_en = spacy.load('en')
). Finally, define the function def tokenizer(text): return [tok.text for tok in spacy_en.tokenizer(text)]
and replace the original tokenizer
function. Note the different forms of phrase tokens in GloVe and spaCy. For example, the phrase token "new york" takes the form of "new-york" in GloVe and the form of "new york" after the spaCy tokenization.:begin_tab:mxnet
Discussions
:end_tab:
:begin_tab:pytorch
Discussions
:end_tab:
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》