Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Gavin Yu 2e0ee8e2a4 | 8 months ago | |
---|---|---|
.github | 9 months ago | |
docs | 9 months ago | |
examples | 9 months ago | |
mindnlp | 8 months ago | |
requirements | 8 months ago | |
scripts | 1 year ago | |
tests | 8 months ago | |
.gitignore | 10 months ago | |
.readthedocs.yaml | 1 year ago | |
LICENSE | 1 year ago | |
NOTICE | 1 year ago | |
README.md | 11 months ago | |
setup.py | 11 months ago |
Introduction |
Quick Links |
Installation |
Get Started |
Tutorials |
Notes
MindNLP is an open source NLP library based on MindSpore. It supports a platform for solving natural language processing tasks, containing many common approaches in NLP. It can help researchers and developers to construct and train models more conveniently and rapidly.
The master branch works with MindSpore master.
To install MindNLP from source, please run:
pip install git+https://github.com/mindspore-lab/mindnlp.git
We will next quickly implement a sentiment classification task by using mindnlp.
from mindspore import ops
from mindnlp.abc import Seq2vecModel
class SentimentClassification(Seq2vecModel):
def construct(self, text):
_, (hidden, _), _ = self.encoder(text)
context = ops.concat((hidden[-2, :, :], hidden[-1, :, :]), axis=1)
output = self.head(context)
return output
The following are some of the required hyperparameters in the model training process.
# define Models & Loss & Optimizer
hidden_size = 256
output_size = 1
num_layers = 2
bidirectional = True
drop = 0.5
lr = 0.001
The dataset was downloaded and preprocessed by calling the interface of dataset in mindnlp.
Load dataset:
from mindnlp import load_dataset
imdb_train, imdb_test = load_dataset('imdb', shuffle=True)
Initializes the vocab and tokenizer for preprocessing:
from mindnlp import Vocab
from mindnlp.transforms import BasicTokenizer
tokenizer = BasicTokenizer(True)
vocab = Vocab.from_pretrained(name="glove.6B.100d")
The loaded dataset is preprocessed and divided into training and validation:
from mindnlp.dataset import process
imdb_train = process('imdb', imdb_train, tokenizer=tokenizer, vocab=vocab, \
bucket_boundaries=[400, 500], max_len=600, drop_remainder=True)
imdb_test = process('imdb', imdb_test, tokenizer=tokenizer, vocab=vocab, \
bucket_boundaries=[400, 500], max_len=600, drop_remainder=False)
from mindnlp.modules import RNNEncoder, Glove
embedding = Glove.from_pretrained('6B', 100, special_tokens=["<unk>", "<pad>"])
# build encoder
lstm_layer = nn.LSTM(100, hidden_size, num_layers=num_layers, batch_first=True,
dropout=dropout, bidirectional=bidirectional)
encoder = RNNEncoder(embedding, lstm_layer)
# build head
head = nn.SequentialCell([
nn.Dropout(p=dropout),
nn.Sigmoid(),
nn.Dense(hidden_size * 2, output_size,
weight_init=HeUniform(math.sqrt(5)),
bias_init=Uniform(1 / math.sqrt(hidden_size * 2)))
])
# build network
network = SentimentClassification(encoder, head)
loss = nn.BCELoss(reduction='mean')
optimizer = nn.Adam(network.trainable_params(), learning_rate=lr)
Now that we have completed all the preparations, we can begin to train the model.
from mindnlp.engine.metrics import Accuracy
from mindnlp.engine.trainer import Trainer
# define metrics
metric = Accuracy()
# define trainer
trainer = Trainer(network=network, train_dataset=imdb_train, eval_dataset=imdb_test, metrics=metric,
epochs=5, loss_fn=loss, optimizer=optimizer)
trainer.run(tgt_columns="label")
This project is released under the Apache 2.0 license.
The dynamic version is still under development, if you find any issue or have an idea on new features, please don't hesitate to contact us via Github Issues.
MindSpore is an open source project that welcome any contribution and feedback.
We wish that the toolbox and benchmark could serve the growing research
community by providing a flexible as well as standardized toolkit to reimplement existing methods
and develop their own new semantic segmentation methods.
If you find this project useful in your research, please consider citing:
@misc{mindnlp2022,
title={{MindNLP}: a MindSpore NLP library},
author={MindNLP Contributors},
howpublished = {\url{https://github.com/mindlab-ai/mindnlp}},
year={2022}
}
Easy-to-use and high-performance NLP and LLM framework based on MindSpore, compatible with models and datasets of 🤗Huggingface.
https://github.com/mindspore-lab/mindnlp
Python Jupyter Notebook Cuda other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》