Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
ms_yan f5f16369db | 2 years ago | |
---|---|---|
.. | ||
data | 2 years ago | |
output | 2 years ago | |
README.md | 2 years ago | |
create_dataset.py | 2 years ago | |
gen_mindrecord.py | 2 years ago | |
run.sh | 2 years ago | |
run_read.sh | 2 years ago |
This example is used to read data from aclImdb dataset and generate mindrecord. It just transfers the aclImdb dataset to mindrecord without any data preprocessing. You can modify the example or refer to the example to implement your own example.
Download aclImdb dataset, transfer it to mindrecord, use MindDataset to read mindrecord.
Download the training data zip.
aclImdb dataset download address -> Large Movie Review Dataset v1.0
Unzip the training data to dir example/nlp_to_mindrecord/aclImdb/data.
tar -zxvf aclImdb_v1.tar.gz -C {your-mindspore}/example/nlp_to_mindrecord/aclImdb/data/
Run the run.sh script.
bash run.sh
Output like this:
...
>> begin generate mindrecord by train data
...
[INFO] ME(20928,python):2020-05-07-23:02:40.066.546 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 256 records successfully.
>> transformed 24320 record...
[INFO] ME(20928,python):2020-05-07-23:02:40.078.344 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 256 records successfully.
>> transformed 24576 record...
[INFO] ME(20928,python):2020-05-07-23:02:40.090.237 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 256 records successfully.
>> transformed 24832 record...
[INFO] ME(20928,python):2020-05-07-23:02:40.098.785 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 168 records successfully.
>> transformed 25000 record...
[INFO] ME(20928,python):2020-05-07-23:02:40.098.957 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:214] Commit] Write metadata successfully.
[INFO] ME(20928,python):2020-05-07-23:02:40.099.302 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:45] Build] Init header from mindrecord file for index successfully.
[INFO] ME(20928,python):2020-05-07-23:02:40.122.271 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:586] DatabaseWriter] Init index db for shard: 0 successfully.
[INFO] ME(20928,python):2020-05-07-23:02:40.932.360 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:535] ExecuteTransaction] Insert 24596 rows to index db.
[INFO] ME(20928,python):2020-05-07-23:02:40.953.177 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:535] ExecuteTransa ction] Insert 404 rows to index db.
[INFO] ME(20928,python):2020-05-07-23:02:40.963.400 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:606] DatabaseWriter] Generate index db for shard: 0 successfully.
[INFO] ME(20928:139630558652224,MainProcess):2020-05-07-23:02:40.964.973 [mindspore/mindrecord/filewriter.py:313] The list of mindrecord files created are: ['output/aclImdb_train.mindrecord'], and the list of index files are: ['output/aclImdb_train.mindrecord.db']
>> begin generate mindrecord by test data
...
>> transformed 24576 record...
[INFO] ME(20928,python):2020-05-07-23:02:42.120.007 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 256 records successfully.
>> transformed 24832 record...
[INFO] ME(20928,python):2020-05-07-23:02:42.128.862 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 168 records successfully.
>> transformed 25000 record...
[INFO] ME(20928,python):2020-05-07-23:02:42.129.024 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:214] Commit] Write metadata successfully.
[INFO] ME(20928,python):2020-05-07-23:02:42.129.362 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:45] Build] Init header from mindrecord file for index successfully.
[INFO] ME(20928,python):2020-05-07-23:02:42.151.237 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:586] DatabaseWriter] Init index db for shard: 0 successfully.
[INFO] ME(20928,python):2020-05-07-23:02:42.935.496 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:535] ExecuteTransaction] Insert 25000 rows to index db.
[INFO] ME(20928,python):2020-05-07-23:02:42.949.319 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:606] DatabaseWriter] Generate index db for shard: 0 successfully.
[INFO] ME(20928:139630558652224,MainProcess):2020-05-07-23:02:42.951.794 [mindspore/mindrecord/filewriter.py:313] The list of mindrecord files created are: ['output/aclImdb_test.mindrecord'], and the list of index files are: ['output/aclImdb_test.mindrecord.db']
Generate mindrecord files
$ ls output/
aclImdb_test.mindrecord aclImdb_test.mindrecord.db aclImdb_train.mindrecord aclImdb_train.mindrecord.db README.md
Run the run_read.sh script.
bash run_read.sh
Output like this:
Caution: field "review" which is string type output is displayed in type uint8.
...
example 2056: {'label': array(1, dtype=int32), 'score': array(4, dtype=int32), 'id': array(5871, dtype=int32), 'review': array([ 70, 111, 114, ..., 111, 110, 46], dtype=uint8)}
example 2057: {'label': array(1, dtype=int32), 'score': array(1, dtype=int32), 'id': array(6092, dtype=int32), 'review': array([ 83, 111, 109, ..., 115, 101, 46], dtype=uint8)}
example 2058: {'label': array(1, dtype=int32), 'score': array(4, dtype=int32), 'id': array(1357, dtype=int32), 'review': array([ 42, 109, 97, ..., 58, 32, 67], dtype=uint8)}
...
Models of MindSpore
Python Shell Unity3D Asset C++ Markdown other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》