关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

History

Kevin Clark d441560021 Fix cvt_text citation.		5 years ago
..
base	Added cvt_text model	5 years ago

corpus_processing	Added cvt_text model	5 years ago

model	Added cvt_text model	5 years ago

task_specific	Added cvt_text model	5 years ago

training	Added cvt_text model	5 years ago

README.md	Fix cvt_text citation.	5 years ago

__init__.py	Added cvt_text model	5 years ago

cvt.py	Added cvt_text model	5 years ago

fetch_data.sh	Added cvt_text model	5 years ago

preprocessing.py	Added cvt_text model	5 years ago

README.md

Cross-View Training

Cross-View Training

This repository contains code for Semi-Supervised Sequence Modeling with Cross-View Training. Currently sequence tagging and dependency parsing tasks are supported.

Requirements

This code has been run with TensorFlow 1.10.1 and Numpy 1.14.5; other versions may work, but have not been tested.

Fetching and Preprocessing Data

Run fetch_data.sh to download and extract pretrained GloVe vectors, the 1 Billion Word Language Model Benchmark corpus of unlabeled data, and the CoNLL-2000 text chunking dataset. Unfortunately the other datasets from our paper are not freely available and so can't be included in this repository.

To apply CVT to other datasets, the data should be placed in data/raw_data/<task_name>/(train|dev|test).txt. For sequence tagging data, each line should contain a word followed by a space followed by that word's tag. Sentences should be separated by empty lines. For dependency parsing, each tag should be of the form <index_of_head>-<relation> (e.g., 0-root).

After all of the data has been downloaded, run preprocessing.py.

Training a Model

Run python cvt.py --mode=train --model_name=chunking_model. By default this trains a model on the chunking data downloaded with fetch_data.sh. To change which task(s) are trained on or model hyperparameters, modify base/configure.py. Models are automatically checkpointed every 1000 steps; training will continue from the latest checkpoint if training is interrupted and restarted. Model checkpoints and other data such as dev set accuracy over time are stored in data/models/<model_name>.

Evaluating a Model

Run python cvt.py --mode=eval --model_name=chunking_model. A CVT model trained on the chunking data for 200k steps should get at least 97.1 F1 on the dev set and 96.6 F1 on the test set.

Citation

If you use this code for your publication, please cite the original paper:

@inproceedings{clark2018semi,
  title = {Semi-Supervised Sequence Modeling with Cross-View Training},
  author = {Kevin Clark and Minh-Thang Luong and Christopher D. Manning and Quoc V. Le},
  booktitle = {EMNLP},
  year = {2018}
}

Contact

Kevin Clark (@clarkkev).
Thang Luong (@lmthang).

No Description

https://readpaper.com/paper/2804047946

openi-paper

Python Jupyter Notebook Unity3D Asset Text C++ other

gardener@tensorflow.org hongkuny@google.com neal@nealwu.com markdaoust@google.com chendouble@google.com tobyboyd@google.com kaushikshiv@google.com yeqing@google.com sriharihumbarwadi97@gmail.com banna3vishnu@gmail.com arashwan@google.com fyangf@google.com syiming@umich.edu jaeyounkim@users.noreply.github.com lzc@google.com taylorrobie@google.com frederickliu@google.com allencwang@google.com vighneshb@google.com lukaszkaiser@users.noreply.github.com blamb@google.com rathodv@google.com derekjchow@gmail.com haoyuzhang@google.com saberkun@users.noreply.github.com

How to access data resources in code