关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

Jiasen Lu 2618080c02 Update README.md		6 years ago
coco-caption	First commit	7 years ago

data	First commit	7 years ago

demo	add figure	7 years ago

image_model	adding image model download script	7 years ago

misc	fix gradient for BN layer	7 years ago

prepro	First commit	7 years ago

visu	add figure	7 years ago

LICENCE.md	Create LICENCE.md	7 years ago

README.md	Update README.md	6 years ago

demo.ipynb	update demo	7 years ago

eval_visulization.lua	update demo	7 years ago

train.lua	Update train.lua	7 years ago

README.md

AdaptiveAttention

AdaptiveAttention

Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"

Requirements

To train the model require GPU with 12GB Memory, if you do not have GPU, you can directly use the pretrained model for inference.

This code is written in Lua and requires Torch. The preprocssinng code is in Python, and you need to install NLTK if you want to use NLTK to tokenize the caption.

You also need to install the following package in order to sucessfully run the code.

Pretrained Model

The pre-trained model for COCO can be download here.
The pre-trained model for Flickr30K can be download here.

Vocabulary File

Download the corresponding Vocabulary file for COCO and Flickr30k

Download Dataset

The first thing you need to do is to download the data and do some preprocessing. Head over to the data/ folder and run the correspodning ipython script. It will download, preprocess and generate coco_raw.json.

Download COCO and Flickr30k image dataset, extract the image and put under somewhere.

training a new model on MS COCO

First train the Language model without finetune the image.

th train.lua -batch_size 20

When finetune the CNN, load the saved model and train for another 15~20 epoch.

th train.lua -batch_size 16 -startEpoch 21 -start_from 'model_id1_20.t7'

More Result about spatial attention and visual sentinel

For more visualization result, you can visit here
(it will load more than 1000 image and their result...)

Reference

If you use this code as part of any published research, please acknowledge the following paper

@misc{Lu2017Adaptive,
author = {Lu, Jiasen and Xiong, Caiming and Parikh, Devi and Socher, Richard},
title = {Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning},
journal = {CVPR},
year = {2017}
}

Acknowledgement

This code is developed based on NeuralTalk2.

Thanks Torch team and Facebook ResNet implementation.

License

BSD 3-Clause License

No Description

https://readpaper.com/paper/2952469094

openi-paper

Jupyter Notebook Lua Python other

echosenm@gmail.com

How to access data resources in code