看图问答(git-base-vqav2)

GIT (short for GenerativeImage2Text) model, base-sized version, fine-tuned on VQAv2.

It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision and Language by Wang et al. and first released in this repository.

GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of (image, text) pairs.

The goal for the model is simply to predict the next text token, giving the image tokens and previous text tokens.

The model has full access to (i.e. a bidirectional attention mask is used for) the image patch tokens, but only has access to the previous text tokens (i.e. a causal attention mask is used for the text tokens) when predicting the next text token.

This allows the model to be used for tasks like:

image and video captioning
visual question answering (VQA) on images and videos
even image classification (by simply conditioning the model on the image and asking it to generate a class for it in text).

模型来源： https://hf-mirror.com/microsoft/git-base-vqav2

模型应用开发和部署

模型服务化

本模型基于 ServiceBoot微服务引擎进行服务化封装，参见：《CubeAI模型开发指南》

直接源代码运行

$ sh pip-install-reqs.sh
$ serviceboot start
或
$ python3 run_model_server.py

本地容器化部署

一键式本地容器化部署和运行，参见：《CubeAI模型独立部署指南》或 CubeAI Docker Builder

云原生网络部署

本模型服务可一键发布至 CubeAI智立方平台进行共享和部署，参见：《CubeAI模型发布指南》

2.4 KiB Raw Permalink Blame History

看图问答(git-base-vqav2)

模型应用开发和部署

模型服务化

直接源代码运行

本地容器化部署

云原生网络部署

更多CubeAI模型服务，参见： 《CubeAI服务原生模型示范库》

2.4 KiB

Raw Permalink Blame History

更多CubeAI模型服务，参见：《CubeAI服务原生模型示范库》