书生·浦语大模型实战营之XTuner多模态训练与测试

XTuner多模态训练与测试
给LLM装上电子眼：多模态LLM原理简介
- 文本单模态
- 文本+图像多模态
电子眼：LLaVA方案简介
- LLaVA训练阶段示意图
- LLaVA测试阶段示意图
项目实践
- 环境准备
- XTuner安装
- 概述
- - Pretrain阶段
  - Finetune阶段
  - - 训练数据构建
    - 创建配置文件
    - 开始Finetune
  - 对比Finetune前后的性能差异
  - - Finetune前
  - Finetune后
大模型技术分享
《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座
Llama3关键技术深度解析与构建Responsible AI、算法及开发落地实战
解码Sora架构、技术及应用

在这里插入图片描述

XTuner多模态训练与测试

在本节课中，我们将学习使用XTuner微调多模态LLM的内容，本部分需要的GPU资源为24GB 30% 的 A100。

这是学完本节内容后的多模态LLM性能效果展示：

Finetune前的多模态LLM(InternLM_Chat_1.8B_llava)：只会给图像打标题

在这里插入图片描述

Finetune后的多模态LLM(InternLM_Chat_1.8B_llava)：会根据图像回答问题了

在这里插入图片描述

给LLM装上电子眼：多模态LLM原理简介

文本单模态

在这里插入图片描述

文本+图像多模态

在这里插入图片描述

电子眼：LLaVA方案简介

Haotian Liu等使用GPT-4V对图像数据生成描述，以此构建出大量 – 的数据对。利用这些数据对，配合文本单模态LLM，训练出一个Image Projector。

所使用的文本单模型LLM和训练出来的Image Projector，统称为LLaVA模型
LLaVA: Large Language and Vision Assistant

https://arxiv.org/pdf/2304.08485.pdf
在这里插入图片描述

LLaVA训练阶段示意图

在这里插入图片描述

LLaVA测试阶段示意图

在这里插入图片描述
Image Projector的训练和测试，有点类似 LoRA微调方案。

二者都是在已有LLM的基础上，用新的数据训练一个新的小文件。

LLM套上LoRA之后，有了新的灵魂（角色）；而LLM套上Image Projector之后，才有了眼睛

项目实践

https://llava-vl.github.io/
在这里插入图片描述

环境准备

使用 Cuda11.7-conda 镜像，然后在资源配置中，使用 30% A100 * 1 的选项
在这里插入图片描述

XTuner安装

# 如果你是在 InternStudio 平台，则从本地 clone 一个已有 pytorch 的环境：
# pytorch    2.0.1   py3.10_cuda11.7_cudnn8.5.0_0

cd ~ && studio-conda xtuner0.1.17
# 如果你是在其他平台：
# conda create --name xtuner0.1.17 python=3.10 -y

# 激活环境
conda activate xtuner0.1.17
# 进入家目录 （~的意思是 “当前用户的home路径”）
cd ~
# 创建版本文件夹并进入，以跟随本教程
mkdir -p /root/xtuner0117 && cd /root/xtuner0117

# 拉取 0.1.17 的版本源码
git clone -b v0.1.17  https://github.com/InternLM/xtuner
# 无法访问github的用户请从 gitee 拉取:
# git clone -b v0.1.15 https://gitee.com/Internlm/xtuner

# 进入源码目录
cd /root/xtuner0117/xtuner

# 从源码安装 XTuner
pip install -e '.[all]' && cd ~

概述

在本节中，将自己构造 – 数据对，基于InternLM2_Chat_1.8B这个文本单模态模型，使用LLaVA方案，训练一个给InternLM2_Chat_1.8B使用的Image Projector文件

LLaVA方案中，给LLM增加视觉能力的过程，即是训练Image Projector文件的过程。该过程分为2个阶段：Pretrain和Finetune。
在这里插入图片描述

Pretrain阶段

在Pretrain阶段，会使用大量的图片+简单文本（caption, 即图片标题）数据对，使LLM理解图像中的普遍特征。

Pretrain阶段训练完成后，此时的模型已经有视觉能力了！但是由于训练数据中都是图片+图片标题，所以此时的模型虽然有视觉能力，但无论用户问它什么，它都只会回答输入图片的标题。只会给输入图像“写标题”。
Pretrain阶段相当于是开发LLM时预训练工作，对硬件要求非常高，有8卡的学有余力同学可以自行尝试

NPROC_PER_NODE=8 xtuner train llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2

NPROC_PER_NODE=8 xtuner train llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune --deepspeed deepspeed_zero2

在这里插入图片描述
https://github.com/InternLM/xtuner/blob/main/docs/zh_cn/user_guides/dataset_prepare.md#llava-dataset

在本次实践训练营中，已准备了Pretrain阶段的“iter_2181.pth”文件，接下来进行Finetune操作即可。

Finetune阶段

在Finetune阶段，我们会使用图片+复杂文本数据对，来对Pretrain得到的Image Projector即iter_2181.pth进行进一步的训练。

训练数据构建

[
    {
        "id": "字符串",
        "image": "图片文件的相对位置。",
        "conversation": [
            {
                "from": "human",
                "value": "<image>\n第1个问题。"
            },
            {
                "from": "gpt",
                "value": "第1个回答"
            },
            {
                "from": "human",
                "value": "第2个问题。"
            },
            {
                "from": "gpt",
                "value": "第2个回答"
            },
            # ......
            {
                "from": "human",
                "value": "第n个问题。"
            },
            {
                "from": "gpt",
                "value": "第n个回答"
            },
        ]
    },

    # 下面是第2组训练数据了。

    {
        "id": "字符串",
        "image": "图片文件的相对位置。",
        "conversation": [
            {
                "from": "human",
                "value": "<image>\n第1个问题。"
            },
            # ......
            {
                "from": "gpt",
                "value": "第n个回答"
            }
        ]
    }
]

每组训练数据的第1个来自human的问题前，要加上图片占位符，即

示例：

[
  {
    "id": "000000033471",
    "image": "coco/train2017/000000033471.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat are the colors of the bus in the image?"
      },
      {
        "from": "gpt",
        "value": "The bus in the image is white and red."
      },
      {
        "from": "human",
        "value": "What feature can be seen on the back of the bus?"
      },
      {
        "from": "gpt",
        "value": "The back of the bus features an advertisement."
      },
      {
        "from": "human",
        "value": "Is the bus driving down the street or pulled off to the side?"
      },
      {
        "from": "gpt",
        "value": "The bus is driving down the street, which is crowded with people and other vehicles."
      }
    ]
  },
  {
    "id": "000000052846",
    "image": "coco/train2017/000000052846.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhere is the cat positioned in the image?"
      },
      {
        "from": "gpt",
        "value": "The cat is positioned on top of the back of the couch in the living room."
      }
    ]
  }
 ]

可以效法LLaVA作者的做法，将自己的图片发送给GPT，要求其按照上述格式生成若干条问答对

在这里插入图片描述

提示词如下：

Create a dataset for me, following this format.

[
  {
    "id": "<random_number_string>",
    "image": "test_img/oph.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nDescribe this image."
      },
      {
        "from": "gpt",
        "value": "<answer1>"
      },
      {
        "from": "human",
        "value": "<question2>"
      },
      {
        "from": "gpt",
        "value": "<answer2>"
      },
      {
        "from": "human",
        "value": "<question3>"
      },
      {
        "from": "gpt",
        "value": "<answer3>"
      }
    ]
  }
]

The questions and answers, please generate for me, based on the image I sent to you. Thes questions should be from the shallow to the deep, and the answers should be as detailed and correct as possible. The questions and answers should be stick to the contents in the image itself, like objects, peoples, equipment, environment, purpose, color, attitude, etc. 5 question and answer pairs.

为了方便大家跟随课程，针对这张示例图片的问答对数据（repeat_data.json），大家按照下面的脚本运行就可以生成啦~（重复200次）

cd ~ && git clone https://github.com/InternLM/tutorial -b camp2 && conda activate xtuner0.1.17 && cd tutorial

python /root/tutorial/xtuner/llava/llava_data/repeat.py \
  -i /root/tutorial/xtuner/llava/llava_data/unique_data.json \
  -o /root/tutorial/xtuner/llava/llava_data/repeated_data.json \
  -n 200

repeat.py 代码：

(xtuner0.1.17) root@intern-studio-061925:~/tutorial# cat  /root/tutorial/xtuner/llava/llava_data/repeat.py
import json
import argparse

def generate_new_json(input_file, output_file, repeat_times):
    # 加载原始JSON数据
    with open(input_file, 'r', encoding='utf-8') as f:
        data = json.load(f)

    # 重复数据
    new_data = data * repeat_times

    # 保存新的JSON数据到文件
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(new_data, f, ensure_ascii=False, indent=4)

def main():
    # 解析命令行参数
    parser = argparse.ArgumentParser(description='Generate a new JSON file based on the input file and repetition times.')
    parser.add_argument('-i', '--input', required=True, help='Input JSON file path')
    parser.add_argument('-o', '--output', required=True, help='Output JSON file path')
    parser.add_argument('-n', '--number', type=int, required=True, help='Number of times to repeat the data')
    args = parser.parse_args()

    # 生成新的JSON文件
    generate_new_json(args.input, args.output, args.number)

if __name__ == '__main__':
    main()

unique_data.json的内容：

[
    {
        "id": "4672496731",
        "image": "test_img/oph.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nDescribe this image."
            },
            {
                "from": "gpt",
                "value": "This is an image of an eye examination in progress. A man in a white lab coat, likely an optometrist, is attending to a female patient who is undergoing an eye test. She is using a phoropter, a device used to measure an individual's prescription for glasses or contact lenses. In the background, there's an eye chart with letters decreasing in size. The environment looks like a professional medical office, and both individuals seem focused on the task."
            },
            {
                "from": "human",
                "value": "Describe the objects and people in this image."
            },
            {
                "from": "gpt",
                "value": "In the image, there's an eye examination machine, likely a phoropter, operated by a male optometrist. A female patient is seated, engaging with the machine. The optometrist is wearing a white coat, indicating a medical professional. The environment looks clean and professional, typical of a clinical setting."
            },
            {
                "from": "human",
                "value": "What equipment is being used, and for what purpose?"
            },
            {
                "from": "gpt",
                "value": "The equipment in use is a phoropter, a common optometric device. It's used to measure the patient's refractive error and to determine an appropriate prescription for glasses or contact lenses."
            },
            {
                "from": "human",
                "value": "Describe the colors and lighting in the room."
            },
            {
                "from": "gpt",
                "value": "The room is brightly lit, which is essential for an accurate eye examination. The colors are muted, with white walls and a white coat worn by the optometrist, which conveys a sense of cleanliness and simplicity typical of medical environments."
            },
            {
                "from": "human",
                "value": "Can you comment on the patient's attitude?"
            },
            {
                "from": "gpt",
                "value": "The patient seems to be cooperating with the examination process, looking attentively into the eye examination machine. Her posture and focus suggest she is taking the procedure seriously."
            }
        ]
    },
    {
        "id": "123456789",
        "image": "test_img/oph.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nDescribe this image."
            },
            {
                "from": "gpt",
                "value": "This is a photograph of a patient undergoing an eye examination. A healthcare professional, possibly an optometrist, is using a slit lamp to examine the patient's eyes. The patient is seated, leaning into the machine which has a chin rest and a forehead support. In the background, there's an eye chart."
            },
            {
                "from": "human",
                "value": "What is the purpose of the examination in the image?"
            },
            {
                "from": "gpt",
                "value": "The examination is likely assessing the health of the patient's eyes and vision. The slit lamp allows the doctor to observe the structure and clarity of the eye's optical pathway, which can help diagnose conditions like cataracts or corneal injuries."
            },
            {
                "from": "human",
                "value": "How does the equipment work?"
            },
            {
                "from": "gpt",
                "value": "The slit lamp is an instrument with a high-intensity light source that can be focused to shine a thin sheet of light into the eye. It is used with a biomicroscope and facilitates the examination of the eye's anterior and posterior segments under magnification."
            },
            {
                "from": "human",
                "value": "Are there any risks associated with this procedure?"
            },
            {
                "from": "gpt",
                "value": "The procedure is generally safe and non-invasive. There might be some discomfort due to bright light, and it's important to avoid sudden movements during the exam, but there are no significant risks associated with the slit lamp examination."
            },
            {
                "from": "human",
                "value": "What would a patient experience during this examination?"
            },
            {
                "from": "gpt",
                "value": "During the slit lamp examination, a patient might experience bright light and the need to keep their eyes open. They might be asked to look in various directions, and while the light can be uncomfortable, the examination is usually quick and painless."
            }
        ]
    },
    {
        "id": "123456789",
        "image": "test_img/oph.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nDescribe this image."
            },
            {
                "from": "gpt",
                "value": "The image shows a medical professional, possibly an optometrist, using a phoropter to examine a patient's eyesight. The patient is sitting at the eye examination machine, looking through it. The optometrist is on the other side of the machine, likely adjusting the lenses and conducting the vision test. In the background, there is an eye chart with letters that decrease in size, used for testing visual acuity."
            },
            {
                "from": "human",
                "value": "What kind of equipment is the patient looking through?"
            },
            {
                "from": "gpt",
                "value": "The patient is looking through a phoropter, an ophthalmic testing device used to measure refractive errors and determine eyeglass prescriptions."
            },
            {
                "from": "human",
                "value": "What is the purpose of the equipment?"
            },
            {
                "from": "gpt",
                "value": "The purpose of the phoropter is to determine the correct prescription for eyeglasses or contact lenses. It measures how light is bent as it enters the patient's eye to identify issues like nearsightedness, farsightedness, astigmatism, and presbyopia."
            },
            {
                "from": "human",
                "value": "Can you describe the environment where this activity is taking place?"
            },
            {
                "from": "gpt",
                "value": "The environment appears to be a professional optometry clinic or an ophthalmologist's examination room, designed specifically for conducting eye exams. It's clean, well-lit, and equipped with the necessary medical instruments for vision testing."
            },
            {
                "from": "human",
                "value": "What can be inferred about the eye chart in the background?"
            },
            {
                "from": "gpt",
                "value": "The eye chart in the background is used to measure visual acuity. The descending size of letters allows optometrists to determine how well a patient can see at various distances. Its presence indicates that visual acuity tests are likely part of the examination process."
            }
        ]
    },
    {
        "id": "123456789",
        "image": "test_img/oph.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nWhat is the gender and approximate age of the patient?"
            },
            {
                "from": "gpt",
                "value": "The patient appears to be a female in her late twenties to thirties, based on her physical features and hairstyle."
            },
            {
                "from": "human",
                "value": "What type of clothing is the optometrist wearing?"
            },
            {
                "from": "gpt",
                "value": "The optometrist is wearing a professional white lab coat, which is typical attire for medical professionals, suggesting cleanliness and expertise."
            },
            {
                "from": "human",
                "value": "What is the facial expression of the optometrist?"
            },
            {
                "from": "gpt",
                "value": "The optometrist has a focused expression, indicative of concentration on the task at hand, likely assessing the patient's vision or the settings on the phoropter."
            },
            {
                "from": "human",
                "value": "Is there any branding or specific model visible on the equipment?"
            },
            {
                "from": "gpt",
                "value": "No specific branding or model details are visible on the equipment in the image, which focuses more on the interaction between patient and optometrist."
            },
            {
                "from": "human",
                "value": "How does the environment appear in terms of comfort and modernity?"
            },
            {
                "from": "gpt",
                "value": "The environment appears clean and functional, with modern medical equipment and a comfortable chair for the patient, suggesting a well-maintained and contemporary optometry practice."
            }
        ]
    },
    {
        "id": "123456789",
        "image": "test_img/oph.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nWhat equipment is the patient using?"
            },
            {
                "from": "gpt",
                "value": "The patient is using a phoropter, a specialized piece of eye examination equipment that measures refractive errors to determine the correct prescription for eyeglasses or contact lenses."
            },
            {
                "from": "human",
                "value": "What does the chart in the background represent?"
            },
            {
                "from": "gpt",
                "value": "The chart in the background is a Snellen chart, commonly used to measure visual acuity. It has several rows of letters that decrease in size, allowing the optometrist to determine how well the patient can see at various distances."
            },
            {
                "from": "human",
                "value": "Can you describe the environment where this examination is taking place?"
            },
            {
                "from": "gpt",
                "value": "The environment appears to be a clean and well-lit optometrist's office. It's equipped with professional medical tools for eye examinations, indicative of a setting where health and precision are priorities."
            },
            {
                "from": "human",
                "value": "What can you say about the optometrist's demeanor?"
            },
            {
                "from": "gpt",
                "value": "The optometrist's demeanor is professional and attentive. He appears to be carefully operating the phoropter and focusing on the patient's responses to ensure an accurate assessment of her vision."
            },
            {
                "from": "human",
                "value": "What objects are in the foreground of this image?"
            },
            {
                "from": "gpt",
                "value": "In the foreground of the image, there is an eye examination machine known as a phoropter, which is used for eye testing. There's also a support device that the patient is using to stabilize her head during the eye test."
            }
        ]
    },
    {
        "id": "123456789",
        "image": "test_img/oph.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nCan you describe the people in the image?"
            },
            {
                "from": "gpt",
                "value": "The image shows two individuals. One is a man in a white lab coat who appears to be an optometrist. He is focused on the equipment, likely adjusting it for the examination. The other is a woman with her chin resting on the machine's support, presumably the patient, undergoing an eye examination."
            },
            {
                "from": "human",
                "value": "What is the equipment being used for?"
            },
            {
                "from": "gpt",
                "value": "The equipment being used is for conducting an eye exam to measure the patient's vision and to check for refractive errors. The device can help the optometrist determine the correct prescription for eyeglasses or contact lenses."
            },
            {
                "from": "human",
                "value": "Describe the environment in which the people are."
            },
            {
                "from": "gpt",
                "value": "The environment looks like a professional optometrist's office, with a clean and clinical setting. In the background, there's an eye chart with letters of decreasing size, which is commonly used for testing visual acuity."
            },
            {
                "from": "human",
                "value": "What are the colors and the general mood in the image?"
            },
            {
                "from": "gpt",
                "value": "The image features a neutral color palette, with the white of the optometrist's coat standing out. The equipment is mostly black and silver, and the office walls are a light color. The mood seems calm and focused, indicative of a professional medical examination."
            },
            {
                "from": "human",
                "value": "What do the expressions and body language of the individuals suggest?"
            },
            {
                "from": "gpt",
                "value": "The optometrist's expression and body language suggest concentration and professionalism, while the patient seems to be calmly cooperating with the examination process. There's an air of attentiveness and care in the interaction."
            },
            {
                "from": "human",
                "value": "Can you infer anything about the purpose of the visit from the image?"
            },
            {
                "from": "gpt",
                "value": "The purpose of the visit is likely for a routine eye examination, which may include updating the patient’s eyeglass prescription, checking for vision problems, or assessing the overall health of the eyes."
            }
        ]
    }
]

创建配置文件

# 查询xtuner内置配置文件
xtuner list-cfg -p llava_internlm2_chat_1_8b

# 拷贝配置文件到当前目录
xtuner copy-cfg \
  llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune \
  /root/tutorial/xtuner/llava

在这里插入图片描述

/root/tutorial/xtuner/llava/目录下的文件结构

修改配置文件llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py

pretrained_pth
llm_name_or_path
visual_encoder_name_or_path
data_root
data_path
image_folder


(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# cat llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py
/# Copyright (c) OpenMMLab. All rights reserved.
import torch
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
                            LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import (AutoModelForCausalLM, AutoTokenizer,
                          BitsAndBytesConfig, CLIPImageProcessor,
                          CLIPVisionModel)

from xtuner.dataset import LLaVADataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.map_fns import llava_map_fn, template_map_fn_factory
from xtuner.dataset.samplers import LengthGroupedSampler
from xtuner.engine.hooks import DatasetInfoHook, EvaluateChatHook
from xtuner.engine.runner import TrainLoop
from xtuner.model import LLaVAModel
from xtuner.utils import PROMPT_TEMPLATE

#######################################################################
#                          PART 1  Settings                           #
#######################################################################
# Model
#llm_name_or_path = 'internlm/internlm2-chat-1_8b'
llm_name_or_path = '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b'


#visual_encoder_name_or_path = 'openai/clip-vit-large-patch14-336'
visual_encoder_name_or_path = '/root/share/new_models/openai/clip-vit-large-patch14-336'

# Specify the pretrained pth
#pretrained_pth = './work_dirs/llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain/iter_2181.pth'  # noqa: E501
pretrained_pth = '/root/share/new_models/xtuner/iter_2181.pth'

# Data
#data_root = './data/llava_data/'
data_root = '/root/tutorial/xtuner/llava/llava_data/'

#data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json'
data_path = data_root + 'repeated_data.json'



#image_folder = data_root + 'llava_images'
image_folder = data_root

prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = int(2048 - (336 / 14)**2)

# Scheduler & Optimizer
#batch_size = 16  # per_device
batch_size = 1  # per_device

accumulative_counts = 1
dataloader_num_workers = 0
max_epochs = 1
optim_type = AdamW
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1  # grad clip
warmup_ratio = 0.03

# Save
save_steps = 500
save_total_limit = 2  # Maximum checkpoints to keep (-1 means unlimited)

# Evaluate the generation performance during the training
evaluation_freq = 500
SYSTEM = ''
evaluation_images = 'https://llava-vl.github.io/static/images/view.jpg'
#evaluation_inputs = ['请描述一下这张照片', 'Please describe this picture']
evaluation_inputs = ['Please describe this picture','What is the equipment in the image?']



#######################################################################
#            PART 2  Model & Tokenizer & Image Processor              #
#######################################################################
tokenizer = dict(
    type=AutoTokenizer.from_pretrained,
    pretrained_model_name_or_path=llm_name_or_path,
    trust_remote_code=True,
    padding_side='right')

image_processor = dict(
    type=CLIPImageProcessor.from_pretrained,
    pretrained_model_name_or_path=visual_encoder_name_or_path,
    trust_remote_code=True)

model = dict(
    type=LLaVAModel,
    freeze_llm=True,
    freeze_visual_encoder=True,
    pretrained_pth=pretrained_pth,
    llm=dict(
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=llm_name_or_path,
        trust_remote_code=True,
        torch_dtype=torch.float16,
        quantization_config=dict(
            type=BitsAndBytesConfig,
            load_in_4bit=True,
            load_in_8bit=False,
            llm_int8_threshold=6.0,
            llm_int8_has_fp16_weight=False,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type='nf4')),
    llm_lora=dict(
        type=LoraConfig,
        r=512,
        lora_alpha=256,
        lora_dropout=0.05,
        bias='none',
        task_type='CAUSAL_LM'),
    visual_encoder=dict(
        type=CLIPVisionModel.from_pretrained,
        pretrained_model_name_or_path=visual_encoder_name_or_path),
    visual_encoder_lora=dict(
        type=LoraConfig, r=64, lora_alpha=16, lora_dropout=0.05, bias='none'))

#######################################################################
#                      PART 3  Dataset & Dataloader                   #
#######################################################################
llava_dataset = dict(
    type=LLaVADataset,
    data_path=data_path,
    image_folder=image_folder,
    tokenizer=tokenizer,
    image_processor=image_processor,
    dataset_map_fn=llava_map_fn,
    template_map_fn=dict(
        type=template_map_fn_factory, template=prompt_template),
    max_length=max_length,
    pad_image_to_square=True)

train_dataloader = dict(
    batch_size=batch_size,
    num_workers=dataloader_num_workers,
    dataset=llava_dataset,
    sampler=dict(
        type=LengthGroupedSampler,
        length_property='modality_length',
        per_device_batch_size=batch_size * accumulative_counts),
    collate_fn=dict(type=default_collate_fn))

#######################################################################
#                    PART 4  Scheduler & Optimizer                    #
#######################################################################
# optimizer
optim_wrapper = dict(
    type=AmpOptimWrapper,
    optimizer=dict(
        type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
    clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
    accumulative_counts=accumulative_counts,
    loss_scale='dynamic',
    dtype='float16')

# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md  # noqa: E501
param_scheduler = [
    dict(
        type=LinearLR,
        start_factor=1e-5,
        by_epoch=True,
        begin=0,
        end=warmup_ratio * max_epochs,
        convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
        begin=warmup_ratio * max_epochs,
        end=max_epochs,
        convert_to_iter_based=True)
]

# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)

#######################################################################
#                           PART 5  Runtime                           #
#######################################################################
# Log the dialogue periodically during the training process, optional
custom_hooks = [
    dict(type=DatasetInfoHook, tokenizer=tokenizer),
    dict(
        type=EvaluateChatHook,
        tokenizer=tokenizer,
        image_processor=image_processor,
        every_n_iters=evaluation_freq,
        evaluation_inputs=evaluation_inputs,
        evaluation_images=evaluation_images,
        system=SYSTEM,
        prompt_template=prompt_template)
]

# configure default hooks
default_hooks = dict(
    # record the time of every iteration.
    timer=dict(type=IterTimerHook),
    # print log every 10 iterations.
    logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
    # enable the parameter scheduler.
    param_scheduler=dict(type=ParamSchedulerHook),
    # save checkpoint per `save_steps`.
    checkpoint=dict(
        type=CheckpointHook,
        by_epoch=False,
        interval=save_steps,
        max_keep_ckpts=save_total_limit),
    # set sampler seed in distributed evrionment.
    sampler_seed=dict(type=DistSamplerSeedHook),
)

# configure environment
env_cfg = dict(
    # whether to enable cudnn benchmark
    cudnn_benchmark=False,
    # set multi process parameters
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    # set distributed parameters
    dist_cfg=dict(backend='nccl'),
)

# set visualizer
visualizer = None

# set log level
log_level = 'INFO'

# load from which checkpoint
load_from = None

# whether to resume training from the loaded checkpoint
resume = False

# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)

# set log processor
log_processor = dict(by_epoch=False)

开始Finetune

cd /root/tutorial/xtuner/llava/
xtuner train /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py --deepspeed deepspeed_zero2

运行日志为：

(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# xtuner train /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py --deepspeed deepspeed_zero2
[2024-04-16 10:26:11,946] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-16 10:26:23,510] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
04/16 10:26:32 - mmengine - INFO -
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 2048456403
    GPU 0: NVIDIA A100-SXM4-80GB
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.7, V11.7.99
    GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
    PyTorch: 2.0.1
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

    TorchVision: 0.15.2
    OpenCV: 4.9.0
    MMEngine: 0.10.3

Runtime environment:
    launcher: none
    randomness: {'seed': None, 'deterministic': False}
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: None
    deterministic: False
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

04/16 10:26:32 - mmengine - INFO - Config:
SYSTEM = ''
accumulative_counts = 1
batch_size = 1
betas = (
    0.9,
    0.999,
)
custom_hooks = [
    dict(
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path=
            '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.engine.hooks.DatasetInfoHook'),
    dict(
        evaluation_images='https://llava-vl.github.io/static/images/view.jpg',
        evaluation_inputs=[
            'Please describe this picture',
            'What is the equipment in the image?',
        ],
        every_n_iters=500,
        image_processor=dict(
            pretrained_model_name_or_path=
            '/root/share/new_models/openai/clip-vit-large-patch14-336',
            trust_remote_code=True,
            type='transformers.CLIPImageProcessor.from_pretrained'),
        prompt_template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat',
        system='',
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path=
            '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.engine.hooks.EvaluateChatHook'),
]
data_path = '/root/tutorial/xtuner/llava/llava_data/repeated_data.json'
data_root = '/root/tutorial/xtuner/llava/llava_data/'
dataloader_num_workers = 0
default_hooks = dict(
    checkpoint=dict(
        by_epoch=False,
        interval=500,
        max_keep_ckpts=2,
        type='mmengine.hooks.CheckpointHook'),
    logger=dict(
        interval=10,
        log_metric_by_epoch=False,
        type='mmengine.hooks.LoggerHook'),
    param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
    sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
    timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 500
evaluation_images = 'https://llava-vl.github.io/static/images/view.jpg'
evaluation_inputs = [
    'Please describe this picture',
    'What is the equipment in the image?',
]
image_folder = '/root/tutorial/xtuner/llava/llava_data/'
image_processor = dict(
    pretrained_model_name_or_path=
    '/root/share/new_models/openai/clip-vit-large-patch14-336',
    trust_remote_code=True,
    type='transformers.CLIPImageProcessor.from_pretrained')
launcher = 'none'
llava_dataset = dict(
    data_path='/root/tutorial/xtuner/llava/llava_data/repeated_data.json',
    dataset_map_fn='xtuner.dataset.map_fns.llava_map_fn',
    image_folder='/root/tutorial/xtuner/llava/llava_data/',
    image_processor=dict(
        pretrained_model_name_or_path=
        '/root/share/new_models/openai/clip-vit-large-patch14-336',
        trust_remote_code=True,
        type='transformers.CLIPImageProcessor.from_pretrained'),
    max_length=1472,
    pad_image_to_square=True,
    template_map_fn=dict(
        template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat',
        type='xtuner.dataset.map_fns.template_map_fn_factory'),
    tokenizer=dict(
        padding_side='right',
        pretrained_model_name_or_path=
        '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
        trust_remote_code=True,
        type='transformers.AutoTokenizer.from_pretrained'),
    type='xtuner.dataset.LLaVADataset')
llm_name_or_path = '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
lr = 0.0002
max_epochs = 1
max_length = 1472
max_norm = 1
model = dict(
    freeze_llm=True,
    freeze_visual_encoder=True,
    llm=dict(
        pretrained_model_name_or_path=
        '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
        quantization_config=dict(
            bnb_4bit_compute_dtype='torch.float16',
            bnb_4bit_quant_type='nf4',
            bnb_4bit_use_double_quant=True,
            llm_int8_has_fp16_weight=False,
            llm_int8_threshold=6.0,
            load_in_4bit=True,
            load_in_8bit=False,
            type='transformers.BitsAndBytesConfig'),
        torch_dtype='torch.float16',
        trust_remote_code=True,
        type='transformers.AutoModelForCausalLM.from_pretrained'),
    llm_lora=dict(
        bias='none',
        lora_alpha=256,
        lora_dropout=0.05,
        r=512,
        task_type='CAUSAL_LM',
        type='peft.LoraConfig'),
    pretrained_pth='/root/share/new_models/xtuner/iter_2181.pth',
    type='xtuner.model.LLaVAModel',
    visual_encoder=dict(
        pretrained_model_name_or_path=
        '/root/share/new_models/openai/clip-vit-large-patch14-336',
        type='transformers.CLIPVisionModel.from_pretrained'),
    visual_encoder_lora=dict(
        bias='none',
        lora_alpha=16,
        lora_dropout=0.05,
        r=64,
        type='peft.LoraConfig'))
optim_type = 'torch.optim.AdamW'
optim_wrapper = dict(
    optimizer=dict(
        betas=(
            0.9,
            0.999,
        ),
        lr=0.0002,
        type='torch.optim.AdamW',
        weight_decay=0),
    type='DeepSpeedOptimWrapper')
param_scheduler = [
    dict(
        begin=0,
        by_epoch=True,
        convert_to_iter_based=True,
        end=0.03,
        start_factor=1e-05,
        type='mmengine.optim.LinearLR'),
    dict(
        begin=0.03,
        by_epoch=True,
        convert_to_iter_based=True,
        end=1,
        eta_min=0.0,
        type='mmengine.optim.CosineAnnealingLR'),
]
pretrained_pth = '/root/share/new_models/xtuner/iter_2181.pth'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.internlm2_chat'
randomness = dict(deterministic=False, seed=None)
resume = False
runner_type = 'FlexibleRunner'
save_steps = 500
save_total_limit = 2
strategy = dict(
    config=dict(
        bf16=dict(enabled=True),
        fp16=dict(enabled=False, initial_scale_power=16),
        gradient_accumulation_steps='auto',
        gradient_clipping='auto',
        train_micro_batch_size_per_gpu='auto',
        zero_allow_untested_optimizer=True,
        zero_force_ds_cpu_optimizer=False,
        zero_optimization=dict(overlap_comm=True, stage=2)),
    exclude_frozen_parameters=True,
    gradient_accumulation_steps=1,
    gradient_clipping=1,
    sequence_parallel_size=1,
    train_micro_batch_size_per_gpu=1,
    type='xtuner.engine.DeepSpeedStrategy')
tokenizer = dict(
    padding_side='right',
    pretrained_model_name_or_path=
    '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
    trust_remote_code=True,
    type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(max_epochs=1, type='xtuner.engine.runner.TrainLoop')
train_dataloader = dict(
    batch_size=1,
    collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'),
    dataset=dict(
        data_path='/root/tutorial/xtuner/llava/llava_data/repeated_data.json',
        dataset_map_fn='xtuner.dataset.map_fns.llava_map_fn',
        image_folder='/root/tutorial/xtuner/llava/llava_data/',
        image_processor=dict(
            pretrained_model_name_or_path=
            '/root/share/new_models/openai/clip-vit-large-patch14-336',
            trust_remote_code=True,
            type='transformers.CLIPImageProcessor.from_pretrained'),
        max_length=1472,
        pad_image_to_square=True,
        template_map_fn=dict(
            template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat',
            type='xtuner.dataset.map_fns.template_map_fn_factory'),
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path=
            '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.dataset.LLaVADataset'),
    num_workers=0,
    sampler=dict(
        length_property='modality_length',
        per_device_batch_size=1,
        type='xtuner.dataset.samplers.LengthGroupedSampler'))
visual_encoder_name_or_path = '/root/share/new_models/openai/clip-vit-large-patch14-336'
visualizer = None
warmup_ratio = 0.03
weight_decay = 0
work_dir = './work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy'

04/16 10:26:32 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
04/16 10:26:40 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
 --------------------
before_train:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(NORMAL      ) DatasetInfoHook
(LOW         ) EvaluateChatHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(NORMAL      ) DistSamplerSeedHook
 --------------------
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
 --------------------
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(LOW         ) EvaluateChatHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_train_epoch:
(NORMAL      ) IterTimerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_val:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) DatasetInfoHook
 --------------------
before_val_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_val_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_val_iter:
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_val:
(VERY_HIGH   ) RuntimeInfoHook
(LOW         ) EvaluateChatHook
 --------------------
after_train:
(VERY_HIGH   ) RuntimeInfoHook
(LOW         ) EvaluateChatHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_test:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) DatasetInfoHook
 --------------------
before_test_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_test_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_test_iter:
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
after_run:
(BELOW_NORMAL) LoggerHook
 --------------------
 --------------------
Map (num_proc=32): 100%|████████████████████████████████████████████████████████| 1200/1200 [00:01<00:00, 972.12 examples/s]
Map (num_proc=32): 100%|███████████████████████████████████████████████████████| 1200/1200 [00:00<00:00, 1436.19 examples/s]
Filter (num_proc=32): 100%|████████████████████████████████████████████████████| 1200/1200 [00:00<00:00, 2407.56 examples/s]
Map (num_proc=32): 100%|████████████████████████████████████████████████████████| 1200/1200 [00:07<00:00, 154.36 examples/s]
Filter (num_proc=32): 100%|████████████████████████████████████████████████████| 1200/1200 [00:00<00:00, 1471.60 examples/s]
Map (num_proc=32): 100%|████████████████████████████████████████████████████████| 1200/1200 [00:06<00:00, 193.98 examples/s]
04/16 10:27:03 - mmengine - WARNING - Dataset LLaVADataset has no metainfo. ``dataset_meta`` in visualizer will be None.
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:50<00:00, 25.19s/it]
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
04/16 10:28:09 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - dispatch internlm2 attn forward
04/16 10:28:09 - mmengine - INFO - replace internlm2 rope
04/16 10:28:09 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:10 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:11 - mmengine - INFO - replace internlm2 rope
04/16 10:28:12 - mmengine - INFO - replace internlm2 rope
04/16 10:28:12 - mmengine - INFO - replace internlm2 rope
04/16 10:28:12 - mmengine - INFO - replace internlm2 rope
04/16 10:28:12 - mmengine - INFO - replace internlm2 rope
04/16 10:28:12 - mmengine - INFO - replace internlm2 rope
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
Load pretrained weight from /root/share/new_models/xtuner/iter_2181.pth
[2024-04-16 10:28:53,489] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown
[2024-04-16 10:28:53,489] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-16 10:28:53,489] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-04-16 10:28:53,765] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.235.222, master_port=29500
[2024-04-16 10:28:53,765] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-04-16 10:28:58,718] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-04-16 10:28:58,726] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-04-16 10:28:58,726] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-04-16 10:28:58,849] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2024-04-16 10:28:58,849] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2024-04-16 10:28:58,849] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-04-16 10:28:58,850] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 500,000,000
[2024-04-16 10:28:58,850] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 500,000,000
[2024-04-16 10:28:58,850] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False
[2024-04-16 10:28:58,850] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False
[2024-04-16 10:29:03,457] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states
[2024-04-16 10:29:03,457] [INFO] [utils.py:801:see_memory_usage] MA 5.27 GB         Max_MA 6.37 GB         CA 7.1 GB         Max_CA 7 GB
[2024-04-16 10:29:03,458] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 84.13 GB, percent = 4.2%
[2024-04-16 10:29:03,614] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states
[2024-04-16 10:29:03,614] [INFO] [utils.py:801:see_memory_usage] MA 5.27 GB         Max_MA 7.46 GB         CA 9.28 GB         Max_CA 9 GB
[2024-04-16 10:29:03,615] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 84.15 GB, percent = 4.2%
[2024-04-16 10:29:03,615] [INFO] [stage_1_and_2.py:539:__init__] optimizer state initialized
[2024-04-16 10:29:03,748] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer
[2024-04-16 10:29:03,749] [INFO] [utils.py:801:see_memory_usage] MA 5.27 GB         Max_MA 5.27 GB         CA 9.28 GB         Max_CA 9 GB
[2024-04-16 10:29:03,749] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 84.16 GB, percent = 4.2%
[2024-04-16 10:29:03,758] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2024-04-16 10:29:03,758] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-04-16 10:29:03,758] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-04-16 10:29:03,758] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002], mom=[(0.9, 0.999)]
[2024-04-16 10:29:03,763] [INFO] [config.py:996:print] DeepSpeedEngine configuration:
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   activation_checkpointing_config  {
    "partition_activations": false,
    "contiguous_memory_optimization": false,
    "cpu_checkpointing": false,
    "number_checkpoints": null,
    "synchronize_checkpoint_boundary": false,
    "profile": false
}
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   amp_enabled .................. False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   amp_params ................... False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   autotuning_config ............ {
    "enabled": false,
    "start_step": null,
    "end_step": null,
    "metric_path": null,
    "arg_mappings": null,
    "metric": "throughput",
    "model_info": null,
    "results_dir": "autotuning_results",
    "exps_dir": "autotuning_exps",
    "overwrite": true,
    "fast": true,
    "start_profile_step": 3,
    "end_profile_step": 5,
    "tuner_type": "gridsearch",
    "tuner_early_stopping": 5,
    "tuner_num_trials": 50,
    "model_info_path": null,
    "mp_size": 1,
    "max_train_batch_size": null,
    "min_train_batch_size": 1,
    "max_train_micro_batch_size_per_gpu": 1.024000e+03,
    "min_train_micro_batch_size_per_gpu": 1,
    "num_tuning_micro_batch_sizes": 3
}
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   bfloat16_enabled ............. True
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   bfloat16_immediate_grad_update  False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   checkpoint_parallel_write_pipeline  False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   checkpoint_tag_validation_enabled  True
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   checkpoint_tag_validation_fail  False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fc986e1ca30>
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   communication_data_type ...... None
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   compile_config ............... enabled=False backend='inductor' kwargs={}
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   curriculum_enabled_legacy .... False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   curriculum_params_legacy ..... False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   data_efficiency_enabled ...... False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   dataloader_drop_last ......... False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   disable_allgather ............ False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   dump_state ................... False
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   dynamic_loss_scale_args ...... None
[2024-04-16 10:29:03,763] [INFO] [config.py:1000:print]   eigenvalue_enabled ........... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   eigenvalue_gas_boundary_resolution  1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   eigenvalue_layer_num ......... 0
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   eigenvalue_max_iter .......... 100
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   eigenvalue_stability ......... 1e-06
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   eigenvalue_tol ............... 0.01
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   eigenvalue_verbose ........... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   elasticity_enabled ........... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   flops_profiler_config ........ {
    "enabled": false,
    "recompute_fwd_factor": 0.0,
    "profile_step": 1,
    "module_depth": -1,
    "top_modules": 1,
    "detailed": true,
    "output_file": null
}
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   fp16_auto_cast ............... None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   fp16_enabled ................. False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   fp16_master_weights_and_gradients  False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   global_rank .................. 0
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   grad_accum_dtype ............. None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   gradient_accumulation_steps .. 1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   gradient_clipping ............ 1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   gradient_predivide_factor .... 1.0
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   graph_harvesting ............. False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   initial_dynamic_scale ........ 1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   load_universal_checkpoint .... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   loss_scale ................... 1.0
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   memory_breakdown ............. False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   mics_hierarchial_params_gather  False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   mics_shard_size .............. -1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   nebula_config ................ {
    "enabled": false,
    "persistent_storage_path": null,
    "persistent_time_interval": 100,
    "num_of_version_in_retention": 2,
    "enable_nebula_load": true,
    "load_path": null
}
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   optimizer_legacy_fusion ...... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   optimizer_name ............... None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   optimizer_params ............. None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   pld_enabled .................. False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   pld_params ................... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   prescale_gradients ........... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   scheduler_name ............... None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   scheduler_params ............. None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   seq_parallel_communication_data_type  torch.float32
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   sparse_attention ............. None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   sparse_gradients_enabled ..... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   steps_per_print .............. 10000000000000
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   train_batch_size ............. 1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   train_micro_batch_size_per_gpu  1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   use_data_before_expert_parallel_  False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   use_node_local_storage ....... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   wall_clock_breakdown ......... False
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   weight_quantization_config ... None
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   world_size ................... 1
[2024-04-16 10:29:03,764] [INFO] [config.py:1000:print]   zero_allow_untested_optimizer  True
[2024-04-16 10:29:03,765] [INFO] [config.py:1000:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-04-16 10:29:03,765] [INFO] [config.py:1000:print]   zero_enabled ................. True
[2024-04-16 10:29:03,765] [INFO] [config.py:1000:print]   zero_force_ds_cpu_optimizer .. False
[2024-04-16 10:29:03,765] [INFO] [config.py:1000:print]   zero_optimization_stage ...... 2
[2024-04-16 10:29:03,765] [INFO] [config.py:986:print_user_config]   json = {
    "gradient_accumulation_steps": 1,
    "train_micro_batch_size_per_gpu": 1,
    "gradient_clipping": 1,
    "zero_allow_untested_optimizer": true,
    "zero_force_ds_cpu_optimizer": false,
    "zero_optimization": {
        "stage": 2,
        "overlap_comm": true
    },
    "fp16": {
        "enabled": false,
        "initial_scale_power": 16
    },
    "bf16": {
        "enabled": true
    },
    "steps_per_print": 1.000000e+13
}
04/16 10:29:03 - mmengine - INFO - Num train samples 1200
04/16 10:29:03 - mmengine - INFO - train example:
04/16 10:29:03 - mmengine - INFO - <s><|im_start|>user
<image>
Describe this image.<|im_end|>
<|im_start|>assistant
This is an image of an eye examination in progress. A man in a white lab coat, likely an optometrist, is attending to a female patient who is undergoing an eye test. She is using a phoropter, a device used to measure an individual's prescription for glasses or contact lenses. In the background, there's an eye chart with letters decreasing in size. The environment looks like a professional medical office, and both individuals seem focused on the task.<|im_end|>
<|im_start|>user
Describe the objects and people in this image.<|im_end|>
<|im_start|>assistant
In the image, there's an eye examination machine, likely a phoropter, operated by a male optometrist. A female patient is seated, engaging with the machine. The optometrist is wearing a white coat, indicating a medical professional. The environment looks clean and professional, typical of a clinical setting.<|im_end|>
<|im_start|>user
What equipment is being used, and for what purpose?<|im_end|>
<|im_start|>assistant
The equipment in use is a phoropter, a common optometric device. It's used to measure the patient's refractive error and to determine an appropriate prescription for glasses or contact lenses.<|im_end|>
<|im_start|>user
Describe the colors and lighting in the room.<|im_end|>
<|im_start|>assistant
The room is brightly lit, which is essential for an accurate eye examination. The colors are muted, with white walls and a white coat worn by the optometrist, which conveys a sense of cleanliness and simplicity typical of medical environments.<|im_end|>
<|im_start|>user
Can you comment on the patient's attitude?<|im_end|>
<|im_start|>assistant
The patient seems to be cooperating with the examination process, looking attentively into the eye examination machine. Her posture and focus suggest she is taking the procedure seriously.<|im_end|>

04/16 10:29:03 - mmengine - INFO - before_train in EvaluateChatHook.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
04/16 10:29:17 - mmengine - INFO - Sample output:
<|im_start|>user
<image>
Please describe this picture<|im_end|>
<|im_start|>assistant
a wooden dock in the middle of a lake with a boat on it<|im_end|>

04/16 10:29:19 - mmengine - INFO - Sample output:
<|im_start|>user
<image>
What is the equipment in the image?<|im_end|>
<|im_start|>assistant
a wooden dock in the lake with a boat on it<|im_end|>

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
04/16 10:29:19 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
04/16 10:29:19 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
04/16 10:29:19 - mmengine - INFO - Checkpoints will be saved to /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.

04/16 10:29:19 - mmengine - INFO - Checkpoints will be saved to /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.
04/16 10:29:44 - mmengine - INFO - Iter(train) [  10/1200]  lr: 5.1430e-05  eta: 0:48:55  time: 2.4668  data_time: 0.0197  memory: 16620  loss: 1.2818
04/16 10:30:06 - mmengine - INFO - Iter(train) [  20/1200]  lr: 1.0857e-04  eta: 0:46:22  time: 2.2496  data_time: 0.0201  memory: 16620  loss: 0.7928
04/16 10:30:25 - mmengine - INFO - Iter(train) [  30/1200]  lr: 1.6571e-04  eta: 0:42:41  time: 1.8507  data_time: 0.0243  memory: 16620  loss: 0.5194
04/16 10:30:42 - mmengine - INFO - Iter(train) [  40/1200]  lr: 2.0000e-04  eta: 0:39:58  time: 1.7036  data_time: 0.0226  memory: 16620  loss: 0.4638
04/16 10:30:58 - mmengine - INFO - Iter(train) [  50/1200]  lr: 1.9994e-04  eta: 0:37:54  time: 1.6181  data_time: 0.0216  memory: 16619  loss: 0.5079
04/16 10:31:14 - mmengine - INFO - Iter(train) [  60/1200]  lr: 1.9981e-04  eta: 0:36:29  time: 1.6374  data_time: 0.0208  memory: 16620  loss: 0.1340
04/16 10:31:30 - mmengine - INFO - Iter(train) [  70/1200]  lr: 1.9960e-04  eta: 0:35:17  time: 1.5889  data_time: 0.0240  memory: 16620  loss: 0.2321
04/16 10:31:46 - mmengine - INFO - Iter(train) [  80/1200]  lr: 1.9933e-04  eta: 0:34:11  time: 1.5396  data_time: 0.0216  memory: 16620  loss: 0.0642
04/16 10:32:00 - mmengine - INFO - Iter(train) [  90/1200]  lr: 1.9898e-04  eta: 0:33:11  time: 1.4900  data_time: 0.0231  memory: 16620  loss: 0.1266
04/16 10:32:15 - mmengine - INFO - Iter(train) [ 100/1200]  lr: 1.9856e-04  eta: 0:32:13  time: 1.4293  data_time: 0.0198  memory: 16620  loss: 0.0730
04/16 10:32:30 - mmengine - INFO - Iter(train) [ 110/1200]  lr: 1.9807e-04  eta: 0:31:30  time: 1.4995  data_time: 0.0254  memory: 16620  loss: 0.0651
04/16 10:32:45 - mmengine - INFO - Iter(train) [ 120/1200]  lr: 1.9750e-04  eta: 0:30:50  time: 1.4880  data_time: 0.0231  memory: 16620  loss: 0.2211
04/16 10:32:59 - mmengine - INFO - Iter(train) [ 130/1200]  lr: 1.9687e-04  eta: 0:30:11  time: 1.4467  data_time: 0.0220  memory: 16620  loss: 0.0437
04/16 10:33:13 - mmengine - INFO - Iter(train) [ 140/1200]  lr: 1.9616e-04  eta: 0:29:33  time: 1.4143  data_time: 0.0219  memory: 16620  loss: 0.0761
04/16 10:33:27 - mmengine - INFO - Iter(train) [ 150/1200]  lr: 1.9539e-04  eta: 0:28:56  time: 1.3872  data_time: 0.0221  memory: 16620  loss: 0.0642
04/16 10:33:42 - mmengine - INFO - Iter(train) [ 160/1200]  lr: 1.9454e-04  eta: 0:28:29  time: 1.4894  data_time: 0.0234  memory: 16620  loss: 0.1415
04/16 10:33:56 - mmengine - INFO - Iter(train) [ 170/1200]  lr: 1.9363e-04  eta: 0:27:59  time: 1.4240  data_time: 0.0212  memory: 16620  loss: 0.0980
04/16 10:34:11 - mmengine - INFO - Iter(train) [ 180/1200]  lr: 1.9264e-04  eta: 0:27:33  time: 1.4620  data_time: 0.0219  memory: 16620  loss: 0.9051
04/16 10:34:25 - mmengine - INFO - Iter(train) [ 190/1200]  lr: 1.9159e-04  eta: 0:27:07  time: 1.4320  data_time: 0.0210  memory: 16620  loss: 0.1338
04/16 10:34:39 - mmengine - INFO - Iter(train) [ 200/1200]  lr: 1.9048e-04  eta: 0:26:40  time: 1.3987  data_time: 0.0232  memory: 16620  loss: 0.0191
04/16 10:34:54 - mmengine - INFO - Iter(train) [ 210/1200]  lr: 1.8930e-04  eta: 0:26:20  time: 1.5031  data_time: 0.0241  memory: 16620  loss: 0.2276
04/16 10:35:08 - mmengine - INFO - Iter(train) [ 220/1200]  lr: 1.8805e-04  eta: 0:25:56  time: 1.4197  data_time: 0.0195  memory: 16620  loss: 0.0439
04/16 10:35:23 - mmengine - INFO - Iter(train) [ 230/1200]  lr: 1.8674e-04  eta: 0:25:33  time: 1.4285  data_time: 0.0232  memory: 16620  loss: 0.0267
。。。。

04/16 10:41:45 - mmengine - INFO - after_train_iter in EvaluateChatHook.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpoi         nting_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need          to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
04/16 10:41:50 - mmengine - INFO - Sample output:
<|im_start|>user
<image>
Please describe this picture<|im_end|>
<|im_start|>assistant
This is a photograph of a patient undergoing an eye examination. A healthcare professional, possibly an optometrist, is usin         g a slit lamp to examine the patient's eyes. The patient is seated, leaning into the machine which has a chin rest and a for         ehead support. In the background, there's an eye chart.<|im_end|>

04/16 10:41:54 - mmengine - INFO - Sample output:
<|im_start|>user
<image>
What is the equipment in the image?<|im_end|>
<|im_start|>assistant
The equipment in the image is a phoropter, a specialized piece of eye examination equipment that measures the patient's eyes          and allows the optometrist to determine the correct prescription for glasses or contact lenses.<|im_end|>

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpoi         nting_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need          to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
04/16 10:41:54 - mmengine - INFO - Saving checkpoint at 500 iterations
[2024-04-16 10:41:55,119] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint iter_500.pth is about to be saved!
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args ar         e being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Mod         ule.state_dict for details.
  warnings.warn(
[2024-04-16 10:41:55,302] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /root/tutorial/xtuner/llava/work         _dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_500.pth/mp_rank_00_model_states         .pt
[2024-04-16 10:41:55,302] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/tutorial/xtuner/llava/work_dirs/l         lava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_500.pth/mp_rank_00_model_states.pt...
[2024-04-16 10:41:56,488] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/tutorial/xtuner/llava/work_dirs/ll         ava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_500.pth/mp_rank_00_model_states.pt.
[2024-04-16 10:41:56,489] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /root/tutorial/xtuner/llava/work_dirs/l         lava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_500.pth/bf16_zero_pp_rank_0_mp_rank_00         _optim_states.pt...
[2024-04-16 10:42:01,836] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /root/tutorial/xtuner/llava/work_dirs/ll         ava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_500.pth/bf16_zero_pp_rank_0_mp_rank_00_         optim_states.pt.
[2024-04-16 10:42:01,854] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved /root/tutorial/xtuner/llava/wo         rk_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_500.pth/bf16_zero_pp_rank_0_m         p_rank_00_optim_states.pt
[2024-04-16 10:42:01,854] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint iter_500.pth is ready now!
04/16 10:42:17 - mmengine - INFO - Iter(train) [ 510/1200]  lr: 1.2900e-04  eta: 0:17                                                :33  time: 3.2612  data_time: 1.6803  memory: 16620  loss: 0.0486


  _internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-04-16 10:59:31,597] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint iter_1200.pth is ready now!
04/16 10:59:31 - mmengine - INFO - after_train in EvaluateChatHook.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
04/16 10:59:37 - mmengine - INFO - Sample output:
<|im_start|>user
<image>
Please describe this picture<|im_end|>
<|im_start|>assistant
This is a photograph of a patient undergoing an eye examination. A healthcare professional, possibly an optometrist, is using a slit lamp to examine the patient's eyes. The patient is seated, leaning into the machine which has a chin rest and a forehead support. In the background, there's an eye chart.<|im_end|>

04/16 10:59:40 - mmengine - INFO - Sample output:
<|im_start|>user
<image>
What is the equipment in the image?<|im_end|>
<|im_start|>assistant
The equipment in the image is a phoropter, a specialized piece of eye examination equipment that measures refractive errors to determine the correct prescription for eyeglasses or contact lenses.<|im_end|>

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava#

在这里插入图片描述

对比Finetune前后的性能差异

clip-vit-large-patch14-336 模型：


(base) root@intern-studio-061925:~# ls -ltr -h /root/share/new_models/openai/clip-vit-large-patch14-336
total 3.2G
-rw-r--r-- 1 root root 4.7K Apr 10 11:15 config.json
-rw-r--r-- 1 root root 1.1K Apr 10 11:15 README.md
-rw-r--r-- 1 root root  316 Apr 10 11:15 preprocessor_config.json
-rw-r--r-- 1 root root  844 Apr 10 11:15 tokenizer_config.json
-rw-r--r-- 1 root root  389 Apr 10 11:15 special_tokens_map.json
-rw-r--r-- 1 root root 2.2M Apr 10 11:15 tokenizer.json
-rw-r--r-- 1 root root 513K Apr 10 11:15 merges.txt
-rw-r--r-- 1 root root 843K Apr 10 11:15 vocab.json
-rw-r--r-- 1 root root 1.6G Apr 10 11:16 pytorch_model.bin
-rw-r--r-- 1 root root 1.6G Apr 10 11:17 tf_model.h5

Finetune前

即：加载 1.8B 和 Pretrain阶段产物(iter_2181) 到显存。

# 解决小bug
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU

# pth转huggingface
xtuner convert pth_to_hf \
  llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain \
  /root/share/new_models/xtuner/iter_2181.pth \
  /root/tutorial/xtuner/llava/llava_data/iter_2181_hf

运行日志为：

(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# export MKL_SERVICE_FORCE_INTEL=1
(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# export MKL_THREADING_LAYER=GNU
(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# xtuner convert pth_to_hf \
>   llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain \
>   /root/share/new_models/xtuner/iter_2181.pth \
>   /root/tutorial/xtuner/llava/llava_data/iter_2181_hf



[2024-04-16 11:08:53,579] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-16 11:09:14,683] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
config.json: 850B [00:00, 3.99MB/s]
configuration_internlm2.py: 7.02kB [00:00, 34.2MB/s]
A new version of the following files was downloaded from https://huggingface.co/internlm/internlm2-chat-1_8b:
- configuration_internlm2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
modeling_internlm2.py: 60.0kB [00:00, 943kB/s]
A new version of the following files was downloaded from https://huggingface.co/internlm/internlm2-chat-1_8b:
- modeling_internlm2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
model.safetensors.index.json: 13.7kB [00:00, 97.1MB/s]
model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████| 1.98G/1.98G [02:08<00:00, 15.5MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████| 1.80G/1.80G [02:06<00:00, 14.2MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 2/2 [04:16<00:00, 128.48s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:31<00:00, 15.61s/it]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████| 132/132 [00:00<00:00, 1.19MB/s]
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
04/16 11:14:34 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:14:34 - mmengine - INFO - replace internlm2 rope
04/16 11:14:34 - mmengine - INFO - replace internlm2 rope
04/16 11:14:34 - mmengine - INFO - replace internlm2 rope
04/16 11:14:34 - mmengine - INFO - replace internlm2 rope
04/16 11:14:34 - mmengine - INFO - replace internlm2 rope
04/16 11:14:35 - mmengine - INFO - replace internlm2 rope
04/16 11:14:35 - mmengine - INFO - replace internlm2 rope
04/16 11:14:35 - mmengine - INFO - replace internlm2 rope
04/16 11:14:35 - mmengine - INFO - replace internlm2 rope
04/16 11:14:35 - mmengine - INFO - replace internlm2 rope
04/16 11:14:35 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:36 - mmengine - INFO - replace internlm2 rope
04/16 11:14:37 - mmengine - INFO - replace internlm2 rope
04/16 11:14:37 - mmengine - INFO - replace internlm2 rope
04/16 11:14:37 - mmengine - INFO - replace internlm2 rope
04/16 11:14:37 - mmengine - INFO - replace internlm2 rope
04/16 11:14:37 - mmengine - INFO - replace internlm2 rope
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
Load PTH model from /root/share/new_models/xtuner/iter_2181.pth
Saving projector to /root/tutorial/xtuner/llava/llava_data/iter_2181_hf/projector
All done!
(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava#
(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava#

# 启动！
xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \
  --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \
  --llava /root/tutorial/xtuner/llava/llava_data/iter_2181_hf \
  --prompt-template internlm2_chat \
  --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

Q1: Describe this image.
Q2: What is the equipment in the image?


(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \
>   --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \
>   --llava /root/tutorial/xtuner/llava/llava_data/iter_2181_hf \
>   --prompt-template internlm2_chat \
>   --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
[2024-04-16 11:23:58,124] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-16 11:24:21,334] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:19<00:00,  9.94s/it]
Load LLM from /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Load visual_encoder from /root/share/new_models/openai/clip-vit-large-patch14-336
Load projector from /root/tutorial/xtuner/llava/llava_data/iter_2181_hf

double enter to end input (EXIT: exit chat, RESET: reset history) >>> Describe this image.

a doctor and a woman looking at a computer screen with a computer screen behind them<|im_end|>

double enter to end input (EXIT: exit chat, RESET: reset history) >>> What is the equipment in the image?

a doctor and a woman looking at a computer screen with a computer screen behind them<|im_end|>

double enter to end input (EXIT: exit chat, RESET: reset history) >>> hello

a doctor and a woman looking at a computer screen with a computer screen behind them<|im_end|>

double enter to end input (EXIT: exit chat, RESET: reset history) >>>

在这里插入图片描述

Finetune后

即：加载 1.8B 和 Fintune阶段产物到显存。

# 解决小bug
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU

# pth转huggingface
xtuner convert pth_to_hf \
  /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py \
  /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth \
  /root/tutorial/xtuner/llava/llava_data/iter_1200_hf

运行日志为：


(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# xtuner convert pth_to_hf \
>   /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py \
>   /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth \
>   /root/tutorial/xtuner/llava/llava_data/iter_1200_hf
[2024-04-16 11:19:35,645] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-16 11:20:06,303] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:36<00:00, 18.04s/it]
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
04/16 11:21:11 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - dispatch internlm2 attn forward
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:11 - mmengine - INFO - replace internlm2 rope
04/16 11:21:12 - mmengine - INFO - replace internlm2 rope
04/16 11:21:12 - mmengine - INFO - replace internlm2 rope
04/16 11:21:12 - mmengine - INFO - replace internlm2 rope
04/16 11:21:12 - mmengine - INFO - replace internlm2 rope
04/16 11:21:12 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
04/16 11:21:13 - mmengine - INFO - replace internlm2 rope
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
Processing zero checkpoint '/root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth'
Detected checkpoint of type zero stage 2, world_size: 1
Parsing checkpoint created by deepspeed==0.14.0
Reconstructed fp32 state dict with 534 params 586354688 elements
Load PTH model from /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth
Saving LLM adapter to /root/tutorial/xtuner/llava/llava_data/iter_1200_hf/llm_adapter
Convert LLM to float16
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b - will assume that the vocabulary was not modified.
  warnings.warn(
Saving visual_encoder adapter to /root/tutorial/xtuner/llava/llava_data/iter_1200_hf/visual_encoder_adapter
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /root/share/new_models/openai/clip-vit-large-patch14-336 - will assume that the vocabulary was not modified.
  warnings.warn(
Saving projector to /root/tutorial/xtuner/llava/llava_data/iter_1200_hf/projector
All done!
(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava#

在这里插入图片描述


(xtuner0.1.17) root@intern-studio-061925:~/tutorial/xtuner/llava# xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \
>   --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \
>   --llava /root/tutorial/xtuner/llava/llava_data/iter_1200_hf \
>   --prompt-template internlm2_chat \
>   --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
[2024-04-16 11:29:15,950] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-16 11:29:29,847] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:23<00:00, 11.94s/it]
Load LLM from /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b
/root/.conda/envs/xtuner0.1.17/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Load visual_encoder from /root/share/new_models/openai/clip-vit-large-patch14-336
Load LLM adapter from /root/tutorial/xtuner/llava/llava_data/iter_1200_hf
Load visual_encoder adapter from /root/tutorial/xtuner/llava/llava_data/iter_1200_hf
Load projector from /root/tutorial/xtuner/llava/llava_data/iter_1200_hf

double enter to end input (EXIT: exit chat, RESET: reset history) >>> Describe this image.

This is
an image of an eye examination in progress. A man in a white lab coat, likely an optometrist, is attending to a female patient who is undergoing an eye test. She is using a phoropter, a device used to measure an individual's prescription for glasses or contact lenses. In the background, there's an eye chart with letters decreasing in size. The environment looks like a professional medical office, and both individuals seem focused on the task.<|im_end|>

double enter to end input (EXIT: exit chat, RESET: reset history) >>> This is a photograph of a patient undergoing an eye examination. A healthcare professional, possibly an optometrist, is using a slit lamp to examine the patient's eyes. The patient is seated, leaning into the machine which has a chin rest and a forehead support. In the background, there's an eye chart.<|im_end|>

double enter to end input (EXIT: exit chat, RESET: reset history) >>> What is the equipment in the image?

The equipment in the image is a phoropter, a common optometric device. It's used to measure the patient's refractive error and to determine an appropriate prescription for glasses or contact lenses.<|im_end|>

在这里插入图片描述

大模型技术分享

在这里插入图片描述

《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座

模块一：Generative AI 原理本质、技术内核及工程实践周期详解
模块二：工业级 Prompting 技术内幕及端到端的基于LLM 的会议助理实战
模块三：三大 Llama 2 模型详解及实战构建安全可靠的智能对话系统
模块四：生产环境下 GenAI/LLMs 的五大核心问题及构建健壮的应用实战
模块五：大模型应用开发技术：Agentic-based 应用技术及案例实战
模块六：LLM 大模型微调及模型 Quantization 技术及案例实战
模块七：大模型高效微调 PEFT 算法、技术、流程及代码实战进阶
模块八：LLM 模型对齐技术、流程及进行文本Toxicity 分析实战
模块九：构建安全的 GenAI/LLMs 核心技术Red Teaming 解密实战
模块十：构建可信赖的企业私有安全大模型Responsible AI 实战

Llama3关键技术深度解析与构建Responsible AI、算法及开发落地实战

1、Llama开源模型家族大模型技术、工具和多模态详解：学员将深入了解Meta Llama 3的创新之处，比如其在语言模型技术上的突破，并学习到如何在Llama 3中构建trust and safety AI。他们将详细了解Llama 3的五大技术分支及工具，以及如何在AWS上实战Llama指令微调的案例。
2、解密Llama 3 Foundation Model模型结构特色技术及代码实现：深入了解Llama 3中的各种技术，比如Tiktokenizer、KV Cache、Grouped Multi-Query Attention等。通过项目二逐行剖析Llama 3的源码，加深对技术的理解。
3、解密Llama 3 Foundation Model模型结构核心技术及代码实现：SwiGLU Activation Function、FeedForward Block、Encoder Block等。通过项目三学习Llama 3的推理及Inferencing代码，加强对技术的实践理解。
4、基于LangGraph on Llama 3构建Responsible AI实战体验：通过项目四在Llama 3上实战基于LangGraph的Responsible AI项目。他们将了解到LangGraph的三大核心组件、运行机制和流程步骤，从而加强对Responsible AI的实践能力。
5、Llama模型家族构建技术构建安全可信赖企业级AI应用内幕详解：深入了解构建安全可靠的企业级AI应用所需的关键技术，比如Code Llama、Llama Guard等。项目五实战构建安全可靠的对话智能项目升级版，加强对安全性的实践理解。
6、Llama模型家族Fine-tuning技术与算法实战：学员将学习Fine-tuning技术与算法，比如Supervised Fine-Tuning(SFT)、Reward Model技术、PPO算法、DPO算法等。项目六动手实现PPO及DPO算法，加强对算法的理解和应用能力。
7、Llama模型家族基于AI反馈的强化学习技术解密：深入学习Llama模型家族基于AI反馈的强化学习技术，比如RLAIF和RLHF。项目七实战基于RLAIF的Constitutional AI。
8、Llama 3中的DPO原理、算法、组件及具体实现及算法进阶：学习Llama 3中结合使用PPO和DPO算法，剖析DPO的原理和工作机制，详细解析DPO中的关键算法组件，并通过综合项目八从零开始动手实现和测试DPO算法，同时课程将解密DPO进阶技术Iterative DPO及IPO算法。
9、Llama模型家族Safety设计与实现：在这个模块中，学员将学习Llama模型家族的Safety设计与实现，比如Safety in Pretraining、Safety Fine-Tuning等。构建安全可靠的GenAI/LLMs项目开发。
10、Llama 3构建可信赖的企业私有安全大模型Responsible AI系统：构建可信赖的企业私有安全大模型Responsible AI系统，掌握Llama 3的Constitutional AI、Red Teaming。

解码Sora架构、技术及应用

一、为何Sora通往AGI道路的里程碑？
1，探索从大规模语言模型(LLM)到大规模视觉模型(LVM)的关键转变，揭示其在实现通用人工智能(AGI)中的作用。
2，展示Visual Data和Text Data结合的成功案例，解析Sora在此过程中扮演的关键角色。
3，详细介绍Sora如何依据文本指令生成具有三维一致性(3D consistency)的视频内容。 4，解析Sora如何根据图像或视频生成高保真内容的技术路径。
5，探讨Sora在不同应用场景中的实践价值及其面临的挑战和局限性。

二、解码Sora架构原理
1，DiT (Diffusion Transformer)架构详解
2，DiT是如何帮助Sora实现Consistent、Realistic、Imaginative视频内容的？
3，探讨为何选用Transformer作为Diffusion的核心网络，而非技术如U-Net。
4，DiT的Patchification原理及流程，揭示其在处理视频和图像数据中的重要性。
5，Conditional Diffusion过程详解，及其在内容生成过程中的作用。
三、解码Sora关键技术解密
1，Sora如何利用Transformer和Diffusion技术理解物体间的互动，及其对模拟复杂互动场景的重要性。
2，为何说Space-time patches是Sora技术的核心，及其对视频生成能力的提升作用。
3，Spacetime latent patches详解，探讨其在视频压缩和生成中的关键角色。
4，Sora Simulator如何利用Space-time patches构建digital和physical世界，及其对模拟真实世界变化的能力。
5，Sora如何实现faithfully按照用户输入文本而生成内容，探讨背后的技术与创新。
6，Sora为何依据abstract concept而不是依据具体的pixels进行内容生成，及其对模型生成质量与多样性的影响。