【论文笔记】Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks

论文地址:Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

代码地址:https://github.com/jierunchen/fasternet

该论文主要提出了PConv,通过优化FLOPS提出了快速推理模型FasterNet。

在设计神经网络结构的时候,大部分注意力都会放在降低FLOPs( floating-point opera-
tions)上,有的时候FLOPs降低了,并不意味了推理速度加快了,这主要是因为没考虑到FLOPS(floating-point operations per second)。针对该问题,作者提出了PConv( partial convolution),通过提高FLOPS来加快推理速度。

一、引言

      非常多的实时推理模型都将重点放在降低FLOPs上,比如:MobileNet,ShuffleNet,GhostNet等等。虽然这些网络都降低了FLOPs,但是他们没有考虑到FLOPS,所以推理速度仍有优化空间,推理的延时计算公式如下:

由上式可以看出,要想加快推理速度,不仅可以从FLOPs入手,也可以优化FLOPS。作者在多个模型上做了实验,发现很多模型的FLOPS低于ResNet50。于是作者提出了PConv,通过提高FLOPS来加快推理速度。

二、PConv

为了提高FLOPS,作者提出了PConv,其结构如下图:

部分通道数经过卷积运算,其他通道不进行运算。再看了几眼。。。。这个和GhostConv好像呀。。。。

网络整体结构如下:

三、模型性能

FasterNet在ImageNet-1K上的表现如下:

在coco数据集上的表现如下:

四、代码

给出PConv的代码,也是非常简单:

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch
import torch.nn as nn
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from functools import partial
from typing import List
from torch import Tensor
import copy
import os

try:
    from mmdet.models.builder import BACKBONES as det_BACKBONES
    from mmdet.utils import get_root_logger
    from mmcv.runner import _load_checkpoint
    has_mmdet = True
except ImportError:
    print("If for detection, please install mmdetection first")
    has_mmdet = False


class Partial_conv3(nn.Module):

    def __init__(self, dim, n_div, forward):
        super().__init__()
        self.dim_conv3 = dim // n_div
        self.dim_untouched = dim - self.dim_conv3
        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)

        if forward == 'slicing':
            self.forward = self.forward_slicing
        elif forward == 'split_cat':
            self.forward = self.forward_split_cat
        else:
            raise NotImplementedError

    def forward_slicing(self, x: Tensor) -> Tensor:
        # only for inference
        x = x.clone()   # !!! Keep the original input intact for the residual connection later
        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])

        return x

    def forward_split_cat(self, x: Tensor) -> Tensor:
        # for training/inference
        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)
        x1 = self.partial_conv3(x1)
        x = torch.cat((x1, x2), 1)

        return x


class MLPBlock(nn.Module):

    def __init__(self,
                 dim,
                 n_div,
                 mlp_ratio,
                 drop_path,
                 layer_scale_init_value,
                 act_layer,
                 norm_layer,
                 pconv_fw_type
                 ):

        super().__init__()
        self.dim = dim
        self.mlp_ratio = mlp_ratio
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.n_div = n_div

        mlp_hidden_dim = int(dim * mlp_ratio)

        mlp_layer: List[nn.Module] = [
            nn.Conv2d(dim, mlp_hidden_dim, 1, bias=False),
            norm_layer(mlp_hidden_dim),
            act_layer(),
            nn.Conv2d(mlp_hidden_dim, dim, 1, bias=False)
        ]

        self.mlp = nn.Sequential(*mlp_layer)

        self.spatial_mixing = Partial_conv3(
            dim,
            n_div,
            pconv_fw_type
        )

        if layer_scale_init_value > 0:
            self.layer_scale = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)
            self.forward = self.forward_layer_scale
        else:
            self.forward = self.forward

    def forward(self, x: Tensor) -> Tensor:
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(self.mlp(x))
        return x

    def forward_layer_scale(self, x: Tensor) -> Tensor:
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(
            self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))
        return x


class BasicStage(nn.Module):

    def __init__(self,
                 dim,
                 depth,
                 n_div,
                 mlp_ratio,
                 drop_path,
                 layer_scale_init_value,
                 norm_layer,
                 act_layer,
                 pconv_fw_type
                 ):

        super().__init__()

        blocks_list = [
            MLPBlock(
                dim=dim,
                n_div=n_div,
                mlp_ratio=mlp_ratio,
                drop_path=drop_path[i],
                layer_scale_init_value=layer_scale_init_value,
                norm_layer=norm_layer,
                act_layer=act_layer,
                pconv_fw_type=pconv_fw_type
            )
            for i in range(depth)
        ]

        self.blocks = nn.Sequential(*blocks_list)

    def forward(self, x: Tensor) -> Tensor:
        x = self.blocks(x)
        return x


class PatchEmbed(nn.Module):

    def __init__(self, patch_size, patch_stride, in_chans, embed_dim, norm_layer):
        super().__init__()
        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_stride, bias=False)
        if norm_layer is not None:
            self.norm = norm_layer(embed_dim)
        else:
            self.norm = nn.Identity()

    def forward(self, x: Tensor) -> Tensor:
        x = self.norm(self.proj(x))
        return x


class PatchMerging(nn.Module):

    def __init__(self, patch_size2, patch_stride2, dim, norm_layer):
        super().__init__()
        self.reduction = nn.Conv2d(dim, 2 * dim, kernel_size=patch_size2, stride=patch_stride2, bias=False)
        if norm_layer is not None:
            self.norm = norm_layer(2 * dim)
        else:
            self.norm = nn.Identity()

    def forward(self, x: Tensor) -> Tensor:
        x = self.norm(self.reduction(x))
        return x


class FasterNet(nn.Module):

    def __init__(self,
                 in_chans=3,
                 num_classes=1000,
                 embed_dim=96,
                 depths=(1, 2, 8, 2),
                 mlp_ratio=2.,
                 n_div=4,
                 patch_size=4,
                 patch_stride=4,
                 patch_size2=2,  # for subsequent layers
                 patch_stride2=2,
                 patch_norm=True,
                 feature_dim=1280,
                 drop_path_rate=0.1,
                 layer_scale_init_value=0,
                 norm_layer='BN',
                 act_layer='RELU',
                 fork_feat=False,
                 init_cfg=None,
                 pretrained=None,
                 pconv_fw_type='split_cat',
                 **kwargs):
        super().__init__()

        if norm_layer == 'BN':
            norm_layer = nn.BatchNorm2d
        else:
            raise NotImplementedError

        if act_layer == 'GELU':
            act_layer = nn.GELU
        elif act_layer == 'RELU':
            act_layer = partial(nn.ReLU, inplace=True)
        else:
            raise NotImplementedError

        if not fork_feat:
            self.num_classes = num_classes
        self.num_stages = len(depths)
        self.embed_dim = embed_dim
        self.patch_norm = patch_norm
        self.num_features = int(embed_dim * 2 ** (self.num_stages - 1))
        self.mlp_ratio = mlp_ratio
        self.depths = depths

        # split image into non-overlapping patches
        self.patch_embed = PatchEmbed(
            patch_size=patch_size,
            patch_stride=patch_stride,
            in_chans=in_chans,
            embed_dim=embed_dim,
            norm_layer=norm_layer if self.patch_norm else None
        )

        # stochastic depth decay rule
        dpr = [x.item()
               for x in torch.linspace(0, drop_path_rate, sum(depths))]

        # build layers
        stages_list = []
        for i_stage in range(self.num_stages):
            stage = BasicStage(dim=int(embed_dim * 2 ** i_stage),
                               n_div=n_div,
                               depth=depths[i_stage],
                               mlp_ratio=self.mlp_ratio,
                               drop_path=dpr[sum(depths[:i_stage]):sum(depths[:i_stage + 1])],
                               layer_scale_init_value=layer_scale_init_value,
                               norm_layer=norm_layer,
                               act_layer=act_layer,
                               pconv_fw_type=pconv_fw_type
                               )
            stages_list.append(stage)

            # patch merging layer
            if i_stage < self.num_stages - 1:
                stages_list.append(
                    PatchMerging(patch_size2=patch_size2,
                                 patch_stride2=patch_stride2,
                                 dim=int(embed_dim * 2 ** i_stage),
                                 norm_layer=norm_layer)
                )

        self.stages = nn.Sequential(*stages_list)

        self.fork_feat = fork_feat

        if self.fork_feat:
            self.forward = self.forward_det
            # add a norm layer for each output
            self.out_indices = [0, 2, 4, 6]
            for i_emb, i_layer in enumerate(self.out_indices):
                if i_emb == 0 and os.environ.get('FORK_LAST3', None):
                    raise NotImplementedError
                else:
                    layer = norm_layer(int(embed_dim * 2 ** i_emb))
                layer_name = f'norm{i_layer}'
                self.add_module(layer_name, layer)
        else:
            self.forward = self.forward_cls
            # Classifier head
            self.avgpool_pre_head = nn.Sequential(
                nn.AdaptiveAvgPool2d(1),
                nn.Conv2d(self.num_features, feature_dim, 1, bias=False),
                act_layer()
            )
            self.head = nn.Linear(feature_dim, num_classes) \
                if num_classes > 0 else nn.Identity()

        self.apply(self.cls_init_weights)
        self.init_cfg = copy.deepcopy(init_cfg)
        if self.fork_feat and (self.init_cfg is not None or pretrained is not None):
            self.init_weights()

    def cls_init_weights(self, m):
        if isinstance(m, nn.Linear):
            trunc_normal_(m.weight, std=.02)
            if isinstance(m, nn.Linear) and m.bias is not None:
                nn.init.constant_(m.bias, 0)
        elif isinstance(m, (nn.Conv1d, nn.Conv2d)):
            trunc_normal_(m.weight, std=.02)
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)
        elif isinstance(m, (nn.LayerNorm, nn.GroupNorm)):
            nn.init.constant_(m.bias, 0)
            nn.init.constant_(m.weight, 1.0)

    # init for mmdetection by loading imagenet pre-trained weights
    def init_weights(self, pretrained=None):
        logger = get_root_logger()
        if self.init_cfg is None and pretrained is None:
            logger.warn(f'No pre-trained weights for '
                        f'{self.__class__.__name__}, '
                        f'training start from scratch')
            pass
        else:
            assert 'checkpoint' in self.init_cfg, f'Only support ' \
                                                  f'specify `Pretrained` in ' \
                                                  f'`init_cfg` in ' \
                                                  f'{self.__class__.__name__} '
            if self.init_cfg is not None:
                ckpt_path = self.init_cfg['checkpoint']
            elif pretrained is not None:
                ckpt_path = pretrained

            ckpt = _load_checkpoint(
                ckpt_path, logger=logger, map_location='cpu')
            if 'state_dict' in ckpt:
                _state_dict = ckpt['state_dict']
            elif 'model' in ckpt:
                _state_dict = ckpt['model']
            else:
                _state_dict = ckpt

            state_dict = _state_dict
            missing_keys, unexpected_keys = \
                self.load_state_dict(state_dict, False)

            # show for debug
            print('missing_keys: ', missing_keys)
            print('unexpected_keys: ', unexpected_keys)

    def forward_cls(self, x):
        # output only the features of last layer for image classification
        x = self.patch_embed(x)
        x = self.stages(x)
        x = self.avgpool_pre_head(x)  # B C 1 1
        x = torch.flatten(x, 1)
        x = self.head(x)

        return x

    def forward_det(self, x: Tensor) -> Tensor:
        # output the features of four stages for dense prediction
        x = self.patch_embed(x)
        outs = []
        for idx, stage in enumerate(self.stages):
            x = stage(x)
            if self.fork_feat and idx in self.out_indices:
                norm_layer = getattr(self, f'norm{idx}')
                x_out = norm_layer(x)
                outs.append(x_out)

        return outs

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/272842.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

网络编程--网络基础

这里写目录标题 协议的概念什么是协议典型协议 分层模型OSI七层模型与TCP/TP四层模型 通信过程协议格式以太网帧协议&#xff08;主要作用与mac地址&#xff0c;也就是网卡&#xff09;mac地址格式ARP协议总结 IP协议&#xff08;主要作用于IP&#xff09;UDP与TCP协议&#xf…

one wire(单总线)FPGA代码篇

一.引言 单总线&#xff08;OneWire&#xff09;是一种串行通信协议&#xff0c;它允许多个设备通过一个单一的数据线进行通信。这个协议通常用于低速、短距离的数字通信&#xff0c;特别适用于嵌入式系统和传感器网络。 二.one wire通信优点缺点 优点&#xff1a; 单一数据线…

扫描全能王启动鸿蒙原生应用开发,系HarmonyOS NEXT智能扫描领域首批

近期&#xff0c;“鸿蒙合作签约暨扫描全能王鸿蒙原生应用开发启动仪式”&#xff08;简称“签约仪式”&#xff09;正式举行。合合信息与华为达成鸿蒙合作&#xff0c;旗下扫描全能王将基于HarmonyOS NEXT正式启动鸿蒙原生应用开发。据悉&#xff0c;扫描全能王是鸿蒙在智能扫…

TG7050CKN,TG7050SKN ,TG7050CMN,TG7050SMN

爱普生推出的温补晶振型号&#xff1a;TG7050CKN&#xff0c;TG7050SKN &#xff0c;TG7050CMN&#xff0c;TG7050SMN频率范围为 10mhz ~ 54mhz 适用于广泛的频率需求。这几款的特点就是耐高温&#xff0c;温度可达105℃高温&#xff0c;而且都是高稳定性温补晶振&#xff0c;&…

【C++】开源:fast-cpp-csv-parser数据解析库配置使用

&#x1f60f;★,:.☆(&#xffe3;▽&#xffe3;)/$:.★ &#x1f60f; 这篇文章主要介绍fast-cpp-csv-parser数据解析库配置使用。 无专精则不能成&#xff0c;无涉猎则不能通。——梁启超 欢迎来到我的博客&#xff0c;一起学习&#xff0c;共同进步。 喜欢的朋友可以关注一…

钦丰科技(安徽)股份有限公司携卫生级阀门管件盛装亮相2024发酵展

钦丰科技(安徽)股份有限公司携卫生级阀门管件盛装亮相2024济南生物发酵展&#xff01; 展位号&#xff1a;2号馆A65展位 2024第12届国际生物发酵产品与技术装备展览会&#xff08;济南&#xff09;于3月5-7日在山东国际会展中心盛大召开&#xff0c;展会同期将举办30余场高质…

ubuntu22.04搭建RTSP服务器

大致命令如下&#xff1a; git clone --depth 1 gitgithub.com:ZLMediaKit/ZLMediaKit.git sudo apt-get install build-essential sudo apt-get install cmake #除了openssl,其他其实都可以不安装 sudo apt-get install libssl-dev sudo apt-get install libsdl-dev sudo apt…

医院信息化-6 大模型与医疗

之前写了一系列跟医疗信息化相关的内容&#xff0c;其中有提到人工智能&#xff0c;但是写的都是原先的一些AI算法基础上的医疗应用。现在大模型出现的涌现推理能力确实让人惊讶&#xff0c;并且出现可商用化的可能性&#xff0c;因此最近一年关于大模型在医疗的应用也开始出现…

使用ffmpeg实现视频旋转并保持清晰度不变

1 原始视频信息 通过ffmpeg -i命令查看视频基本信息 ffmpeg -i source.mp4 ffmpeg version 6.1-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developersbuilt with gcc 12.2.0 (Rev10, Built by MSYS2 project)configuration: --enable-gpl --enable-…

智能三维数据虚拟现实电子沙盘

一、概述 易图讯科技&#xff08;www.3dgis.top&#xff09;以大数据、云计算、虚拟现实、物联网、AI等先进技术为支撑&#xff0c;支持高清卫星影像、DEM高程数据、矢量数据、无人机倾斜摄像、BIM模型、点云、城市白模、等高线、标高点等数据融合和切换&#xff0c;智能三维数…

python作业题百度网盘,python作业答案怎么查

大家好&#xff0c;小编来为大家解答以下问题&#xff0c;python作业题百度网盘&#xff0c;python作业答案怎么查&#xff0c;今天让我们一起来看看吧&#xff01; 1 以下代码的输出结果为&#xff1a; alist [1, 2, 3, 4] print(alist.reverse()) print(alist) A.[4, 3, 2, …

根据DCT特征训练CNN

记录一次改代码的挣扎经历&#xff1a; 看了几篇关于DCT频域的深度模型文献&#xff0c;尤其是21年FcaNet&#xff1a;基于DCT 的attention model&#xff0c;咱就是说想试试将我模型的输入改为分组的DCT系数&#xff0c;然后就开始下面的波折了。 第一次尝试&#xf…

在Centos7中利用Shell脚本:实现MySQL的数据备份

目录 自动化备份MySQL 一.备份数据库脚本 1.创建备份目录 2.创建脚本文件 3.新建配置文件&#xff08;连接数据库的配置文件&#xff09; 4.给文件权限(mysql_backup.sh) ​编辑 5.执行命令 (mysql_backup.sh) ​编辑 二.数据库通过备份恢复 1.创建脚…

多维时序 | MATLAB实现SSA-BiLSTM麻雀算法优化双向长短期记忆神经网络多变量时间序列预测

多维时序 | MATLAB实现SSA-BiLSTM麻雀算法优化双向长短期记忆神经网络多变量时间序列预测 目录 多维时序 | MATLAB实现SSA-BiLSTM麻雀算法优化双向长短期记忆神经网络多变量时间序列预测预测效果基本介绍程序设计参考资料 预测效果 基本介绍 1.MATLAB实现SSA-BiLSTM麻雀算法优化…

k8s的二进制部署: 源码包部署

服务器IP软件包k8s--master0120.0.0.61kube-aplserver&#xff0c;kube-controer-manager&#xff0c;kube-scheduler&#xff0c;etcdk8s--master0220.0.0.62kube-controer-manager&#xff0c;kube-schedulernode节点0120.0.0.62kubelet&#xff0c;kube-proxy&#xff0c;et…

第九部分 图论

目录 例 相关概念 握手定理 例1 图的度数列 例 无向图的连通性 无向图的连通度 例2 例3 有向图D如图所示&#xff0c;求 A, A2, A3, A4&#xff0c;并回答诸问题&#xff1a; 中间有几章这里没有写&#xff0c;感兴趣可以自己去学&#xff0c;组合数学跟高中差不多&#xff0c…

目标检测-Two Stage-SPP Net

文章目录 前言一、SPP Net 的网络结构和流程二、SPP的创新点总结 前言 SPP Net&#xff1a;Spatial Pyramid Pooling Net&#xff08;空间金字塔池化网络&#xff09; SPP-Net是出自何凯明教授于2015年发表在IEEE上的论文-《Spatial Pyramid Pooling in Deep ConvolutionalNetw…

设计模式(4)--对象行为(5)--中介者

1. 意图 用一个中介对象来封装一系列的对象交互。 中介者使各对象不需要显式地相互引用&#xff0c;从而使其耦合松散&#xff0c; 而且可以独立地改变它们之间的交互。 2. 四种角色 抽象中介者(Mediator)、具体中介者(Concrete Mediator)、抽象同事(Colleague)、 具体同事(Co…

https密钥认证、上传镜像实验

一、第一台主机通过https密钥对认证 1、安装docker服务 &#xff08;1&#xff09;安装环境依赖包 yum -y install yum-utils device-mapper-persistent-data lvm2 &#xff08;2&#xff09;设置阿里云镜像源 yum-config-manager --add-repo http://mirrors.aliyun.com/do…

时序预测 | Matlab实现SSA-CNN-LSTM麻雀算法优化卷积长短期记忆神经网络时间序列预测

时序预测 | Matlab实现SSA-CNN-LSTM麻雀算法优化卷积长短期记忆神经网络时间序列预测 目录 时序预测 | Matlab实现SSA-CNN-LSTM麻雀算法优化卷积长短期记忆神经网络时间序列预测预测效果基本介绍程序设计参考资料 预测效果 基本介绍 MATLAB实现SSA-CNN-LSTM麻雀算法优化卷积长短…
最新文章