當前位置:網站首頁>【論文筆記】LSNet: Extremely Light-Weight Siamese Network For Change Detection in Remote Sensing Image

【論文筆記】LSNet: Extremely Light-Weight Siamese Network For Change Detection in Remote Sensing Image

2022-05-15 07:19:57m0_61899108

 論文

論文題目:LSNET: EXTREMELY LIGHT-WEIGHT SIAMESE NETWORK FOR CHANGE DETECTIONOF REMOTE SENSING IMAGE

投遞:CVPR 2022

論文地址:https://arxiv.org/abs/2201.09156

項目地址:https://github.com/qaz670756/LSNet

論文思路較為簡單,主要做了兩方面的修改,首先是骨幹網絡的輕量化,采用CGB模塊構建孿生的輕量骨幹網絡;另一方面則是金字塔特征融合方式的改進,在denseFPN的基礎上改進,去除冗餘連接,增加自底而上的融合路徑。模型參數量和計算量大大降低的原因在於骨幹網絡的輕量化,采用了深度可分離卷積替代普通卷積操作。

實驗結果 

官方訓練參數: 

{
  "patch_size": 256,
  "augmentation": true,
  "num_gpus": 1,
  "num_workers": 8,
  "num_channel": 3,
  "EF": false,
  "epochs": 101,
  "batch_size": 12,
  "learning_rate": 1e-3,
  "model_name": "denseFPN",
  "loss_function": "contra_hybrid",
  "dataset_dir": "data/Real/subset/",
  "weight_dir": "./outputs/",
  "log_dir": "./log/"
}

 

Abstract

孿生網絡逐漸成為遙感圖像(remote sensing image,RSI)變化檢測的主流。但隨著結構、模塊和訓練過程的複雜化,模型也越加複雜,難以實際應用。

本文,提出一種用於RSI變化檢測的超輕量孿生網絡(Light-Weight Siamese Network,LSNet),用深度可分離卷積空洞卷積替代標准卷積,並去除冗餘的密集連接,在進行孿生特征融合時只保留有效的特征流,大大壓縮了參數和計算量。在CDD數據集上,與第一比特模型相比,LSNet的參數和計算量分別减少了90.35%和91.34%,精度僅下降了1.5%。

Introduction

傳統的RSI變化檢測方法依賴於人工特征和耗時的前後期處理,難以區分語義變化和背景噪聲。

圖像對可直接輸入孿生卷積網絡,無需預處理,依靠端到端的監督學習就可以分離語義變化區域和不變區域。

  • 本文提出一種輕量級的孿生網絡LSNet,效率很高,如圖1。網絡主幹采用上下文引導模塊(Context Guide Block,CGB)構建,該模塊以深度可分離空洞卷積和全局特征聚合為核心組件。對比使用ResNet-50作為主幹時,LSNet主幹的參數量和計算量分別只有原來的3.97%和32.56%。
  • 提出一種差分特征金字塔網絡(diffFPN)來進行漸進式特征對差分提取和分辨率恢複(在保持特征流的同時消除冗餘連接),最終將變化的圖像區域從恒定的圖像區域中分離出來。

Method

LSNet:包括一個孿生主幹網(LightSiamese Backbone)和一個差分特征金字塔網絡(diffFPN)。主幹網利用上下文引導模塊(CGB)構建,diffFPN用於有效的孿生特征對融合。

Light-Siamese backbone

圖像T1和T2經過具有共享權重的孿生網絡主幹,該骨幹網由4個複合層組成(從上到下,複合層分別由3/3/8/12個CGB模塊組成),每個CGB相當於兩個級別,因此有4組特征輸出,總共52層。

基本組件(Context Guide Block)CGB如圖2右邊所示。輸入X經過並行的擴張(膨脹)卷積,以獲得不同範圍(感受野)的局部上下文信息。擴張卷積以深度可分離的方式計算,即所有的通道都被分組,卷積只在一個獨立的組中運算。(深度可分離卷積,能大大减少計算量,但速度有上限,其算力瓶頸在於訪問帶寬)

通道交互和全局信息提取。 

Differential feature pyramid network

 SUNNet提出過一種密集連接的金字塔特征融合方法,圖3(a)所示,

 這種denseFPN結構存在2個問題:

  • 冗餘連接。(T_1,0、T_2,0等淺層特征被反複輸入到d_1,0、d_2,0、d_3,0中,效率低下)
  • 不合理的特征流。(denseFPN中,輸出層d_0,0和d_1,0包含來自主幹的不完整特征)

於是,論文提出diffFPPN結構, 移除冗餘連接,添加了自底向上的融合路徑,使三個輸出層包含完整的主幹網絡特征。

Experiment and Results

Dataset and evaluation metrics

數據集:CDD

常用指標: precision、recall、F1-score、overall accuracy

量化指標:F1-G、F1-F (分別量化單元參數和計算量對F1分數的影響),F1-Eff(評估模型的整體效率) 

Accuracy and efficiency comparison 

兩個模塊在參數量和計算量上的比較(CDD數據集),錶中可看出,

  • 相比ResNet-50,LightSiamese-52主幹的參數量和計算量分別只有原來的1/25和1/3。
  • denseFPN的結構存在特征流不合理性,diffFPN則僅提昇0.0709M的參數量時,計算量减少1.0884GFLOPs,减少一半以上。

 多種方法在CDD數據集中的性能錶現對比,LSNet方法的各項性能指標都還ok,居前三。

 多種方法的效率對比,可見使用diffFPN的方法具有最高的F1-P和F1-G。

結合錶2和圖3,與SNUNet相比,LSNet的參數和計算量分別减少了90.35%和91.34%,精度僅下降1.5%。

LSNet的可視化結果。結果相較准確,但邊緣細節的有待進一步細化。(e)可看出,變化區域的邊緣比內部概率更高,錶明網絡利用區域的結構作為鑒別特征,提高了其對顏色和紋理變化的魯棒性。 

Conclusion

為了有效地檢測RSI變化,提出了一種輕量級孿生網絡,該網絡具有由上下文引導模塊(CGB)構建的輕量孿生主幹(LightSiamese Backbone)和特征對融合模塊(diffFPN)。在具有挑戰性的CCD數據集上的結果錶明,與其他主流方法相比,該方法在有限的參數和計算量下獲得了具有競爭力的結果,證明了其有效性。 

核心代碼 

Context Guide Block

class ContextGuidedBlock(nn.Module):
    """Context Guided Block for CGNet.

    This class consists of four components: local feature extractor,
    surrounding feature extractor, joint feature extractor and global
    context extractor.

    Args:
        in_channels (int): Number of input feature channels.
        out_channels (int): Number of output feature channels.
        dilation (int): Dilation rate for surrounding context extractor.
            Default: 2.
        reduction (int): Reduction for global context extractor. Default: 16.
        skip_connect (bool): Add input to output or not. Default: True.
        downsample (bool): Downsample the input to 1/2 or not. Default: False.
        conv_cfg (dict): Config dict for convolution layer.
            Default: None, which means using conv2d.
        norm_cfg (dict): Config dict for normalization layer.
            Default: dict(type='BN', requires_grad=True).
        act_cfg (dict): Config dict for activation layer.
            Default: dict(type='PReLU').
        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
            memory while slowing down the training speed. Default: False.
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 dilation=2,
                 reduction=16,
                 skip_connect=True,
                 downsample=False,
                 conv_cfg=None,
                 norm_cfg=dict(type='BN', requires_grad=True),
                 act_cfg=dict(type='PReLU'),
                 with_cp=False):
        super(ContextGuidedBlock, self).__init__()
        self.with_cp = with_cp
        self.downsample = downsample

        # channels = out_channels if downsample else out_channels // 2
        channels = out_channels // 2
        if 'type' in act_cfg and act_cfg['type'] == 'PReLU':
            act_cfg['num_parameters'] = channels
        kernel_size = 3 if downsample else 1
        stride = 2 if downsample else 1
        padding = (kernel_size - 1) // 2
        # self.channel_shuffle = ChannelShuffle(2 if in_channels==in_channels//2*2 else in_channels)
        self.conv1x1 = nn.Sequential(
            nn.Conv2d(in_channels, channels, kernel_size=kernel_size, stride=stride, padding=padding),
            build_norm_layer(channels),
            nn.PReLU(num_parameters=channels)
        )

        self.f_loc = nn.Conv2d(channels, channels, kernel_size=3,
                               padding=1, groups=channels, bias=False)

        self.f_sur = nn.Conv2d(channels, channels, kernel_size=3, padding=dilation,
                               dilation=dilation, groups=channels, bias=False)

        self.bn = build_norm_layer(2 * channels)
        self.activate = nn.PReLU(2 * channels)

        # original bottleneck in CGNet: A light weight context guided network for segmantic segmentation
        # is removed for saving computation amount
        # if downsample:
        #     self.bottleneck = build_conv_layer(
        #         conv_cfg,
        #         2 * channels,
        #         out_channels,
        #         kernel_size=1,
        #         bias=False)

        self.skip_connect = skip_connect and not downsample
        self.f_glo = GlobalContextExtractor(out_channels, reduction, with_cp)
        # self.f_glo = CoordAtt(out_channels,out_channels,groups=reduction)

    def forward(self, x):

        def _inner_forward(x):
            # x = self.channel_shuffle(x)
            out = self.conv1x1(x)
            loc = self.f_loc(out)
            sur = self.f_sur(out)

            joi_feat = torch.cat([loc, sur], 1)  # the joint feature
            joi_feat = self.bn(joi_feat)
            joi_feat = self.activate(joi_feat)
            if self.downsample:
                pass
                # joi_feat = self.bottleneck(joi_feat)  # channel = out_channels
            # f_glo is employed to refine the joint feature
            out = self.f_glo(joi_feat)

            if self.skip_connect:
                return x + out
            else:
                return out

        return _inner_forward(x)


def cgblock(in_ch, out_ch, dilation=2, reduction=8, skip_connect=False):
    return nn.Sequential(
        ContextGuidedBlock(in_ch, out_ch,
                           dilation=dilation,
                           reduction=reduction,
                           downsample=False,
                           skip_connect=skip_connect))

light_siamese_backbone

class light_siamese_backbone(nn.Module):
    def __init__(self, in_ch=None, num_blocks=None, cur_channels=None,
                 filters=None, dilations=None, reductions=None):
        super(light_siamese_backbone, self).__init__()
        norm_cfg = {'type': 'BN', 'eps': 0.001, 'requires_grad': True}
        act_cfg = {'type': 'PReLU', 'num_parameters': 32}
        self.inject_2x = InputInjection(1)  # down-sample for Input, factor=2
        self.inject_4x = InputInjection(2)  # down-sample for Input, factor=4
        # stage 0
        self.stem = nn.ModuleList()
        for i in range(num_blocks[0]):
            self.stem.append(
                ContextGuidedBlock(
                    cur_channels[0], filters[0],
                    dilations[0], reductions[0],
                    skip_connect=(i != 0),
                    downsample=False,
                    norm_cfg=norm_cfg,
                    act_cfg=act_cfg)  # CG block
            )
            cur_channels[0] = filters[0]

        cur_channels[0] += in_ch
        self.norm_prelu_0 = nn.Sequential(
            build_norm_layer(cur_channels[0]),
            nn.PReLU(cur_channels[0]))

        # stage 1
        self.level1 = nn.ModuleList()
        for i in range(num_blocks[1]):
            self.level1.append(
                ContextGuidedBlock(
                    cur_channels[0] if i == 0 else filters[1],
                    filters[1], dilations[1], reductions[1],
                    downsample=(i == 0),
                    norm_cfg=norm_cfg,
                    act_cfg=act_cfg))  # CG block

        cur_channels[1] = 2 * filters[1] + in_ch
        self.norm_prelu_1 = nn.Sequential(
            build_norm_layer(cur_channels[1]),
            nn.PReLU(cur_channels[1]))

        # stage 2
        self.level2 = nn.ModuleList()
        for i in range(num_blocks[2]):
            self.level2.append(
                ContextGuidedBlock(
                    cur_channels[1] if i == 0 else filters[2],
                    filters[2], dilations[2], reductions[2],
                    downsample=(i == 0),
                    norm_cfg=norm_cfg,
                    act_cfg=act_cfg))  # CG block

        cur_channels[2] = 2 * filters[2]
        self.norm_prelu_2 = nn.Sequential(
            build_norm_layer(cur_channels[2]),
            nn.PReLU(cur_channels[2]))

        # stage 3
        self.level3 = nn.ModuleList()
        for i in range(num_blocks[3]):
            self.level3.append(
                ContextGuidedBlock(
                    cur_channels[2] if i == 0 else filters[3],
                    filters[3], dilations[3], reductions[3],
                    downsample=(i == 0),
                    norm_cfg=norm_cfg,
                    act_cfg=act_cfg))  # CG block

        cur_channels[3] = 2 * filters[3]
        self.norm_prelu_3 = nn.Sequential(
            build_norm_layer(cur_channels[3]),
            nn.PReLU(cur_channels[3]))

    def forward(self, x):
        # x = torch.cat([xA, xB], dim=0)
        # stage 0
        inp_2x = x  # self.inject_2x(x)
        inp_4x = self.inject_2x(x)
        for layer in self.stem:
            x = layer(x)
        x = self.norm_prelu_0(torch.cat([x, inp_2x], 1))
        x0_0A, x0_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]

        # stage 1
        for i, layer in enumerate(self.level1):
            x = layer(x)
            if i == 0:
                down1 = x
        x = self.norm_prelu_1(torch.cat([x, down1, inp_4x], 1))
        x1_0A, x1_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]

        # stage 2
        for i, layer in enumerate(self.level2):
            x = layer(x)
            if i == 0:
                down1 = x
        x = self.norm_prelu_2(torch.cat([x, down1], 1))
        x2_0A, x2_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]

        # stage 3
        for i, layer in enumerate(self.level3):
            x = layer(x)
            if i == 0:
                down1 = x
        x = self.norm_prelu_3(torch.cat([x, down1], 1))
        x3_0A, x3_0B = x[:x.shape[0] // 2, :, :, :], x[x.shape[0] // 2:, :, :, :]

        return [x0_0A, x0_0B, x1_0A, x1_0B, x2_0A, x2_0B, x3_0A, x3_0B]


class InputInjection(nn.Module):
    """Downsampling module for CGNet."""

    def __init__(self, num_downsampling):
        super(InputInjection, self).__init__()
        self.pool = nn.ModuleList()
        for i in range(num_downsampling):
            self.pool.append(nn.AvgPool2d(3, stride=2, padding=1))

    def forward(self, x):
        for pool in self.pool:
            x = pool(x)
        return x

def build_norm_layer(ch):
    layer = nn.BatchNorm2d(ch, eps=0.01)
    for param in layer.parameters():
        param.requires_grad = True
    return layer

diffFPN

class diffFPN(nn.Module):
    def __init__(self, cur_channels=None, mid_ch=None,
                 dilations=None, reductions=None,
                 bilinear=True):
        super(diffFPN, self).__init__()
        # lateral convs for unifing channels
        self.lateral_convs = nn.ModuleList()
        for i in range(4):
            self.lateral_convs.append(
                cgblock(cur_channels[i] * 2, mid_ch * 2 ** i, dilations[i], reductions[i])
            )
        # top_down_convs
        self.top_down_convs = nn.ModuleList()
        for i in range(3, 0, -1):
            self.top_down_convs.append(
                cgblock(mid_ch * 2 ** i, mid_ch * 2 ** (i - 1), dilation=dilations[i], reduction=reductions[i])
            )

        # diff convs
        self.diff_convs = nn.ModuleList()
        for i in range(3):
            self.diff_convs.append(
                cgblock(mid_ch * (3 * 2 ** i), mid_ch * 2 ** i, dilations[i], reductions[i])
            )
        for i in range(2):
            self.diff_convs.append(
                cgblock(mid_ch * (3 * 2 ** i), mid_ch * 2 ** i, dilations[i], reductions[i])
            )
        self.diff_convs.append(
            cgblock(mid_ch * 3, mid_ch * 2,
                    dilation=dilations[0], reduction=reductions[0])
        )
        self.up2x = up(32, bilinear)

    def forward(self, output):
        tmp = [self.lateral_convs[i](torch.cat([output[i * 2], output[i * 2 + 1]], dim=1))
               for i in range(4)]

        # top_down_path
        for i in range(3, 0, -1):
            tmp[i - 1] += self.up2x(self.top_down_convs[3 - i](tmp[i]))

        # x0_1
        tmp = [self.diff_convs[i](torch.cat([tmp[i], self.up2x(tmp[i + 1])], dim=1)) for i in [0, 1, 2]]
        x0_1 = tmp[0]
        # x0_2
        tmp = [self.diff_convs[i](torch.cat([tmp[i - 3], self.up2x(tmp[i - 2])], dim=1)) for i in [3, 4]]
        x0_2 = tmp[0]
        # x0_3
        x0_3 = self.diff_convs[5](torch.cat([tmp[0], self.up2x(tmp[1])], dim=1))

        return x0_1, x0_2, x0_3

LSNet_diffFPN

class LSNet_diffFPN(nn.Module):
    # SNUNet-CD with ECAM
    def __init__(self, in_ch=3, mid_ch=32, out_ch=2, bilinear=True):
        super(LSNet_diffFPN, self).__init__()
        torch.nn.Module.dump_patches = True

        n1 = 32  # the initial number of channels of feature map
        filters = (n1, n1 * 2, n1 * 4, n1 * 8, n1 * 16)
        num_blocks = (3, 3, 8, 12)
        dilations = (1, 2, 4, 8)
        reductions = (4, 8, 16, 32)
        cur_channels = [0, 0, 0, 0]
        cur_channels[0] = in_ch

        self.backbone = light_siamese_backbone(in_ch=in_ch, num_blocks=num_blocks,
                                               cur_channels=cur_channels,
                                               filters=filters, dilations=dilations,
                                               reductions=reductions)

        self.head = cam_head(mid_ch=mid_ch,out_ch=out_ch)

        self.FPN = diffFPN(cur_channels=cur_channels, mid_ch=mid_ch,
                           dilations=dilations, reductions=reductions, bilinear=bilinear)


        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def forward(self, x, debug=False):

        output = self.backbone(x)

        x0_1, x0_2, x0_3 = self.FPN(output)

        out = self.head(x0_1, x0_2, x0_3)

        if debug:
            print_flops_params(self.backbone, [x], 'backbone')
            print_flops_params(self.FPN, [output], 'diffFPN')
            print_flops_params(self.head, [x0_1, x0_2, x0_3], 'head')

        return (x0_1, x0_2, x0_3, x0_3, out,)

版權聲明
本文為[m0_61899108]所創,轉載請帶上原文鏈接,感謝
https://cht.chowdera.com/2022/135/202205142322539306.html

隨機推薦