每次训练自动备份源码的脚本

2020年6月3日

Table of Contents

我们运行深度学习任务时，常常要改改代码就跑个实验，同时跑很多个，时间久了可能会忘记 TensorBoard 上相应的训练任务都改了什么。
这种情况下，记录下每次运行任务时的源码状态就变得非常有必要。这里有很多方法，比如用git保存相应版本之类的。由于通常情况下深度学习的代码文件都非常少（可能也就几十kb），我这里给出一个最简单粗暴的方法：每次将特定扩展名的文件自动备份到另一个目录。

1 脚本源码

直接给出脚本源码，我这里保存其为 train.sh

#!/bin/bash

currentdir=${PWD##*/}

echo "["${currentdir}"] command line is : " "$*"

echo '********************************'
echo 'Begin backup source codes'

# backup all necessary files
# "%Y-%m-%d-%H-%M-%S"
time=$(date "+%Y-%m-%d-%H-%M-%S")

# echo ${time}

destpath='../'${currentdir}'_backups/'${time}'/'

echo 'Copy files to : '${destpath}

mkdir -p ${destpath}

# config file extensions you need
# find ./ -name '*.py' -exec  cp --parents '{}' ${destpath} \;
# find ./ -name '*.sh' -exec  cp --parents '{}' ${destpath} \;
# find ./ -name '*.txt' -exec  cp --parents '{}' ${destpath} \;
# find ./ -name '*.md' -exec  cp --parents '{}' ${destpath} \;
# find ./ -name '*.ipynb' -exec  cp --parents '{}' ${destpath} \;

find ./ -regex ".*\.\(sh\|\|py\|txt\|md\|ipynb\)" -exec cp --parents '{}' ${destpath} \;

echo 'End backup source codes'
echo '********************************'

# your training command line
python train.py --time ${time} $*

#!/bin/bash

currentdir=${PWD##*/}

echo "["${currentdir}"] command line is : " "$*"

echo '********************************'

echo 'Begin backup source codes'

# backup all necessary files

# "%Y-%m-%d-%H-%M-%S"

time=$(date "+%Y-%m-%d-%H-%M-%S")

# echo ${time}

destpath='../'${currentdir}'_backups/'${time}'/'

echo 'Copy files to : '${destpath}

mkdir -p ${destpath}

# config file extensions you need

# find ./ -name '*.py' -exec cp --parents '{}' ${destpath} \;

# find ./ -name '*.sh' -exec cp --parents '{}' ${destpath} \;

# find ./ -name '*.txt' -exec cp --parents '{}' ${destpath} \;

# find ./ -name '*.md' -exec cp --parents '{}' ${destpath} \;

# find ./ -name '*.ipynb' -exec cp --parents '{}' ${destpath} \;

find ./ -regex ".*\.$sh\|\|py\|txt\|md\|ipynb$" -exec cp --parents '{}' ${destpath} \;

echo 'End backup source codes'

echo '********************************'

# your training command line

python train.py --time ${time} $*

2 实现功能

这个脚本非常简单，就是实现了以下一些功能：

新建一个备份目录：路径为 ../当前目录名_backups/系统时间/

其中使用系统时间作为目录名是为了防止重名，格式为：年-月-日-时-分-秒，我们默认为不会出现1s中多个同时运行情况，相对应的代码是：

#!/bin/bash

currentdir=${PWD##*/}

echo "["${currentdir}"] command line is : " "$*"

echo '********************************'
echo 'Begin backup source codes'

# backup all necessary files
# "%Y-%m-%d-%H-%M-%S"
time=$(date "+%Y-%m-%d-%H-%M-%S")

# echo ${time}

destpath='../'${currentdir}'_backups/'${time}'/'

echo 'Copy files to : '${destpath}

mkdir -p ${destpath}

#!/bin/bash

currentdir=${PWD##*/}

echo "["${currentdir}"] command line is : " "$*"

echo '********************************'

echo 'Begin backup source codes'

# backup all necessary files

# "%Y-%m-%d-%H-%M-%S"

time=$(date "+%Y-%m-%d-%H-%M-%S")

# echo ${time}

destpath='../'${currentdir}'_backups/'${time}'/'

echo 'Copy files to : '${destpath}

mkdir -p ${destpath}

拷贝指定扩展名的文件到备份目录：这些文件会保持原有的目录结构

相对应代码为：

find ./ -regex ".*\.\(sh\|\|py\|txt\|md\|ipynb\)" -exec cp --parents '{}' ${destpath} \;

echo 'End backup source codes'
echo '********************************'

find ./ -regex ".*\.$sh\|\|py\|txt\|md\|ipynb$" -exec cp --parents '{}' ${destpath} \;

echo 'End backup source codes'

echo '********************************'

运行相应训练脚本，将系统时间作为 --time 参数传入：脚本后的参数会原封不动传给你的训练文件

相对应代码为，这里面我们默认训练是调用 python train.py [args]。请根据你的需要进行相应修改：

# your training command line
python train.py --time ${time} $*

1 2	# your training command line python train.py --time ${time} $*

3 修改 TensorBoard 日志路径（可选）

为了你刚才保存的训练代码能够和 TensorBoard 中的实验尽快对应，可以采用如下方式在你保存 TensorBoard 路径中也加入日期后缀。
例如下面的方式，请参考进行修改（我们这里假设只有一个参数 exp_id 以及我们附加的参数 --time）：

# Argparse
import argparse

# TensorBoard
from torch.utils.tensorboard import SummaryWriter

t_str = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())

parser = argparse.ArgumentParser('train.py')
parser.add_argument("--exp_id", default='dsnt', type=str)
parser.add_argument("--time", default=t_str, type=str)

EXP_PATH = "exp/{}_{}".format(args.exp_id, args.time)
print('EXP_PATH = {}'.format(EXP_PATH))

writer = SummaryWriter(EXP_PATH)

# Argparse

import argparse

# TensorBoard

from torch.utils.tensorboard import SummaryWriter

t_str = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())

parser = argparse.ArgumentParser('train.py')

parser.add_argument("--exp_id", default='dsnt', type=str)

parser.add_argument("--time", default=t_str, type=str)

EXP_PATH = "exp/{}_{}".format(args.exp_id, args.time)

print('EXP_PATH = {}'.format(EXP_PATH))

writer = SummaryWriter(EXP_PATH)

4 使用方法

使用方法也非常简单，比如你之前用这样训练：

python train.py --exp_id shufflenet_x0.5.unet.up0.loss.smooth_l1

1	python train.py --exp_id shufflenet_x0.5.unet.up0.loss.smooth_l1

就改成这样训练：

sh train.sh --exp_id shufflenet_x0.5.unet.up0.loss.smooth_l1

1	sh train.sh --exp_id shufflenet_x0.5.unet.up0.loss.smooth_l1

这样你就会看到指定的备份目录下保存了你的主要源码：

同时在你的 TensorBoard 下也相应地在日志路径后增加了日期，有需要回溯时，可以用这个作为你的索引快速找到你备份的源代码：

PS：
这一方法比较简单粗暴，只能适合代码量比较小的时候。当然对于没什么用的备份还是要及时删除的。

如果大家有什么更加简洁优雅的记录实验文件的方式，欢迎留言哦～

About The Author

skylook

增强现实、图像识别技术爱好者。

技术刘

每次训练自动备份源码的脚本

1 脚本源码

2 实现功能

3 修改 TensorBoard 日志路径（可选）

4 使用方法

About The Author

skylook

Add a Comment

1 脚本源码

2 实现功能

3 修改 TensorBoard 日志路径（可选）

4 使用方法

Related Posts

About The Author

skylook

Add a Comment