O

ops-doc

运维交接文档

Name Last Update
README.md Loading commit data...

Jenkins环境迁移及autodl部署

1、目标机器安装openjdk8和jenkins(debs包在阿里云oss other目录,版本一致不要改变)

2、打包jenkins数据目录迁移解压至目标环境(需要更改数据目录权限,使jenkins能够访问)

3、重启jenkisn,修改系统配置中的ssh server服务器ip与密码。修改jenkins server uri 访问地址

4、k8s master服务器 下载configmap-kpl 项目,seetaas-deploy(如果未来不部署则可以忽略)、autodl-deploy项目

5、创建/home/wangtingwei/workspace目录,迁移修改deploy.sh中harbor_host地址为该环境内网harbor地址。

6、修改autodl-deploy 项目中hosts和 group_vars中的配置文件。

7、将master机器root用户和Jenkins用户分别设置与master机器的免密,以及设置wangtingwei gitlab项目目录的ssh免密

8、迁移autodl mysql数据目录gpuhub_v21到新环境mysql

9、master机器安装部署go 和node js编译环境

10、使用jenkins更新配置文件、gpuhub server(首次使用jenkins部署会有报错,先手动部署create_agent_server再执行即可jenkins) gpuhub frontend

11、为部署frp的节点机器,打上frp=true的label,进入gpuhub server目录部署frp与nginx服务(不清楚部署方式与开发确认)

12、部署ingress以及dns解析实现服务暴露。

网盘

  • 架构组成

1、keepalived (故障切换;主备均部署)

2、sersync2 (数据同步;主节点部署sersync2,从节点部署rsync服务)

3、nfs (存储server;主备均部署)

  • 服务文件及路径

1、keepalived配置相关

  • keepalivced master (/etc/keepalived/keepalived.conf)
! Configuration File for keepalived
global_defs {
    router_id  nfs  #id可以随便设
}
vrrp_script chk_nfs {
    script "/etc/keepalived/nfs_check.sh"    #监控脚本
    interval 2
    weight -20   #keepalived部署了两台,所以设为20,如果三台就设为30
}
vrrp_instance VI_1 {
    state BACKUP    #两台主机都设为backup非抢占模式
    interface bond0.282  #网卡接口
    virtual_router_id 51
    priority 100    #master设为100,backup设为80
    advert_int 1
    nopreempt       #设置为非抢占模式必须要该参数
    authentication {
        auth_type PASS
        auth_pass XcSqeOUjW3TpmptR
    }  
    track_script {
        chk_nfs
    }
    virtual_ipaddress {
        100.72.64.10     #虚拟ip
    }
}
  • keepalived slave (/etc/keepalived/keepalived.conf)
! Configuration File for keepalived
global_defs {
    router_id nfs 
}
vrrp_script chk_nfs {
    script "/etc/keepalived/nfs_check.sh"   
    interval 2
    weight -20  
}
vrrp_instance VI_1 {
    state BACKUP   
    interface bond0.282 
    virtual_router_id 51
    priority 80  
    advert_int 1
    nopreempt      
    authentication {
        auth_type PASS
        auth_pass XcSqeOUjW3TpmptR
    }  
    track_script {
        chk_nfs
    }
    virtual_ipaddress {
        100.72.64.10
    }
}
  • nfs_check.sh (/etc/keepalived/nfs_check.sh)
#!/bin/bash
A=`ps -C nfsd --no-header | wc -l`
if [ $A -eq 0 ];then
        systemctl restart nfs-server.service
        sleep 10
        if [ `ps -C nfsd --no-header| wc -l` -eq 0 ];then
            pkill keepalived
        fi
fi
  • keepalived service文件(apt安装后的默认即可)
[Unit]
Description=Keepalive Daemon (LVS and VRRP)
After=network-online.target
Wants=network-online.target
# Only start if there is a configuration file
ConditionFileNotEmpty=/etc/keepalived/keepalived.conf

[Service]
Type=simple
# Read configuration variable file if it is present
EnvironmentFile=-/etc/default/keepalived
ExecStart=/usr/sbin/keepalived --dont-fork $DAEMON_ARGS
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

2、sersync2 相关

  • sersync2二进制文件(/usr/local/sersync/sersync2)
  • user.pass (/usr/local/sersync/user.pass);rsync认证pass文件;权限600
XcSqeOUjW3TpmptR
  • confxml.xml (/usr/local/sersync/confxml.xml;sersync2配置文件)
<?xml version="1.0" encoding="ISO-8859-1"?>
<head version="2.5">
    <host hostip="localhost" port="8008"></host>
    <debug start="false"/>
    <fileSystem xfs="false"/>
    <filter start="false">
    <exclude expression="(.*)\.svn"></exclude>
    <exclude expression="(.*)\.gz"></exclude>
    <exclude expression="^info/*"></exclude>
    <exclude expression="^static/*"></exclude>
    </filter>
    <inotify>
    <delete start="true"/>
    <createFolder start="true"/>
    <createFile start="true"/>
    <closeWrite start="true"/>
    <moveFrom start="true"/>
    <moveTo start="true"/>
    <attrib start="true"/>
    <modify start="true"/>
    </inotify>

    <sersync>
        <localpath watch="/data">   <!--<modify,默认是data"/>-->  
        <remote ip="100.72.64.12" name="nfs_rsync"/>     <!--<modify"/>-->
        <!--<remote ip="192.168.8.39" name="tongbu"/>-->
        <!--<remote ip="192.168.8.40" name="tongbu"/>-->
    </localpath>
    <rsync>
        <commonParams params="-artuz"/>
        <auth start="true" users="nfs_rsync" passwordfile="/usr/local/sersync/user.pass"/>  <!--<modify,这里写死use和密码文件路径"/>-->
        <userDefinedPort start="false" port="874"/><!-- port=874 -->
        <timeout start="true" time="10"/><!-- timeout=100 -->
        <ssh start="false"/>
    </rsync>
    <failLog path="/tmp/rsync_fail_log.sh" timeToExecute="60"/><!--default every 60mins execute once-->
    <crontab start="false" schedule="600"><!--600mins-->
        <crontabfilter start="false">
        <exclude expression="*.php"></exclude>
        <exclude expression="info/*"></exclude>
        </crontabfilter>
    </crontab>
    <plugin start="false" name="command"/>
    </sersync>

    <plugin name="command">
    <param prefix="/bin/sh" suffix="" ignoreError="true"/>  <!--prefix /opt/tongbu/mmm.sh suffix-->
    <filter start="false">
        <include expression="(.*)\.php"/>
        <include expression="(.*)\.sh"/>
    </filter>
    </plugin>

    <plugin name="socket">
    <localpath watch="/opt/tongbu">
        <deshost ip="192.168.138.20" port="8009"/>
    </localpath>
    </plugin>
    <plugin name="refreshCDN">
    <localpath watch="/data0/htdocs/cms.xoyo.com/site/">
        <cdninfo domainname="ccms.chinacache.com" port="80" username="xxxx" passwd="xxxx"/>
        <sendurl base="http://pic.xoyo.com/cms"/>
        <regexurl regex="false" match="cms.xoyo.com/site([/a-zA-Z0-9]*).xoyo.com/images"/>
    </localpath>
    </plugin>
</head>
  • 开机启动服务
[Unit]
Description=sersync service
After=network.target network-online.target syslog.target
Wants=network.target network-online.target

[Service]
Type=simple
#后台模式,默认是simple

ExecStart=/usr/local/sersync/sersync2 -r -o /usr/local/sersync/confxml.xml
ExecStartPost=ssh 100.72.64.12 "service rsync restart" #ip地址为slave的ip。
Restart=always
RuntimeMaxSec=24h
RestartSec=30
ExecStop=/usr/bin/pkill sersync2

[Install]
WantedBy=multi-user.target

3、slave rsync相关

  • rsync.pass (/etc/rsync.pass)
nfs_rsync:XcSqeOUjW3TpmptR
  • rsyncd.conf (/etc/rsyncd.conf)
timeout = 10
pid file = /var/run/rsyncd.pid
#服务器slave上的rsyncd.conf文件内容
uid=root
gid=root
#最大连接数
max connections=36000
#默认为true,修改为no,增加对目录文件软连接的备份
use chroot=no
#定义日志存放位置
log file=/var/log/rsyncd.log
#忽略无关错误
ignore errors = yes
#设置rsync服务端文件为读写权限
read only = no
#认证的用户名与系统帐户无关在认证文件做配置,如果没有这行则表明是匿名
auth users = nfs_rsync
#密码认证文件,格式(虚拟用户名:密码)
secrets file = /etc/rsync.pass
#这里是认证的模块名,在client端需要指定,可以设置多个模块和路径
reverse lookup = no
#禁用反向查找
[nfs_rsync]
#自定义注释
comment  = nfs_rsync
#同步到B服务器的文件存放的路径
path=/data

4、nfs 相关(/etc/exports)

/data \
127.0.0.1(rw,async,all_squash,no_subtree_check) \
100.72.64.21(rw,async,all_squash,no_subtree_check) \
100.72.64.22(rw,async,all_squash,no_subtree_check) \
100.72.64.23(rw,async,all_squash,no_subtree_check) \
100.72.64.24(rw,async,all_squash,no_subtree_check) \
100.72.64.25(rw,async,all_squash,no_subtree_check) \
100.72.64.26(rw,async,all_squash,no_subtree_check) \
100.72.64.27(rw,async,all_squash,no_subtree_check) \
100.72.64.28(rw,async,all_squash,no_subtree_check) \
100.72.64.29(rw,async,all_squash,no_subtree_check) \
100.72.64.30(rw,async,all_squash,no_subtree_check) \
100.72.64.31(rw,async,all_squash,no_subtree_check) \
100.72.64.32(rw,async,all_squash,no_subtree_check) \
100.72.64.150(rw,async,all_squash,no_subtree_check) \
100.72.64.151(rw,async,all_squash,no_subtree_check) \
100.72.64.152(rw,async,all_squash,no_subtree_check) \
100.72.64.153(rw,async,all_squash,no_subtree_check) \
100.72.64.154(rw,async,all_squash,no_subtree_check) \
100.72.64.155(rw,async,all_squash,no_subtree_check) \
100.72.64.156(rw,async,all_squash,no_subtree_check) \
100.72.64.157(rw,async,all_squash,no_subtree_check) \
100.72.64.158(rw,async,all_squash,no_subtree_check) \
100.72.64.159(rw,async,all_squash,no_subtree_check) \
100.72.64.165(rw,async,all_squash,no_subtree_check) \
100.72.64.186(rw,async,all_squash,no_subtree_check) \
100.72.64.187(rw,async,all_squash,no_subtree_check) \
100.72.64.196(rw,async,all_squash,no_subtree_check) \
100.72.64.197(rw,async,all_squash,no_subtree_check) \
100.72.64.200(rw,async,all_squash,no_subtree_check)
  • 参考文档

https://www.cnblogs.com/Csir/p/6921635.html

https://cloud.tencent.com/developer/article/1445884

Minio对象存储

架构

主备部署单机minio,依靠sersync2进行同步,因机房公网ip与mac地址绑定原因,无法使用keepalived。故障发生后需要手动切换域名。

配置文件路径及模板

  • minio配置相关

1、二进制文件(/usr/local/minio)

2、minio配置文件(/etc/minio.con

MINIO_PROMETHEUS_JOB_ID="minio"
MINIO_PROMETHEUS_URL="http://prometheus.autodl.com/"
MINIO_PROMETHEUS_AUTH_TYPE="public"
MINIO_ROOT_USER="minio"
MINIO_ROOT_PASSWORD="h41rDTvC1QvdMaFg"
MINIO_VOLUMES="/data/minio"
MINIO_OPTS="--console-address ':9090'"

3、minio service文件(/lib/systemd/system/minio.service)

[Unit]
Description=MinIO
Documentation=https://docs.min.io
Wants=network-online.target
After=network-online.target
AssertFileIsExecutable=/usr/local/minio

[Service]
WorkingDirectory=/usr/local/

User=root
Group=root

EnvironmentFile=/etc/minio.conf
ExecStartPre=/bin/bash -c "if [ -z \"${MINIO_VOLUMES}\" ]; then echo \"Variable MINIO_VOLUMES not set in /etc/minio.conf\"; exit 1; fi"
ExecStart=/usr/local/minio server $MINIO_OPTS $MINIO_VOLUMES

# Let systemd restart this service always
Restart=always

# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65536

# Specifies the maximum number of threads this process can create
TasksMax=infinity

# Disable timeout logic and wait until process is stopped
TimeoutStopSec=infinity
SendSIGKILL=no

[Install]
WantedBy=multi-user.target
  • sersync2配置相关(参考网盘中的sersync2部分)

部署核心流程

  • 部署minio server
  • 部署sersync2

参考链接

https://docs.min.io/docs/minio-quickstart-guide.html