kubefate训练测试

kubefate训练测试

多方连接测试

1: Party-9999和Party-10000

在A集群的Master机器操作

1
2
3
4
5
6
7
8
9
kubectl get pods -n fate-9999 | grep client

返回
client-5fcd4fdcfd-nqvrz           1/1     Running   0          2d1h

获取client容器ID
kubectl exec -it client-5fcd4fdcfd-nqvrz -n fate-9999 bash

flow test toy -gid 9999 -hid 10000

3: Party-10000和Party-9999

在C集群的Master机器操作

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
kubectl get pods -n fate-10000 | grep client

返回
client-845c4df74-xbf89            1/1     Running   0          47h

获取client容器ID

kubectl exec -it client-845c4df74-xbf89  -n fate-10001 bash

flow test toy -gid 10000 -hid 9999

成功显示结果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
toy test job 202205090852477671460 is waiting
toy test job 202205090852477671460 is waiting
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is running
toy test job 202205090852477671460 is success
[INFO] [2022-05-09 08:52:53,088] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:96]: begin to init parameters of secure add example guest
[INFO] [2022-05-09 08:52:53,088] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:100]: begin to make guest data
[INFO] [2022-05-09 08:52:53,136] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:103]: split data into two random parts
[INFO] [2022-05-09 08:52:54,031] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:106]: share one random part data to host
[INFO] [2022-05-09 08:52:54,035] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:109]: get share of one random part data from host
[INFO] [2022-05-09 08:52:54,573] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:112]: begin to get sum of guest and host
[INFO] [2022-05-09 08:52:54,645] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:115]: receive host sum from guest
[INFO] [2022-05-09 08:52:54,721] [202205090852477671460] [1339:139984863901504] - [secure_add_guest.run] [line:122]: success to calculate secure_sum, it is 2000.0000000000002

6.9.5.2 多方训练测试

如何两两之间互通测试通过,我们就可以跑一个三方的test_hetero_lr来测试多方的任务训练。

test_hetero_lr的任务需要三方参与,Guest,Host和Arbiter。我们将Party-9999作为Guest,Party-10000作为Host和Arbiter。

1:在Party-10000上传 host 方数据集

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
kubectl get pods -n fate-10000 | grep client

返回
client-64f7959ff5-7hvt9           1/1     Running   0          2d

获取client容器ID
kubectl exec -it client-64f7959ff5-7hvt9 -n fate-10000 bash

上传数据
flow data upload -c fateflow/examples/upload/upload_host.json

2:在Party-9999上传 guest 方数据集

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
kubectl get pods -n fate-9999 | grep client

返回
client-5fcd4fdcfd-nqvrz           1/1     Running   0          2d1h

获取client容器ID
kubectl exec -it client-5fcd4fdcfd-nqvrz -n fate-9999 bash

上传数据
flow data upload -c fateflow/examples/upload/upload_guest.json

3:在Party-9999发起训练任务

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
kubectl get pods -n fate-9999 | grep client

返回
client-5fcd4fdcfd-nqvrz           1/1     Running   0          2d1h

获取client容器ID
kubectl exec -it client-5fcd4fdcfd-nqvrz -n fate-9999 bash

修改文件
cat > fateflow/examples/lr/test_hetero_lr_job_conf.json <<EOF
{
  "dsl_version": "2",
  "initiator": {
    "role": "guest",
    "party_id": 10004
  },
  "role": {
    "guest": [
      10004
    ],
    "host": [
      10001
    ],
    "arbiter": [
      10001
    ]
  },
  "job_parameters": {
    "common": {
      "task_parallelism": 2,
      "computing_partitions": 8,
      "task_cores": 4,
      "auto_retries": 1
    }
  },
  "component_parameters": {
    "common": {
      "intersection_0": {
        "intersect_method": "raw",
        "sync_intersect_ids": true,
        "only_output_key": false
      },
      "hetero_lr_0": {
        "penalty": "L2",
        "optimizer": "rmsprop",
        "alpha": 0.01,
        "max_iter": 3,
        "batch_size": 320,
        "learning_rate": 0.15,
        "init_param": {
          "init_method": "random_uniform"
        }
      }
    },
    "role": {
      "guest": {
        "0": {
          "reader_0": {
            "table": {
              "name": "breast_hetero_guest",
              "namespace": "experiment"
            }
          },
          "dataio_0": {
            "with_label": true,
            "label_name": "y",
            "label_type": "int",
            "output_format": "dense"
          }
        }
      },
      "host": {
        "0": {
          "reader_0": {
            "table": {
              "name": "breast_hetero_host",
              "namespace": "experiment"
            }
          },
          "dataio_0": {
            "with_label": false,
            "output_format": "dense"
          },
          "evaluation_0": {
            "need_run": false
          }
        }
      }
    }
  }
}
EOF

发起训练任务
flow job submit -d fateflow/examples/lr/test_hetero_lr_job_dsl.json -c fateflow/examples/lr/test_hetero_lr_job_conf.json

返回结果
{
    "data": {
        "board_url": "http://fateboard:8080/index.html#/dashboard?job_id=202208310748436341990&role=guest&party_id=10001",
        "code": 0,
        "dsl_path": "/data/projects/fate/fateflow/jobs/202208310748436341990/job_dsl.json",
        "job_id": "202208310748436341990",
        "logs_directory": "/data/projects/fate/fateflow/logs/202208310748436341990",
        "message": "success",
        "model_info": {
            "model_id": "arbiter-9006#guest-10001#host-9006#model",
            "model_version": "202208310748436341990"
        },
        "pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202208310748436341990/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202208310748436341990/guest/10001/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/fateflow/jobs/202208310748436341990/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202208310748436341990/train_runtime_conf.json"
    },
    "jobId": "202208310748436341990",
    "retcode": 0,
    "retmsg": "success"
}

4:命令查看任务结果

1
flow task query -r guest -j 202208310748436341990 | grep -w f_status

6.9.6 联邦推理测试

上面已经基于hetero_lr进行多方训练,训练出模型,下面基于上面的多方训练后部署模型、加载模型、绑定模型、模型推理

6.9.6.1 部署模型

进入party-9999容器内

1
2
3
4
5
6
7
kubectl get pods -n fate-9999 | grep client

返回
client-5fcd4fdcfd-nqvrz           1/1     Running   0          2d1h

获取client容器ID
kubectl exec -it client-5fcd4fdcfd-nqvrz -n fate-9999 bash

部署模型

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
flow model deploy --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202205180944078730290(任务ID)

返回
{
    "data": {
        "arbiter": {
            "9006": 0
        },
        "detail": {
            "arbiter": {
                "9006": {
                    "retcode": 0,
                    "retmsg": "deploy model of role arbiter 9006 success"
                }
            },
            "guest": {
                "10001": {
                    "retcode": 0,
                    "retmsg": "deploy model of role guest 10001 success"
                }
            },
            "host": {
                "9006": {
                    "retcode": 0,
                    "retmsg": "deploy model of role host 9006 success"
                }
            }
        },
        "guest": {
            "10001": 0
        },
        "host": {
            "9006": 0
        },
        "model_id": "arbiter-9006#guest-10001#host-9006#model",
        "model_version": "202208310755426963170"
    },
    "retcode": 0,
    "retmsg": "success"
}

获取到的信息是

1
2
        "model_id": "arbiter-9006#guest-10001#host-9006#model",
        "model_version": "202208310755426963170"

6.9.6.2 加载模型

修改publish_load_model.json如下

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cat > fateflow/examples/model/publish_load_model.json <<EOF
{
  "initiator": {
    "party_id": "10001",
    "role": "guest"
  },
  "role": {
    "guest": [
      "10001"
    ],
    "host": [
      "10004"
    ],
    "arbiter": [
      "10004"
    ]
  },
  "job_parameters": {
    "model_id": "arbiter-10004#guest-10001#host-10004#model",
    "model_version": "202209010444329886110"
  }
}
EOF

加载模型

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
flow model load -c fateflow/examples/model/publish_load_model.json

返回信息
{
    "data": {
        "detail": {
            "guest": {
                "10001": {
                    "retcode": 0,
                    "retmsg": "success"
                }
            },
            "host": {
                "9006": {
                    "retcode": 0,
                    "retmsg": "success"
                }
            }
        },
        "guest": {
            "10001": 0
        },
        "host": {
            "9006": 0
        }
    },
    "jobId": "202208310757075993710",
    "retcode": 0,
    "retmsg": "success"
}

6.9.6.3 绑定模型

修改版定模型配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
cat > fateflow/examples/model/bind_model_service.json <<EOF
{
    "service_id": "201",
    "initiator": {
        "party_id": "10001",
        "role": "guest"
    },
    "role": {
        "guest": ["10001"],
        "host": ["10004"],
        "arbiter": ["10004"]
    },
    "job_parameters": {
        "work_mode": 1,
        "model_id": "arbiter-10004#guest-10001#host-10004#model",
        "model_version": "202209010444329886110"
    }
}
EOF

绑定模型

1
2
3
4
5
6
7
flow model bind -c fateflow/examples/model/bind_model_service.json

返回信息
{
    "retcode": 0,
    "retmsg": "service id is test03"
}

6.9.6.4 联邦推理

配置party-10000的ingressHost

1
2
3
4
5
6
sudo vi /etc/hosts

增加一行
10.0.1.199 9999.serving-proxy.sj.pclab

10.0.1.199 为partry-9999所在集群的Master节点IP,请根据集群IP进行修改
6.9.6.4.1 单笔推理
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
curl -X POST -H 'Content-Type: application/json' -i 'http://10004.serving-proxy.204.pclab/federation/v1/inference' --data '{
  "head": {
    "serviceId": "204"
  },
  "body": {
    "featureData": {
        "x0": 1.88669,
        "x1": -1.359293,
        "x2": 2.303601,
        "x3": 2.00137,
        "x4": 1.307686
    },
    "sendToRemoteFeatureData": {
        "phone_num": "122222222"
    }
  }
}'
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
curl -X POST -H 'Content-Type: application/json' -i 'http://10001.serving-proxy.sh.com/federation/v1/inference' --data '{
  "head": {
    "serviceId": "201"
  },
  "body": {
    "featureData": {
        "x0": 1.88669,
        "x1": -1.359293,
        "x2": 2.303601,
        "x3": 2.00137,
        "x4": 1.307686
    },
    "sendToRemoteFeatureData": {
        "phone_num": "122222222"
    }
  }
}'

返回

1
{"retcode":0,"retmsg":"","data":{"score":0.9983544928940998,"modelId":"guest#9999#arbiter-10000#guest-9999#host-10000#model","modelVersion":"202205090912483804370","timestamp":1652087899953},"flag":0}
6.9.6.4.2 批量推理
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
curl -X POST -H 'Content-Type: application/json' -i 'http://10004.serving-proxy.204.pclab/federation/v1/batchInference' --data '{
  "head": {
    "serviceId": "204"
  },
  "body": {
      "batchDataList": [
          {
              "index": 0,
              "featureData": {
                  "x0": 1.88669,
                  "x1": -1.359293,
                  "x2": 2.303601,
                  "x3": 2.00137,
                  "x4": 1.307686
              },
              "sendToRemoteFeatureData": {
                  "device_id": "aaaaa",
                  "phone_num": "122222222"
              }
          },
          {
              "index": 1,
              "featureData": {
                  "x0": 1.88669,
                  "x1": -1.359293,
                  "x2": 2.303601,
                  "x3": 2.00137,
                  "x4": 1.307686
              },
              "sendToRemoteFeatureData": {
                  "device_id": "aaaaa",
                  "phone_num": "122222222"
              }
          }
      ]
  }
}'

返回

1
{"retcode":0,"retmsg":"","data":{"modelId":"guest#9999#arbiter-10000#guest-9999#host-10000#model","modelVersion":"202205110759591171670","timestamp":1652256064349},"flag":0,"batchDataList":[{"index":0,"retcode":0,"retmsg":"","data":{"score":0.3813935132948826}},{"index":1,"retcode":0,"retmsg":"","data":{"score":0.3813935132948826}}],"singleInferenceResultMap":{"0":{"index":0,"retcode":0,"retmsg":"","data":{"score":0.3813935132948826}},"1":{"index":1,"retcode":0,"retmsg":"","data":{"score":0.3813935132948826}}}}%
Licensed under CC BY-NC-SA 4.0
最后更新于 Aug 31, 2022 00:00 UTC
comments powered by Disqus