Opsworks agent error debugging

어느날 Prometheus alert이 와서 Portus 에 문제가 생겼음을 알렸습니다.

Prerequisite

  • AWS Opsworks chef12
  • ubuntu 16.04

디버깅

Disk Full

확인해보니 문제는

no space ….

이런 에러…

$ df -h 

로 보니 root가 disk full 입니다.

확인해보니 portus의 nginx 로그가 2GB 씩 stdout, stderr 두개….

로그 지우고 nginx 재시작으로 임시 해결..

AWS opsworks 의 agent 가 업데이트 될때 마침 디스크가 꽉 차있었나 봅니다…

용량 확보 후 재시작…

opsworks-agent-cli를 수동으로 실행해서 configure를 다시 실행시키는 것을 시도 합니다….

root@portus2:/var/lib/aws/opsworks/chef# opsworks-agent-cli run_command configure

결과는 에러…

Couldn't execute run_command: RuntimeError - Couldn't gather commands. RuntimeError - Could not parse /var/lib/aws/opsworks/chef/2018-08-28-10-28-23-01.json: JSON::ParserError - A JSON text must at least contain two octets! - /opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `initialize'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `new'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `parse'
/opt/aws/opsworks/current/lib/cli/base.rb:69:in `parse_json'
/opt/aws/opsworks/current/lib/cli/base.rb:17:in `block in gather_commands'
/opt/aws/opsworks/current/lib/cli/base.rb:16:in `each'
/opt/aws/opsworks/current/lib/cli/base.rb:16:in `gather_commands'
/opt/aws/opsworks/current/lib/cli/runner.rb:72:in `initialize'
/opt/aws/opsworks/current/lib/cli/runner.rb:50:in `new'
/opt/aws/opsworks/current/lib/cli/runner.rb:50:in `run'
/usr/sbin/opsworks-agent-cli:91:in `block (2 levels) in <main>'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/command_support.rb:126:in `call'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/command_support.rb:126:in `execute'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:296:in `block in call_command'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:309:in `call'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:309:in `call_command'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:83:in `run'
/usr/sbin/opsworks-agent-cli:110:in `<main>' - /opt/aws/opsworks/current/lib/cli/base.rb:71:in `rescue in parse_json'
/opt/aws/opsworks/current/lib/cli/base.rb:69:in `parse_json'
/opt/aws/opsworks/current/lib/cli/base.rb:17:in `block in gather_commands'
/opt/aws/opsworks/current/lib/cli/base.rb:16:in `each'
/opt/aws/opsworks/current/lib/cli/base.rb:16:in `gather_commands'
/opt/aws/opsworks/current/lib/cli/runner.rb:72:in `initialize'
/opt/aws/opsworks/current/lib/cli/runner.rb:50:in `new'
/opt/aws/opsworks/current/lib/cli/runner.rb:50:in `run'
/usr/sbin/opsworks-agent-cli:91:in `block (2 levels) in <main>'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/command_support.rb:126:in `call'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/command_support.rb:126:in `execute'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:296:in `block in call_command'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:309:in `call'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:309:in `call_command'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:83:in `run'
/usr/sbin/opsworks-agent-cli:110:in `<main>' - /opt/aws/opsworks/current/lib/cli/base.rb:33:in `rescue in gather_commands'
/opt/aws/opsworks/current/lib/cli/base.rb:15:in `gather_commands'
/opt/aws/opsworks/current/lib/cli/runner.rb:72:in `initialize'
/opt/aws/opsworks/current/lib/cli/runner.rb:50:in `new'
/opt/aws/opsworks/current/lib/cli/runner.rb:50:in `run'
/usr/sbin/opsworks-agent-cli:91:in `block (2 levels) in <main>'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/command_support.rb:126:in `call'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/command_support.rb:126:in `execute'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:296:in `block in call_command'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:309:in `call'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:309:in `call_command'
/opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/gli-2.13.4/lib/gli/app_support.rb:83:in `run'
/usr/sbin/opsworks-agent-cli:110:in `<main>'

File check

파싱이 안된다고 합니다. 파일을 오픈 합니다.

$ vi /var/lib/aws/opsworks/chef/2018-08-28-10-28-23-01.json

파일이 비어있습니다… 이상합니다…

JSON 파일은 opsworks 의 리소스들의 정보가 모여있는 파일입니다. 파일 내용이 있어야하는데…. 아마도 용량문제로 파일을 쓰지 못한 듯 합니다.

/var/lib/aws/opsworks/chef 이 폴더로 들어가서 뭐하는 폴더인지 확인합니다.

Directory check

$ ll
root@portus2:/var/lib/aws/opsworks/chef# ll
total 2168
drwxr-xr-x 2 aws  aws    4096 Aug 29 11:28 ./
drwxr-x--- 6 aws  aws    4096 Aug 29 11:21 ../
-rw-r--r-- 1 aws  aws       0 May 10 03:44 2018-05-08-10-10-48-01.log.gz
-rw-r--r-- 1 aws  aws  102226 Jul 17 09:55 2018-07-17-09-55-08-01.log
-rw-r--r-- 1 aws  aws       0 Aug 28 10:24 2018-08-28-10-24-52-01.json
-rw-r--r-- 1 aws  aws       0 Aug 28 10:28 2018-08-28-10-28-23-01.json
-rw-r--r-- 1 aws  aws       0 Aug 28 16:20 2018-08-28-16-20-30-01.json
-rw-r--r-- 1 aws  aws  157625 Aug 28 16:31 2018-08-28-16-31-31-01.json
-rw-r--r-- 1 root root 105741 Aug 28 16:31 2018-08-28-16-31-31-01.log
-rw-r--r-- 1 aws  aws  157559 Aug 29 05:55. 2018-08-29-05-55-37-01.json
-rw-r--r-- 1 root root 102946 Aug 29 05:55 2018-08-29-05-55-37-01.log
-rw-r--r-- 1 aws  aws  157560 Aug 29 06:14 2018-08-29-06-14-17-01.json
-rw-r--r-- 1 root root 103341 Aug 29 06:14 2018-08-29-06-14-17-01.log
-rw-r--r-- 1 aws  aws  149311 Aug 29 11:04 2018-08-29-11-04-15-01.json
-rw-r--r-- 1 root root  98073 Aug 29 11:04 2018-08-29-11-04-15-01.log
-rw-r--r-- 1 aws  aws  157559 Aug 29 11:07 2018-08-29-11-07-52-01.json
-rw-r--r-- 1 root root 103340 Aug 29 11:07 2018-08-29-11-07-52-01.log
-rw-r--r-- 1 aws  aws  157558 Aug 29 11:11 2018-08-29-11-11-30-01.json
-rw-r--r-- 1 root root 103651 Aug 29 11:11 2018-08-29-11-11-30-01.log
-rw-r--r-- 1 aws  aws  157656 Aug 29 11:21 2018-08-29-11-21-38-01.json
-rw-r--r-- 1 root root 103648 Aug 29 11:21 2018-08-29-11-21-38-01.log
-rw-r--r-- 1 aws  aws  157670 Aug 29 11:24 2018-08-29-11-24-45-01.json
-rw-r--r-- 1 root root 100655 Aug 29 11:24 2018-08-29-11-24-45-01.log

빈 파일이 몇개 있습니다..

Removing empty files

뭔지 모르겠지만 비어서 parsing error 가 나는 듯 합니다.
과감히 빈 파일을 지워줍니다.

$> rm 2018-08-28-10-24-52-01.json

그리고 다시 위의 명령을 실행합니다.

뭔가 바뀔까요?

root@portus2:/var/lib/aws/opsworks/chef# opsworks-agent-cli run_command configure
Couldn't execute run_command: RuntimeError - Couldn't gather commands. RuntimeError - Could not parse /var/lib/aws/opsworks/chef/2018-08-28-10-28-23-01.json: JSON::ParserError - A JSON text must at least contain two octets! - /opt/aws/opsworks/current/vendor/bundle/ruby/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `initialize'
...

그 뒤의 파일을 읽기 시도하다가 에러가 다시 발생했습니다.
이제 이상한 json 파일들을 전부 지워줍시다.

그리고 다시 agent 를 실행하면 정상동작하는 것을 볼 수 있습니다.

[2018-08-29T11:50:10+00:00] INFO: Processing directory[/etc/logrotate.d] action create (/var/chef/runs/5f5d34e2-e02f-4871-98b9-c842712610de/local-mode-cache/cache/cookbooks/logrotate/resources/app.rb line 60)
[2018-08-29T11:50:10+00:00] INFO: Processing template[/etc/logrotate.d/nginx] action create (/var/chef/runs/5f5d34e2-e02f-4871-98b9-c842712610de/local-mode-cache/cache/cookbooks/logrotate/resources/app.rb line 67)
[2018-08-29T11:50:10+00:00] INFO: HTTP Request Returned 404 Not Found: Object not found: chefzero://localhost:8889/nodes/portus2
[2018-08-29T11:50:10+00:00] INFO: Chef Run complete in 0.286619869 seconds
[2018-08-29T11:50:10+00:00] INFO: Running report handlers
[2018-08-29T11:50:10+00:00] INFO: Report handlers complete
[2018-08-29 11:50:10]  INFO [opsworks-agent(16252)]: Finished Chef run with exitcode 0

chef12 opsworks 실행

Chef-client 를 수동으로 실행하려면 아래와 같은 명령을 사용하면 가능합니다.

opsworks-agent-cli 실행 시 확인 가능

  1. 첫번째 chef-client 실행
# RUBYOPT="-E utf-8" /opt/aws/opsworks/current/bin/chef-client -j /var/lib/aws/opsworks/chef/2018-08-30-05-10-55-01.json -c /var/lib/aws/opsworks/client.internal.rb -o aws_opsworks_agent 2>&1
  1. 두번째 chef-client 실행
# RUBYOPT="-E utf-8" /opt/chef/bin/chef-client -j /var/chef/runs/fd1800b7-5b1e-4ec8-b7dd-3579f6c568c3/attribs.json -c /var/chef/runs/fd1800b7-5b1e-4ec8-b7dd-3579f6c568c3/client.rb  2>&1

위의 attribs.json 파일을 살펴보니…

{
  "region_project": "TEST",
  "region": "TEST01",
  ...
  ...
  "run_list": [
    "recipe[logrotate::nginx]"
  ]
}

run_list 로 제가 OpsWorks Stack 에서 execute_recipes 한 내용이 기록되어 있습니다.
logrotate 레시피를 추가해줬죠…

attribs.json 파일에 현재 실행할 명령들이 들어가는 구조로 보입니다.
필요하다면 run_list 에 명령을 변경해서 실행한다면 가능할 듯….

이상 opsworks-agent-cli 관련 디버깅 경험 이슈였습니다.