Troubleshoot a deployment¶
- Technical Architecture
- Prepare yourself with the survival kit
- No package matching python-pip
- Failure talking to yum
- nc: command not found
- TASK [Gathering Facts]
- TASK [survivalkit: Install packages]
- TASK [openio-sds: OpenIO SDS: Set sysctl parameters]
- TASK [openio-sds: Install OpenIO puppet module]
- TASK [repositories: Configure repositories for Ubuntu xenial]
- TASK [survivalkit: Include Ubuntu variables]
You find below, the results of checks when we deployed OpenIO SDS on Cent0S 7 and Ubuntu 16.4 cluster with the last release pull by Cédric. The overarching aim of to check compatibility between old playbook and last roles.
Technical Architecture¶
Each platform will be composed of the following servers:
- 3 x nodes with CentOS 7 (1CPU, 1GB RAM, 1Gb/s)
- 3 x nodes with Ubuntu 16.4 (1CPU, 1GB RAM, 1Gb/s)
- 1 x admin node witch Cent0S 7 (1CPU, 1GB RAM)
Prepare yourself with the survival kit¶
On first step, only one cluster OpenIO SDS playbook are check and the new role install_basic_needs. After discussion with Cédric, we decide that the install_basic_needs role was expected to be deployed on first.
After execute this playbook with this command-line:
# ansible-playbook install_basic_needs.yml
This playbook performs repositories, ntp, and survivalkit roles.
No package matching python-pip¶
Global Task Status for CentOS Troubleshooting:
failed: [192.168.1.105] (item=python-pip) => {"changed": false, "msg": "No package matching 'python-pip' found available, installed or updated", "pkg": "python-pip", "rc": 126, "results": ["No package matching 'python-pip' found available, installed or updated"]}
failed: [192.168.1.106] (item=htop) => {"changed": false, "msg": "No package matching 'htop' found available, installed or updated", "pkg": "htop", "rc": 126, "results": ["No package matching 'htop' found available, installed or updated"]}
failed: [192.168.1.103] (item=iftop) => {"changed": false, "msg": "No package matching 'iftop' found available, installed or updated", "pkg": "iftop", "rc": 126, "results": ["No package matching 'iftop' found available, installed or updated »]}
Cause: These packages are on epel-repo but it’s not installed on each nodes
Resolving the problem:
- Add epel-release to the top of the packages list because iftop, atop and python2-pip packages are on this repository.
- Patch(s) to add for resolve the above cases.
--- /root/customer-centos/ansible/deployment/roles/survivalkit/vars/RedHat.yml.bak 2018-06-26 14:19:08.739985248 +0000
+++ /root/customer-centos/ansible/deployment/roles/survivalkit/vars/RedHat.yml 2018-06-26 14:19:51.800607516 +0000
@@ -2,6 +2,7 @@
# Distribution-specific variables for RHEL, CentOS, ... ---
survivalkit_packages:
+ - epel-release - atop
- bash-completion - bind-utils
Failure talking to yum¶
failed: [192.168.1.106] (item=nc) => {"changed": false, "msg": "Failure talking to yum: failure: repodata/ repomd.xml from centos-qemu-ev: [Errno 256] No more mirrors to try.\nhttp://mirror.centos.org/altarch/7/ virt/x86_64/kvm-common/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found", "pkg": "nc"} failed: [192.168.1.103] (item=nc) => {"changed": false, "msg": "Failure talking to yum: failure: repodata/ repomd.xml from centos-qemu-ev: [Errno 256] No more mirrors to try.\nhttp://mirror.centos.org/altarch/7/ virt/x86_64/kvm-common/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found", "pkg": "nc"}
Cause: One of the configured repositories failed (Cent0S-7 - QEMU EV)
Resolving the problem: The following workaround explain on patch is needed for resolved these errors.
Patch(s) to add for resolve the above cases.
One of the configured repositories failed (CentOS-7 - QEMU EV), and yum doesn’t have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work “fix” this:
- Contact the upstream for the repository and get them to fix the problem.
- Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work).
- Run the command with the repository temporarily disabled yum –disablerepo=centos-qemu-ev …
- Disable the repository permanently, so yum won’t use it by default. Yum will then just ignore the repository until you permanently enable it again or use –enablerepo for temporary usage: yum-config-manager –disable centos-qemu-ev or subscription-manager repos –disable=centos-qemu-ev
- Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise:
$ yum-config-manager --save --setopt=centos-qemu-ev.skip_if_unavailable=true
nc: command not found¶
install should be done before. Wait for Zookeeper to be online.
TASK (failed: [10.0.0.14] (item=10.0.0.14) => {"attempts": 10, "changed": true, "cmd": "echo ruok | nc 10.0.0.14 6005", "delta": "0:00:00.004663", "end": "2018-06-11 15:26:20.575484", "item": "10.0.0.14", "msg": "non-zero return code", "rc": 127, "start": "2018-06-11 15:26:20.570821", "stderr": "/bin/sh: nc : commande introuvable", "stderr_lines": ["/bin/sh: nc : commande introuvable"], "stdout": "", "stdout_lines": []})
Cause: Netcat is not installed
Resolving the problem:
- Install should be done before
- Wait for Zookeeper to be online
- Patch(s) to add for resolve the above cases.
TASK [Gathering Facts]¶
FAILED! => {"changed": false, "module_stderr": "Shared connection to xx.xx.xx.xx closed.\r\n", "module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n", "msg": "MODULE FAILURE", "rc": 0}
Cause: Python is not installed on remote node.
Resolving the problem:
- Install python on each nodes manually, with the following command line: # apt-get -y install python
- Patch(s) to add for resolve the above cases.
TASK [survivalkit: Install packages]¶
failed: [xx.xx.xx.xx] (item=tdpdump) => {"changed": false, "msg": "No package matching 'tdpdump' is available", "pkg": "tdpdump"}
Cause: Syntax error!
Resolving the problem:
- Replace tdpdump by tcpdump.
- Patch(s) to add for resolve the above cases.
--- roles/survivalkit/vars/Debian.yml.back
+++ roles/survivalkit/vars/Debian.yml @@ -25,7 +25,7 @@
- strace - sysstat - tar
- - tdpdump + - tcpdump
2018-06-26 14:06:57.811271968 +0000 2018-06-26 14:07:19.791084043 +0000
- telnet - tmux - vim
————————
TASK [openio-sds: OpenIO SDS: Set sysctl parameters]¶
Buffer and other variables, are not set automatically by OS, it is possible that we have tcp buffer errors.
Cause: OpenIO set variables
Resolving the problem:
- Kernel should be set variables.
- Patch(s) to add for resolve the above cases.
TASK [openio-sds: Install OpenIO puppet module]¶
fatal: [192.168.1.138]: FAILED! => {"changed": false, "msg": "No package matching 'puppet-module-openio- openiosds' is available »}
Cause: Repositories are not deployed.
Resolving the problem:
- Use the playbook install_basic_needs.yml, and define openio_sds_version
- Patch(s) to add for resolve the above cases.
TASK [repositories: Configure repositories for Ubuntu xenial]¶
fatal: [192.168.1.116]: FAILED! => {"msg": "{u'sds': {u'release': u'{{ openio_sds_release }}'}}: 'openio_sds_release' is undefined"}
Cause: The openio_sds_release is not set correctly. E.g. it might happen when we deploy a {RELEASE} instead of {RELEASE} sys release)
Resolving the problem:
- Remove the wrong repository,
- Patch(s) to add for resolve the above cases.
- Add a check release (wget or curl), before deployed the repository configuration on each nodes.
TASK [survivalkit: Include Ubuntu variables]¶
fatal: [192.168.1.116]: FAILED! => {"msg": "The conditional check 'install_survival_kit' failed. The error was: error while evaluating conditional (install_survival_kit): 'install_survival_kit' is undefined\n\nThe error appears to have been in '/root/customer-ubuntu/ansible/deployment/roles/survivalkit/tasks/main.yml': line 3, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: \"Include {{ ansible_distribution }} variables\"\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items: \n - \"{{ foo }}\ »\n"}
Cause: Wrong syntax!
Resolving the problem:
- As reminder, this list of installed packages is slated to disappear, so that we disable it.
- Patch(s) to add for resolve the above cases.