Handle Bash config file variables like a pro

A good practice when writing large Bash scripts, is to separate the config data from the actual processor, hence the humble config file.

The Config File

After playing around with various config file types (ini, json, yaml, txt), I found the YAML config file to be most flexible due to its ability to insert comments and nested structure. The only downside of Yaml is its fanatical 2-space requirement, meaning that you need to run ‘yamllint’ or some other linter to make sure the Yaml has no errors, otherwise you will get runtime errors

Another thing to consider when writing Bash or any shell, is that shells donot handle nested data structure well, hence the need for higher languages like Python or Ruby to parse YAML or JSON files.

If you want to keep things relatively sane, all-Bash and dependency-free then here is a good method to handle your config parameters

Lets use this sample config file ‘config.yaml’

---
#Company ABC Config
company:
  address: 123 new city
  datacenters:
    CA:
      North: 'San Francisco'
      South: 'San Diego'
      East:
    OH:
      West: 'Cincinati'
      East: 'Cleveland'
  phone:
    # memo: need to change US number to new area code
    US: '1-800-222-3333'
    EU: '2-1234-3433-33444'

This looks painful but its actually very simple, the data is arranged logically with ‘Company’ being the top node, with Address, Datacenters, Phone being the 3 subnodes, and each subnode with its own subnodes, all separated by 2 spaces. You can also add comments to the config unlike JSON.

Parsing YAML data with pure Bash

To get all these values as parameters into Bash without using Python or Ruby and without explicitly declaring each parameter, use the following Bash function,

function parse_yaml {
  local prefix=$2
  local s=’[[:space:]]*’ w=’[a-zA-Z0–9_]*’ fs=$(echo @|tr @ ‘\034’)
  sed -ne “s|^\($s\)\($w\)$s:$s\”\(.*\)\”$s\$|\1$fs\2$fs\3|p” \
  -e “s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p” $1 |
  awk -F$fs ‘{
    indent = length($1)/2;
    vname[indent] = $2;
    for (i in vname) {if (i > indent) {delete vname[i]}}
    if (length($3) > 0) {
      vn=””; for (i=0; i<indent; i++) {vn=(vn)(vname[i])(“_”)}
      printf(“%s%s%s=\”%s\”\n”, “‘$prefix’”,vn, $2, $3);
    }
   }’
}

Add this function to your Bash script

To get all your parameters from the config.yaml in 1 shot, add this line the script (add this after the function declaration),

eval $(parse_yaml config.yaml)

This will read your config file, and add each data node as a parameter to your script, so for example, you will end up with following parameters (it will add an underscore to separate each YAML level)

echo $company_address # outputs “123 new city”
echo $company_phone_EU #outputs “2–1234–3433–33444”

The beauty of this approach is that all your parameters are inside your Bash script without explicit individual declaration or messy iteration loops. You now have all your variables and are ready to work with them.

The only caveat with the “parse_yaml” function is that it cant handle YAML lists, for example

company:
  employees:
    - joe
    - mary
    - bob

This is a limitation of Bash and a workaround is to put all your list items into an string like this,

company:
  employees: 'joe, mary, bob'

Ok, now we are ready..

But wait.. what if the data is missing or someone removed a value from config.yaml? Heres a simple method to check for null or empty values:

Verify your Parameter Values

add a new function to your Bash script to verify each param,

function verify_param() {
  [[ -z “${!1}” ]] && echo $1 value not present in config.yaml, exiting.
}

To verify a param:

verify_param company_datacenters_CA_East

Since this param is null in our config file, it will give you an error when your run your script

“company_datacenters_CA_East value is not present in config.yaml”

Thats it, you can now handle all your params with 2 simple functions, there is no need to initialize each parameter individually and checking for null values is a breeze.

Advertisements

How to parse a nested YAML config file in Python and Ruby

if you have a complex config schema, you may need to store it in a YAML or JSON format

having been used to .INI style configs, I recently had to store nested values and INI style gets very complex, very fast.

For instance in YAML:

---
person:
  Joe:
    age: 32
    children:
      - Katie
      - Frank

  Bob:
    age: 43
    children:
      - Lisa

to get the names of Joe’s children, in JSON or YAML would look something like this,

data['person']['Joe']['children']
['Katie','Frank']

in INI, this would be something like,

[person]
[person/joe]
age=32
children=Katie,Frank

This is really ugly and not nested visually. To get an individual child’s name, you would need to additionally parse a comma separated string. Fugly.

Much better to use YAML. I prefer YAML over JSON because its much easier for human readability, although the language interpreter converts YAML into JSON during run-time

Heres a Python and Ruby example on how to parse this sample Config file

config.yaml

---
scanners:
  hostname:
    web01.nyc.mycorp.com:
      port: 9900
      scans:
        - "cisco scan"
        - "network sec scan"
        - "windows sec scan"
    web05.tex.mycorp.com:
      port: 9923
      scans:
        - "tex network"
        - "infra scan"

 

The Py and Rb parser scripts are structurally very similar,

Python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import yaml

def get_config(*args):
 with open('config.yaml', 'r') as f:
     conf = yaml.load(f) 
 
     # get section
     section = args[0]

     # check if Config file has Section
     if not conf.has_key(section):
         print "key missing"
         quit()

     # get values
     argList = list(args) # convert tuple to list
     argList.pop(0) # remove Section from list
 
     # create lookup path
     parsepath = "conf['" + section + "']"
 
     for arg in argList:
         parsepath = parsepath + "['" + arg + "']"
     return eval(parsepath)
 f.close()

scans = get_config('scanners','hostname','web01.nyc.mycorp.com','scans')
print scans
['cisco scan', 'network sec scan', 'windows sec scan']

or if you want a list of all hostnames,

scans = list(get_config('scanners','hostname'))
['web01.nyc.mycorp.com', 'web05.tex.mycorp.com']

 

here’s a simple one-liner if you want to bypass error checking

for key, value in yaml.load(open('config.yaml'))['scanners']['hostname'].iteritems():
    print key, value
web01.nyc.mycorp.com {'port': 9900, 'scans': ['cisco scan', 'network sec scan', 'windows sec scan']}
web05.tex.mycorp.com {'port': 9923, 'scans': ['tex network', 'infra scan']}

Ruby

#!/usr/bin/env ruby
require 'rubygems'
require 'yaml'

# get config file value
def get_config(*args)
   conf = YAML.load_file($config_file)

   # get section
   section = args[0]

   # check if Config file has Section
   if not conf.has_key?(section)
       puts "Config file does not have section information for #{section}")
       break
    end

   # remove 1st element, "section"
   args.shift
   parsepath = "conf[\"#{section}\"]"
   args.each do |arg|
       parsepath = parsepath+"[\"#{arg}\"]"
   end

   # Handle errors
   begin
     return eval(parsepath)
   rescue
      puts "Config file does not have information for #{parsepath}")
      exit(1)
   end
end #EOF

scans = get_config('scanners','hostname','web05.tex.mycorp.com','scans')
puts scans
['tex network', 'infra scan']

My Big-Ass, verbose and overly complicated Puppet, SaltStack, Ansible review


Whats all this about?

I’ve been reading various reviews of Config Management tools, especially for the big 4 (Puppet,Chef,SaltStack,Ansible), but havent seen anything concrete and technical with examples. This article is a “civilian” level review of 3 of these tools with some helpful code examples and pro/cons of each. I tried to not go too deep into the weeds but theres weeds here everywhere. Enjoy.

If you see any technical inaccuracies, feel free to flame me and also comment and I’ll update the article. Also my Ansible exposure is limited so any Ansible people, please chime in with information. Thanks.

My Background

I’ve been working as a IT automation consultant, working with a proprietary automation tool (cannot give name here) for over 6 years, but about a year and half ago, I’ve hit a major limit with this application when it comes to things like version control, devops lifecycle and continuous deployments, ability to integrate with cloud providers and basic APIs and web services.

Another major problem was scaling it to handle thousands of virtualized servers, scaling it was very painful, involved constant JVM and OS worker thread tuning. It also had a very complex setup to do basic remote execution which had a very complex and cumbersome authentication mechanism. Additionally the price of this application was astronomical, so I looked at what other solutions are out there.

My needs are simple and probably same as other admins,

  • configure OS and application layouts
  • maintain configuration code in a clear manner with separation of environments (dev, stage, prod)
  • ability to maintain a form of RBAC (role-based access control) to this application
  • deploy software packages to my managed nodes
  • develop some form of orchestration between managed objects
  • ensure my nodes are always up-to-date on releases and patches

There are many other typical sysadmin areas I can list but I will keep this article short (ha ha!) and to the point. My review focuses on these 3 out of 4 most popular config mgmt tools (Puppet, Ansible, SaltStack, Chef). The following is my experience level with each application.

Puppet

knowledgeable, have been working with Puppet for at least 12 months, developed my own modules that are up on Puppet Forge.

SaltStack

knowledgeable but not an expert, worked with Salt on my personal dev environment for about 12–15 months mainly to configure my own laptop and a few cloud-VMs, nothing major. Have experience with basic configuration tasks but have not dived into advanced stuff like Beacons and RAET.

Ansible

low level of knowledge, installed and played with Ansible Tower on Vagrant and ran few playbooks on test nodes.

Chef

no knowledge, I havent had time to look at this tool. Cannot rate it in this review 😦


Versions Tested

Versions of each product used for this article:

Puppet: PE 2016.2 (Puppet 4.2)
SaltStack: 2015.5.3
Ansible: 2.1.0


Each application in 60 seconds

Some quick background info on each product

Puppet

  • Ruby and Java based configuration management tool that uses an agent installed on each managed system to process a compiled catalog
  • Idea behind Puppet is a “Pull > Push” mechanism, every 30 min an Agent requests its Catalog from the Puppet Master, which then sends it to the Agent
  • Puppet Master receives a list of Facts (generated by Facter, a ruby gem) from managed nodes and then compiles a catalog based on latest Facts
  • Puppet Masters generate JSON catalogs and send these catalogs back to nodes
  • Puppet Master does not do any execution on the nodes, the agents to the actual legwork. Master’s primary function is to compile catalogs, send them to nodes and run Reports
  • node agents do the heavy lifting of package installs, configuration,etc
  • focuses on declarative and idempotent infrastructure configuration using Modules which are written in its own DSL (domain specific language), no knowledge of Ruby required
  • has some remote execution capabilities with MCollective (a Ruby messaging framework that uses ActiveMQ broker)
  • Enterprise version comes with PuppetDB that is used for storing data retrieved from nodes (reporting)
  • can store hierarchical lookup data for its modules using Hiera
  • can do OS provisioning using Razor
  • can parse file structures and pull out or modify specific data using Augeas
  • has Puppet Forge that stores collections of modules to perform various tasks
  • started in 2006
  • core strength: infrastructure modeling and configuration, user-contributed content, reporting

SaltStack

  • Python-based event-driven remote execution & config management tool that can manage nodes either through a Salt agent or SSH
  • Can be both Push and Pull. Can have Master send instructions to nodes and nodes pull instructions from Master
  • Idea behind Salt is very different from Puppet. Salt is essentially a messaging bus with Config Mgmt built on top of the bus
  • No direct connection from Master to nodes, instead Master publishes commands and nodes listen on a port, then execute commands if they match the target filter, reply back to Master on another port
  • No connection makes Salt execution lightning fast
  • Salt uses a TOP file that flows down hierarchically and sends agents a list of changes to be made. Essentially “node classification”
  • uses Python HTTP protocol ZeroMQ for persistent AES-encrypted event bus resulting in extremely fast execution. (Also has UDP-like RAET protocol thats even faster, although with UDP-like data loss)
  • uses YAML-based configuration ‘state’ files to model infrastructure, “states” are similar to Puppet’s “classes” and Ansible’s “playbook”
  • Salt Master serves a ‘state’ file to a minion and Salt agents do all the heavy work of package installs, configurations, etc.
  • has additional processors called “modules”, unlike Puppet modules, Salt modules are subsystems that perform work, not collections of DSL code, for example,
salt ‘*’ pkg.list (PKG is a module, LIST is a command)
salt nychost123 test.ping (Test is a module, Ping is a command)
salt nychost123 hosts.add_host 192.168.33.100 myHost
  • also has orchestrational features like Beacon & Reactor, which are actions driven by events that are sent out from nodes
  • similarly to Puppet Facter, Salt uses “Grains” to obtain facts about each node
  • can store data for its ‘states’ using Pillar, a centralized “data provider” for nodes
  • started in 2011
  • core strength: fastest execution and parallelism of any product, remote execution, scalability, cloud integration

Ansible

  • Python-based configuration management and remote execution tool
  • Master server pushes ‘modules’ to managed nodes and lets each node configure itself. These ‘modules’ or basically python scripts are removed from the host after each execution
  • uses “Playbooks” which are similar to Puppet’s “classes” and Salt’s “state” files
  • has Ansible Galaxy that stores Roles (collections of Playbooks) used to perform various tasks
  • doesnt have anything similar to Hiera or Pillar for data management, although it integrates with other data providers
  • Started 2013
  • core strength: simplicity of use and deployment, cloud integration, agentless SSH connection (although Salt can do the same thing)

 

Jargon Breakdown

One of the more annoying things is that each product has a different name for similar concept, heres a breakdownm
NOTE: this comparison is not 1:1 but a general summary, there are many important differences between each concept so read each product’s documents carefully!!

 Puppet  Saltstack  Ansible  Description
 Facts / Facter  Grains  Facts / Facter facts about a managed systems (OS name, IP address, Role, owner, Memory size, Python version, etc)
 Class  State  Playbook a set of instructions to be performed on a managed system (install MySQL package, create a user, change permission, etc)
 Module  Formula  Role a collection of instructions above that performs a logical function (Hadoop cluster install, NGINX installation + config, Management of F5 Load Balancer, etc)
 Provider  Module  Module performs various system tasks on target node (User module, Package module, Network module, etc).

In Puppet’s and Salt’s case, it has true Resource Abstraction (see “Defining the Infrastructure” section below).

 Node  Minion  Host any system managed by the application. (Target)
 Hiera  Pillar N/A data provider for the application’s config management code. Best practice is to separate user and company data from code that does the actual configuration change

Note: Hiera works very differently from Pillar, but the idea is the same.

 MOM
(Master of Masters)
 Syndic N/A Master of Masters, a top Master controller for other slave Masters in a scaled environment
 Forge  github, no centralized forge  Galaxy  Online repository of user-built content

The Review – Lets Begin

All areas reviewed on 5-point basis


01 – Learning Curve

This is purely subjective rating, this is how difficult I found the product to be. Obviously this will vary with each person.

Puppet:  2

Puppet was by far the most steep learning curve but this isnt necessarily a knock against the product. Puppet does have a much richer and capable CM modelling language than the other 2. Because of this, there are a lot of concepts that Puppet covers like Defined Types, Virtual Resources, Functions, etc that do not apply to the other 2 applications

The bad part about this complexity is the large number of components that make up Puppet Enterprise, especially the MCollective piece (which is an entire product in its own right). It takes a lot of time to understand how everything works together and how to troubleshoot common issues

SaltStack:  5

I found Salt to be extremely simple to understand and use. Salt’s simplicity is misleading however because it does offer a very complex and powerful orchestration and event management system along with its CM capability. I found Salt to be even easier to pick up than Ansible, it just makes sense to me

Ansible:  5

Simple to pick up as well although I found the syntax of playbooks to be verbose, for example in the Template line, placing everything in 1 single line like this makes it too jumbled, but this is subjective obviously

- name: configure hive in /etc/hive/conf.{{ site_name|lower }}
  template: src={{ item }} dest=/etc/hive/conf.{{ site_name|lower }}/{{ item }} owner=root group=root mode=0644
  with_items:
    - hive-site.xml
  notify:
    - restart hive-metastore

02 – Installation of Masters and Agents

Puppet:  3
  • Master is relatively simple to install but involves more steps than others.
  • If installing Monolithic (single) master, the installer starts a webserver and the installation asks for various input details like Master name and DNS alias. Install takes 15-20 min.
  • Additionally the installation tarball is roughly 250-300 MBs which is much larger than the other products.
  • agents can be installed via pkg managers or through a bash script that reaches out to its Master’s package directory in /opt and pulls down the installer
  • Puppet can also be run without the Master, which would only require the installation of an agent and then periodically updating the JSON catalog on each machine
SaltStack:  5
  • I found Salt to be very simple to install, install time 2-3 minutes depending on connection speed. Size of package is only 20-25 MB when installed and this includes all the necessary python packages.
  • Agents: very simple to install agents as well, apt-get
  • Salt provides a Bootstrap script that can be run on any OS, that will install Master or agent. You can even use Salt to install its own agents on nodes using SSH,
    salt-ssh  -i targetName cmd.run "sh bootstrap.sh"
  • Can also be run via pure SSH using ‘salt-ssh’ command. (Note: I found salt-ssh to be fantastically easy to use, even more so than Ansible)
Ansible:  5
  • very simple install for Debian and Fedora based masters,
    sudo yum install ansible 
  • Agents: no agents to install, everything done with SSH
  • Has really nice Inventory management feature, you can import hosts to manage using a script, AWS, Rackspace, Openspace, DigitalOcean, etc, very simple to add massive inventory data and get going managing it (requires Enterprise version though)

 03 – Defining the Infrastructure (Config Management)

All 3 products use a declarative approach to infrastructure definition (with some being more Declarative than others)

Puppet:  5

  • Puppet’s core strength is its DSL, which is becoming Lingua Franca of CM. It has very robust and expressive conditionals which let you create additional things like Defined Resources (cookie cutter Resource management structures)
  • Has Ruby’s powerful hashes and functions, very expressive and relatively simple to model your infrasturcture

    sample Nginx declaration

        package { "nginx":
             ensure  => installed,
             require => Yumrepo['nginx-repo'],
             before  => File['/etc/nginx/nginx.conf'],
        }
    
  • Has Resource Providers, which are Ruby code that translate the Puppet DSL declaration into the actual OS commands, making Puppet completely Abstract when it comes to managing resources. This makes it easy to manage a resource regardless of the underlying OS type of the target, meaning that your config code is now reduced since you dont need any additional declarations for each OS-type, you only need 1 declaration.

    For example, if you need to have user Joe created on all your managed nodes regardless of OS:

    user { "joe":
        ensure => present,
     }

    This will create Joe user on any machine that runs this class.

  • Has lots of Resource Providers making configuration of your infrastructure components very easy, providers are also extendible with Ruby (full list of Resources: https://docs.puppet.com/puppet/latest/reference/type.html )
  • Can get very complex when processing a large catalog, especially when including other classes and variables (include vs inherit vs require). Not as straightforward as YAML
  • “Hiera” is an excellent data provider tool that works by pulling down data in a hierarchical fashion during the Puppet Agent’s run and providing data to the Module (for example, passwords or company-specific data like Datacenter Name). I found Hiera to be difficult to debug however.
  • Has “Augeas”, a file parser that breaks down any file into a tree-like data structure which can then be parsed and manipulated. This is useful but I found it very difficult to use and debug.
  • Has many modules that make it easy to manage specifics about files, regex, computation, etc (for example Puppet’s Standard Lib module)

SaltStack:  5

  • Config Management is built on top of messaging bus, using YAML to define the infrastructure (although other syntax can be used other than YAML like Jinja, JSON or pure Python). Has “Renderers” to process any YAML or other syntax into Python execution code
  • Uses “Modules” similar to Puppet Providers to manage actual resources on targets, for example:
httpd:
  pkg.installed:
    - fromrepo: mycustomrepo
    - skip_verify: True
    - skip_suggestions: True
    - version: 2.0.6~ubuntu3
    - refresh: True
    - allow_updates: True
    - hold: False
  • Has a massive list of Modules that manage all kinds of object types, which are extendible with Python (Apache hosts, Augeas, Grafana, etc,
    full list here: https://docs.saltstack.com/en/latest/ref/states/all/index.html )
  • Relationships between resources have very flexible and elegant expression keywords ( require, watch, prereq, use, onchanges, onfail, etc), this gives you more flexibility in managing relationships between resources than Puppet or Ansible
Ansible:  5
  • uses Jinja as the template engine for config files, I found it very simple to use
  • Lots of included Modules to manage resources, similar to Salt, full list: http://docs.ansible.com/ansible/modules_by_category.html
  • Low level of abstraction with resources as compared to others. For example, package manager needs a specific apt-get or yum to install a package in a playbook. This results in Case or For loops to take care of every possibility, or creating a separate playbook for each common installer (for each pkg manager), which is a hassle.
  • Is both declarative and imperative, meaning you can specify explicit instructions or model the end state in abstract terms
  • I really liked some aspects of its syntax, for example installing a package with the ability to provide additional install parameters like package modules or libs
- name: install Ganglia web dependencies via apt
  apt: name={{ item }}
  with_items:
    - apache2
    - php5
    - rrdtool
tags: ganglia
  • Another great feature of Ansible’s syntax is When conditional, you dont need to do For or Case loops to determine the OS type, for example this runs a command When the OS = Debian. Very simple and clean syntax
tasks:
 - name: "shutdown Debian flavored systems"
 - command: /sbin/shutdown -t now
 - when: ansible_os_family == "Debian"

04 – Security

Puppet:  5
  • Agents are handled via SSL certs, Puppet Master acts as a CA.
  • Can be difficult to setup certs across multiple Masters in a large environment. Communication is encrypted via SSL so all catalogs sent to nodes are encrypted.
  • Management of nodes is easy with “puppet cert sign” command
  • For passing data to “modules”, Puppet has e-Yaml which encrypts data like passwords into AES 256 bit encryption as well as additional plugins like Amazon Web Services key management.
SaltStack:  5
  • No direct connection to agents so no handshake, all comms done through AES encrypted ZeroMQ messaging bus using Ports 4505 (publisher), 4506 (requester)
  • ZeroMQ uses PyCrypto Python security library that apparently had some issues, although from what Ive read these have been addressed
  • if using RAET instead of ZeroMQ for communication, RAET is AES encrypted
  • The AES encryption key uses an explicit initialization vector and a CBC block-chaining algorithm in accordance with the latest accepted version of TLS
  • Agent keys are managed with ‘salt-key’ command. Once agent is installed it contacts the Master via port 4506 and sits until its public key is accepted on the Master side.
Ansible:  5
  • uses SSH 256-SHA security private/pub keys, industry tested and used everywhere
  • Doesnt use any other port besides SSH (which can be changed from default 22), so no areas of exposure

05 – Remote Execution & SSH

Puppet:  2
  • Puppet does install MCollective (MCO) orchestration piece with Enterprise install but I found it very complex and much more  limited than the others.
  • MCO runs on top of an ActiveMQ (or RabbitMQ) messaging broker and has a concept of a “messaging Server”
  • MCO was designed to orchestrate puppet runs across the environment, instead of waiting for each agent to check in with the master every 30 min
  • I found MCO to be very capable but not as robust as others, for example, theres no way (that I know of ) to run direct system commands. Can only control resources already managed by Puppet
# start NTP service on hostname ny14.nyc
mco service ntp start -I ny14.nyc
  • Puppet does not have a pure SSH-only controller, the agent is the key driver of change on each managed node.
SaltStack:  5
  • Salt is excellent at remote execution and parallelism, it was designed from ground-up to be a true remote execution system
  • has a Reactor subsystem on Master that can handle logic when receiving events from managed nodes, making it very powerful for orchestration
  • similar to how Ansible handles SSH logins, Salt contains a SSH file in /etc/salt/roster
  • I found Salt’s SSH setup to be even simpler to use than Ansible
# Sample salt-ssh config file 
mrxcloud1: 
    host: 104.131.102.230 
centos7node: 
    host: 192.168.56.20 
    user: root
  • has “salt-thin” agent, which is compiled by Master. Its a tarball of vital Salt modules that is copied to the nodes through SSH (but isnt installed, its just python files). Salt-Thin enhances SSH management of nodes
Ansible:  4
  • Ansible remote commands are simple as well, commands are passed with “-a” parameter
[root@ansible-tower .ssh]# ansible all -a "uname -a" centos7node | SUCCESS | rc=0 >> Linux centos7node 
3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  • Ansible’s SSH communication is handled via its own Hosts file:
  • the /etc/ansible/hosts file contains a list of SSH entries for each managed node. Unlike Salt, SSH is the only way Ansible can manage its nodes.
# Ex 1: Ungrouped hosts, specify before any group headers. 

## green.example.com 
## blue.example.com 
## 192.168.100.1 
## 192.168.100.10 

# Ex 2: A collection of hosts belonging to the 'webservers' group 
## [webservers] 
## alpha.example.org 
## beta.example.org 
## 192.168.1.100 
## 192.168.1.110
  • SSH information can also be pulled in from other sources like DB, AWS,
  • Because of Ansibles SSH-only connection, there can be delay and slowness when executing on large #s of systems (there are parallelism settings that can be adjusted)

06 – Node Classification & Assignment

Puppet: 5
  • Provides lots of choices on how to classify your nodes
  • Can assign nodes to Classes through: Console, Site.pp file (on Master) or using Hiera
  • Enterprise Console lets you group Nodes into logical groups (environment, datacenter, working role, etc) and assign a Class to each node group. For example, a Webserver node group can have the following classes assigned to it: Apache, HAProxy, Firewall, Common
  • Example of Hiera classification:
/etc/puppetlabs/code/environments/productions/hiera.yaml
---
:backends: 
- yaml 

:hierarchy: 
   - "node/%{::trusted.certname}" 
   - "datacenter/%{::datacenter}" 
   - "os/%{::os.name}" 
   - common
Hiera searches top level (Trusted Certname), if finds a Yaml that matches the hostname, pulls out values from that yaml file

Example of a hostname Yaml file:
/etc/puppetlabs/code/environments/productions/hieradata/node/hostA.yaml

## host A
sysctl:
– some_val = 123

Hiera then traverses down the hierarchy to find a datacenter. If managed node has a Datacenter fact that matches, it pulls out values from this yaml

/etc/puppetlabs/code/environments/productions/hieradata/datacenter/London.yaml

## LONDON
ntp::servers:
 - 0.uk.pool.ntp.org
 - 1.uk.pool.ntp.org
 - 2.uk.pool.ntp.org
 - 3.uk.pool.ntp.org
SaltStack: 3
  • lacks true classification of nodes. Salt’s remote execution priority is clearly on display here as targeting nodes for specific State files is done from command line using either a direct hostname of the node or some type of additional filtering parameter, for example by OS type.
  • Uses a TOP file to classify each node with its State which works well
# /srv/salt/top.sls
base:
   'mrxpalmeiras':
     - linux.git
     - linux.virtualbox

  'centos7node':
    - linux.git

  'mrxwin7':
    - win.notepadpp
    - win.firefox
  • Can also configure a Node Group like Puppet, but this has to be configured in Salt conf file in /etc/salt/salt.conf

    Example node targeting:

    direct hostname:

salt nycdev12 state.sls "apache"
  • by grains(facts)
salt -G os:Windows cmd.run "net stop Firewall"  
  • by Node Group (NGs are configured in /etc/salt/master)
salt –N db_servers cmd.run 'ps –ef | grep mysql'
  • Can also classify using Pillar, which is data stored on the Master about a particular node
  • Lacks classification

Ansible: 2

  • Ansible uses a Host file to configure what nodes go where
    # Ex 2: A collection of hosts belonging to the 'webservers' group 
    [webservers] 
     alpha.example.org 
     beta.example.org 
     192.168.1.100 
     192.168.1.110
  • Playbooks are then run according to the tag thats placed in the playbook, ie
    ---
    - hosts: webservers
    serial: 5 # update 5 machines at a time
    roles:
    - common
    - webapp
    
  • This makes it both simple to assign nodes to playbooks or Roles, but also makes tracking of what goes where difficult. There is no true dynamic way of passing data down for each host. You have to be sure what Playbook is tied to what group of hosts.

07 – Scaling

Puppet:  2
  • a single Puppet Master can handle up to 1,000 nodes so for any large company with tens of thousands of servers, scaling can be painful. Scaling Puppet would involve setting up additional Compile masters that crunch each node’s catalogs and send the catalogs to the node for processing. This also involves using HAProxy or some other proxy and LB mechanism. Because Puppet is a one-way “pull” service (the agent checks in with the master every “x” minutes and requests its catalog), Puppet Masters can sometimes be overloaded with requests (“Thundering Herd phenomenon”), so additional strategies are needed, like setting the check-in time for each node to some randomly generated number.
  • Catalog compilation can be made more efficient by adding “compile masters”, which process and generate catalog Json files. There is some complexity involved in setting this up, making sure certs are correct and monitoring for service uptime
SaltStack:  5
  • a single Master can run upwards of 10,000 nodes or more, because of use of messaging bus instead of SSH or TCP connection, making Masters extremely efficient
  • Can scale using Syndic (Master of Masters), very simple to configure.
  • No need to setup strategies to deal with ‘thundering herd’ of node check-ins. Everything is done via a messaging queue.
Ansible:  No Score

Need more info here

  • I have no experience scaling Ansible
  • From reading other comments, Ansible doesnt seem to scale as well as Salt because of SSH communication limitation, although I cant comment on how

 


08 – UI / Console / Portal

Puppet:  3
  • Puppet 4 comes with a very mature console that shows Classification, Job and Node data, also Facts about each node
  • Console also has a very under-rated Resource Dependency Graph, which visually shows how all managed resources tie into each other, making it easier to troubleshoot or see what will be affected
  • Lacking remote execution function, no way to talk to Node directly from console (Live Management was removed in version 4)
  • has LDAP integration to add users to Console
  • easy to create Node Groups according to various criteria (Fact data, Name, Regex, Direct Pin, etc)
  • Lacks some reporting content, for example a visual breakdown of nodes by OS type, Arch, etc (pie chart or bar graph)

    example of events:
    cm_events.png

    resource graph:
    cm_nodegraph


SaltStack: 0
  • As of now there is no stable and mature console although Salt Enterprise 5 has a very nice console in the works. As of now (Sept. 2016) Salt is purely cmd line

Ansible:  4

  • Ansible has a very mature console with Inventory, Job and Project views.
  • has LDAP integration for Users, as well as notification settings for each user
  • Has a great feature to plugin directly to Github or other VCS and pull playbooks directly into the console and run them.
  • Lacks reporting and metrics, for example showing host Facter or relevant information
  • No clear way to look at nodes listed in Inventory (for example Fact data, OS type, etc)
  • Can run remote commands against Groups of managed nodes but not directly against a node
  • No easy way to create dynamic Node group population (that I know of), Groups are populated manually or from external source like AWS or script.
  • Has a great Credentials management space, can store creds for servers, AWS, Rackspace, network, VMWare, etc

    Ansible’s management areas:
    ansible1

    running a command on a host
    ansible2.png


09 – Community & Contributions

Puppet:  5
  • Puppet has active contributions to the core project but with less commits, which is typcially because its a mature/stable product
    Commit graph: https://github.com/puppetlabs/puppet/graphs/contributors

  • Has massive number of Modules which do anything from install Apache to run F5 load balancers, although some are better written than others
  • Has both Forum and IIRC channel for issues and help as well as external Google Groups forum

SaltStack:  4
  • Has active contributions on Github and active community
  • Commit graph: https://github.com/saltstack/salt/graphs/contributors
  • Has lots of Formulas on the main Salt github page, but lacks a front end like Puppet Forge or Ansible galaxy
  • Doesnt have a true user forum. Has an active Google Group site for questions and an IIRC channel

 

Ansible:  5
  • Has active contributions on Github and active community
  • Commit graph: https://github.com/ansible/ansible/graphs/contributors
  • Lots of user content for “Roles” (Ansible’s version of Puppet Modules)
  • Similarly to Puppet Forge, Ansible has Galaxy which can be used from command line
  • Has a Google Groups forum for questions