Handle Bash config file variables like a pro

A good practice when writing large Bash scripts, is to separate the config data from the actual processor, hence the humble config file.

The Config File

After playing around with various config file types (ini, json, yaml, txt), I found the YAML config file to be most flexible due to its ability to insert comments and nested structure. The only downside of Yaml is its fanatical 2-space requirement, meaning that you need to run ‘yamllint’ or some other linter to make sure the Yaml has no errors, otherwise you will get runtime errors

Another thing to consider when writing Bash or any shell, is that shells donot handle nested data structure well, hence the need for higher languages like Python or Ruby to parse YAML or JSON files.

If you want to keep things relatively sane, all-Bash and dependency-free then here is a good method to handle your config parameters

Lets use this sample config file ‘config.yaml’

---
#Company ABC Config
company:
  address: 123 new city
  datacenters:
    CA:
      North: 'San Francisco'
      South: 'San Diego'
      East:
    OH:
      West: 'Cincinati'
      East: 'Cleveland'
  phone:
    # memo: need to change US number to new area code
    US: '1-800-222-3333'
    EU: '2-1234-3433-33444'

This looks painful but its actually very simple, the data is arranged logically with ‘Company’ being the top node, with Address, Datacenters, Phone being the 3 subnodes, and each subnode with its own subnodes, all separated by 2 spaces. You can also add comments to the config unlike JSON.

Parsing YAML data with pure Bash

To get all these values as parameters into Bash without using Python or Ruby and without explicitly declaring each parameter, use the following Bash function,

function parse_yaml {
  local prefix=$2
  local s=’[[:space:]]*’ w=’[a-zA-Z0–9_]*’ fs=$(echo @|tr @ ‘\034’)
  sed -ne “s|^\($s\)\($w\)$s:$s\”\(.*\)\”$s\$|\1$fs\2$fs\3|p” \
  -e “s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p” $1 |
  awk -F$fs ‘{
    indent = length($1)/2;
    vname[indent] = $2;
    for (i in vname) {if (i > indent) {delete vname[i]}}
    if (length($3) > 0) {
      vn=””; for (i=0; i<indent; i++) {vn=(vn)(vname[i])(“_”)}
      printf(“%s%s%s=\”%s\”\n”, “‘$prefix’”,vn, $2, $3);
    }
   }’
}

Add this function to your Bash script

To get all your parameters from the config.yaml in 1 shot, add this line the script (add this after the function declaration),

eval $(parse_yaml config.yaml)

This will read your config file, and add each data node as a parameter to your script, so for example, you will end up with following parameters (it will add an underscore to separate each YAML level)

echo $company_address # outputs “123 new city”
echo $company_phone_EU #outputs “2–1234–3433–33444”

The beauty of this approach is that all your parameters are inside your Bash script without explicit individual declaration or messy iteration loops. You now have all your variables and are ready to work with them.

The only caveat with the “parse_yaml” function is that it cant handle YAML lists, for example

company:
  employees:
    - joe
    - mary
    - bob

This is a limitation of Bash and a workaround is to put all your list items into an string like this,

company:
  employees: 'joe, mary, bob'

Ok, now we are ready..

But wait.. what if the data is missing or someone removed a value from config.yaml? Heres a simple method to check for null or empty values:

Verify your Parameter Values

add a new function to your Bash script to verify each param,

function verify_param() {
  [[ -z “${!1}” ]] && echo $1 value not present in config.yaml, exiting.
}

To verify a param:

verify_param company_datacenters_CA_East

Since this param is null in our config file, it will give you an error when your run your script

“company_datacenters_CA_East value is not present in config.yaml”

Thats it, you can now handle all your params with 2 simple functions, there is no need to initialize each parameter individually and checking for null values is a breeze.

How to parse a nested YAML config file in Python and Ruby

if you have a complex config schema, you may need to store it in a YAML or JSON format

having been used to .INI style configs, I recently had to store nested values and INI style gets very complex, very fast.

For instance in YAML:

---
person:
  Joe:
    age: 32
    children:
      - Katie
      - Frank

  Bob:
    age: 43
    children:
      - Lisa

to get the names of Joe’s children, in JSON or YAML would look something like this,

data['person']['Joe']['children']
['Katie','Frank']

in INI, this would be something like,

[person]
[person/joe]
age=32
children=Katie,Frank

This is really ugly and not nested visually. To get an individual child’s name, you would need to additionally parse a comma separated string. Fugly.

Much better to use YAML. I prefer YAML over JSON because its much easier for human readability, although the language interpreter converts YAML into JSON during run-time

Heres a Python and Ruby example on how to parse this sample Config file

config.yaml

---
scanners:
  hostname:
    web01.nyc.mycorp.com:
      port: 9900
      scans:
        - "cisco scan"
        - "network sec scan"
        - "windows sec scan"
    web05.tex.mycorp.com:
      port: 9923
      scans:
        - "tex network"
        - "infra scan"

 

The Py and Rb parser scripts are structurally very similar,

Python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import yaml

def get_config(*args):
 with open('config.yaml', 'r') as f:
     conf = yaml.load(f) 
 
     # get section
     section = args[0]

     # check if Config file has Section
     if not conf.has_key(section):
         print "key missing"
         quit()

     # get values
     argList = list(args) # convert tuple to list
     argList.pop(0) # remove Section from list
 
     # create lookup path
     parsepath = "conf['" + section + "']"
 
     for arg in argList:
         parsepath = parsepath + "['" + arg + "']"
     return eval(parsepath)
 f.close()

scans = get_config('scanners','hostname','web01.nyc.mycorp.com','scans')
print scans
['cisco scan', 'network sec scan', 'windows sec scan']

or if you want a list of all hostnames,

scans = list(get_config('scanners','hostname'))
['web01.nyc.mycorp.com', 'web05.tex.mycorp.com']

 

here’s a simple one-liner if you want to bypass error checking

for key, value in yaml.load(open('config.yaml'))['scanners']['hostname'].iteritems():
    print key, value
web01.nyc.mycorp.com {'port': 9900, 'scans': ['cisco scan', 'network sec scan', 'windows sec scan']}
web05.tex.mycorp.com {'port': 9923, 'scans': ['tex network', 'infra scan']}

Ruby

#!/usr/bin/env ruby
require 'rubygems'
require 'yaml'

# get config file value
def get_config(*args)
   conf = YAML.load_file($config_file)

   # get section
   section = args[0]

   # check if Config file has Section
   if not conf.has_key?(section)
       puts "Config file does not have section information for #{section}")
       break
    end

   # remove 1st element, "section"
   args.shift
   parsepath = "conf[\"#{section}\"]"
   args.each do |arg|
       parsepath = parsepath+"[\"#{arg}\"]"
   end

   # Handle errors
   begin
     return eval(parsepath)
   rescue
      puts "Config file does not have information for #{parsepath}")
      exit(1)
   end
end #EOF

scans = get_config('scanners','hostname','web05.tex.mycorp.com','scans')
puts scans
['tex network', 'infra scan']