IoT Management - Plan B // Grinning Cat

A little bit of history

Appears in the first post in this series.

Probably the bit that most matters is the list of requirements:

I want to update the code running on the Gateways
I want to make sure the Gateways are alive
I want the Gateways to be configurable remotely
I need to ship stats off the Gateways (for instance how much data are they sending and receiving?)

Plan B

About two weeks ago when I drafted this post I wrote. If all else fails this is what I'll ship. It'll be ugly, I'll be sad but it will work. That will open up the field for the classier solutions.

Yeah, software has a habit of making a fool of you and sure enough that's what has happened here. The underlying script that collected the data and shipped it ot the cloud turned out to be hopelessly naive. So I ended up re-writing the blessed thing over the time I thought I had to get evaluating alternatives. End result. I shipped Plan B.

A cunning plan…

via GIPHY

So I have a little git repo that I'm keeping the code in. I could just automate a git pull. So long as I do nothing daft I can just keep pulling the repo and voila updating code!

Then I need some way to run things on a box with no SSH. Ah-ha, I set up a cron job to run a script in the repo. So long as the script isn't ridiculous I can run and re-run commands.

That leaves me with the last issue, how do I monitor the vital statistics of the box? Some people will say so what. But my experience is excessive writes and pegging the CPU at 100% are portents of doom for your hardware. Certainly in my last job that's exactly what happened. We used these Building Management System (BMS) units and arguably didn't monitor them well enough (although that is a never-ending quest for balance). Certainly bad things happened when the O/S couldn't get the space do things like scheduling. Furthermore multiple times we detected attacks (and in at least one case successfully defended that attack) because we saw elevated metrics.

The plan in practice

Before we begin

As you can imagine most of this is not warrantied or recommended in any way. But I assume if you've been led to this blog post you are at the point where that's not your primary concern.

via GIPHY

Prepping the device

So in order to get rolling you need the following:

Your code under source control (I assume git)
Enough key management to make the thing work
An image of the OS for the device
Access to the hardware

Access the hardware

In our case when you boot the Raspberry Pi (hereafter RPi) the easiest thing to do is give it an IP address and SSH in. That's one cable to plug in rather than two. Most modern Ethernet will automatically do a crossover cable (see this Wikipedia article for more details) so point to point Ethernet is a matter of plugging the device to your machine. This means you can provide in-field support by having a human with a computer and an Ethernet cable. Note that pretty much all major OSes these days come with a DHCP server. It's under internet sharing in both Mac OS and Windows. I do have a script that puts my ssh keys on the RPi and prompts me to reset the password on the machine. But it's weirdly flaky. One to fix another time 😬. Your checklist should be:

Connect to the Rpi
Change the pi user password (see the Raspberry Pi user documentation for more)
Copy your ssh keys onto the Rpi

An image of the OS

This section is super short, just to note that the default raspbian needs a little encouragement to fire up SSH. For my approach you need to copy a file called ssh onto the root directory of the machine (see Raspberry Pi SSH documentation). As an aside this is perfectly reasonable security practice by the Raspberry Pi Foundation. It's just I'm not their normal punter.

Enough key management

Almost as short as the last section 😃 There are more and less clever ways to do this. I'm using Keybase.IO right now. Previously I've seen key management done through other things like GPG and certainly whatever becomes Plan A is going to need to answer this in a satisfactory way. Don't use a magic USB key. Don't transmit them plain text. You know the drill.

Your source code

Once you're here you're getting close to a usable system. A couple of tips. First it is good security to use read-only accounts for source control. So don't make the RPi a user on your source control. Remember these devices are out in the wild and you should assume they will get hacked. You don't want random RPis pushing updates to your repo. Second don't use master, master as a branch on git is the default for almost everything so tools you don't even know people are using will be messing with master. So use a release branch, pre-commit hooks,proper TDD… Anything and everything that stops master getting messy and that flowing down to your devices. Because no one is proud of that a commit that appears on the Developers Swearing twitter feed 😉 Also what you should get the vibe of here is that really I'm overloading source control as package management.

A few helpful gists

So now you have it what does it look like?

This is the feedback bit. I would love it if you looked at these scripts and made improvements. Hence why they're gists. Share the code, make it better 😃

Git Updating

OK so here's the frist script in the series. This is script basically moves to the location of the repo and pulls down the contents of the release branch. Two little things here. The tendency for updates to be scuppered by an operator logging in and "hot-fixing" the code is high. So we reset the repo before the update. Two I didn't enforce the branch to allow just a cheeky touch of hot-fixing if required. That will likely hoist me on my own petard but I figured I'd give it a shot 😓

Running arbitrary tasks

Now I'm playing with a couple of updated approaches here but the first and most simple is to have a special script in the repo. Here's my default example:

This is an even simpler script because in reality it does nothing. The point being if there's something there it can be run by the machine.

It's probably also worth noting that I'm expecting to build almost all of my configuration within the script. That's because I'm running inside relatively sparse environments like cron.

Health of the Gateway

So I've rolled both of these into one easy answer Datadog. I can install the agent onto a RPi from source (see https://docs.datadoghq.com/agent/basic_agent_usage/source/?tab=agentv6 and https://docs.datadoghq.com/developers/faq/deploying-the-agent-on-raspberrypi/). The only pain in that is that I think it takes some 40-ish minutes to install the agent. Seriously, it's excruciating. From there I know that the gateway is both up and functional. There's a relatively simple monitor you can configure in datadog to then tell you what's going on. As importantly I can track the amount of bandwidth being used at any one time. Which is super helpful as I start to work out the operational characteristics of the platform.

Where next?

The short answer is if the solution holds it is time to go find something better 😄