On commenting code: external vs internal documentation

In my day job, I sometimes have to write scripts in order to automate things that either happen too often to be handled manually or need to be handled in the shortest amount of time possible.But more often, I am tasked with modifying the behavior of an already existing script and/or adding new functionalities.Needless to say, some scripts are way longer than they should be, and they do things that are not supposed to do in a shell-scripting language (think of doing stuff on/to an XML document, using only the Bash shell and tools from the coreutils package). You might think that stuff like that is crazy, and yes, it is a bit crazy but it works and quite frankly, I got to see some rather clever solutions and approaches to unusual shell-scripting problems, and some very creative uses of more or less known features of Bash.

Call it masochism, but I am quite fond of reading code.

Sometimes though, reading scripts is not so pleasant. As shell script is still programming, I see a recurring pattern of poor programming in the form of poor (or non-existant) comments in the code.

I have come to think that code should really be interspersed with two kinds of documentation, and I like to call them “external” and “internal” documentation.

External documentation is supposed to answer the following questions:

  • what does this script/function do?
  • what are, if any, its side effects?
  • what parameters are needed?
  • can you make an example of the use of this script/function ?

Long story short, external documentation is supposed to help the user of your code use correctly, in particular when using your function the first time. Documenting parameters is particularly important: in dynamic languages like python, the name of a parameter says very little about its type but the type, and in the bash language you don’t even have to list formal parameters: whether you pass two or five of them, they will all be available via their position using variables like $1, $5 and so on.

Internal documentation on the other hand, should really address and describe the internals and the details of the implementation. We could say internal documentation answers the following questions:

  • how is the problem being approached?
  • what is the general strategy ?
  • what’s the meaning/the semantics of the various magic number and magic strings that are in the code?

If you think magic strings are bad, you probably haven’t had the pleasure of writing that many shell scripts. Magic strings are things like utility-specific format strings. Think of the date utility and its many available format strings, or the ps utility (see the “STANDARD FORMAT SPECIFIERS” section in the manpage). Sometimes magic strings can be weird and particular bash syntax.

You don’t see magic strings when you’re writing code because you have just looked it up and it’s obvious to you, but it might not be the case for someone else.

As an example: when formatting a timestamp using the date command, it is a good idea to add a comment with an example output of said format string. Are you writing

date '+%D @ %T'

This is more effective:

date '+%D @ %T'  ## 09/02/17 @ 10:20:53

And this is better commenting:

## date: '+%D @ %T' => '09/02/17 @ 10:20:53'

No need to look it up. You can immediately see what generates what.

 

In the end, as text processing is a significant part of pretty much every system administration task, you can go very far using the standard basic tools from the coreutils package. Once you get to know them, they can be super effective and reasonaby fast for most tasks.

But shell scripting is sometimes underappreciated, and this usually leads to many issue of poor documentation.

Annunci

Decommisioning acso-explorer.santoro.tk

gcc-explorer

Today I stopped serving requests for the domain http://acso-explorer.santoro.tk.

Acso-explorer was a patched version of gcc-explorer, bundled with pre-built cross-toolchains of gcc-m68k, gcc-mips and gcc-amd64.

The last request has been served October 29, 2016, but the node.js process was consuming 5-6% CPU with unjustified spikes to 10-12%.

I’ll be setting up a pointed to the container image on the Docker hub, and that will be the end of acso-explorer, at least for now.

XDM, fingerprint readers and ecryptfs

how to setup pam/xdm to work correctly with fingerprint login and ecryptfs-encrypted filesystem

I got myself a new toy last week, it’s a ThinkPad X200s. I bought it home for less than 60 euros and for 24 more i got a compatible battery. It’s a nice toy, definitely slow, but I now have something light I can carry around without worrying too much.

Of course, I installed GNU/Linux on it, and to keep things simple I chose Debian with a combination of light applications: XDM as login manager, i3 as window manager, xterm as terminal emulator, GNU Emacs as editor, claws-mail as mail client. The only fat cow on this laptop is Firefox.

As I started saving passwords and stuff on its disk, I decided to encrypt my home folder and my choice for this “simple” setup is ecryptfs. It’s secure enough, simple to setup and integrates very well with other Debian stuff.

This ThinkPad also has a fingerprint reader, correctly recognized by the operating system and I enrolled one of my fingerprints. It’s very comfortable, and it allows me to “sudo su” to root even with people around without fearing shoulder-surfing.

The first login though, it still requires me to input my password as it is needed to unwrap the ecrypts passphrase. And here is where the problem arises: first login is usually done via display manager, namely XDM.

As far as I know XDM, being an old-times thing, has no clue about fingerprint readers and stuff, but the underlying PAM does. And since XDM relies on PAM for authentication… It works, kinda. Basically you either swipe your fingerprint and log in without all your files (you didn’t input your password ⇒ your passphrase was not unwrapped ⇒ your private data was not mounted in your home) or wait for the fingerprint-based login to timeout and XDM to prompt you fro your password.

So if you are okay with waiting 10-15 seconds before being asked for your password then you can stop reading, otherwise keep reading to see how to fix it.

Long story short, PAM in Debian (my guess is that it works pretty much the same for other distributions too) has some nicely formatted, application-specific configuration files under /etc/pam.d . One of those, is /etc/pam.d/xdm and defines how PAM should behave when it’s interacting with XDM.

If you open it, you’ll see it actually does nothing particularly fancy: it just imports some common definition and uses the same settings every application uses that is, try fingerprint first, then fall back on password if it fails or times out.

Such behaviour is defined in /etc/pam.d/common-auth and it is just fine for all other application. For XDM though, it’s advisable in my opinion to ask right out for password and just don’t ask for fingerprint swipe.

My fix for this problem is then to:

  1. open /etc/pam.d/xdm
  2. replace the line importing /etc/pam.d/common-auth with the content of such file
  3. alter the newly added content fo ask right away for the password

tl;dr:

pam-xdm-right

Now XDM is going to ask for the password straight away.

Learn how to use GNU info

Recently I’ve been digging a lot into GNU/Linux system administration and as part of this, I have finally taken some time to google about that mysterious info command that has been sitting here in my GNU/Linux systems, unused for years.

Well, I can tell you, it has been a life-changing experience.

Texinfo-based documentation is awesome.

In this article, I want to share why is info documentation cool and why you should read its documentation if you didn’t already.

First, some terminology.

  • info: the command-line tool you use to read documents written using the texinfo format.
  • Texinfo is a document format and a software package providing tools to manipulate texinfo documents.

Texinfo was invented by the Free Software Foundation to overcome the limitations of manpages such as “un-browseability” (man pages are really documents supposed to be printed to paper but rendered to the terminal instead) and the lack of hyperlinking capabilities in the format.

GNU info was designed as a third level in documentation. If you take a typical program from the free software foundation, you can expect it to have three level of documentations:

  • the –help option: quick, one- or two-screenful of documentation
  • the man page
  • the info manual

Try and do “ls –help”, “man ls” and see the difference.

So info documents are documents that can be divided into sections, browsed only in parts, can have links to other pages of the documentations and can have links to other pieces of documentation as well! Also, they can be viewed with different viewers.

How do I learn to use info ?

Well, if the right way to learn about man is “man man”, the right way to learn about info is “info info”, and indeed such command will teach how to use the info tool.

Basically, you can go browser documents going forwards and backwards between nodes using n/] or p/[. Scrolling happens with space or backspace.

That is really the basic usage. Now go and type “info info” in your terminal.

The game-changer

As I said earlier, Texinfo Documents can be viewed outside the terminal too, while retaining all of their capabilities. I you have ever read some documentation on the webside of the Free Software Foundation then congrats, you have been reading a texinfo document translated to HTML.

For me, the game changer has been reading the GNU Emacs manual (a texinfo document) using GNU Emacs itself! They keystrokes are pretty much the same as the terminal ones, but you get variable-sized fonts and different colors for, say, hyperlinks and stuff like this.

Being able to read the Emacs manual inside Emacs is a game-changer for me because every time I don’t know something about Emacs, I can just start the manual and look it up. Clean and fast, no browser required (my CPUs are thanking me a lot)

Writing Texinfo documents

Here is where, sadly, the story takes an ugly turn.

I didn’t dig this topic very deep, but as far as I’ve seen, Texinfo documents are a major pain to write. The syntax looks quirky, but it seems worth it as Texinfo documents can be exported to HTML and PDF too.

There should be an org-mode plugin to export to Texinfo, but I couldn’t get it to work.

Again, I have to dig this topic a bit more, but it seems quite worth it.

Some notes on bash shell programming

I recently began working as a GNU/Linux System Engineer at a software company. My job involves answering to tickets from customers, and take action in the case of blocking issues. In doing all this, I am reading a lot of bash scripts, and I am also deepening my knowledge of the Bash shell as a programming environment.

As someone who has been toying with software engineering for a while, I am both pleased by bash shell programming and a bit displeased by how scripts are often written, mostly due to a widespread non-application of some basic software engineering principles when writing shell-script. As if bash scripts were not software that like any other software has to last for a possibly long amount of time, will need maintenance possibly by different people over time.

In this article, I want to take some notes about problems I have faced and things I have noticed. Not all of this directly relates to my current employer, a lot of what I’ll be writing about is something that has been floating in my brain for a while.

Common sense: avoid side effects

Some operations can be easily done by “tricks” that you know due to the intimate knowledge of how bash works.

Real world example: your scripts rotate logfiles, and after a logfile has been copied, you have to flush its content and make it a blank file again.

This is a good way to do it:


user@host $ truncate -s 0 $FILENAME

And here is a not-so-good way to do it:


> $FILENAME

The main difference between the two ways of doing the same thing is that the former  explicitly accomplishes the task while the latter does the same by exploiting a side effect of the bash redirection operator. The former is only a shot of man  away from having a clear understanding of it, while the latter is quite far from being easy to understand unless you already know it or you have seen it before.

Common sense: use side effects

Despite what I have said earlier, there are some side-effects explicitly built into Bash in order to let scripters solve a particular kind of problem in a fast, concise and effective way.

Scenario: you are writing an init-script for something. You spawn a process, and then you are supposed to save its PID to a pidfile.

The bad way to do it:


user@host: ~ # ./my_service -a arg1 -b arg2

user@host: ~ # ps auxwhatever  | grep "my_service -a arg1" | awk '{print $3}' > PIDFILE

A good way to do it:

<br data-mce-bogus="1">

user@host: ~ # ./my_service -a arg1 -b arg2 <br data-mce-bogus="1">

user@host: ~ # echo $! > PIDFILE

The “good” way is good because:

  • It uses a bash featured  explicitly designed for this situation
  • it’s easy to read, as it is an usual echo output redirection scenario
  • it’s clear that $! is a special variable, and as thus it’s clear where to look it up (just search “special variable” in the bash manual or something like that)

On the other hand, the “bad” way:

  • combines many commands, each of them with many possible options, together
  • might be using obscure/arcane/unusual parameters you’ll have to look up in order to have a clear understanding of that the code is doing

If you expect parameters, at least check their presence

This might be obvious, but I’ve seen a number of scripts failing SILENTLY because of this. If your script or your function expects some parameters, you’d better be checking their presence (again, this is is basic software engineering: validate input, always).

This is particularly true because parameters are positional and depending on how you traverse the parameter array (via $* instead of $@, for example), you might get different behaviours.

A basic check might be just checking that the number of parameters ($#) is correct (unless you’re writing “variadic functions” in bash script… that’s cool).

Protip: if you’re writing debug routines, don’t just print “\$variable=$variable”. It helps (a lot!) to know what values were expected.

A better debug code could be one of these lines:


echo "\$VAR1='$VAR1' [expected values: 'A', 'B' or 'C']"

echo "\$VAR1='$VAR1' [expected value x should be 0 < x <= 100]"

The rationale in this is that a debug line telling you the status of a variable actually tells you very little. You will have to go read the whole script anyway in order to understand whether such value is okay or it isn’t. But when you see the value of a variable and the range of values it is at least supposed to have, there’s no need to go further: half of the problem (finding the bug) is done.

Pay attention to quoting

Quoting is good, but pay attention to the single-quote vs double-quote, and pay double-attention if you’re programmatically running a script on a remote machine using ssh.

I had to deal with a nasty bug of logfiles not being correctly collected and rotated because of a like like this:


ssh $USER@$HOST 'command $ARG1 $ARG2 /a/path  $ARG3 /another/path'

What’s wrong with this code? Nothing? Yup, it’s correct. Or at least, it’s syntactically correct.

What I didn’t tell you is that one of the argument (let’s say $ARG2) was locally-supplied, while the others were local variables of the remote machine.

Can you guess what went wrong?

I am sure you can, but in case you’re wondering, single quotes do not perform variable substitution, and the local argument ($ARG2) was not being supplied. On the other hand (on the other side of the ssh connection, that is, on the remote machine) all of the remaining variables were being correctly substituted, so the script was running almost correctly, except for the part that used the unbound variable.

In this case, we/I had to switch from single quotes to double quotes, and escape dollar signs for the remote variables. The script ran.  (Actually the script ran into another bug, but this is a story for another post… (oh and btw we/I fixed the other bug too)).

For the christ’s sake, comment your scripts

This is old, everybody says that. But shell scripts are way more neglected than scripts in other languages, for some reason, despite the fact that quite a number of smart tricks are possible in shell scripts.

In my opinion, the very first five or ten lines past the sha-bang should describe what the script is supposed to do and what its parameters are. Better: include a synopsis i.e. an example on how to call such script with the various parameters.

Protip: past the aforementioned five or ten lines, spend a couple more lines documenting exit codes (you’re using exit codes, right?).

The rationale behind commenting is that another person in the future (or possibly yourself, still in the future) is not supposed to have to read the whole script.

For christ’s sake, use meaningful variable names

i, j, k. When I see so-named variables I am sorry I don’t live in the US and can’t get a firearm and bullets at the mall down the road.

No rant here, just a hint: if you really have to use short variable names, at least write a comment to say what said variables is going to hold, especially if you’re going to reference those variables outside of the function where you define and initialize them (ah, the joys of the global scope).

Effective debug lines: some useful variables

I want to close this post with a couple of hint for writing more-effective debug lines, and a tip to proper and further documentation.

First things first: useful variables.

First of all, if a workflow (a nightly cronjob or something) is involves running many scripts one after another and redirection all of their output to a particular file, and if said scripts call other scripts, it could be a good idea, when writing debug lines, to say what shell script is being executed. In order to do so, at the beginning of your script, the $0 variable will hold the complete path to the script.

If you are writing a debug message for a function, it would be useful to know the line that is causing problems too. In this case, at any point in the script, $LINENO is a variable holding the current line in the script. It can be a very good hint on where to look.

In the end, I want to say that the best documentation about the bash shell, the best “book” I have stumbled into is the bash man page itself. It’s quite long and very exaustive.

My suggestion is to convert it to pdf and print it (if you like to read on paper or like me, like to use an highlighter)


man -t bash | ps2pdf - bash.pdf

 

Well… I think that’s all for now 😀

Building GNU Emacs from sources

I want to look at the GNU Emacs source code because I have some ideas I want to try and implement.

If you want to write patches for an open-source project, the first thing to do is to check out the latest version from the repository, make sure it compiles and runs. At this stage, you also want to make sure that all of the tests passes, so that you can start working on a verified assumption that you branched from a fully working commit.

So for short this post is a micro-tutorial on building GNU Emacs from source on a Debian-compatible (read: apt-based) system.

The first thing to do is to gather all the required build dependencies:

sudo apt-get build-dep emacs24

This alone will install all of the necessary building dependencies.

Next step is to check out the source code. You can find the url of the repository on the Savannah page:

git clone git clone -b master git://git.sv.gnu.org/emacs.git

At this page you might want to slow down a little and skim through the INSTALL file. It contains instructions for building.


At this point you’re al most ready for compilation. You have to run the auto-tools and generate the Makefile:

./autogen.sh

./autogen.sh git

And now you can generate a Makefile with all the features you want (you can look up some of the configure options in the INSTALL file, or run ./configure –help):

./configure --with-x-toolkit=gtk --with-cairo --with-modules

And now, finally, compile:

make -j9

On my machine (ThinkPad W530, quad-core i7 and 7200rpm rotational hard drive), it took about five minutes:

real	5m9.283s
user	21m21.412s
sys	0m48.284s

You will now find the newly built emacs binaries in the src/ directory.

Have fun!

I wrote a toy url shortener and people started using it

So yesterday I gave an introductory presentation on how to write web applications using Python, Flask and dokku. In my talk I shown how to go from zero to a fully functional application with a small but functional deployment environment based on Dokku.

The nice thing I noticed is that people started using it, and now I have some URLs shortened through my system.

The application is called musho, as in Micro Url SHOrtener. Here it is live, and you can find the source code on Gitlab.

I hope you like it 🙂