Monday, 27 May 2013

GStreamer toy application template

Rationale

This time I'd like to share some piece of code I find very useful when experimenting with GStreamer. This includes writing some small tests to understand some features and all sort of debugging of specific cases after distilling them from larger applications. The thing is that I don't want to start over and over again with the common stuff whenever I want to quickly hack a GStreamer application. This is mostly about writing the main() function and some bits and pieces:

  • parse command line options
  • initialise GStreamer
  • create the pipeline
  • optionally add a probe
  • optionally send EOS
At the same time I want to keep it simple and not turn it into some versatile utility which would obfuscate the main idea. Plain simple. I know there's gst-template project but as I understand it's more GStreamer plugin oriented and helps with some GLib/GObject boilerplate.

The template

The goal for this "template" is to be reusable but mostly for some throw-away prototypes or tests. But still I wanted to keep the quality as high as possible, include proper error checking etc. as in practice a lot of prototype code ends up in production.

The cool thing about it is that it reads the pipeline description from the command line which is very useful when experimenting/debugging. It also offers sending EOS event upon SIGINT (-e), adding probe on a pad (-p) and verbose output (-v) if e.g. one wants see the output of a identity element. There are few gaps to fill in, all marked with TODO, e.g. the type of probe and probe return value. As I said I didn't want to over-complicate this so I left it to edit in the code.

Building and running

The source code is available here. This time I'm using uninstalled version of GStreamer's master branch. On my Debian laptop I build it as follows:

libtool --mode=link \
  gcc -O2 \
  $(pkg-config --cflags --libs gstreamer-1.0) \
  gst-test-app.c \
  -o /tmp/test
And a couple of tests:
[kris@lenovo-x1 master]$ /tmp/test -e fakesrc ! fakesink
** Message: Running...
^C** Message: Handling interrupt:  forcing EOS on the pipeline

** Message: End of stream
** Message: Returned, stopping playback
[kris@lenovo-x1 master]$ /tmp/test -e -p "sink:sink" videotestsrc num-buffers=5 ! fakesink name=sink
** Message: Successfully installed probe on 'sink:sink'
** Message: Running...
Got event: stream-start

** (lt-test:27135): WARNING **: TODO: Implement me!!!
Got event: caps

** (lt-test:27135): WARNING **: TODO: Implement me!!!
Got event: segment

** (lt-test:27135): WARNING **: TODO: Implement me!!!
Got event: eos

** (lt-test:27135): WARNING **: TODO: Implement me!!!
** Message: End of stream
** Message: Returned, stopping playback
VoilĂ ! Enjoy!

GStreamer toy application template

Wednesday, 8 May 2013

C14N for signed XML

Confession

Here's another, somewhat exotic subject which is canonicalisation, quite often abbreviated C14N. No, this has nothing to do with the pope and saints. This is about getting the canonical form of an XML file. Why? Well, XML is pretty ambiguous format where there's a lot of white space characters (for poor human readability), no strict element ordering or implicit namespaces. So what? This means that for a dumb machine two identically looking XML files may be very different. An extreme example would be a terminating new line character which is very difficult to spot. So what? Sometimes you want to find out if two XML files are identical. Or even more—whether anyone has tampered with it and whether it's been sent by someone you expect if it's transferred over the internet. This all leads to a state of the art, 21st century invention—signed XML or XMLDSig.

XML files are not perfect but some people love them. This is not going to be an overview of this format as this is a very broad topic and I'm not eligible to write such overview. I'm just a lowly engineer. Let me just say that XML is useful sometimes. I'd even dare to say that it's inevitable to use sometimes (as a dead simple INI format would raise too many eye-browses... OK, I won't be sarcastic any more). XML is inevitable sometimes. So I convinced you and you think that signed XML is cool and you want to have one, aye? Your mates will be jealous. You may be even tempted to get your partner's name (or your pet's name if you don't have a partner or anyone or anything you're attached to) into XML form, sign it and tattoo on your thigh. I'll leave it to you and focus only on one mysterious step toward that goal which is canonicalisation.

Since you're still reading, you're either one of my committed readers or you're genuinely interested in XMLDSig. I won't recommend reading XMLDSig reference nor C14N unless you have a strong urge to wade through rather dry standards. I also won't recommend implementing anything from scratch. Check out xmlsec library whether it suits you. I confess I knew about xmlsec but decided to use bare libxml2 and get my hands dirty for at least two reasons—I was scared and I wasn't sure what I was doing. I wanted to reassure myself by having as much control as possible. I hope you're in a better position.

Not very discerning dissection

Briefly XMLDSig consists of three parts:

SignedInfo describes how the content is signed, KeyInfo allows to identify the signer and Object is the signed content1. Since signing and verification involve computing the hash (digest) and asymmetric cryptography, care need to be taken to sign and verify exactly the same content. Those beasts are ruthless if there's even a single byte changed. To get a canonical form of an XML content, some uniform rules need to be applied so that two logically equal XML files look exactly the same. Then the result can be either signed or verified.

Once you've got a glimpse of the whole process, you may want to (or someone forces you to) do it programmatically. Now you might be tempted to call it a day after finding xmlC14NDocSave() function in the libxml2 API documentation. But before you cross this bridge, you need to answer first what's your quest. And believe it or not, this is not easy. The catch is that the xmlC14NDocSave() takes a set of nodes you want it to operate on and in fact it doesn't do all the dirty job for you. You need to provide it a set of the right nodes in the right order. Here's the XPath incantation:

descendant-or-self::* | descendant-or-self::text()[normalize-space(.)] |
.//attribute::* | .//namespace::* | .//comment()
There's some uncertainty whether to use normalize-space(.) or not and what the relative order of attributes and namespaces should be. This unfortunately depends on who you talk to and where you received signed XML from or where you intend to send it. For example signed XML files in Adobe AIR packages require space normalization while xmlsec tool doesn't. This is mundane and brutal reality of format incompatibilities. Beware.

Getting hands dirty

Since I got your full and utter attention and you're so excited that you probably dropped some of your late at-the-desk lunch onto your smart looking office trousers (or onto your pants if you're home alone or in the United States of America), let me show you some sample code. Full source code along with test scripts is available here. Here's the essence with mundane error checking removed here for brevity:

  // Now this is old granny's secret recipe for a delicious cheesecake.
  const xmlChar C14N_CONTENT[] =
    "descendant-or-self::* | descendant-or-self::text()[normalize-space(.)]"
    "| .//attribute::* | .//namespace::* | .//comment()";

  // Get some cheese... all sub-document content we need for canonicalisation.
  const auto sinfo =
    make_scoped (xmlXPathEvalExpression (C14N_CONTENT, ctx.get ()),
   xmlXPathFreeObject);

  // And finally... bake it!
  xmlC14NDocSave (doc.get (), sinfo->nodesetval, 0, NULL, 0, file_out, 0) < 0);
The full implementation is available here. And here's how I compile it on my Fedora 18 laptop:
g++ -O3 -std=c++11 \
  $(pkg-config --cflags --libs libxml-2.0) \
  xmldsig-c14n.cpp \
  -o /tmp/c14n-exe
And here's a Bash one-liner that takes a signed canonicalised XML on its input and verifies it (provided that you have one—see below what to do if you don't). For convenience I split it into multiple lines but essentially it's a single line command.
/tmp/c14n-exe "/default:Signature/default:SignedInfo" /tmp/sample.xml |
openssl dgst -sha1 -binary -verify <(
  printf -- \
    "-----BEGIN CERTIFICATE-----\n%s\n-----END CERTIFICATE-----\n" \
    "$(xmllint \
        --xpath "//*[local-name()=\"X509Certificate\"][1]/text()" \
        /tmp/sample.xml)" | \
  openssl x509 -pubkey -noout) \
  -signature <(
    xmllint \
      --xpath "//*[local-name()='SignatureValue'][1]/text()" \
      /tmp/sample.xml | \
    openssl base64 -d)
A more convenient script is here. If you don't have a XMLDSig file at hand, you can generate one with the script I provided.
./generate-xmldsig.sh /tmp/c14n-exe > /tmp/sample.xml
./test-c14n.sh /tmp/c14n-exe /tmp/sample.xml
That's it for now. Enjoy!

[1] Actually Object is digested and the digest is included in SignedInfo which itself is digested and signed.
C14N for signed XML

Saturday, 13 April 2013

Verifying signatures with OpenSSL API

A brief introduction

This is another article where I try to tame OpenSSL API using C++11. This time round I describe a small example showing how to verify signed data programatically. There are many message formats catering for different needs. In this example I show how to verify that the data is not tampered and is sent from a party identified by a PKI certificate. Please refer to my other article to learn how to verify a certificate.

Again I want to emphasize that you should not implement this functionality as you can use openssl tool:

openssl dgst -verify test-key-pub.pem \
  -signature /tmp/signature </tmp/data
This command checks that the data stored in /tmp/data is not tampered. The tool calculates a checksum (a digest) and verifies it with the signature stored in /tmp/signature. The signature has been signed with the private key paired with the public key stored in test-key-pub.pem. If there's a certificate associated with the public key available, it can also be verified to see wether the data hasn't been signed by an intruder in the middle.

As you can see, there's no need to invent the wheel if your requirements are simple enough. Depending on the circumstances this approach might not be sufficient or acceptable, and only then you should come to grips with your own implementation.

Producing or verifying a signature is rather expensive operation as it involves asymmetric cryptography. In practice a digest is produced first (e.g. using SHA1) and then the digest is signed with one of the asymmetric keys. The verification comprises applying the same digest function to the received data and checking whether the signature of that digest "matches" when using the other key of the asymmetric pair. You don't have to worry about these details though as they are hidden behind the OpenSSL API. Hopefully this also allays concerns about the use of the openss dgst command which stands for "digest". The signature is simply another step in the process of digesting data.

Code

Code for this example is available here. There's also a very basic test script provided.

The main three functions we are going to use are EVP_VerifyInit_ex(), EVP_VerifyUpdate(), and EVP_VerifyFinal(). The first two of them are simply aliases (macros) of equivalent "digest" functions. Of course you shouldn't abuse them and better use the macros provided to be explicit about the intentions. Please also note that in general *_ex() versions of OpenSSL API functions are recommended if available as they are more general and allow you to use an engine. If you don't intend to use an engine simply use nullptr.

The "trinity" is a common pattern in the OpenSSL API.

zVZNb9wgEP01PrYyptlkj+02aXuoVCmq2p4qamZtFGwsjHft/PpCwR8Y73Y3WiW5WPAGhpk3jzER3hTtJ0mq/KugwKMkpm2EP0ZJchPH+muAbgZkklELIQc0jELtQUoIrljlg6koS0iVh20F951VJIMAuE8JD9EfjKrcBZesRvwzsCzvj0GrtbXUqut9UNiShqs3/6DEmtvY2hK3vHPz695v6Z3/KEThARJq9ujHuGUuCEfcHyEpSA/irHyYkoRvI7yRQig7KtoNcFOVnnG77e6AdeBGQqlO2eBy3xHeuNC/lDroOV96k66lnnzY50zBfUVSY9lr3YQnuiB2IBW089S02EAUoGRnKLdW7LhwOnvnpvtJdR2UTwrbY8TRlw1+x3z1wKW8nD4O0v9eUaLgmQlAVy/GwCpg4I6VhLP6uTlIrl+Mg3XAQZB72sgdmOVIpw8tUz/1OH575Wa/zMyMS330xGSm1naIoFo0MgXvMioiM1CeQIF6PTEkccISihdoGkAJnCi28/vUEnnujG+C6YiHKt34RRqq1Huw6bhN074z84NmjtDckSUhcPReStJNllVmQX043iHvPuD4eFirY8v1wJ4/qmoowUlCQ3GgtEJI+K1bDtGwQQwYbwk390/fTdwSraP/yPFs0b0aOaH18qU/V04YHfdzITUNf+3T1BTclgurKfx7eZoZhosaU7IJW/xym5s0OU9vaKq2hafHQovDr7jFdTM/Z/e0+cZL9bDZn7F/LTxRR3o6vi/t8vH5j2//Ag==
First you initialise the algorithm, then there's a one or more updates that feed the algorithm with data, and in the end you finalise the algorithm. The update step allows to process "streamed" data, i.e. you feed the algorithm with data as it arrives. If all data is available at once, you can make only one update call. In many situation though you might want to process data in chunks, e.g. when you read a large file or from a network socket.

More details about the API used in this example are available in the manual so there's no point in duplicating them here. If you're off-line and have openssl-devel (or equivalent) package installed (which you should in order to compile this example), you can also use info or man pages. Don't forget to read about EVP_MD_CTX_create() and EVP_MD_CTX_destroy().

Build and test

This is how I build the example on my Fedora 18 laptop:

g++ -std=c++11 -O3 -DNDEBUG signature-verify.cpp \
  -lcrypto -o /tmp/my-verifier
I think that the most frustrating thing about keys, certificates and all this cryptographic stuff is testing. Creating test assets (key material, certificates etc.) can be truly onerous. But this is still not as hard as testing a full production system with real cryptographic material (very often hardware assited), so let's get on with it:
# generate test private key and associated certificate
openssl req -x509 -newkey rsa:2048 \
  -keyout test-key-priv.pem -subj "/CN=FakeSigner" \
  -passout pass:none -out test-cert.pem

# sign some test data
echo -n "test" | tee /tmp/data | \
openssl dgst -sha1 -sign test-key-priv.pem \
  -passin pass:none -out signature
And finally we can run our verifier:
# verify signed data
/tmp/my-verifier test-cert.pem /tmp/data signature
As the steps above are a bit tedious, you can use a test script I provide here. Simply give it the path to the verifier executable as an argument and that's it. It creates a temporary scratch directory where it generates the assets and runs rudimentary tests using the executable provided. As I wanted to keep it dead simple, it doesn't provide any additional options like preserving the scratch directory, setting verbosity level etc. It'll probably evolve in future incarnations once I've got examples in my repository a bit reorganised.
./signature-verify-test.sh /tmp/my-verifier

Happy verifying!

Verifying signatures with OpenSSL

Tuesday, 26 March 2013

Verifying certificates with OpenSSL API

I'd like to allude to my previous article where I superficially described how certificates and PKI work, at least the way I understand it. This time I want it to be more tangible and hope to make it helpful to anyone grappling with a problem: how to verify certificates programatically with OpenSSL API.

Let me make it clear—don't use OpenSSL API, don't write any C/C++ or any other application. Check out ready to use tool, namely openssl verify command:

[kris@lenovo-x1 tmp]$ cat c1.pem c2.pem c3.pem > untrusted.pem
[kris@lenovo-x1 tmp]$ openssl verify \
  -CAfile ca.pem \
  -untrusted untrusted.pem \
  leaf.pem 
leaf.pem: OK
This should cover most common cases. Sure, one may say this is not a production tool and despite been written by experts (the authors of the OpenSSL package) it's hard to audit or it doesn't provide required features from the command line. Someone may not be able to deploy the executable onto the target system due to space limitations or may not be able to execute it from their program. There are plenty of reasons why someone would want to use OpenSSL API directly. But if you're not constrained by any of those reasons, just don't, unless you want to learn something or do it for dubious fun. And the OpenSSL API is not the most beautiful one. See this article on how botched SSL APIs make it easy to introduce vulnerabilities into applications.

I doubt you can get the ball rolling after reading the rather sloppy and incomplete documentation. Maybe it's a good starting point but sooner or later you'll be discontented. The way I learned how to use the API—and I believe it's the best way—is to read OpenSSL source code and maybe run the harder bits under the debugger. Certificate verification in particular is implemented in apps/verify.c. A good code cross-reference tool might be handy as well as any grep-ish approach will be hindered by wodge of preprocessor generated functions.

It's also interesting to see how C++11 features can be leveraged to deliver robust and nice to read code. Some developers slog through C in their applications as they feel they have to because OpenSSL is written in C. Sure, I appreciate OpenSSL itself is written in C as it's portable, but writing robust code in C is really hard and unpleasant. I wouldn't bother writing an example for this article in C at my leisure. I wouldn't even do it for money if there was no good reason for it. With C++11 on the other hand it's fun!

This is a rudimentary example and there are no classes and all that fancy stuff. I wanted to keep it dead simple. You may wonder why I bother checking all error codes and throw everywhere. Well, the reason is simple—I never know when I want to copy it (ehem, re-factor) into some utility, class, library or anything else in production code. I strongly believe that it's very important to start with a prototype (the simpler the better) to explore new areas and spot problems as soon as possible. But the sad thing is that quite often prototype code ends up in a final release, most notably because "if it works, don't touch it". I see nothing wrong in reusing prototype code that proved to work really well. But the caveat is to make it robust from the outset. So all this "small" things like checking error codes, deallocating those tiny bits and pieces may not matter now, but they might do in the future. And you don't want to go through the code scouring it and looking for these little nuisances. It's much easier to deal with the details as you write the "essential" code.

Enough preaching so let's do it, shall we? The source code is available as usual on GitHub here.

I want to declare some function return types with my beloved std::unique_ptr. As this is a very lightweight smart pointer, the pointee deleter is part of its type. This is different when compared to std::shared_ptr which uses type erasure for the deleter. As I don't care too much about the performance here, I want to do the same for my std::unique_ptr:

template<class T>
using scoped_ptr = std::unique_ptr<T, std::function<void (T *const)>>;
And I also reuse a teensy make_scoped() utility from one of the previous articles. It's noteworthy to say that this might be considered as std::unique_ptr abuse as it just happens that objects returned by OpenSSL API are literary pointers. Had they been some sort of handles only, I wouldn't have used std::unique_ptr here. And I named these utilities scoped_ptr and make_scoped to make them different from proposed std::make_unique().

Before you can call X509_verify_cert(), you need to prepare certificate store context which comprises ultimately trusted root certificate, untrusted intermediate certificates and the leaf certificate. This is all done in the main() function and is fairly easy to follow. Quite challenging is providing the actual error message. In this example I limited it to loading some rather succinct message from the OpenSSL library and extracting certificate expiry dates as I've found it quite common verification failure reason (dealing with a lot of generated short-term test certificates). See the formidable print_verification_failure_msg(). Basically OpenSSL provides verification algorithm but you could provide your own. Believe me though—you don't want to do it. So here's a simplified sequence (please refer to the source code for detailed error handling):

// Let's create a certificate store for the root CA
const auto trusted =
  make_scoped (X509_STORE_new (), X509_STORE_free);

// The lookup method is owned by the store once created
// and added to the store
const auto lookup =
  X509_STORE_add_lookup (trusted.get (), X509_LOOKUP_file ());

// Load the root CA into the store
X509_LOOKUP_load_file (lookup, argv[1], X509_FILETYPE_PEM);

// Create a X509 store context required for the verification
const auto ctx =
  make_scoped (X509_STORE_CTX_new (), X509_STORE_CTX_free);

// Now our untrusted (intermediate) certificates (if any)
const auto untrusted =
  read_untrusted (argv + 2, argv + argc - 1);

// And our leaf certificate we want to verify
const auto cert =
  read_x509 (argv[argc - 1]);

// Initialize the context for the verifiacion
X509_STORE_CTX_init (
  ctx.get (), trusted.get (), cert.get (), untrusted.get ());

// Verify!
const int result = X509_verify_cert (ctx.get ());

To build and try the example I did the following on my Fedora 18 laptop:

[kris@lenovo-x1 kriscience]$ g++ -std=c++11 -O2 -DNDEBUG \
cert-verify.cpp -lssl -lcrypto -o /tmp/test
[kris@lenovo-x1 kriscience]$ !$ ca.pem c1.pem c2.pem c3.pem leaf.pem
Verification OK
This is very trivial example but it's a foothold for someone starting their adventure with OpenSSL API. A more representable application would use policy verification, time parameter and CRL checks. Maybe one day I'll present it here.

Saturday, 2 March 2013

Supporting trusted but untrusted certificates with OpenSSL

This time for something completely different I'll broach a bit intimidating area—PKI certificate chains that link back both to trusted and untrusted root certificates. That is, how to recognise different trees from quite a long way away.

For some of the readers that know at least a little bit about the matter this might be a quick and easy recipe to solve their problem if they understand it. For those who have no idea of what I'm talking about this still might be an interesting reading if they like exotic trips. Don't worry though, as I'll provide some basic introduction without going into much detail. There's already a smorgasbord of material to learn from if someone's interested.

I won't even bother providing links as a reference (except for salient ones) as I find it overwhelming and distracting when a very specific story is sprinkled with lots of links for everything. Sure, one may ignore them (so I'd waste my time providing them) but equally one may look up something on their own if they have an urge to do so (whatever you look for, if you need an authoritative information, always reach out for RFCs). This is what I find to be a pragmatic approach which I think is different from purely scientific one.

A bit of introduction

PKI stands for Public Key Infrastructure and is a monster which a lot of people find hard to get on with. For this article it's only important to know that it provides means of distributing some cryptographic material referred to as public key in a trusted manner.

What is a public key you may ask and why is it public? Some smart guys in the past have combined a bunch of mathematical operations with huge random numbers and have come up with something known as asymmetric keys. Keys in turn are specially crafted strings of digits used as an input for ciphers. So you have: data + key -> cipher => blob (a `cipher' is a function that eats a `data' and a `key' and spews out a `blob'). Asymmetric means that there's a pair of keys related to each other through some mathematical operations—you can use one to encipher data and the second one to decipher it and the other way around. If you keep one of them secret and give the other one to everyone else you have public-private key pair.

Publishing keys

When you give out your public key, someone may want to use it to encipher something that is only intended for you (some secret) as you are supposed to use your private key to decipher it. The problem is that by simply publishing your public key there's no guarantee that it's actually your key. No one can really trust it to encipher secrets they want to share only with you. Here is where certificates come into play.

Certificates leverage at least two properties of asymmetric ciphers: authenticity and integrity of data. Again, using some smart mathematical operations and having one of the asymmetric keys, one can tell that the data has not been tampered with and has been ciphered with the other key of the same pair. This process is known as signing and verification and the data exchanged between parties is called a signature.

Certificates are structured bits of information where the essential part is a signature of one party's (A) public key created with a more trusted party's (B) private key. Such signature in turn can be verified with a more trusted party's (B) public key which again can be signed by a more trusted party's (C) private key (another certificate). This forms a chain of trust (chain of certificates) which ends up at the top with a root certificate.

Root certificates

Root certificates are special as there's no one who would signed them. They are in fact self-signed. So how does it all make sense? At the very top is a human. It might be you or a system administrator (some people claim they are not humans). No matter how complex processes and policies are employed, it's always a human being that makes the ultimate decision: I trust something or not.

Root certificate bundles

A system that is supposed to use PKI needs a set of trusted certificates also known as a root CA certificate store (CA stands for Certificate Authority) or a root CA certificate bundle (CA bundle for short). Whatever chain of certificates it needs to verify, it expects that the top-level untrusted certificate in that chain is signed with one of the root certificates stored in the root CA bundle which it ultimately trusts. As a PC user you get a CA bundle onto your system usually distributed with a web browser so you can use HTTPS protocol. You might have been prompted by a browser with some sort of security exception pop-up when visiting a website using HTTPS protocol whose authenticity couldn't be verified by the browser based on the CA bundle installed (actually the certificate chain sent to your browser as part of the TLS protocol could not be verified or for example the website domain name did not match the one signed by the leaf certificate). The browser in such situation might leave the decision up to you whether to trust the website or not.

Summary

This short introduction just scratched the surface. It's not even a top of an iceberg. There's a lot more to talk about, like on what basis a human being can make a decision to trust an authority (or simply someone else's public key) or how to ensure key privacy, what different key usages are and how they are ensured and enforced, cipher suites properties etc. Firstly, I'm not an expert nor a scientist, and secondly, it's not directly related to the rest of this article.

What this article is actually about?

Given the introduction above or the knowledge you might have already had and a CA bundle, you might face a situation when you need to verify a certificate chain as follows: A signed by B signed by C signed by D but C is in your CA bundle (your system trusts it ultimately) and D is not (your system doesn't trust it). I'm pretty sure it happened at least once in everyone's life, even in my dog's life which I actually don't have.

You may wonder how it's possible to have certificate C in the bundle (self-signed) and at the same time have it signed with certificate D. Here's where I'd like to refer to some external resource where it's nicely depicted. Note that some people may burble here something about cross-signing but it's not specific only to this situation as cross-signing simply refers to anything else than self-signing. Don't get confused by the diagram on the website I referred to. The article there says that one certificate is signed by two other certificates. But it's not a certificate that is signed nor any certificate signs another one. It's keys what actually gets signed. So by saying that certificate B signs certificate A, one (hopefully) means that the public key represented by certificate A is signed by the private key corresponding to the public key represented by certificate B.

What's the catch?

The catch is that if use OpenSSL, you may not be able to verify a chain which contains a root certificate included in the system CA bundle but signed in that chain by some other non-trusted root certificate. This is not what is expected in many situations. Fear not my friend as you're not left alone. OpenSSL has a solution for this situation: X509_V_FLAG_TRUSTED_FIRST verification flag. It tells it to not follow the chain once a trusted certificate is found.

Now the problem is that it's not available in all OpenSSL versions. The way I understand OpenSSL releases is that at the time of this writing there are three "production" branches available: 0.9.8x, 1.0.0x and 1.0.1x. You're very likely using one of them. None of them supports X509_V_FLAG_TRUSTED_FIRST though. All you need to do is to apply the following patch from the OpenSSL mainline. Now given a naughty certificate chain in chain.pem and a CA bundle in ca-bundle.crt you may try:

[kris@lenovo-x1 kriscience]$ openssl verify \
-CAfile ca-bundle.crt -untrusted chain.pem chain.pem

chain.pem: C = US, O = "VeriSign, Inc.", OU = VeriSign Trust Network, OU = "(c) \
2006 VeriSign, Inc. - For authorized use only", CN = VeriSign Class 3 Public \
Primary Certification Authority - G5
error 20 at 2 depth lookup:unable to get local issuer certificate
but if you use a new -trusted_first option, it should succeed:
[kris@lenovo-x1 kriscience]$ openssl verify \
-CAfile ca-bundle.crt -trusted_first -untrusted chain.pem chain.pem
cert.pem: OK
Now all you need to do is to convince your client application to use X509_V_FLAG_TRUSTED_FIRST option. For example if you are using libcurl, you may want to apply this patch:
--- a/lib/ssluse.c
+++ b/lib/ssluse.c
@@ -1651,6 +1651,26 @@ ossl_connect_step1(struct connectdata *conn,
           data->set.str[STRING_SSL_CRLFILE]: "none");
   }
 
+  if (1) { // artificial scope to keep the patch local
+    X509_VERIFY_PARAM *x509_param = X509_VERIFY_PARAM_new();
+      if (!x509_param) {
+          failf(data,"failed to create X509 verification parameter");
+          return CURLE_OUT_OF_MEMORY;
+      }
+
+      if (!X509_VERIFY_PARAM_set_flags(x509_param, X509_V_FLAG_TRUSTED_FIRST)) {
+          failf(data,"failed to set X509 flag - trusted certificate first");
+          return CURLE_PEER_FAILED_VERIFICATION;
+      }
+
+      if (!SSL_CTX_set1_param(connssl->ctx, x509_param)) {
+          failf(data,"failed to set X509 trusted certificate first parameter");
+          return CURLE_PEER_FAILED_VERIFICATION;
+      }
+
+      X509_VERIFY_PARAM_free(x509_param);
+  }
+  
   /* SSL always tries to verify the peer, this only says whether it should
    * fail to connect if the verification fails, or if it should continue
    * anyway. In the latter case the result of the verification is checked with

Saturday, 2 February 2013

Handling POSIX signals

When programming for *nix systems, like it or not, sooner or later your application will be bothered with POSIX signals. And even if signals are considered to be a broken design, you have to live with them, hammered with some trepidation. And even if you've tinkered with it and managed to install some handlers, you can't really do much in a signal handler context. You can merely set some flag and get out of there if you don't want to get into trouble. And if you have to whack some threads in, you feel completely screwed as there's no certainty whatsoever of which thread would receive signals. A bit intimidating, isn't it?

There's an attempt to remedy this creepy situation with sigwait(3) and there's a nice article about it. So fear no more, you're not stranded as there's even a better solution: signalfd(2). It gives you a file descriptor so you can choose how to handle signals—be it blocking read(2) in a separate thread, some sort of poll(2) which nicely integrates with polling loops (GMainLoop, DBusWatch etc.) or anything else you'd like to do with a file descriptor.

The problem

Before I show an example of one approach to signal handling, I'd like to elaborate a bit more on problems you may encounter with "traditional" signal handling. It'll be easier for me to use a multi-threaded application example although single-threaded applications are also negatively affected but in a more subtle manner. Feels a bit contrived but I've seen it in the field.

Let's suppose someone wants to use a thread synchronization mechanism (e. g. a mutex) in a signal handler. Signals are considered to be software interruptions and their handlers are executed uninterrupted. If someone attempts to protect some shared (global) data with thread-based locks, it will apparently lead to a deadlock:

int globalData;
std::mutex globalMutex;

void foo()
{
  // ... some stuff

  {
    std::lock_guard<std::mutex> lock(globalMutex);
    globalData = 1; // deal with the global data
  }

  // ... some other stuff
}

void handler()
{
  std::lock_guard<std::mutex> lock(globalMutex);
  if (globalData == 0) // deal with the global data
  {
    // ... something miserable
  } else {
    // ... something funny
  }
}
It's conceivable that foo() can grab the mutex right before the process receives a signal. The handler() function is called as the signal handler. It tries to get the mutex without a success and it ends up with a deadlock since it is not scheduled out and doesn't let foo() release the mutex. Completely botched.

On the other hand, we cannot leave global data unprotected since there's no guarantee that the data can be accessed atomically. We could use atomics here but some architectures still use gUSA to provide atomicity which might complicate things in the context of a signal handler and in general the resource you want to protect might not be as simple as a single atomic structure.

The solution

I chose to civilize signal handling by making it thread-friendly (I can use mutexes and what not). The signal handler runs in a dedicated thread but there are other ways of doing it as the signal delivery is serialized by signalfd(2). You may even use it in a single-threaded application where you will probably have some sort of a main loop and check for any signals delivered to the application process at your leisure.

The example application is available here. Below I present only the main idea.

// protect some shared resource, let it be the standard output
std::mutex g_mutex;

void signal_handler(const int fd) {
  struct ::signalfd_siginfo si;
  
  while (const int bytes = ::read(fd, &si, sizeof(si))) {
    std::lock_guard<std::mutex> lock(g_mutex);
    
    switch (si.ssi_signo) {
    // Handle signals here
    }
  }
}

int main() {
  ::sigset_t mask;
  ::sigfillset(&mask);
  // block all signals
  ::pthread_sigmask(SIG_BLOCK, &mask, NULL);

  // unblock all signals but deliver through a file descriptor
  const int fd = ::signalfd(-1, &mask, 0);
  
  std::thread t(signal_handler, fd);
  
  // do some other stuff
}
First it blocks all signals and then unblocks them in the signalfd(2) call so they all are delivered through a file descriptor. The signals do not affect the application (except for SIGKILL and SIGSTOP) until the application reads from the descriptor and decides what to do about them. Nice and elegant.

And here's an example session on a x86 Linux PC with GCC 4.7.2.

$ g++ -O3 -std=c++11 signal-handler.cpp -pthread -o /tmp/test
$ !$ &
[1] 4135
$ kill -SIGUSR1 %
Received signal 10
$ kill -SIGUSR2 %
Received signal 12
$ kill -SIGTERM %
Quitting application...

Bear in mind that any approach to signal handling assumes you have control over the main thread of the application. If you are writing a library that is called from the main application that you can't control, then you're thrown back at its way of signal handling and other libraries it uses. This is a difficult situation and frankly there's no robust solution to it. Every new thread inherits signal mask from its parent thread, so when it's started it potentially has all signals unblocked until it blocks them with pthread_sigmask(3) call. Until that point new threads may still receive unsolicited signals.

Having the caveat above in mind I wish you best luck with signal handling. It doesn't have to be difficult. Just try to push traditional way of doing it toward oblivion and embrace signalfd(2) as soon as you can.

Saturday, 19 January 2013

Sneaky resources

What's the fuss all about?

Every so often an application has to deal with some sort of resources. There's been plenty of books and articles written about it and they are all useful. They even attempt to go beyond managing memory as this is only one of many resources. The resources often mentioned are network connections, database connections, file system objects and the like. Not only the application has to release them when it's done with them but also has to ensure that they are not held when errors occur and a different execution path is taken.

In C++ error conditions are very often signalled by throwing an exception which automatically unwinds the call stack. This is very handy if one knows how to leverage it. For unwary this could be pernicious though. In some ways C is more convenient in this regard as there could not be any exception thrown when a C function is called. The real problem I often see in code is when both worlds have to coexist:

  • C++ -> C call
    The most likely scenario is when a C++ code calls some C functions that return some sort of handles (very often pointers) to some resources (or simply memory) which are expected to be released with another (complementary) C function call.
  • C -> C++ call
    The other scenario is when a C code calls some C++ function (quite often some sort of C++ callback function registered with a C API). Care need to be taken to not let any exceptions back into C as this causes undefined behaviour. If you're lucky the program aborts immediately, otherwise you have to gape at meaningless call stacks and waste some time to figure out what happened. In this situation (if you know that C calls C++) a good idea is to set a breakpoint on a library routine that is responsible for delivering exceptions (for example __cxa_throw in glibc) and laboriously debug the program to find which exception gets into C.
In this article I want to focus on the C++ calling C scenario.

Most of resources are released by the operating system if the application exits or terminates. So network connections, file descriptors and all sort of sockets are closed, memory is released etc. There are some less obvious resources though which result in leaving undeleted temporary files behind, unreleased IPC resources or (for paranoid) unwiped memory (RSA encryption/decryption). I admit I haven't ever attempted to deal with application recovery and I'm usually happy if the application at least reports some error message and simply exits. But in the face of a less obvious resources leakage and simply for diligence I like to have all toys tidied up nicely no matter what bad course the application has taken. This does not include an immediate abortion which is really horrible and there's not much one can do about it. Even if releasing resources on application exit doesn't matter now, some bits of code or the entire application may get reused somewhere where it matters.

Some unwary programmer may write something like this:

void
foo()
{
  char *const p = get_some_c_string();
  std::string s(p);
  // do some fancy stuff with the string
  free(p);
}
Now this is a typical scenario when talking about exceptions. In this situation a lot of operations with std::string (including its creation) can throw. This will inevitably leak memory returned by get_some_c_string(). I won't dwell on it as a lot has been said about it elsewhere and it usually ends up using some sort of smart pointers.

What to do? What to do?

Let me introduce some small utility function that creates a scoped pointer that draws upon std::unique_ptr from C++11.
template<class T, class D = std::default_delete<T>>
constexpr std::unique_ptr<T, D>
make_scoped(T *const ptr, D deleter = D())
{
  return std::unique_ptr<T, D>(ptr, deleter);
}
Now let's assume that one wants to verify a certificate using OpenSSL. This task itself could probably make a lot of people nervous not to mention they would have to manage related resources properly. Now let's focus on adding some policies to the verification process:
void
addPolicy(X509_VERIFY_PARAM *const params, const std::string& policy)
{
  auto policyObj =
    make_scoped(OBJ_txt2obj(policy.c_str(), 1), ASN1_OBJECT_free);
    
  if (!policyObj) {
    throw std::runtime_error("Cannot create policy object");
  }

  if (!X509_VERIFY_PARAM_add0_policy(params, policyObj.get())) {
    throw std::runtime_error("Cannot add policy");
  }

  // committed
  (void)policyObj.release();
}
An interesting use case here is a "transactional" approach. Note that OpenSSL functions that have 0 in their name take ownership of objects passed to them. This is explained in the notes section here but it may appear to you to be the opposite if you read it for the first time. The addPolicy() function creates a managed policy object and do not hesitate to throw if something goes wrong. But once we successfully passed the ownership of the policy object to another object, we can give up its ownership in our scope (remember that C functions do not throw) and the "transaction" is "committed". Nice and easy. Then some part of initialization of the verification process could look like this:
const auto params =
  make_scoped(X509_VERIFY_PARAM_new(), X509_VERIFY_PARAM_free);
  
(void)X509_VERIFY_PARAM_set_flags(
  params.get(), X509_V_FLAG_POLICY_CHECK | X509_V_FLAG_EXPLICIT_POLICY);
(void)X509_VERIFY_PARAM_clear_flags(
  params.get(), X509_V_FLAG_INHIBIT_ANY | X509_V_FLAG_INHIBIT_MAP);
  
addPolicy(params.get(), "1.2.3.4");

// set up other stuff and do the verification
// ...
This is even nicer. It's a bliss and harmony.

I mentioned that letting the program exit without releasing memory shouldn't make any harm as the operating system would reclaim it anyway. But if you're paranoid and deal with some secrets in memory, then this becomes an issue. To address this problem just do the following:

void
someRsaStuff()
{
  auto rsa = make_scoped(RSA_new(), RSA_free);
  // do some stuff with the RSA key
  // ...
}
Note that RSA_free() function erases the memory.

It's also convenient to do some more complicated things with lambdas which in pre-C++11 would have to be wrapped into a functor:

const auto untrustedCerts =
  make_scoped(
    sk_X509_new_null(),
    [](STACK_OF(X509)* const s) { sk_X509_pop_free(s, X509_free); });
// now add untrusted certificates to the stack
// ...

No worries

Using some C++11 goodies can help you to manage different sort of resources and focus on the problem rather worrying about releasing stuff especially in error paths. It also gives a confidence and clarity of what is released where and how. In some scenarios you can also use the transactional approach.

Source code for the examples above is available here. On my Fedora 17 I built it as follows:

g++ -std=c++11 scoped-ptr-example.cpp -lcrypto -o /tmp/scoped
For those who can't afford using C++11 but can use Boost libraries, the following could be a replacement for a scoped pointer with a custom deleter:
#include <boost/function.hpp>
#include <boost/interprocess/smart_ptr/unique_ptr.hpp>

namespace detail {

template<class T>
class unique_ptr_deleter
{
public:
    template<class D>
    unique_ptr_deleter(const D& d) : deleter(d) {}

    void operator()(T* const p) throw() { deleter(p); }

private:
    const boost::function<void (T* const)> deleter;
};

} // namespace detail

template<class T>
struct unique_ptr
{
    typedef boost::interprocess::unique_ptr<
      T, detail::unique_ptr_deleter<T> > type;
};
This can be used as follows:
const unique_ptr<RSA>::type rsa(RSA_new(), RSA_free);

Saturday, 12 January 2013

Debugging threaded application crash

The story

Some time ago I had a nasty issue of an application crashing due to heap corruption. Quite quickly I discovered it was related to the intensively multi-threaded nature of the test case that reproduced the crash that happened in the field. The call stacks used to end up somewhere in malloc()/free() pair. I said they used to, as there where different ways it was crashing (different call stacks in the back trace). It was another reason to think that multiple threads were making harm to each other. To make this story more exciting (or painful to me at that time), the problem couldn't be reproduced on x86 PC. Only some specific timing conditions on the MIPSEL system could make the crash happen.

The malloc()/free() functions are thread safe (even in uClibc). I proved it to myself by looking at the back traces showing some internal locking functions called from within malloc()/free() (obviously this could also be inferred from the code). Once the memory is malloc-ed in one thread, there's no harm the other thread can do as there's no way malloc() can have internal structures corrupted from the other thread. The free() function is a different story. Although it is not possible to free memory simultaneously (corrupt allocator structures), it is possible to do it twice. Once per thread. This was my theory from the day one.

Tools

First thing to do is to throw some ready to use tools at the problem. Ideally I'd have looked at some reverse debugging tool, had I known at that time they existed for real. Recently someone I worked with referred me to UndoDB which I'd have at least evaluated, had I known about it and if it were available for MIPSEL (it's not as of the time of this writing). Most memory debugging tools have receded having their functionality taken over by Valgrind. As I was (un)lucky to observe the problem only on the MIPSEL platform, there was no chance to use it as it does not support MIPS(EL) and is not likely to support it in the near future. Another one is efence which is excellent in detecting memory corruption but it uses memory very intensively (every malloc() call results in at least two pages allocated due to the way it works). In my particular case the system was running out of memory when using efence.

It was time to get a grip and write my own rough and ready tool.

Let's crack on with it

The idea was to interpose malloc()/free() and their friends. Once they are interposed, I could put my own housekeeping information in allocated blocks to help analyse the cause of the crash. And even more, I could detect some conditions, e. g. double free() calls (thanks to Tish for the inspiration).

First of all I wanted to have some counter incremented on every free() call and once the non-zero value was detected I wanted it to abort immediately so I could analyse the state and the context. It was also useful to store some information (return address, thread ID) about the caller that malloc-ed the memory chunk and the caller that free-ed it.

To make easier finding my housekeeping information when examining memory, I wanted to put some magic and easy to spot values there. They would also serve as execution fences, i. e. with LSB set on a value interpreted as a function pointer, the call on many platforms would result in SIGBUS.

Implementation

I got to the final implementation in few iterations after making some mistakes. At least it might save someone's else time although getting through all these mistakes was enriching. The implementation is (intended to be) thread safe. Note that a naive use of mutexes here causes a lot of problems (atomic operations are used directly instead). Similarly calling printf()-like functions is not easy/possible as they call malloc()/free().

The intention of this implementation is to detect double free() call under the control of the debugger (similar to the way efence works). It is MIPSEL specific but it should be easy to adapt for other platforms. It's available here. Below I only present the gist of it.

template<class Fn>
Fn getNextFunction(char const* const name) {
  ::dlerror();
  if (void* const sym = ::dlsym(RTLD_NEXT, name)) {
    return reinterpret_cast<Fn>(sym);
  }
  
  ABORT_HERE;
}

void* malloc(size_t size) {
  size_t ra;
  asm volatile("move %0, $ra" : "=r" (ra)); // get the return address

  typedef void*(*fn_malloc_t)(size_t);
  static fn_malloc_t fn_malloc = getNextFunction<fn_malloc_t>("malloc");

  char* const ptr =
    static_cast<char*>(fn_malloc(sizeof(MallocInfo) +
    size +
    sizeof(MallocInfoBack)));

  if (!ptr) {
    return ptr;
  }

  const MallocInfo info = {
    { MAGIC1, MAGIC1, MAGIC1, MAGIC1 },
    ra, pthread_self(), 0u, 0u, size, 0u,
    { MAGIC2, MAGIC2, MAGIC2, MAGIC2 }
    };

  const MallocInfoBack infoBack = {
    { MAGIC3, MAGIC3, MAGIC3, MAGIC3 },
    pthread_self(), size,
    { MAGIC4, MAGIC4, MAGIC4, MAGIC4 }
  };

  *reinterpret_cast<MallocInfo*>(ptr) = info;
  *reinterpret_cast<MallocInfoBack*>(ptr + sizeof(MallocInfo) + size) =
    infoBack;

  return ptr + sizeof(MallocInfo);
}

void free(void* ptr) {
  size_t ra;
  asm volatile("move %0, $ra" : "=r" (ra)); // get the return address
  typedef void (*fn_free_t)(void*);
  static fn_free_t fn_free = getNextFunction<fn_free_t>("free");

  if (ptr) {
    ptr = static_cast<char*>(ptr) - sizeof(MallocInfo);
    MallocInfo* const mi = reinterpret_cast<MallocInfo*>(ptr);

    if (0 != __sync_fetch_and_add(&mi->freeCnt, 1)) {
      // this is it - someone alreade freed the memory

      // now these two bits of information are preserved in the global variables
      // as the compiler may (re)use registers and stack heavily and it's easier
      // to find out in the disassembly where these values are stored when they
      // are assigned to global variables
      gRaFree = mi->raFree;
      gTidTerminator = mi->tidTerminator;

      ABORT_HERE;
    }

    mi->raFree = ra;
    mi->tidTerminator = pthread_self();
  }

  fn_free(ptr);
}
To compile/build:
mipsel-linux-g++ \
  -shared -fPIC -O2 -o libkrismalloc.so \
  kris-malloc-mipsel.cpp -pthread -ldl
Now you can LD_PRELOAD it after transferring it onto the target system:
gdb \
  -ex "set exec-wrapper env LD_PRELOAD=/kris/libkrismalloc.so" \
  --args <app> <args>
You can also compile it into an object file and link it statically into your executable, although it's slightly less flexible.

Usage

Basically you need a bit of luck if the problem is strongly related to thread race conditions. You can make yourself luckier by reducing the test case to the very minimum and making it more thread intensive if possible. This will let you get to the crash point sooner so you can repeat attempts more often.

On my MIPSEL system I got something like this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x42361520 (LWP 28702)]
0x2aac09d0 in free () from /kris/libkrismalloc.so
Lucky me. I did x/i $pc and got:
0x2aac09d0 <free+196>:  sw  zero,0(zero)
I was very lucky. I could examine the housekeeping information in the allocated block and it's still not clobbered.

How to examine preserved information? The argument passed to free() should be passed in the $a0 register due to MIPS(EL) calling convention. If you compile the interposing library with -O2, which is useful to keep the speed and to not affect the race conditions between threads, then you may be a wee bit less lucky. The $a0 register might get reused for something else and you need to look around in the disassembly for the register where the $a0 is copied to (it has to be copied if $a0 gets reused). In my case it used to be $s0 register with additional bonus of offsetting the memory block pointer to the housekeeping information. Hence I was examining it like this (I marked some elements with (1), (2), ... to comment on them later on):

(gdb) x/20xw $s0
0xbd96c8: (1) 0xaaaaaaaa     0xaaaaaaaa     0xaaaaaaaa     0xaaaaaaaa
0xbd96d8: (2) 0x2b566d84 (3) 0x2cf21520 (4) 0x2b6ae248 (5) 0x2cf21520
0xbd96e8: (6) 0x00000088 (7) 0x00000002 (8) 0xbbbbbbbb     0xbbbbbbbb
0xbd96f8:     0xbbbbbbbb     0xbbbbbbbb     0x00000000     0x00000000
0xbd9708:     0x0025002f     0x00450037     0x006c0066     0x00730061

##
# get the name of the function that originally freed the memory
# (note that I have release build with debug information)
#

(gdb) x/i 0x2b6ae248
0x2b6ae248 <operator delete(void *, unsigned int, const char *, int, struct {...} *)+120>:  lw  gp,16(s8)
A legend (memory interpretation according to the MallocInfo structure): (1) - MallocInfo begin marker (2) - the address of the caller that malloc-ed memory (3) - the thread ID of the caller that malloc-ed memory (4) - the address of the caller that originally free-ed memory (5) - the thread ID of the caller that originally free-ed memory (6) - the size of the allocated block (7) - free() calls counter (8) - MallocInfo end marker Moving forwards and backwards with memory examination you should also find the MallocInfoBack structure. It helps a lot if you are unlucky and some part of any housekeeping information gets clobbered.

The happy end

This method helped me to track the problem down to a std::string variable being not locked properly. Adding a lock on a mutex when accessing it solved the problem. The code base was large enough to make it impossible to find the problem by simple code inspection and adding to that thread unsafe use of this particular variable wasn't obvious from the code at a first glance. So having a nice crash and a right approach can set you right on the track.

Tuesday, 8 January 2013

System V shared memory fun

Why, oh why?

Let me begin with a disclaimer that if you ever have a chance to deal with shared memory, you're better off with POSIX shared memory API or simply use files and mmap(2). You still need some means of synchronisation though, like atomic operations and good spinning strategies. System V shared memory (shmget(2), shmop(2), etc.) on the other hand is an old skool toolkit.

One of the reasons I mention it here is a nostalgia as my first exposure to parallel programming in the academia was with the use of processes and System V. If you're too cool for old skool, better start with threads instead. The very first *nix system I ever used was FreeBSD. The very first *nix system I happened to use in my professional career was Solaris. And I still use hjkl to move around in vi since once I was given access to a Solaris server terminal with no arrow keys on the keyboard and vi being the only editor available on the system (visiting a large IT company in Asia as a support engineer). For these and other reason I'm probably becoming an old grumpy man.

Nostalgia is of course not a good enough reason to mention System V shared memory. In fact I'm not that old and my career is less than ten years but I've come across a few projects where it was used. Although I was lucky to get a good start from my academia on this subject, I had to learn more about it the hard way. Maintaining and improving code around it has taught me some lessons. Whatever one may think of this API, it's still quite popular and more available in a variety of systems than the better POSIX alternative.

A smidgen of an introduction

Good old books about the theory of parallel programming talk about semaphores, processes and use a historical V and P notation for operations on semaphores. See an overview here. And if you're stuck to a fork(3) and execv(3) pair, better forget about the latter one here. It's mind bending sometimes when there's a lot of forking going on but no exec-ing. With some strategy applied we should be fine though.

Example

Let's have a look at a problem where there's a one server and multiple clients. All parties need to send (share) data somehow. They also need some exclusion mechanism to use a shared resource which in this case is a dazzling computing power (the server). A scenario where a client wants to use the server would be as in the following drama:

Scene:
  somewhere in an inter-process jungle

Characters:
  Master of puppets
  Server
  a number of clients

Appliances:
  3 semaphores: SemClient, SemServer, SemResultReady

Act I (prologue):
  Master of puppets enters the stage and initializes 1 semaphore to "up"
  (or "unlocked" if you're too cool for old skool)
  
  Master of puppets:
    V (SemServer)

  Two other semaphores are left "down" (or "locked" if you're too cool...)
  
  Master of puppets:
    Fork (Server)       [Let it be a server]
    Fork (Client)       [Let it be a client]
    Fork (Client)       [Let it be a client]
    ...

Act II:
  The server enters timidly the scene

  Server:
    P (SemClient)       [Is there anybody here?]
    
  A client enters the scene with a stately procession
  
  Client:
    P (SemServer)       [I'd like to speak to the server in private]
    
  The client gets private access to the server
  
  Client:
    (produces data in the shared memory area)
    V (SemClient)       [Hey server, here's some data I'd like you to munge]
    P (SemResultReady)  [Let me know when you're done]
    
  Server:
    (processes data in the shared memory area)
    V (SemResultReady)  [Mmm.. yummy data, here's the result]
    
  Client:
    V (SemServer)       [Thanks. Bye!]
    
Act III (epilogue):

  Master of puppets decides to cease everyone.
  
  Master of puppets:
    JoinAll ()
    
  Master of puppets leaves the scene
This boils down to the following lines:
Client:
  // get exclusive access to the server (and the shared memory area)
  P (SemServer)
  (generate some data in the shared memory area)
  // notify the server about the data to process
  V (SemClient)
    
  // wait for server
  P (SemResultReady)
  (consume the result)
  
  // release server access
  V (SemServer)
    
Server:
  // wait for a client
  P (SemClient)
  (process the data)
  // notify the client
  V (SemResultReady)

Implementation

I try to simplify things by isolating the mind bending bits and writing things in such a way that I can focus on the gist of the problem rather than nasty details. I also try to use concepts from the problem domain. That's the strategy I mentioned before to cope with the complexity.

Sources are available here. The way I compile the sources and run the program is as follows (x86 Linux/Fedora 17, GCC 4.7.2):

##
# Version without debugging messages
#
$ g++ -Wall -DNDEBUG -O3 -std=c++11 \
sysv-ipc-example/*.cpp -I./sysv-ipc-example -o /tmp/ipc-test

$ !$
/tmp/ipc-test
Client 4315
 result   : 863000
 expected : 863000 [OK]
Client 4316
 result   : 863200
 expected : 863200 [OK]
Client 4317
 result   : 863400
 expected : 863400 [OK]
Client 4318
 result   : 863600
 expected : 863600 [OK]
Client 4319
 result   : 863800
 expected : 863800 [OK]

##
# Version with some drama to follow (no -DNDEBUG)
#
$ g++ -Wall -O3 -std=c++11 \
sysv-ipc-example/*.cpp -I./sysv-ipc-example -o /tmp/ipc-test

$ !$
Start main 4345...
Starting server 4346...
Starting client 4347[Server] P (SemClient)

Starting client 4348[Client 4347] P (SemServer)

[Client 4347] V (SemClient)
[Client 4347] P (SemResultReady)
[Server] V (SemResultReady)
[Server] P (SemClient)

... (lots of drama)

Client 4351
 result   : 870200
 expected : 870200 [OK]
[Client 4351] V (SemConsole)
Terminating client 4351
Finished waiting for 4351
Terminating main...
~IpcManager() : freeing resources in 4345

Let's have a look at some code excerpts in client-server-example.cpp. Client() and Server() almost replicate what I sketched about their roles above. To make the program finite there's some expected (constant) number of clients and transactions to occur but in an undetermined order. The data "protocol" between the server and the clients is established by the Packet structure. Note that there's also additional semaphore SemConsole to avoid interleaving log messages from processes. The essence of the client implementation looks as follows:

void
Client ()
{
  /* ... */
  
  int sum = 0;
  Packet *const pData = static_cast (GetShm ());
  
  for (unsigned i = 0; i < NO_OF_PACKETS_PER_CLIENT; ++i) {
    // wait for server access
    CLIENT (P (SemServer))
    // generate data:
    // easy to predict the result but still individual for every client
    std::fill_n (pData->numbers, NO_OF_ITEMS_IN_PACKET, pid);
    // notify the server
    CLIENT (V (SemClient))
    
    // wait for the server and get the result
    CLIENT (P (SemResultReady))
    sum += pData->result;
    
    // free server access
    CLIENT (V (SemServer))
  }
  
  /* ... */
}
I use CLIENT and SERVER macros to add some debugging messages so they form a nice story on the console.

Now some juicy details of the API usage in IpcManager.cpp. For example the V operation looks like this:

void
IpcManager::V (const unsigned short sem) const
{
  ::sembuf sb = { sem, 1, 0 };
  const int status = ::semop (m_semId, &sb, 1);
  CHECK (0 == status);
}
That's it. I invite you to read semop(2) rather than repeating what's already explained there. And the last but not the least, how we initialize all that stuff:
// only the first instance of the manager (creator) is responsible
// for resources
IpcManager::IpcManager (const char *key, const size_t memSize, const int semNum)
  : m_bCreator (true), m_data (nullptr)
{
  assert (key && key[0]);
  m_key = ::ftok (key, key[0]);
  CHECK (-1 != m_key);
  
  m_memId = ::shmget (m_key, memSize, IPC_CREAT | 0600);
  CHECK (-1 != m_memId);
  
  m_semId = ::semget (m_key, semNum, IPC_CREAT | 0600);
  CHECK (-1 != m_semId);
  
  for (int i = 0; i < semNum; ++i) {
    const int status = ::semctl (m_semId, i, SETVAL, nullptr);
    CHECK (-1 != status);
  }
}
Note that the key passed to ftok(3) is literary a key that identifies a shared resource so that processes know which one to connect to in order to communicate (it's a bit like a token for a conference call). Note that the key must be actually a path to an existing file system object.

Bear in mind that all processes are forked from the very first one. This means that after forking all of them have their memory in the exactly same state as their parent. This also means that all variables have exactly the same values (which could be problematic if you fork from a multi-threaded process and have some mutexes locked and only one thread left in the child process that waits for some of the locked mutexes). As System V API resources have to be allocated and deallocated somewhere (see the Gotchas section), we declare only one process to be responsible for both operations. It doesn't have to be like this, it just have to be done once. The IpcManager object is initialized in the first process so it becomes a "creator". The same object manages forking new processes and ensures they won't attempt to de-initialize the resources.

Gotchas

Managing resources

To me the most difficult aspect of System V API is ensuring reliable shared memory deallocation. Bear in mind that if you don't deallocate it, the system (not just a single process) will be leaking and sooner or later the host will not be able to operate properly if there are new processes requesting System V memory. A modern system usually has only few megabytes of System V memory available so it's not something you can splurge. ipcs(1) and ipcrm(1) are your friends here.

What makes the deallocation actually tricky is deciding who should do it. If the process that was supposed to deal with it crashed, then we've got a problem. Different strategies can be applied depending on the situation but they tend to be complicated. In this regards aforementioned POSIX shared memory API seems to be much better.

Memory segment initialization

Although the shmget(2) documentation promises:

When a new shared memory segment is created, its contents are initialized to zero values [...]
that's certainly not true on all systems. I spent days chasing a problem on a server farm consisting of tens of machines of different makes, hardware, systems, versions etc. After contriving literary a hunt, it appeared that the problem was only on particular machines and was diagnosed to be non-conforming implementations not initializing shared segment.

Now I'm a bit sceptical about such promises if the software is intended to run on undefined hardware and system. Unless it's a promise made by a portable and well maintained library, I prefer to ensure that a resource I'm given is initialized properly.