Prev | Contents | Next

18 The Outside Environment

When you run a program, it’s actually you talking to the shell, saying, “Hey, please run this thing.” And the shell says, “Sure,” and then tells the operating system, “Hey, could you please make a new process and run this thing?” And if all goes well, the OS complies and your program runs.

But there’s a whole world outside your program in the shell that can be interacted with from within C. We’ll look at a few of those in this chapter.

18.1 Command Line Arguments

Many command line utilities accept command line arguments. For example, if we want to see all files that end in .txt, we can type something like this on a Unix-like system:

ls *.txt

(or dir instead of ls on a Windows system).

In this case, the command is ls, but it arguments are all all files that end with .txt118.

So how can we see what is passed into program from the command line?

Say we have a program called add that adds all numbers passed on the command line and prints the result:

./add 10 30 5
45

That’s gonna pay the bills for sure!

But seriously, this is a great tool for seeing how to get those arguments from the command line and break them down.

First, let’s see how to get them at all. For this, we’re going to need a new main()!

Here’s a program that prints out all the command line arguments. For example, if we name the executable foo, we can run it like this:

./foo i like turtles

and we’ll see this output:

arg 0: ./foo
arg 1: i
arg 2: like
arg 3: turtles

It’s a little weird, because the zeroth argument is the name of the executable, itself. But that’s just something to get used to. The arguments themselves follow directly.

Source:

#include <stdio.h>

int main(int argc, char *argv[])
{
    for (int i = 0; i < argc; i++) {
        printf("arg %d: %s\n", i, argv[i]);
    }
}

Whoa! What’s going on with the main() function signature? What’s argc and argv119 (pronounced arg-cee and arg-vee)?

Let’s start with the easy one first: argc. This is the argument count, including the program name, itself. If you think of all the arguments as an array of strings, which is exactly what they are, then you can think of argc as the length of that array, which is exactly what it is.

And so what we’re doing in that loop is going through all the argvs and printing them out one at a time, so for a given input:

./foo i like turtles

we get a corresponding output:

arg 0: ./foo
arg 1: i
arg 2: like
arg 3: turtles

With that in mind, we should be good to go with our adder program.

Our plan:

Let’s get to it!

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    int total = 0;

    for (int i = 1; i < argc; i++) {  // Start at 1, the first argument
        int value = atoi(argv[i]);    // Use strtol() for better error handling

        total += value;
    }

    printf("%d\n", total);
}

Sample runs:

$ ./add
0
$ ./add 1
1
$ ./add 1 2
3
$ ./add 1 2 3
6
$ ./add 1 2 3 4
10

Of course, it might puke if you pass in a non-integer, but hardening against that is left as an exercise to the reader.

18.1.1 The Last argv is NULL

One bit of fun trivia about argv is that after the last string is a pointer to NULL.

That is:

argv[argc] == NULL

is always true!

This might seem pointless, but it turns out to be useful in a couple places; we’ll take a look at one of those right now.

18.1.2 The Alternate: char **argv

Remember that when you call a function, C doesn’t differentiate between array notation and pointer notation in the function signature.

That is, these are the same:

void foo(char a[])
void foo(char *a)

Now, it’s been convenient to think of argv as an array of strings, i.e. an array of char*s, so this made sense:

int main(int argc, char *argv[])

but because of the equivalence, you could also write:

int main(int argc, char **argv)

Yeah, that’s a pointer to a pointer, all right! If it makes it easier, think of it as a pointer to a string. But really, it’s a pointer to a value that points to a char.

Also recall that these are equivalent:

argv[i]
*(argv + i)

which means you can do pointer arithmetic on argv.

So an alternate way to consume the command line arguments might be to just walk along the argv array by bumping up a pointer until we hit that NULL at the end.

Let’s modify our adder to do that:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    int total = 0;
    
    // Cute trick to get the compiler to stop warning about the
    // unused variable argc:
    (void)argc;

    for (char **p = argv + 1; *p != NULL; p++) {
        int value = atoi(*p);  // Use strtol() for better error handling

        total += value;
    }

    printf("%d\n", total);
}

Personally, I use array notation to access argv, but have seen this style floating around, as well.

18.1.3 Fun Facts

Just a few more things about argc and argv.

18.2 Exit Status

Did you notice that the function signatures for main() have it returning type int? What’s that all about? It has to do with a thing called the exit status, which is an integer that can be returned to the program that launched yours to let it know how things went.

Now, there are a number of ways a program can exit in C, including returning from main(), or calling one of the exit() variants.

All of these methods accept an int as an argument.

Side note: did you see that in basically all my examples, even though main() is supposed to return an int, I don’t actually return anything? In any other function, this would be illegal, but there’s a special case in C: if execution reaches the end of main() without finding a return, it automatically does a return 0.

But what does the 0 mean? What other numbers can we put there? And how are they used?

The spec is both clear and vague on the matter, as is common. Clear because it spells out what you can do, but vague in that it doesn’t particularly limit it, either.

Nothing for it but to forge ahead and figure it out!

Let’s get Inception121 for a second: turns out that when you run your program, you’re running it from another program.

Usually this other program is some kind of shell122 that doesn’t do much on its own except launch other programs.

But this is a multi-phase process, especially visible in command-line shells:

  1. The shell launches your program
  2. The shell typically goes to sleep (for command-line shells)
  3. Your program runs
  4. Your program terminates
  5. The shell wakes up and waits for another command

Now, there’s a little piece of communication that takes place between steps 4 and 5: the program can return a status value that the shell can interrogate. Typically, this value is used to indicate the success or failure of your program, and, if a failure, what type of failure.

This value is what we’ve been returning from main(). That’s the status.

Now, the C spec allows for two different status values, which have macro names defined in <stdlib.h>:

Status Description
EXIT_SUCCESS or 0 Program terminated successfully.
EXIT_FAILURE Program terminated with an error.

Let’s write a short program that multiplies two numbers from the command line. We’ll require that you specify exactly two values. If you don’t, we’ll print an error message, and exit with an error status.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    if (argc != 3) {
        printf("usage: mult x y\n");
        return EXIT_FAILURE;   // Indicate to shell that it didn't work
    }

    printf("%d\n", atoi(argv[1]) * atoi(argv[2]));

    return 0;  // same as EXIT_SUCCESS, everything was good.
}

Now if we try to run this, we get the expected effect until we specify exactly the right number of command-line arguments:

$ ./mult
usage: mult x y

$ ./mult 3 4 5
usage: mult x y

$ ./mult 3 4
12

But that doesn’t really show the exit status that we returned, does it? We can get the shell to print it out, though. Assuming you’re running Bash or another POSIX shell, you can use echo $? to see it123.

Let’s try:

$ ./mult
usage: mult x y
$ echo $?
1

$ ./mult 3 4 5
usage: mult x y
$ echo $?
1

$ ./mult 3 4
12
$ echo $?
0

Interesting! We see that on my system, EXIT_FAILURE is 1. The spec doesn’t spell this out, so it could be any number. But try it; it’s probably 1 on your system, too.

18.2.1 Other Exit Status Values

The status 0 most definitely means success, but what about all the other integers, even negative ones?

Here we’re going off the C spec and into Unix land. In general, while 0 means success, a positive non-zero number means failure. So you can only have one type of success, and multiple types of failure. Bash says the exit code should be between 0 and 255, though a number of codes are reserved.

In short, if you want to indicate different error exit statuses in a Unix environment, you can start with 1 and work your way up.

On Linux, if you try any code outside the range 0-255, it will bitwise AND the code with 0xff, effectively clamping it to that range.

You can script the shell to later use these status codes to make decisions about what to do next.

18.3 Environment Variables

Before I get into this, I need to warn you that C doesn’t specify what an environment variable is. So I’m going to describe the environment variable system that works on every major platform I’m aware of.

Basically, the environment is the program that’s going to run your program, e.g. the bash shell. And it might have some bash variables defined. In case you didn’t know, the shell can make its own variables. Each shell is different, but in bash you can just type set and it’ll show you all of them.

Here’s an excerpt from the 61 variables that are defined in my bash shell:

HISTFILE=/home/beej/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/home/beej
HOSTNAME=FBILAPTOP
HOSTTYPE=x86_64
IFS=$' \t\n'

Notice they are in the form of key/value pairs. For example, one key is HOSTTYPE and its value is x86_64. From a C perspective, all values are strings, even if they’re numbers124.

So, anyway! Long story short, it’s possible to get these values from inside your C program.

Let’s write a program that uses the standard getenv() function to look up a value that you set in the shell.

getenv() will return a pointer to the value string, or else NULL if the environment variable doesn’t exist.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char *val = getenv("FROTZ");  // Try to get the value

    // Check to make sure it exists
    if (val == NULL) {
        printf("Cannot find the FROTZ environment variable\n");
        return EXIT_FAILURE;
    }

    printf("Value: %s\n", val);
}

If I run this directly, I get this:

$ ./foo
Cannot find the FROTZ environment variable

which makes sense, since I haven’t set it yet.

In bash, I can set it to something with125:

$ export FROTZ="C is awesome!"

Then if I run it, I get:

$ ./foo
Value: C is awesome!

In this way, you can set up data in environment variables, and you can get it in your C code and modify your behavior accordingly.

18.3.1 Setting Environment Variables

This isn’t standard, but a lot of systems provide ways to set environment variables.

If on a Unix-like, look up the documentation for putenv(), setenv(), and unsetenv(). On Windows, see _putenv().

18.3.2 Unix-like Alternative Environment Variables

If you’re on a Unix-like system, odds are you have another couple ways of getting access to environment variables. Note that although the spec points this out as a common extension, it’s not truly part of the C standard. It is, however, part of the POSIX standard.

One of these is a variable called environ that must be declared like so:

extern char **environ;

It’s an array of strings terminated with a NULL pointer.

You should declare it yourself before you use it, or you might find it in the non-standard <unistd.h> header file.

Each string is in the form "key=value" so you’ll have to split it and parse it yourself if you want to get the keys and values out.

Here’s an example of looping through and printing out the environment variables a couple different ways:

#include <stdio.h>

extern char **environ;  // MUST be extern AND named "environ"

int main(void)
{
    for (char **p = environ; *p != NULL; p++) {
        printf("%s\n", *p);
    }

    // Or you could do this:
    for (int i = 0; environ[i] != NULL; i++) {
        printf("%s\n", environ[i]);
    }
}

For a bunch of output that looks like this:

SHELL=/bin/bash
COLORTERM=truecolor
TERM_PROGRAM_VERSION=1.53.2
LOGNAME=beej
HOME=/home/beej
... etc ...

Use getenv() if at all possible because it’s more portable. But if you have to iterate over environment variables, using environ might be the way to go.

Another non-standard way to get the environment variables is as a parameter to main(). It works much the same way, but you avoid needing to add your extern environ variable. Not even the POSIX spec supports this126 as far as I can tell, but it’s common in Unix land.

#include <stdio.h>

int main(int argc, char **argv, char **env)  // <-- env!
{
    (void)argc; (void)argv;  // Suppress unused warnings

    for (char **p = env; *p != NULL; p++) {
        printf("%s\n", *p);
    }

    // Or you could do this:
    for (int i = 0; env[i] != NULL; i++) {
        printf("%s\n", env[i]);
    }
}

Just like using environ but even less portable. It’s good to have goals.


Prev | Contents | Next