Processes are fundamental to Linux operating system. In order to understand how a Linux process works, it's important to know about its environment. For example, things like how main() function is called, how command line arguments are passed to a program, how a program accesses environment variables, how a process is laid out in memory, and different ways to terminate a process. In this two-part article series, we will touch base on all these aspects from a beginners point of view.
The main() Function
As a programmer, generally we know that the main() is the first function that gets called when a program is executed. But, have you ever thought who calls it? What happens before main() is executed?
Well, whenever you execute a program through command line, here is an overview of what happens:
- The shell calls one of the exec family of functions with binary name, argument count (argc), and argument array (argv).
- A kernel handler function is invoked which passes all the information to kernel. This information consists of pointer to program name string, argv array pointer, environment variable array pointer, and more.
- The kernel then determines the executable file format (For example: ELF or a.out) being used, based on which it sets up related data structures like code size, data segment start, stack segment start, etc
- The kernel then allocates user mode pages for the process and copies the argument array and environment variables to those allocated page addresses.
- Finally, the _start() function is called, which is the entry point to a C executable. _start() then calls main() and passes all the required information to it.
That was just a brief overview. If you want to understand in detail each and every step that happens between when a program is executed and when main() is called, refer to this excellent tutorial.
Command Line Arguments
Now lets understand how a program accesses command line arguments. Lets take an example of the following code:
#include<stdio.h>
int main(void)
{
printf("\n This is a test program\n");
return 0;
}
So, as you can see, it's a very basic program that just prints a string in the output. Now, if you want the program to accept command line arguments, you have to first change the argument list of the main() function.
#include<stdio.h>
int main(int argc, char* argv[])
{
printf("\n This is a test program\n");
return 0;
}
In the code above, argc is an integer that represents the number of arguments (including the name of the binary), and argv is an array containing command line arguments as strings. Here are some more modifications that show how you can access command line arguments in code:
#include<stdio.h>
int main(int argc, char* argv[])
{
printf("\n This is a test program\n");
if(argc != 3)
{
printf("\n The program accepts 3 arguments\n");
return -1;
}
int temp = 0;
while(temp<3)
{
printf("\n %s \n", argv[temp]);
temp++;
}
return 0;
}
The program now makes of use of argc and argv. It loops over argc times and prints all the command line arguments passed to the program. Here is the output of the program when it is executed:
$ ./arg prog 5
This is a test program
./arg
prog
5
So you can see that the program, when executed through shell, prints all the command line arguments on stdout.
Environment List
Besides command line arguments, a program also receives information about the context in which it was invoked through the environment list passed to it. A standard environment list contains information like: user's home directory, terminal type, current locale, and so on; you can also define additional variables for other purposes.
By convention, the environment variables are defined in the following format:
name=value
The names are defined in upper case, but this is only a convention.
Just like the command line argument list, the environment list is also an array of pointers pointing to the address of a null-terminated string. And the environment list can be accessed through a global variable environ, which is defined as a pointer to pointer to char.
Here is an example, how to use environment list in a C program:
#include<stdio.h>
extern char **environ;
int main(int argc, char* argv[])
{
printf("\n This is a test program\n");
char **tmp = environ;
while(*tmp != '')
{
printf("\n %s \n", *tmp);
tmp++;
}
return 0;
}
So, as you can see, the environ variable is already defined as a global variable, so you've to just declare it as an extern variable. Using a temp pointer variable, to which you assign the address held by environ variable, the program loops over and prints each environment variable.
Here is the output:
This is a test program
XDG_VTNR=7
SSH_AGENT_PID=1508
XDG_SESSION_ID=c2
CLUTTER_IM_MODULE=xim
SESSION=ubuntu
GPG_AGENT_INFO=/run/user/1000/keyring-TfWiqP/gpg:0:1
TERM=xterm
XDG_MENU_PREFIX=gnome-
SHELL=/bin/bash
VTE_VERSION=3409
WINDOWID=69206026
UPSTART_SESSION=unix:abstract=/com/ubuntu/upstart-session/1000/1441
GNOME_KEYRING_CONTROL=/run/user/1000/keyring-TfWiqP
GTK_MODULES=overlay-scrollbar:unity-gtk-module
USER=himanshu
...
...
...
Conclusion
In this article, we learned about how main() function is called and how command line arguments and environment list can be accessed in the code.