Why Python is better than C
I love C. I’ve written a little bit of C code in my time - both UNIX user land and kernel stuff. I co-wrote OpenBSD’s rum(4) i802.11a/b/g wireless driver for Ralink USB devices [article here] and also made large contributions to OpenRCS and OpenCVS [articles here, here and here]. I’m also the author of the small, portable and efficient BitTorrent implementation, Unworkable, which is part of our work at P2P Research. So I am relatively familiar with the language.
I’ve been hacking Python code for around two years now, really developing a taste for it from my day job. I would not consider myself a Python guru by any stretch, but I’ve worked with many different parts of the standard library, and used enough of the features (generators, lambdas, list comprehensions, classes etc) that I reckon I have a pretty solid handle on what it offers.
The majority of the crawling and data analysis software developed here at P2P Research is written in Python - with a little bit of C here and there, for performance. I suppose that the system features our stuff uses can be broken down into the following categories:
- String manipulation / parsing.
- Fast dynamic data structures. Lists and dictionaries, at a high level, including sorting etc.
- Networking. Specifically, a lot of HTTP is spoken.
- Threading. For increased throughput.
- File I/O. For archival purposes.
- Database. We use PostgreSQL for some reporting and analysis.
I’m going to do a brief comparison with each of these items, comparing the two languages. All these things can be achieved relatively straight forwardly with both C and Python. Consider how many network servers, text editors and databases are written purely in C. The POSIX and ANSI standards actually give you a pretty good set of library functions for doing these things, too - apart from the data structure area I suppose. There are mature interfaces available for working with databases.
What Python really gives you that C does not, in my opinion, are the following:
-
Largely eliminates the headaches of memory management.
-
Similarly, makes string manipulation much less painful, while maintaining much of C’s performance by interfacing directly with printf family of functions. Consider the following C snippet, followed by the Python equivalent:
1 2 3 4 5 6 7 8 9 10 11 12 13
/* Format a HTTP 1.0 GET request safely in C */ l = snprintf(request, GETSTRINGLEN, "GET %s%s HTTP/1.0\r\nHost: %s\r\nUser-agent: Unworkable/%s\r\n\r\n", path, params, host, UNWORKABLE_VERSION); if (l == -1 || l >= GETSTRINGLEN) goto trunc; /* ... */ trunc: trace("announce: string truncation detected"); xfree(params); xfree(request); xfree(tparams); return (-1);
1 2 3
# Format a HTTP 1.0 GET request safely in Python request = "GET %s%s HTTP/1.0\r\nHOST: %s\r\nUser-agent: Unworkable/%s\r\n\r\n" %(path, params, host, UNWORKABLE_VERSION)
The big difference in this case, is really the amount of care you need to take with memory cleanup and error checking in C. Python is far more lenient when it comes to string and memory manipulation than C, which saves a great deal of complexity.
-
While there are good, relatively straight-forward implementations of various data structures for C, well-known examples being the venerable sys/queue.h for various sorts of linked lists, and the similar sys/tree.h for red-black trees or splay trees, typically used to implement dictionaries.
But these C macros, while extremely helpful, are still tricky. It is not obvious, for example, how to make an object (In C, something declared with the struct keyword) be allowed to be a member of an arbitrary set of TAILQs. In fact, you need a fairly convoluted definition, let alone complex management code:
1 2 3 4 5 6 7 8 9 10
/* An actual node, which can be used in arbitrary lists */ struct node { char *key; }; /* Separated list structure for managing nodes */ struct node_list_entry { TAILQ_ENTRY(node_list_entry) node_list; struct node *item; };
It makes you appreciate Python code like this:
1 2 3 4
mylist = [] mylist.append("foo") mylist.append(1) mylist.sort()
And after investigating what is involved in getting dictionary-like storage from C (left as an exercise to the reader), code like this:
1 2 3
mydict = {} mydict['foo'] = bar del mydict['foo']
-
The TCP/IP stacks in all major operating systems are written in C, and a good number of extremely popular network clients and servers are also (Apache, Sendmail, OpenSSH). One could perhaps even argue that networking is one of the things that C is best suited for, in fact, particularly very low level networking. However, just opening a TCP socket safely is quite a lot of C code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
/* C snippet to connect to a remote host via TCP */ struct addrinfo hints, *res, *res0; int error, sockfd; memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_INET; hints.ai_socktype = SOCK_STREAM; error = getaddrinfo(host, port, &hints, &res0); if (error) { /* handle error */ } res = res0; sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol); if (sockfd == -1) { /* handle error */ } if (connect(sockfd, res->ai_addr, res->ai_addrlen) == -1) { /* handle error */ } freeaddrinfo(res0); return (sockfd);
Now compare this to the Python equivalent:
1 2 3
import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT))
When it comes to HTTP, or other protocols, the difference is even greater. Of course, much of this can be attributed to string and memory handling. To be fair, implementing a basic HTTP/1.0 client in C is not that hard - I did it in under 500 lines of code in Unworkable. However, Python’s standard library - whether via urllib, urllib2 or httplib directly - just makes it at least an order of magnitude less of a headache compared to C.
-
In the realm of threading, it seems pretty clear to me that the POSIX threads (pthreads) interface has won. Of course, the API is available on all POSIX compliant operating systems. I don’t have a huge amount of experience with using it through C - a few years ago I did some very simple stuff with it. While not impossible, it is complicated and tricky enough to deal with. On the other hand, Python offers its own threading module, loosely based on Java’s API. I find it very easy to use threads in Python - perhaps the most glaring feature being that the Python threading module supports both an object-oriented paradigm - where you extend the Thread class with your own - and also a functional approach. The functional approach makes great sense to me - I very much like the idea. Creating a thread like this is as simple as:
1 2 3 4 5 6 7 8 9
# Simple Python threads example, using functional paradigm import threading def worker(): while True: # do work then break break t = threading.Thread(target=worker) t.start()
-
File I/O is an area where straight C really isn’t too bad. You have your POSIX interface, via open(2), read(2), write(2), etc - and you have your ANSI buffered I/O functions with fopen(3), fread(3), fwrite(3), etc. Many of the shell commands for file system manipulation map very closely to libc calls. For example, mkdir(2), dirname(3), stat(2) and so on. Python - once again mostly thanks to being able to handle the memory management for you - helps a lot in the situation where you are reading from a file, of which the size is unknown (for example, a pipe, or a network socket).
I would also mention that Python’s standard library has a concept of ‘file-like objects’ which are essentially opaque data buffers which can be accessed through exactly the same interfaces as actual files. Common examples are StringIO, urllib and urllib2.
-
When it comes to working with databases, Python has the usual advantage of making it easy to deal with dynamic result sets. Additionally, abstractions like DB API 2 and some of the advanced language features such as list comprehensions and generators, can greatly reduce the amount of code required for filtering and processing data from databases. Furthermore, I have found that psycopg2 (the website of which is unfortunately in bad shape) works extremely well in a threaded environment.
In conclusion, Python allows you to write complicated, useful applications, with fewer bugs, much faster than in C. It removes many (but not all) headaches associated with memory management and data structures. Much of the portability issues are taken care of for you. Essentially with Python you stand on the shoulders of giants. While C is still extremely useful and important, Python makes excellent sense for many classes of program.
Comments
Comment from Nick
Date: July 18, 2008, 12:04 pm
Hi, I’ve found your post interesting. However, I don’t think that is appropiate to compare a “high-level-dynamic” language to a low level one. C is all about the imperative paradigm, low level structures, and is also statically typed. Python is about list comprehensions, lambdas, OO-programming AND imperative programming. It’s clear to me that a lot of things will be more concise when done with python than with C. But also I think that some problems are better handled with C than Python. Also I (still) find hard to believe that for most cases, (high level) python code can match C performance. I’m not saying that it can’t. I’d just like to see some numbers and detailed cases about that. I’m still sticking to C when it comes to performance.
Comment from Niall O’Higgins
Date: July 18, 2008, 12:32 pm
Hi Nick,
Of course Python and C are very different. I admit to being somewhat polemic in tone in this article. However, I don’t think there is anything wrong with a comparison of C and Python in the context of the software I develop - which consists of pieces in both languages.
I don’t have hard numbers on the efficiency of Python vs C in specific cases. I have found that my Bencode parser which is written in YACC/C is about an order of magnitude faster than the equivalent Python parser.
However, the vast majority of my Python programs (and I do not think I am any exception) spend their time waiting on I/O, and so it really seems like performance is equivalent.
For hard numbers on various kinds of work loads, you can always check out the shootout results:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=python&lang2=gcc
Comment from Bobby
Date: August 3, 2008, 2:06 pm
Some of the comparison seems to be about the language itself (memory management), while other parts seem to be about the standard library (sockets, http, threads). While I can see some value in comparing the standard libraries, it doesn’t seem that important to me, as there are plenty of libraries available in C to simplify sockets, http/ftp/etc, various data structures, and such. It just doesn7t come bundled with the compiler.
It’s also worth noting that the python interpreter is not thread-safe (see http://docs.python.org/api/threads.html), so when you create multiple threads, the global interpreter lock means that only one thread will run at a time, even on a multicore or SMP system.





Write a comment