Why Python sucks for BitTorrent and P2P

Niall O'Higgins @ 19 June, 2008 (06:10) | P2P Software

Python is a wonderful programming language. We use it very heavily at P2P Research for all our statistical analysis and data mining of peer-to-peer networks. It has a great deal of expressiveness and enables a software author to very quickly jump from idea to implementation. It removes much – but by no means all – of the headaches around memory management and choosing efficient data structures. With its ‘batteries included’ philosophy, the standard library contains a vast array of very useful modules straight off – for the majority of tasks, there is something already in the standard library has something which will greatly simplify it, from excellent Berkeley DB support to convenient secure temporary file handling.

However, Python’s “one size fits all” standard library can also be a weakness in cases where you need certain performance optimisations. My specific example here is the mmap module. On UNIX systems, the mmap(2) system call supports an ‘offset’ parameter. Mmap is extremely useful for BitTorrent because it allows you very efficiently and easily to read and write directly to and from on-disk data. Mmap access can be implemented by the kernel using zero-copy semantics, and avoids all the overhead (system call and otherwise) of the read(2) and write(2) family of functions. BitTorrent fundamentally involves a huge amount of IO dancing and mmap(2) offers an elegant and high performance boulevard for this.

While the UNIX system call supports an ‘offset’ parameter, the Python mmap module does not. From what I have been able to find out about why this is the case it appears to me that the biggest reason for not having this support is that the Python mmap module also supports Windows, and so any additions or changes to it – for example adding support for an offset parameter – must be ported to Windows also. While Windows does support offsets in its memory mapping routines it seems that this has proven difficult to use for the Python developers. Presumably there is not enough demand for the feature to truly warrant implementation. Now that I think of it, perhaps I will myself take a stab at implementing it, since I have no prejudice against Windows users being able to run my software.

In any case, because of this situation, Python BitTorrent implementations – such as Brahm Cohen’s mainline client – are therefore limited to using the more unwieldy (for random access) and less efficient read and write calls. Due to this, the mainline BitTorrent code is more complicated than it could be, and is slower. On the other hand, my high performance BitTorrent implementation written in C and portable to UNIX and Windows is able to take advantage of memory mapped file support – precisely because it is written in C and not Python, giving me sufficient control.

The same idiom of using memory mapped files to avoid system call overhead and allow the kernel’s VM to optimise IO for you of course applies to a much broader spectrum of applications than just BitTorrent. Just about all the major P2P applications which I am aware of could use this feature to efficiently read and write large blocks of data in random order. Unfortunately until this is fixed, Python remains a sub-optimal choice for BitTorrent and P2P applications.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Google
  • Reddit
  • Technorati

Comments

Comment from wancharle
Date: September 22, 2009, 6:22 am

Now python mmap suport offset.
http://docs.python.org/library/mmap.html

sory my english i speak portuguese.

Write a comment