<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/atom10full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0" xml:lang="en" xml:base="http://blog.p2presearch.com/wp-atom.php">
	<title type="text">P2P Research Institute Blog</title>
	<subtitle type="text">whats happening in the peersphere</subtitle>

	<updated>2008-11-10T04:33:38Z</updated>
	<generator uri="http://wordpress.org/" version="2.6.3">WordPress</generator>

	<link rel="alternate" type="text/html" href="http://blog.p2presearch.com" />
	<id>http://blog.p2presearch.com/feed/atom/</id>
	

			<link rel="self" href="http://feeds.feedburner.com/P2pResearchInstituteBlog" type="application/atom+xml" /><entry>
		<author>
			<name>Harry Tormey</name>
						<uri>http://p2presearch.com</uri>
					</author>
		<title type="html"><![CDATA[Secure, fast, P2P filesystem: An interview with DustFS author Michael Stapelberg]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/10/05/secure-fast-p2p-filesystem-an-interview-with-dustfs-author-michael-stapelberg/" />
		<id>http://blog.p2presearch.com/?p=145</id>
		<updated>2008-10-06T06:57:16Z</updated>
		<published>2008-10-06T06:25:42Z</published>
		<category scheme="http://blog.p2presearch.com" term="BitTorrent Protocol" /><category scheme="http://blog.p2presearch.com" term="Interview" /><category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="Piracy Research" /><category scheme="http://blog.p2presearch.com" term="BiTorrent Protocol" /><category scheme="http://blog.p2presearch.com" term="crawling" /><category scheme="http://blog.p2presearch.com" term="encryption" /><category scheme="http://blog.p2presearch.com" term="future" /><category scheme="http://blog.p2presearch.com" term="Harry Tormey" /><category scheme="http://blog.p2presearch.com" term="Linux" /><category scheme="http://blog.p2presearch.com" term="OpenBSD" /><category scheme="http://blog.p2presearch.com" term="OSX" /><category scheme="http://blog.p2presearch.com" term="piracy" /><category scheme="http://blog.p2presearch.com" term="potential" /><category scheme="http://blog.p2presearch.com" term="protocol" /><category scheme="http://blog.p2presearch.com" term="security" /><category scheme="http://blog.p2presearch.com" term="streaming" /><category scheme="http://blog.p2presearch.com" term="unworkable" /><category scheme="http://blog.p2presearch.com" term="Windows" />		<summary type="html"><![CDATA[Download MP3 audio of interview here.
A couple of weeks ago I interviewed Michael Stapelberg, head developer of the DustFS project. You can listen to a recording of this interview here. DustFS is an encrypted, distributed file system based on the BitTorrent protocol. DustFS&#8217;s major features are full encryption, authentication using certificates and usage of BitTorrent [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/10/05/secure-fast-p2p-filesystem-an-interview-with-dustfs-author-michael-stapelberg/"><![CDATA[<p><a href="http://p2presearch.com/Audio/2008/October/5/michael0945am12sept2008.mp3">Download MP3 audio of interview here</a>.</p>
<p>A couple of weeks ago I interviewed <a href="http://michael.stapelberg.de">Michael Stapelberg</a>, head developer of the <a href="http://dustfs.zekjur.net">DustFS</a> project. You can listen to a recording of this interview <a href="http://p2presearch.com/Audio/2008/October/5/michael0945am12sept2008.mp3">here</a>. DustFS is an <a href="http://en.wikipedia.org/wiki/Encryption">encrypted</a>, <a href="http://en.wikipedia.org/wiki/Distributed_file_system">distributed file system</a> based on the<a href="http://en.wikipedia.org/wiki/BitTorrent"> BitTorrent</a> protocol. DustFS&#8217;s major features are full encryption, authentication using certificates and usage of BitTorrent for fast distribution of files (accelerated by using so-called <a href="http://en.wikipedia.org/wiki/Cache">cache</a> servers).</p>
<p>Michael is quite active in the <a href="http://en.wikipedia.org/wiki/Open_source">Open Source</a> community having contributed to a number of projects such as <a href="http://en.wikipedia.org/wiki/Enlightenment_(window_manager)">Enlightenment</a> (e17), <a href="http://collectd.org/">collectd </a>and <a href="http://www.bacula.org/">Bacula</a>. Apart from the above Michael is also the author of <a href="http://michael.stapelberg.de/mxallowd.en">mxallowd</a> an anti-spam daemon for Linux/*BSD and is about to enter a Computer Science program at the <a href="http://www.uni-heidelberg.de/index_e.html">University of Heidelberg</a> in southern Germany.</p>
<p>According to Michael, the idea for DustFS came from a local <a href="http://www.ccc.de/?language=en">Chaos Computer Club (CCC)</a> meeting. In Germany, most of the <a href="http://en.wikipedia.org/wiki/DSL">DSL</a> providers offer high downstream but low upstream bandwidth Internet connections. If you are a DSL customer in Germany and you want to be able to conveniently share large files with your friends, your options are essentially limited to the existing, well-known protocols such as <a href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">FTP</a>, <a href="http://en.wikipedia.org/wiki/Network_File_System">NFS</a>, <a href="http://en.wikipedia.org/wiki/Samba_(software)">Samba</a> and <a href="http://en.wikipedia.org/wiki/Andrew_File_System">AFS</a>. The problem however is that all of these protocols require a central server - this factor combined with the limited upstream bandwidth in Germany conspire to make this sub-optimal for distributing large files amongst a group of friends.  A distributed, P2P protocol has the advantage that one can download simultaneously from multiple peers, therefore effectively combining their upstream bandwidth, rather than being limited by the pipe of the central server - resulting in greatly improved download speed. Another issue to note is that most DSL connections in Germany do not have a static IPv4 address, therefore one must additionally deal with the problem of co-ordinating server addresses with your friends.  P2P networks, on the other hand, are inherently distributed and self-configuring.</p>
<p>One solution to the above problem would be to build a torrent file for the data you want to share, set up a BitTorrent tracker and attempt to keep it private - however this is a lot of manual set up overhead and would be quite annoying to do every time you wanted to share a single file. The main aim of DustFS is to make sharing a file among a small group of friends in a secure and trustworthy fashion easy and painless.</p>
<p>As I mentioned already, DustFS is an encrypted FUSE file system built on top of the <a href="http://www.p2presearch.com/">P2P Research Institute&#8217;s</a> BitTorrent implementation <a href="http://p2presearch.com/unworkable/">Unworkable</a>. Originally DustFS used <a href="http://en.wikipedia.org/wiki/Transmission_(BitTorrent_client)">Libtransmission</a> as its BiTorrent implementation. However despite it&#8217;s name, Libtransmission is not in fact just a library.  Libtransmission provides a number of features not required by DustFS and at 17,000 lines of source code is quite large.  Unworkable, on the other hand, is closer to 4,000 lines of code. To illustrate the difference this makes, the DustFS binary built with Libtransmission is 1.7 megabytes in size, as opposed to 700 kilobytes when built with Unworkable.</p>
<p>In Germany it is also quite cheap to rent servers which have reasonable upstream bandwidth allowances, a fact that DustFS plans to leverage to accelerate transfers. These rented servers could be used as intermediate cache servers for the DustFS network.  The idea is that when you transfer a file over DustFS, a cache server kicks in and mirrors this file. If yet another user requests the same file, they download using the copy stored on the cache server - at a vastly greater rate then from a DSL peer. A cache server is stripped down DustFS node.  A DustFS cache node doesn&#8217;t need FUSE, and generally has a lower set of dependencies than a normal node.</p>
<p>While DustFS is still in early stages of development, it looks extremely promising.  We think that Michael has struck on a very good idea.  If his project can gain enough momentum to solve the non-trivial engineering problems, DustFS could revolutionise file-sharing among groups of friends.  Imagine a world where everyone has an embedded, DustFS-capable device in their homes.  Family videos, photos and so on would be easily shareable between trusted parties in a secure manner.  Musicians could collaborate on tracks effortlessly without worry of their intellectual property being stolen.  The potential is huge.  We wish Michael and DustFS the best of luck and look forward to collaborating with him in the future.</p>
]]></content>
<link href="http://p2presearch.com/Audio/2008/October/5/michael0945am12sept2008.mp3" rel="enclosure" length="14724485" type="audio/mpeg" />
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/10/05/secure-fast-p2p-filesystem-an-interview-with-dustfs-author-michael-stapelberg/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/10/05/secure-fast-p2p-filesystem-an-interview-with-dustfs-author-michael-stapelberg/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Niall O'Higgins</name>
						<uri>http://niallohiggins.com</uri>
					</author>
		<title type="html"><![CDATA[Unworkable 0.51 Released - Major bugfixes]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/10/01/unworkable-051-released-major-bugfixes/" />
		<id>http://blog.p2presearch.com/?p=140</id>
		<updated>2008-10-02T03:51:17Z</updated>
		<published>2008-10-02T03:51:17Z</published>
		<category scheme="http://blog.p2presearch.com" term="BitTorrent Protocol" /><category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="BitTorrent" /><category scheme="http://blog.p2presearch.com" term="C" /><category scheme="http://blog.p2presearch.com" term="p2p" /><category scheme="http://blog.p2presearch.com" term="unworkable" />		<summary type="html"><![CDATA[I just released version 0.51 of our high-performance, BSD-licensed C implementation of BitTorrent, Unworkable.  This release contains very minor code changes which fix some important bugs in the mapping of torrent pieces to on-disk mmap()&#8217;d regions.  In particular, this can fix some edge-cases in the downloading of large, multi-file torrents.

Direct download link to [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/10/01/unworkable-051-released-major-bugfixes/"><![CDATA[<p>I just released <a href="http://p2presearch.com/unworkable/dist/unworkable-0.51.tar.gz">version 0.51</a> of our <a href="http://p2presearch.com/unworkable/">high-performance, BSD-licensed C implementation of BitTorrent, Unworkable</a>.  This release contains very minor code changes which fix some important bugs in the mapping of torrent pieces to on-disk mmap()&#8217;d regions.  In particular, this can fix some edge-cases in the downloading of large, multi-file torrents.</p>
<p><a href="http://p2presearch.com/unworkable/dist/unworkable-0.51.tar.gz"><br />
Direct download link to the tarball</a></p>
]]></content>
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/10/01/unworkable-051-released-major-bugfixes/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/10/01/unworkable-051-released-major-bugfixes/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Harry Tormey</name>
						<uri>http://p2presearch.com</uri>
					</author>
		<title type="html"><![CDATA[Battling Climate Change with BitTorrent - An Interview with Jeremy Blackburn of the University of South Florida]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/09/22/battling-climate-change-with-bittorrent-an-interview-with-jeremy-blackburn-of-the-university-of-south-florida/" />
		<id>http://blog.p2presearch.com/?p=67</id>
		<updated>2008-11-10T04:33:38Z</updated>
		<published>2008-09-23T05:22:35Z</published>
		<category scheme="http://blog.p2presearch.com" term="BitTorrent Protocol" /><category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="BitTorrent" /><category scheme="http://blog.p2presearch.com" term="climate change" /><category scheme="http://blog.p2presearch.com" term="global warming" /><category scheme="http://blog.p2presearch.com" term="p2p" />		<summary type="html"><![CDATA[

Download MP3 audio of interview, part one.
Download MP3 audio of interview, part two.

Last week I had the chance to interview Jeremy Blackburn, a graduate student at the University of South Florida. Jeremy’s research interests are in the energy efficiency of computer networks. I spoke to Jeremy concerning his work to improve the power management of [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/09/22/battling-climate-change-with-bittorrent-an-interview-with-jeremy-blackburn-of-the-university-of-south-florida/"><![CDATA[<p><a href="http://blog.p2presearch.com/wp-content/uploads/2008/09/pic.png"><img class="size-full wp-image-119" title="btproto" src="http://blog.p2presearch.com/wp-content/uploads/2008/09/pic.png" alt="BitTorrent Protocol" width="708" height="328" /></a></p>
<p>
<a href="http://p2presearch.com/Audio/2008/September/15/09-08-2008-01-36-p1.mp3">Download MP3 audio of interview, part one</a>.<br/><br />
<a href="http://p2presearch.com/Audio/2008/September/15/09-08-2008-01-36-p2.mp3">Download MP3 audio of interview, part two</a>.<br/>
</p>
<p class="MsoNormal">Last week I had the chance to interview <a href="http://www.csee.usf.edu/~jhblackb/">Jeremy Blackburn</a>, a graduate student at the <a href="http://www.usf.edu/index.asp">University of South Florida</a>. Jeremy’s research interests are in the energy efficiency of computer networks. I spoke to Jeremy concerning his work to improve the power management of the <a href="http://en.wikipedia.org/wiki/BitTorrent_(protocol)">BitTorrent P2P protocol</a>. The  interview is available in two parts (<a href="http://p2presearch.com/Audio/2008/September/15/09-08-2008-01-36-p1.mp3">part one</a> and <a href="http://p2presearch.com/Audio/2008/September/15/09-08-2008-01-36-p2.mp3">part two</a>).</p>
<p>Jeremy&#8217;s main argument is that data centers are very expensive.  In the US alone, they consume about 1.2% of all electricity, costing approximately $3bn a year.  Worldwide, data centers consume roughly $7bn worth of electricity. Jeremy argues that this high cost is a strong incentive to offload some of the hosting traditionally done in the data center, onto the consumer, through P2P technologies.  In other words, the person downloading a file from you can contribute some of their own computing and bandwidth resources to the hosting of that file. Thus reducing data center expenditure.  However, he goes on to say that while this does reduce the energy cost for the content providers, it will probably increase overall energy consumption - through losses in efficiency - and will certainly increase the energy consumption for the users themselves.</p>
<p>A problem with the above premise is that it assumes that the world’s data centers are mainly used for file sharing or other services which are easily distributed across a P2P network. This is clearly not the case. I would argue that most of the computing resources in data centers are dedicated to database-related work, as opposed to activities easily distributed through P2P technologies.  One is unlikely to see <a href="http://en.wikipedia.org/wiki/Ecommerce">eCommerce</a> transaction processing being carried out by consumer computers participating in a P2P network any time soon.  Surely this kind of activity makes up a large chunk of that 1.2% figure.  Furthermore, when I asked Jeremy for evidence of this being a major industry trend, the only research study he could cite was about a P2P set-top box released by <a href="http://www.telefonica.com/home_eng.shtml">Telefonica</a>, a Spanish <a href="http://en.wikipedia.org/wiki/Internet_service_provider">ISP</a>.</p>
<p>Having said all this, no one questions the fact that BitTorrent comprises an estimated 18-35% of traffic on the Internet (according to CacheLabs and CacheLogic) or the fact that large businesses such as <a href="http://www.blizzard.com">Blizzard Entertainment</a> are using it to reduce their software distribution costs.  Also consider that the <a href="http://en.wikipedia.org/wiki/Kyoto_Protocol">Kyoto protocol</a> only requires that on average countries reduce their emissions 5.2% below their 1990 baseline by 2010 - even a reduction of a fraction of a percent in data center energy consumption would be significant toward this goal. Indeed, how many people leave their computers on all day to download files? With all of this in mind surely a more energy efficient BitTorrent protocol is absolutely worth pursuing.</p>
<p>The main focus of Jeremy’s investigation is to see what can be done to the BitTorrent protocol to reduce energy consumption. Jeremy has been studying the protocol by running simulations using an implementation of a BitTorrent client within an open source network simulator called <a href="http://nsnam.isi.edu/nsnam/index.php/Main_Page">NS-2</a>. These simulations consist of a 51 peer swarm with varying arrival times and one seed awake at all times for file availability. The files used in these simulations are between 10 to 100 mb (Jeremy intends to use files larger than a gigabyte in later simulations).</p>
<p>Jeremy introduces the concept of putting a node to &#8220;sleep&#8221;.  This would involve having the computer enter a low-power mode, akin to putting one&#8217;s laptop to sleep.  According to Jeremy&#8217;s simulations, theoretically by putting &#8220;seeds&#8221; (seeds are peers with all data in the torrent) to sleep 50% of the time you can halve the energy cost but only increase download times by about 10%. This saving comes from the fact that every peer in a swarm keeps a list of all other peers it is connected to in a peer table. Normally what would happen is that if a peer went to sleep then the TCP connection between it and all of the other peers it was connected to would be dropped and those connected peers would forget that the sleeping peer ever existed. The only way this peer&#8217;s resources could be made available again is if it decided to reconnect to the swarm. In the modified BitTorrrent client used within Jeremy&#8217;s simulation, when a seed is not uploading, it goes to sleep and all of the peers it is connected to do not forget about it. When those peers need that seeds resources again they can wake it up.</p>
<p style="text-align: center;"><a href="http://blog.p2presearch.com/wp-content/uploads/2008/09/pic2.png"><img class="size-full wp-image-131 aligncenter" title="sleep" src="http://blog.p2presearch.com/wp-content/uploads/2008/09/pic2.png" alt="" width="500" height="307" /></a></p>
<p>However, the idea of putting one&#8217;s personal computer to sleep based on network activity is not likely to be practical nor of interest to typical computer users.  Although wake-on-LAN technology exists and indeed is bundled with most modern computers, a more likely application of this sleeping technique would be in set-top boxes.  A set-top box is not a general-purpose computer.  It is inherently a single-purpose device, dedicated to an individual task such as downloading video.  Such set-top boxes generally consume a lot less power than general purpose computers, and furthermore the user is less concerned should the device suspend itself.</p>
<p>At some point in the near future Jeremy intends to apply his research to a real-world BitTorrent client. At the <a href="http://p2presearch.com">P2P Research Institute</a> we have developed our own <a href="http://p2presearch.com/unworkable/">highly optimized BitTorrent client, Unworkable</a>. Jeremy mentioned our client as a prime candidate for testing out these changes. Unworkable is written in C and uses a high-performance, single-threaded, asynchronous networking communication model. Unworkable has been measured downloading torrents on a ten-year-old 270mhz UltraSparc II machine, using less CPU time than the top(1) process used to monitor it.</p>
<p>All of the above of course begs the question, what other protocols could be modified for greater energy efficiency?  We look forward to seeing what Jeremy comes up with next in his quest to reduce the energy consumption of computer networks.  To read more about Jeremy&#8217;s research in this area, download his poster <a href="http://blog.p2presearch.com/wp-content/uploads/2008/09/jeremy.pdf">[Adobe PDF]</a> <a href="http://blog.p2presearch.com/wp-content/uploads/2008/09/jeremy.ppt">[Microsoft Power Point]</a>.</p>
]]></content>
<link href="http://p2presearch.com/Audio/2008/September/15/09-08-2008-01-36-p2.mp3" rel="enclosure" length="7606853" type="audio/mpeg" />
<link href="http://p2presearch.com/Audio/2008/September/15/09-08-2008-01-36-p1.mp3" rel="enclosure" length="4502213" type="audio/mpeg" />
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/09/22/battling-climate-change-with-bittorrent-an-interview-with-jeremy-blackburn-of-the-university-of-south-florida/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/09/22/battling-climate-change-with-bittorrent-an-interview-with-jeremy-blackburn-of-the-university-of-south-florida/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Niall O'Higgins</name>
						<uri>http://niallohiggins.com</uri>
					</author>
		<title type="html"><![CDATA[Unworkable 0.5 Released - Fast Extension, Fast Resume and many bugfixes]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/09/22/unworkable-05-released-fast-extension-fast-resume-and-many-bugfixes/" />
		<id>http://blog.p2presearch.com/?p=91</id>
		<updated>2008-09-23T04:09:09Z</updated>
		<published>2008-09-23T03:44:25Z</published>
		<category scheme="http://blog.p2presearch.com" term="BitTorrent Protocol" /><category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="BitTorrent" /><category scheme="http://blog.p2presearch.com" term="p2p" /><category scheme="http://blog.p2presearch.com" term="unworkable" />		<summary type="html"><![CDATA[Better late than never
After more than six months of working on our data-mining and statistical analysis software, I&#8217;ve finally had some free time to work on our high-performance cross-platform BitTorrent implementation, Unworkable.  My interest in Unworkable was greatly renewed by the fact that Michael Stapelberg is using it as a basis for his distributed [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/09/22/unworkable-05-released-fast-extension-fast-resume-and-many-bugfixes/"><![CDATA[<p><b>Better late than never</b></p>
<p>After more than six months of working on our data-mining and statistical analysis software, I&#8217;ve finally had some free time to work on our <a href="http://p2presearch.com/unworkable">high-performance cross-platform BitTorrent implementation, Unworkable</a>.  My interest in Unworkable was greatly renewed by the fact that Michael Stapelberg is using it as a basis for his distributed user-land file system (via <a href="http://fuse.sourceforge.net/">FUSE</a>) called <a href="https://dustfs.zekjur.net/">DustFS</a>.  Michael sent me a bunch of useful patches and fixes for this release.  Watch this space for an article and interview with Michael from us, hopefully in a week or two.</p>
<p><b>Whats new</b></p>
<p>In any case, Unworkable 0.5 is a great improvement.  There were a number of critical bug fixes, eliminating potential crashes and properly handling certain bitwise operations.  When taken together, these fixes make Unworkable a much more effective participant in BitTorrent swarms.  While the main focus of this release was to improve stability and effectiveness, I also added two nice new features.  Firstly, I implemented initial support for the <a href="http://bittorrent.org/beps/bep_0006.html">BitTorrent Fast Extension, a.k.a. BEP 6</a>.  This extension eliminates some potential race conditions inherent in the choking algorithm, and also has some bandwidth saving messages - for example the peer can easily just say &#8220;I have all the data&#8221; and &#8220;I have none of the data&#8221; in a single short message, instead of sending its entire bitfield representation of which pieces it does and does not have.  For large torrents, this bitfield can get quite long. There is some quite <a href="http://forum.bittorrent.org/viewtopic.php?id=13">in-depth discussion about the Fast Extension / BEP 6 from BitTorrent, Inc developers David Harrison and Bram Cohen</a> on the BitTorrent.org forum if you would like more info.</p>
<p>I also implemented - at long last - a fast resume mechanism.  For large torrents, and especially on slower computers, the initial hash check for a torrent can take a long time.  With the fast resume support in Unworkable 0.5, how much data you have is recorded in a small on-disk file.  If this file is present, the hash check can be skipped.  In practice, this means you can quit torrents and resume them almost instantaneously.</p>
<p>As usual for this release, I have made some portability improvements.  Somehow I neglected in the past to build Unworkable with 64-bit file offsets under Linux, which was a major oversight on my part. Also as usual, I have been developing Unworkable on both fast (2Ghz amd64) and slow (270Mhz UltraSparc II) machines.  My 270 Mhz Sun Microsystems Ultra 5 with 192M ram can happily download large Linux distro DVD images at speeds saturating my DSL connection!</p>
<p><b>Whats next</b></p>
<p>Now that Unworkable is pretty stable, I would like to work more on adding features such as native rate-limiting, multi-torrent support, and improvements to the control protocol.  Once these features are complete, work on the GUI can be resumed and we should have quite a compelling BitTorrent client!</p>
<p><b>Download</b></p>
<p>Download the sources for <a href="http://p2presearch.com/unworkable/dist/unworkable-0.5.tar.gz">Unworkable 0.5</a>.  Builds and runs on OpenBSD, FreeBSD 6.2, Ubuntu Linux 8.04, Centos 5, Fedora 7 (on EC 2), Gentoo Linux, Arch Linux, Mac OS X, Solaris 10 and Windows XP (cygwin).</p>
]]></content>
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/09/22/unworkable-05-released-fast-extension-fast-resume-and-many-bugfixes/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/09/22/unworkable-05-released-fast-extension-fast-resume-and-many-bugfixes/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Harry Tormey</name>
						<uri>http://p2presearch.com</uri>
					</author>
		<title type="html"><![CDATA[Podmailing: the next p2p craze?]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/08/07/podmailing-the-next-p2p-craze/" />
		<id>http://blog.p2presearch.com/?p=46</id>
		<updated>2008-08-07T19:38:14Z</updated>
		<published>2008-08-07T17:42:59Z</published>
		<category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="p2p bittorrent podmailing" />		<summary type="html"><![CDATA[A couple of weeks ago I had the opportunity to interview Louis Choquel CEO of zSlide the company behind a new service called podmailing. Podmailing is a simple way to send and receive large files and folders by e-mail. Prior to launching in the US last month the service had over 40,000 registered users mostly [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/08/07/podmailing-the-next-p2p-craze/"><![CDATA[<p class="MsoNormal">A couple of weeks ago I had the opportunity to interview Louis Choquel CEO of <a href="http://www.zslide.com/">zSlide</a> the company behind a new service called <a href="http://www.podmailing.com/">podmailing</a>. Podmailing is a simple way to send and receive large files and folders by e-mail. Prior to launching in the US last month the service had over 40,000 registered users mostly in France and Spain. You can listen to a podcast of the interview <a href="http://p2presearch.com/Audio/2008/August/7/02-07-2008-09-57-22.mp3">here</a>.<span> </span></p>
<p class="MsoNormal">The service works by uploading file(s) from your computer to a tracker/server hosted on<a href="http://www.amazon.com/AWS-home-page-Money/b/ref=sc_iw_l_0?ie=UTF8&amp;node=3435361&amp;no=3435361&amp;me=A36L942TSJ2AJA"> Amazon web services</a> through the podmailing client. The provided software is a modified version of the original open source <a href="http://www.bittorrent.com/">Bittorrent Inc</a> python client. Once you register and upload your files a link to the torrent and a direct download option are emailed to the recipient who may choose to use either option. Seeding of the torrent is handled on the podmailing servers.<br />
<!--[endif]--></p>
<p class="MsoNormal">In order to get to either the torrent or direct download link the user must first visit a webpage covered in advertisements. Currently the service is free with few restrictions on the file sizes. I asked Louis directly if advertisement was the only planned source of revenue for this service. His reply was that eventually users will be able to purchase premium accounts which will offer larger file hosting, encryption and higher bandwidth downloads.</p>
<p class="MsoNormal">At present the average file/folder size sent through this service is 200 megabytes. Another interesting point is that the amount of successfully completed downloads made using the direct download link versus downloads made using the podmailing bittorrent client are approximately equal.</p>
<p class="MsoNormal">While podmailing is an interesting twist on the existing <a href="http://en.wikipedia.org/wiki/One-click_hosting">one-click hosting</a> concept, I am unsure as to whether the p2p aspect sufficiently differentiates them from giants such as <a href="http://rapidshare.com/">rapidshare</a> and others.</p>
]]></content>
<link href="http://p2presearch.com/Audio/2008/August/7/02-07-2008-09-57-22.mp3" rel="enclosure" length="6231996" type="audio/mpeg" />
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/08/07/podmailing-the-next-p2p-craze/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/08/07/podmailing-the-next-p2p-craze/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Niall O'Higgins</name>
						<uri>http://niallohiggins.com</uri>
					</author>
		<title type="html"><![CDATA[Three reasons why video is the Holy Grail of P2P]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/07/21/three-reasons-why-video-is-the-holy-grail-of-p2p/" />
		<id>http://blog.p2presearch.com/?p=49</id>
		<updated>2008-07-21T22:55:58Z</updated>
		<published>2008-07-21T22:55:58Z</published>
		<category scheme="http://blog.p2presearch.com" term="BitTorrent Protocol" /><category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="BitTorrent" /><category scheme="http://blog.p2presearch.com" term="p2p streaming" /><category scheme="http://blog.p2presearch.com" term="p2p video" />		<summary type="html"><![CDATA[Peer-to-peer technology has many extremely useful applications.  Fundamentally P2P is about increasing network resilience and decreasing bandwidth costs.  Privacy, anonymity and security are  all secondary to these essential principles.  While BitTorrent has been an extremely successful P2P protocol for certain types of P2P applications, such as patch distribution for Blizzard&#8217;s World [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/07/21/three-reasons-why-video-is-the-holy-grail-of-p2p/"><![CDATA[<p>Peer-to-peer technology has many extremely useful applications.  Fundamentally P2P is about increasing network resilience and decreasing bandwidth costs.  Privacy, anonymity and security are  all secondary to these essential principles.  While BitTorrent has been an extremely successful P2P protocol for certain types of P2P applications, such as patch distribution for Blizzard&#8217;s World of Warcraft, it has also been a failure in other areas.</p>
<p><strong>The Holy Grail</strong></p>
<p>Streaming video is one of the largest consumers of bandwidth today.  Many estimates put it ahead of P2P in terms of gross bandwidth consumption.  Sites like <a href="http://youtube.com">YouTube</a> and <a href="http://video.google.com">Google Video</a> attract vast numbers of viewers.  Various online video streaming services are starting up, offered by companies like <a href="http://www.amazon.com">Amazon</a> and <a href="http://www.netflix.com">NetFlix</a>.  It seems that streaming video, and the sales and advertising opportunities which come along with it, represent an irresistible revenue source for large companies.</p>
<p>However, streaming video also has large bandwidth costs associated with it.  According to the <a href="http://en.wikipedia.org/wiki/Streaming_media">Wikipedia page on streaming video</a>, to stream a standard video to 1,000 viewers would require 300 Mbit/sec of bandwidth using a traditional unicast approach.  All this bandwidth is expensive.  How can streaming providers spread this cost out?  With P2P.  Make your consumers pitch in to host the video.  As the number of viewers increases, so should the amount of peers which can serve up the data, allowing you to scale up the number of participants without proportionate increases in your own bandwidth.</p>
<p><strong>Why not BitTorrent</strong></p>
<p>As we have written about before on this blog, <a href="http://blog.p2presearch.com/2008/04/10/bittorrent-download-strategy-in-the-beginning/">BitTorrent is not good for streaming video due to its rarest-first download ordering policy</a>.  In order to stream video, or music, or whatever - you want it to arrive in a predictable order.  Typically that order is linear, starting at the start.  This way data arrives in the order of consumption.  But BitTorrent does not provide this.  In fact, it almost explicitly guarantees that it will not order data in a linear fashion.  BitTorrent trades predictable ordering for replication increases.  Under BitTorrent, the rarest pieces of data will be replicated the most, and so become less rare.</p>
<p><strong>So what, then?</strong></p>
<p>Companies are instead developing their own protocols.  The <a href="http://www.p2p-next.org/">EU has given 19 million euro to one P2P group</a> which is modifying the BitTorrent protocol to support streaming - presumably by doing away with the rarest-first policy.</p>
<p>China has a number of well-funded start ups developing their own P2P video streaming technologies.  <a href="http://www.blin.cn/">Blin.cn</a> <a href="http://www.techcrunch.com/2007/10/16/50x-faster-than-bittorrent-i-want/">claims to be 50x faster than BitTorrent for video streaming</a>.  Google is <a href="http://gigaom.com/2007/01/05/google-confirms-xunlei/">an investor in Chinese streaming company Xunlei</a>.</p>
<p>And of course <a href="http://www.bittorrent.com">BitTorrent, Inc</a> have been <a href="http://blog.streamingmedia.com/the_business_of_online_vi/2007/05/bittorrent_laun.html">working to develop their own video streaming version of their protocol</a>.</p>
<p>Who will come out on top remains to be seen.</p>
]]></content>
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/07/21/three-reasons-why-video-is-the-holy-grail-of-p2p/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/07/21/three-reasons-why-video-is-the-holy-grail-of-p2p/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Niall O'Higgins</name>
						<uri>http://niallohiggins.com</uri>
					</author>
		<title type="html"><![CDATA[Why Python is better than C]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/07/17/why-python-is-better-than-c/" />
		<id>http://blog.p2presearch.com/?p=17</id>
		<updated>2008-11-10T04:26:08Z</updated>
		<published>2008-07-18T00:33:56Z</published>
		<category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="C" /><category scheme="http://blog.p2presearch.com" term="hacking" /><category scheme="http://blog.p2presearch.com" term="Python" />		<summary type="html"><![CDATA[I love C.  I&#8217;ve written a little bit of C code in my time - both UNIX user land and kernel stuff.  I co-wrote OpenBSD&#8217;s rum(4) i802.11a/b/g wireless driver for Ralink USB devices [article here] and also made large contributions to OpenRCS and OpenCVS [articles here, here and here].  I&#8217;m also the [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/07/17/why-python-is-better-than-c/"><![CDATA[<p>I love C.  I&#8217;ve written a little bit of C code in my time - both UNIX user land and kernel stuff.  I co-wrote OpenBSD&#8217;s <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=rum&#038;apropos=0&#038;sektion=0&#038;manpath=OpenBSD+Current&#038;arch=i386&#038;format=html">rum(4) i802.11a/b/g wireless driver for Ralink USB devices</a> [article <a href="http://undeadly.org/cgi?action=article&#038;sid=20060406233608">here</a>] and also made large contributions to OpenRCS and <a href="http://www.opencvs.org">OpenCVS</a> [articles <a href="http://undeadly.org/cgi?action=article&#038;sid=20060310202814">here</a>, <a href="http://undeadly.org/cgi?action=article&#038;sid=20070214225624">here</a> and <a href="http://undeadly.org/cgi?action=article&#038;sid=20070117165957">here</a>].  I&#8217;m also the author of the <a href="http://p2presearch.com/unworkable/">small, portable and efficient BitTorrent implementation, Unworkable</a>, which is part of our work at P2P Research.  So I am relatively familiar with the language.</p>
<p>I&#8217;ve been hacking <a href="http://www.python.org">Python</a> code for around two years now, really developing a taste for it from my day job.  I would not consider myself a Python guru by any stretch, but I&#8217;ve worked with many different parts of the standard library, and used enough of the features (generators, lambdas, list comprehensions, classes etc) that I reckon I have a pretty solid handle on what it offers.</p>
<p>The majority of the crawling and data analysis software developed here at <a href="http://p2presearch.com">P2P Research</a> is written in Python - with a little bit of C here and there, for performance.  I suppose that the system features our stuff uses can be broken down into the following categories:</p>
<ul>
<li><strong>String manipulation / parsing</strong>.</li>
<li><strong>Fast dynamic data structures</strong>.  Lists and dictionaries, at a high level, including sorting etc.</li>
<li><strong>Networking.</strong>  Specifically, a lot of HTTP is spoken.</li>
<li><strong>Threading</strong>. For increased throughput.</li>
<li><strong>File I/O</strong>.  For archival purposes.</li>
<li><strong>Database</strong>. We use PostgreSQL for some reporting and analysis.</li>
</ul>
<p>I&#8217;m going to do a brief comparison with each of these items, comparing the two languages.  All these things can be achieved relatively straight forwardly with both C and Python.  Consider how many network servers, text editors and databases are written purely in C.  The POSIX and ANSI standards actually give you a pretty good set of library functions for doing these things, too - apart from the data structure area I suppose.  There are mature interfaces available for working with databases.</p>
<p>What Python really gives you that C does not, in my opinion, are the following:</p>
<ul>
<li>
<p>
Largely eliminates the headaches of memory management.
</p>
</li>
<li>
<p>
Similarly, makes string manipulation much less painful, while maintaining much of C&#8217;s performance by interfacing directly with <em>printf</em> family of functions.  Consider the following C snippet, followed by the Python equivalent:
</p>
<p>
<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="c"><span style="color: #808080; font-style: italic;">/* Format a HTTP 1.0 GET request safely in C */</span>
l <span style="color: #339933;">=</span> snprintf<span style="color: #009900;">&#40;</span>request, GETSTRINGLEN,
    <span style="color: #ff0000;">&quot;GET %s%s HTTP/1.0<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>Host: %s<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>User-agent: Unworkable/%s<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>, path,
    params, host, UNWORKABLE_VERSION<span style="color: #009900;">&#41;</span>;
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>l <span style="color: #339933;">==</span> <span style="color: #cc66cc;">-1</span> || l &gt;<span style="color: #339933;">=</span> GETSTRINGLEN<span style="color: #009900;">&#41;</span>
        <span style="color: #b1b100;">goto</span> trunc;
<span style="color: #808080; font-style: italic;">/* ... */</span>
trunc<span style="color: #339933;">:</span>
trace<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;announce: string truncation detected&quot;</span><span style="color: #009900;">&#41;</span>;
xfree<span style="color: #009900;">&#40;</span>params<span style="color: #009900;">&#41;</span>;
xfree<span style="color: #009900;">&#40;</span>request<span style="color: #009900;">&#41;</span>;
xfree<span style="color: #009900;">&#40;</span>tparams<span style="color: #009900;">&#41;</span>;
<span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">-1</span><span style="color: #009900;">&#41;</span>;</pre></td></tr></table></div>

</p>
<p>
<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="python"><span style="color: #808080; font-style: italic;"># Format a HTTP 1.0 GET request safely in Python</span>
request = <span style="color: #483d8b;">&quot;GET %s%s HTTP/1.0<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>HOST: %s<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>User-agent: Unworkable/%s<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span><span style="color: black;">&#40;</span>path, 
    params, host, UNWORKABLE_VERSION<span style="color: black;">&#41;</span></pre></td></tr></table></div>

</p>
<p>
The big difference in this case, is really the amount of care you need to take with memory cleanup and error checking in C.  Python is far more lenient when it comes to string and memory manipulation than C, which saves a great deal of complexity.
</p>
</li>
<li>
<p>
While there are good, relatively straight-forward implementations of various data structures for C, well-known examples being the venerable <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=queue&#038;apropos=0&#038;sektion=0&#038;manpath=OpenBSD+Current&#038;arch=i386&#038;format=html">sys/queue.h</a> for various sorts of linked lists, and the similar <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=tree&#038;apropos=0&#038;sektion=3&#038;manpath=OpenBSD+Current&#038;arch=i386&#038;format=html">sys/tree.h</a> for <a href="http://en.wikipedia.org/wiki/Red-black_tree">red-black trees</a> or <a href="http://en.wikipedia.org/wiki/Splay_tree">splay trees</a>, typically used to implement dictionaries.
</p>
<p>
But these C macros, while extremely helpful, are still tricky.  It is not obvious, for example, how to make an object (In C, something declared with the struct keyword) be allowed to be a member of an arbitrary set of TAILQs.  In fact, you need a fairly convoluted definition, let alone complex management code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre class="c"><span style="color: #808080; font-style: italic;">/* An actual node, which can be used in arbitrary lists */</span>
<span style="color: #993333;">struct</span> node <span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>key;
<span style="color: #009900;">&#125;</span>;
&nbsp;
<span style="color: #808080; font-style: italic;">/* Separated list structure for managing nodes */</span>
<span style="color: #993333;">struct</span> node_list_entry <span style="color: #009900;">&#123;</span>
        TAILQ_ENTRY<span style="color: #009900;">&#40;</span>node_list_entry<span style="color: #009900;">&#41;</span>     node_list;
        <span style="color: #993333;">struct</span> node <span style="color: #339933;">*</span>item;
<span style="color: #009900;">&#125;</span>;</pre></td></tr></table></div>

<p>It makes you appreciate Python code like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="python">mylist = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
mylist.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;foo&quot;</span><span style="color: black;">&#41;</span>
mylist.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
mylist.<span style="color: black;">sort</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>And after investigating what is involved in getting dictionary-like storage from C (left as an exercise to the reader), code like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="python">mydict = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
mydict<span style="color: black;">&#91;</span><span style="color: #483d8b;">'foo'</span><span style="color: black;">&#93;</span> = bar
<span style="color: #ff7700;font-weight:bold;">del</span> mydict<span style="color: black;">&#91;</span><span style="color: #483d8b;">'foo'</span><span style="color: black;">&#93;</span></pre></td></tr></table></div>

</p>
</li>
<li>
<p>
The TCP/IP stacks in all major operating systems are written in C, and a good number of extremely popular network clients and servers are also (Apache, Sendmail, OpenSSH).  One could perhaps even argue that networking is one of the things that C is best suited for, in fact, particularly very low level networking.  However, just opening a TCP socket safely is quite a lot of C code:
</p>
<p>
<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="c"><span style="color: #808080; font-style: italic;">/* C snippet to connect to a remote host via TCP */</span>
<span style="color: #993333;">struct</span> addrinfo hints, <span style="color: #339933;">*</span>res, <span style="color: #339933;">*</span>res0;
<span style="color: #993333;">int</span> error, sockfd;
memset<span style="color: #009900;">&#40;</span><span style="color: #339933;">&amp;</span>hints, <span style="color: #cc66cc;">0</span>, <span style="color: #993333;">sizeof</span><span style="color: #009900;">&#40;</span>hints<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>;
hints.<span style="color: #202020;">ai_family</span> <span style="color: #339933;">=</span> PF_INET;
hints.<span style="color: #202020;">ai_socktype</span> <span style="color: #339933;">=</span> SOCK_STREAM;
error <span style="color: #339933;">=</span> getaddrinfo<span style="color: #009900;">&#40;</span>host, port, <span style="color: #339933;">&amp;</span>hints, <span style="color: #339933;">&amp;</span>res0<span style="color: #009900;">&#41;</span>;
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>error<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #808080; font-style: italic;">/* handle error */</span>
<span style="color: #009900;">&#125;</span>
res <span style="color: #339933;">=</span> res0;
sockfd <span style="color: #339933;">=</span> socket<span style="color: #009900;">&#40;</span>res<span style="color: #339933;">-</span>&gt;ai_family, res<span style="color: #339933;">-</span>&gt;ai_socktype, res<span style="color: #339933;">-</span>&gt;ai_protocol<span style="color: #009900;">&#41;</span>;
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>sockfd <span style="color: #339933;">==</span> <span style="color: #cc66cc;">-1</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
 <span style="color: #808080; font-style: italic;">/* handle error */</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>connect<span style="color: #009900;">&#40;</span>sockfd, res<span style="color: #339933;">-</span>&gt;ai_addr, res<span style="color: #339933;">-</span>&gt;ai_addrlen<span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #cc66cc;">-1</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
 <span style="color: #808080; font-style: italic;">/* handle error */</span>
<span style="color: #009900;">&#125;</span>
freeaddrinfo<span style="color: #009900;">&#40;</span>res0<span style="color: #009900;">&#41;</span>;
<span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span>sockfd<span style="color: #009900;">&#41;</span>;</pre></td></tr></table></div>

</p>
<p>
Now compare this to the Python equivalent:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="python"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">socket</span>
s = <span style="color: #dc143c;">socket</span>.<span style="color: #dc143c;">socket</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">socket</span>.<span style="color: black;">AF_INET</span>, <span style="color: #dc143c;">socket</span>.<span style="color: black;">SOCK_STREAM</span><span style="color: black;">&#41;</span>
s.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>HOST, PORT<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

</p>
<p>
When it comes to HTTP, or other protocols, the difference is even greater.  Of course, much of this can be attributed to string and memory handling.  To be fair, implementing a basic HTTP/1.0 client in C is not that hard - I did it in under 500 lines of code in Unworkable.  However, Python&#8217;s standard library - whether via <a href="http://docs.python.org/lib/module-urllib.html">urllib</a>, <a href="http://docs.python.org/lib/module-urllib2.html">urllib2</a> or <a href="http://www.python.org/doc/lib/module-httplib.html">httplib</a> directly - just makes it at least an order of magnitude less of a headache compared to C.
</p>
</li>
<li>
<p>In the realm of threading, it seems pretty clear to me that the <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=pthreads&#038;apropos=0&#038;sektion=0&#038;manpath=OpenBSD+Current&#038;arch=i386&#038;format=html">POSIX threads</a> (pthreads) interface has won.  Of course, the API is available on all POSIX compliant operating systems.  I don&#8217;t have a huge amount of experience with using it through C - a few years ago I did some very simple stuff with it.  While not impossible, it is complicated and tricky enough to deal with.  On the other hand, Python offers its own <a href="http://docs.python.org/lib/module-threading.html">threading module</a>, loosely based on Java&#8217;s API.  I find it very easy to use threads in Python - perhaps the most glaring feature being that the Python threading module supports both an object-oriented paradigm - where you extend the Thread class with your own - and also a functional approach.  The functional approach makes great sense to me - I very much like the idea.  Creating a thread like this is as simple as:
</p>
<p>
<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre class="python"><span style="color: #808080; font-style: italic;"># Simple Python threads example, using functional paradigm</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">threading</span>
<span style="color: #ff7700;font-weight:bold;">def</span> worker<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
        <span style="color: #808080; font-style: italic;"># do work then break</span>
        <span style="color: #ff7700;font-weight:bold;">break</span>
&nbsp;
t = <span style="color: #dc143c;">threading</span>.<span style="color: black;">Thread</span><span style="color: black;">&#40;</span>target=worker<span style="color: black;">&#41;</span>
t.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

</p>
</li>
<li>
<p>
File I/O is an area where straight C really isn&#8217;t too bad.  You have your POSIX interface, via <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=open&#038;sektion=2">open(2)</a>, <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=read">read(2)</a>, <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=write&#038;sektion=2">write(2)</a>, etc - and you have your ANSI buffered I/O functions with <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=fopen&#038;sektion=3">fopen(3)</a>, <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=fread&#038;sektion=3">fread(3)</a>, <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=fwrite&#038;sektion=3">fwrite(3)</a>, etc.  Many of the shell commands for file system manipulation map very closely to libc calls.  For example, <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=mkdir&#038;sektion=2">mkdir(2)</a>, <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=dirname&#038;sektion=3">dirname(3)</a>, <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=stat&#038;sektion=2">stat(2)</a> and so on.  Python - once again mostly thanks to being able to handle the memory management for you - helps a lot in the situation where you are reading from a file, of which the size is unknown (for example, a pipe, or a network socket).
</p>
<p>
I would also mention that Python&#8217;s standard library has a concept of &#8216;file-like objects&#8217; which are essentially opaque data buffers which can be accessed through exactly the same interfaces as actual files.  Common examples are <a href="http://docs.python.org/lib/module-StringIO.html">StringIO</a>, <a href="http://docs.python.org/lib/module-urllib.html">urllib</a> and <a href="http://docs.python.org/lib/module-urllib2.html">urllib2</a>.
</p>
</li>
<li>
<p>
When it comes to working with databases, Python has the usual advantage of making it easy to deal with dynamic result sets.  Additionally, abstractions like <a href="http://www.python.org/dev/peps/pep-0249/">DB API 2</a> and some of the advanced language features such as <a href="http://www.python.org/dev/peps/pep-0202/">list comprehensions</a> and <a href="http://www.python.org/dev/peps/pep-0342/">generators</a>, can greatly reduce the amount of code required for filtering and processing data from databases.  Furthermore, I have found that <a href="http://www.initd.org/pub/software/psycopg/">psycopg2</a> (the website of which is unfortunately in bad shape) works extremely well in a threaded environment.</p>
</li>
</ul>
<p>
In conclusion, Python allows you to write complicated, useful applications, with fewer bugs, much faster than in C.  It removes many (but not all) headaches associated with memory management and data structures.  Much of the portability issues are taken care of for you.  Essentially with Python you stand on the shoulders of giants.  While C is still extremely useful and important, Python makes excellent sense for many classes of program.</p>
]]></content>
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/07/17/why-python-is-better-than-c/#comments" thr:count="3" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/07/17/why-python-is-better-than-c/feed/atom/" thr:count="3" />
		<thr:total>3</thr:total>
	</entry>
		<entry>
		<author>
			<name>Niall O'Higgins</name>
						<uri>http://niallohiggins.com</uri>
					</author>
		<title type="html"><![CDATA[P2P Research at Google wrap up and slides]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/07/16/p2p-research-at-google-wrap-up-and-slides/" />
		<id>http://blog.p2presearch.com/?p=19</id>
		<updated>2008-07-17T03:50:07Z</updated>
		<published>2008-07-17T03:50:07Z</published>
		<category scheme="http://blog.p2presearch.com" term="BitTorrent Protocol" /><category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="Piracy Research" /><category scheme="http://blog.p2presearch.com" term="baypiggies" /><category scheme="http://blog.p2presearch.com" term="google" /><category scheme="http://blog.p2presearch.com" term="presentation" />		<summary type="html"><![CDATA[I gave a talk at Google/bayPIGgies last week.  I was very pleasantly surprised by the turnout - and most of all by the excellent questions asked by the audience.  The interest from people at the talk crossed many domains - people were generally curious about many aspects, from security to technical scalability concerns [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/07/16/p2p-research-at-google-wrap-up-and-slides/"><![CDATA[<p>I gave a talk at Google/bayPIGgies last week.  I was very pleasantly surprised by the turnout - and most of all by the excellent questions asked by the audience.  The interest from people at the talk crossed many domains - people were generally curious about many aspects, from security to technical scalability concerns to legal issues.</p>
<p>Four employees from <a href="http://www.bittorrent.com/">BitTorrent, Inc</a> were present.  It was wonderful to have the chance to talk to some of the pioneers in this technology.  They seemed very interested in the work we are doing and invited us to call in to their offices in downtown SF some time.</p>
<p>Overall, it was wonderful to have so much interest in our research and to be able to spread some information about BitTorrent and p2p to a wider audience.  I believe the talk was recorded on video and should be made available by Google on YouTube or Google Video, however I don&#8217;t have the URL just yet.  I did the slides in S5 though, and you can view <a href="http://p2presearch.com/baypiggies20080710/slides.html">the slides for the p2p research talk at Google here</a> in a regular web browser.</p>
]]></content>
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/07/16/p2p-research-at-google-wrap-up-and-slides/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/07/16/p2p-research-at-google-wrap-up-and-slides/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Niall O'Higgins</name>
						<uri>http://niallohiggins.com</uri>
					</author>
		<title type="html"><![CDATA[P2P Research Talk @ Google in Mountain View tonight]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/07/10/p2p-research-talk-google-in-mountain-view-tonight/" />
		<id>http://blog.p2presearch.com/?p=18</id>
		<updated>2008-07-10T16:53:10Z</updated>
		<published>2008-07-10T16:53:10Z</published>
		<category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="Piracy Research" />		<summary type="html"><![CDATA[Just a quick note, I will be speaking on the subject of our research at the Google amphitheatre this evening.  Details on the talk, along with directions etc, can be found at the BayPiggies site. 
I will post my slides online and I believe there will be a good-quality recording of the talk made [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/07/10/p2p-research-talk-google-in-mountain-view-tonight/"><![CDATA[<p>Just a quick note, I will be speaking on the subject of our research at the Google amphitheatre this evening.  Details on the talk, along with directions etc, can be found at <a href="http://baypiggies.net">the BayPiggies site</a>. </p>
<p>I will post my slides online and I believe there will be a good-quality recording of the talk made available on Youtube or Google Video.</p>
]]></content>
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/07/10/p2p-research-talk-google-in-mountain-view-tonight/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/07/10/p2p-research-talk-google-in-mountain-view-tonight/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>Niall O'Higgins</name>
						<uri>http://niallohiggins.com</uri>
					</author>
		<title type="html"><![CDATA[Have your software email you when it is in trouble with Python logging]]></title>
		<link rel="alternate" type="text/html" href="http://blog.p2presearch.com/2008/06/29/have-your-software-email-you-when-it-is-in-trouble-with-python-logging/" />
		<id>http://blog.p2presearch.com/?p=16</id>
		<updated>2008-07-01T13:02:45Z</updated>
		<published>2008-06-29T16:08:03Z</published>
		<category scheme="http://blog.p2presearch.com" term="P2P Software" /><category scheme="http://blog.p2presearch.com" term="bittorrent crawling" /><category scheme="http://blog.p2presearch.com" term="email notification" /><category scheme="http://blog.p2presearch.com" term="p2p agents" /><category scheme="http://blog.p2presearch.com" term="python logging" />		<summary type="html"><![CDATA[Here at P2P Research, we have numerous long-running software agents written mostly in Python, which crawl BitTorrent and perform various types of analysis.  While these agents are relatively robust, every few months they might hiccup - perhaps not even from a condition within their control, like a full disk.
In any case, we want these [...]]]></summary>
		<content type="html" xml:base="http://blog.p2presearch.com/2008/06/29/have-your-software-email-you-when-it-is-in-trouble-with-python-logging/"><![CDATA[<p>Here at P2P Research, we have numerous long-running software agents written mostly in <a href="http://www.python.org">Python</a>, which crawl BitTorrent and perform various types of analysis.  While these agents are relatively robust, every few months they might hiccup - perhaps not even from a condition within their control, like a full disk.</p>
<p>In any case, we want these agents to be running as much as possible.  If any of the agents exits for some reason, we want to know about it.  One of the most important pieces in any production software system is robust logging.  Long-running daemon processes should at a minimum log to an on-disk file in a clean, consistent format with timestamps.  After very basic logging, the next requirement is some log rotation scheme, to prevent the disk from filling up with logs, and perhaps the introduction of log levels, to make it easy to distinguish warnings from critical events from debug events and so on.</p>
<p>Fortunately with <a href="http://docs.python.org/lib/module-logging.html">Python&#8217;s logging module</a> makes this trivial to implement.  Where you might usually use a print statement in debugging, you just use one of the logging functions.  Instead of</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="python"><span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #66cc66;">&gt;&gt;</span> stderr, <span style="color: #483d8b;">&quot;message&quot;</span></pre></td></tr></table></div>

<p>you probably want</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="python"><span style="color: #dc143c;">logging</span>.<span style="color: black;">warn</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;message&quot;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>or</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="python"><span style="color: #dc143c;">logging</span>.<span style="color: black;">critical</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;message&quot;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Setting up rotated logging is very easy, since Python has the <a href="http://docs.python.org/lib/node412.html">RotatedFileHandler</a> already implemented in its standard library.  To set it up for use in your program, you can add something like the following snippet:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre class="python"><span style="color: #808080; font-style: italic;"># Rotated logging setup snippet</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">logging</span>
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">logging</span>.<span style="color: black;">handlers</span> <span style="color: #ff7700;font-weight:bold;">import</span> RotatingFileHandler
&nbsp;
LOG_FILE=<span style="color: #483d8b;">'/path/to/logfile'</span>
LOG_FORMAT=<span style="color: #483d8b;">'%(asctime)s %(levelname)s %(message)s'</span>
MAX_BYTES=<span style="color: #ff4500;">1024</span> <span style="color: #66cc66;">*</span> <span style="color: #ff4500;">100</span> <span style="color: #808080; font-style: italic;"># 100KB</span>
BACKUP_COUNT=<span style="color: #ff4500;">10</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># ...</span>
&nbsp;
rotating_handler = RotatingFileHandler<span style="color: black;">&#40;</span>LOG_FILE, <span style="color: #483d8b;">'a'</span>,MAX_BYTES, BACKUP_COUNT<span style="color: black;">&#41;</span>
rotating_handler.<span style="color: black;">setFormatter</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">logging</span>.<span style="color: black;">Formatter</span><span style="color: black;">&#40;</span>LOG_FORMAT<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">logging</span>.<span style="color: black;">getLogger</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">addHandler</span><span style="color: black;">&#40;</span>rotating_handler<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #dc143c;">logging</span>.<span style="color: black;">info</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Logging setup complete&quot;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Logging to disk is a great start.  But what if you want to notify a human if something goes horribly wrong?  We can define &#8220;horribly wrong&#8221; as logging.crit(), and set up a <a href="http://docs.python.org/lib/node418.html">SMTPHandler</a> to send email whenever this happens:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="python"><span style="color: #808080; font-style: italic;"># Email logging setup snippet</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">logging</span>
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">logging</span>.<span style="color: black;">handlers</span> <span style="color: #ff7700;font-weight:bold;">import</span> SMTPHandler
LOG_FORMAT=<span style="color: #483d8b;">'%(asctime)s %(levelname)s %(message)s'</span>
<span style="color: #808080; font-style: italic;"># who do we send the log email to</span>
TO_LIST=<span style="color: black;">&#91;</span><span style="color: #483d8b;">'admin@example.com'</span>, <span style="color: #483d8b;">'me@mydomain.com'</span><span style="color: black;">&#93;</span>
<span style="color: #808080; font-style: italic;"># who is it from</span>
FROM=<span style="color: #483d8b;">'agent@example.com'</span>
<span style="color: #808080; font-style: italic;"># subject template</span>
SUBJECT=<span style="color: #483d8b;">'[my agent] Critical event'</span>
<span style="color: #808080; font-style: italic;"># SMTP server</span>
SMTP_SERVER=<span style="color: #483d8b;">'localhost'</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># ...</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># smtp handler, only logs on CRITICAL level messages</span>
smtp_handler = <span style="color: #dc143c;">logging</span>.<span style="color: black;">handlers</span>.<span style="color: black;">SMTPHandler</span><span style="color: black;">&#40;</span>SMTP_SERVER, FROM, TO_LIST, SUBJECT<span style="color: black;">&#41;</span>
smtp_handler.<span style="color: black;">setLevel</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">logging</span>.<span style="color: black;">CRITICAL</span><span style="color: black;">&#41;</span>
smtp_handler.<span style="color: black;">setFormatter</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">logging</span>.<span style="color: black;">Formatter</span><span style="color: black;">&#40;</span>LOG_FORMAT<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">logging</span>.<span style="color: black;">getLogger</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">addHandler</span><span style="color: black;">&#40;</span>smtp_handler<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Now we are configured to send notices via email on any critical event.  However, what do we do about uncaught exceptions which don&#8217;t have a logging.critical() handler, and the interpreter just exits?  We install our own uncaught exception handler, which calls logging.critical(), of course!</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="python"><span style="color: #808080; font-style: italic;"># Snippet to setup custom uncaught exception handler</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #ff7700;font-weight:bold;">def</span> except_hook<span style="color: black;">&#40;</span><span style="color: #008000;">type</span>, value, tb<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">' handler for exceptions which would cause the interpreter to exit '</span><span style="color: #483d8b;">''</span>
    <span style="color: #dc143c;">logging</span>.<span style="color: black;">critical</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Uncaught exception of type `%s' with value '%s'&quot;</span> <span style="color: #66cc66;">%</span><span style="color: black;">&#40;</span><span style="color: #008000;">type</span>, value<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">os</span>._exit<span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #dc143c;">sys</span>.<span style="color: black;">excepthook</span> = except_hook</pre></td></tr></table></div>

<p>And there you have it!</p>
]]></content>
		<link rel="replies" type="text/html" href="http://blog.p2presearch.com/2008/06/29/have-your-software-email-you-when-it-is-in-trouble-with-python-logging/#comments" thr:count="3" />
		<link rel="replies" type="application/atom+xml" href="http://blog.p2presearch.com/2008/06/29/have-your-software-email-you-when-it-is-in-trouble-with-python-logging/feed/atom/" thr:count="3" />
		<thr:total>3</thr:total>
	</entry>
	</feed><!-- Dynamic Page Served (once) in 0.776 seconds --><!-- Cached page served by WP-Cache -->
