I am talking here about the case where you start a download with no existing data, in other words, from scratch. Torrents are divided into units called pieces. The torrent meta data contains the SHA-1 checksums for each piece, so that we can hash each piece once we have downloaded it, for verification purposes. Pieces are downloaded on a block-by-block basis - typically the block size is sixteen kilobytes (16KB).
Another important thing to understand is that BitTorrent peers use a “tit-for-tat” transfer scheduling algorithm. That is, peers will be rewarded by other peers for uploading data to them. It is therefore important that a bootstrapping peer have at least a small number of complete pieces as soon as possible, in order to share them, and hence be rewarded by the “tit-for-tat” algorithm. The bootstrapping peer wants to get at least a few pieces to share as soon as possible.
Additionally, pieces are usually downloaded in rarest-first order. This ensures that rare pieces are replicated sufficiently so that holes do not appear in the torrent as peers leave the swarm. One of the design considerations of BitTorrent - vs other distributed data mechanisms - is to have as reliable replication as possible. It is highly undesirable in bit torrent for even a single piece to disappear, as this could render the final file(s) completely unusable. This contrasts heavily with a file distribution system such as Amazon’s Dynamo, where it may be quite acceptable for data to disappear - is it really a huge deal if a user loses the contents of their shopping cart in an outage?
In any case, the desire to ensure replication of the rarest pieces in a torrent is overridden by the need to get a small number of pieces as a peer is bootstrapping. To this end, the initial pieces are downloaded in random order as opposed to rarest-first order. In practice this means the bootstrapping peer should get its initial data quite a bit faster. The number of pieces to download before switching to rarest-first order is suggested to be four.
As a general point, the peer will aim to complete a piece fully before requesting blocks belonging to another piece. Alternately put, if the peer has a few blocks in one piece, it will concentrate on downloading the remaining blocks belonging to that piece before starting on another piece.