BitTorrent FAQ and Guide
What is BitTorrent?
BitTorrent is a protocol designed for transferring files. It is peer-to-peer in nature, as users connect to each other directly to send and receive portions of the file. However, there is a central server (called a tracker) which coordinates the action of all such peers. The tracker only manages connections, it does not have any knowledge of the contents of the files being distributed, and therefore a large number of users can be supported with relatively limited tracker bandwidth. The key philosophy of BitTorrent is that users should upload (transmit outbound) at the same time they are downloading (receiving inbound.) In this manner, network bandwidth is utilized as efficiently as possible. BitTorrent is designed to work better as the number of people interested in a certain file increases, in contrast to other file transfer protocols.
One analogy to describe this process might be to visualize a group of people sitting at a table. Each person at the table can both talk and listen to any other person at the table. These people are each trying to get a complete copy of a book. Person A announces that he has pages 1-10, 23, 42-50, and 75. Persons C, D, and E are each missing some of those pages that A has, and so they coordinate such that A gives them each copies of the pages he has that they are missing. Person B then announces that she has pages 11-22, 31-37, and 63-70. Persons A, D, and E tell B they would like some of her pages, so she gives them copies of the pages that she has. The process continues around the table until everyone has announced what they have (and hence what they are missing.) The people at the table coordinate to swap parts of this book until everyone has everything. There is also another person at the table, who we’ll call ‘S’. This person has a complete copy of the book, and so doesn’t need anything sent to him. He responds with pages that no one else in the group has. At first, when everyone has just arrived, they all must talk to him to get their first set of pages. However, the people are smart enough to not all get the same pages from him. After a short while they all have most of the book amongst themselves, even if no one person has the whole thing. In this manner, this one person can share a book that he has with many other people, without having to give a full copy to everyone that’s interested. He can instead give out different parts to different people, and they will be able to share it amongst themselves. This person who we’ve referred to as ‘S’ is called a seed in the terminology of BitTorrent. There’s more about the various terms in a later section.
How does BitTorrent compare to other forms of file transfer?
The most common method by which files are transferred on the Internet is the client-server model. A central server sends the entire file to each client that requests it — this is how both http and ftp work. The clients only speak to the server, and never to each other. The main advantages of this method are that it’s simple to set up, and the files are usually always available since the servers tend to be dedicated to the task of serving, and are always on and connected to the Internet. However, this model has a significant problem with files that are large or very popular, or both. Namely, it takes a great deal of bandwidth and server resources to distribute such a file, since the server must transmit the entire file to each client. Perhaps you may have tried to download a demo of a new game just released, or CD images of a new Linux distribution, and found that all the servers report “too many users,” or there is a long queue that you have to wait through. The concept of mirrors partially addresses this shortcoming by distributing the load across multiple servers. But it requires a lot of coordination and effort to set up an efficient network of mirrors, and it’s usually only feasible for the busiest of sites.
Another method of transferring files has become popular recently: the peer-to-peer network, systems such as Kazaa, eDonkey, Gnutella, Direct Connect, etc. In most of these networks, ordinary Internet users trade files by directly connecting one-to-one. The advantage here is that files can be shared without having access to a proper server, and because of this there is little accountability for the contents of the files. Hence, these networks tend to be very popular for illicit files such as music, movies, pirated software, etc. Typically, a downloader receives a file from a single source, however the newest version of some clients allow downloading a single file from multiple sources for higher speeds. The problem discussed above of popular downloads is somewhat mitigated, because there’s a greater chance that a popular file will be offered by a number of peers. The breadth of files available tends to be fairly good, though download speeds for obscure files tend to be low. Another common problem sometimes associated with these systems is the significant protocol overhead for passing search queries amongst the peers, and the number of peers that one can reach is often limited as a result. Partially downloaded files are usually not available to other peers, although some newer clients may offer this functionality. Availability is generally dependent on the goodwill of the users, to the extent that some of these networks have tried to enforce rules or restrictions regarding send/receive ratios.
Use of the Usenet binary newsgroups is yet another method of file distribution, one that is substantially different from the other methods. Files transferred over Usenet are often subject to miniscule windows of opportunity. Typical retention time of binary news servers are often as low as 24 hours, and having a posted file available for a week is considered a long time. However, the Usenet model is relatively efficient, in that the messages are passed around a large web of peers from one news server to another, and finally fanned out to the end user from there. Often the end user connects to a server provided by his or her ISP, resulting in further bandwidth savings. Usenet is also one of the more anonymous forms of file sharing, and it too is often used for illicit files of almost any nature. Due to the nature of NNTP, a file’s popularity has little to do with its availability and hence downloads from Usenet tend to be quite fast regardless of content. The downsides of this method include a baroque set of rules and procedures, and requires a certain amount of effort and understanding from the user. Patience is often required to get a complete file due to the nature of splitting big files into a huge number of smaller posts. Finally, access to Usenet often must be purchased due to the extremely high volume of messages in the binary groups.
BitTorrent is closest to Usenet, in my opinion. It is best suited to newer files, of which a number of people have interest in. Obscure or older files tend to not be available. Perhaps as the software matures a more suitable means of keeping torrents seeded will emerge, but currently the client is quite resource-intensive, making it cumbersome to share a number of files. BitTorrent also deals well with files that are in high demand, especially compared to the other methods.
What do all these words mean? (seeding, uploading, share rating, etc.)
Here is a brief list of words associated with BitTorrent and their meanings.
torrent
Usually this refers to the small metadata file you receive from the web server (the one that ends in .torrent.) Metadata here means that the file contains information about the data you want to download, not the data itself. This is what is sent to your computer when you click on a download link on a website. You can also save the torrent file to your local system, and then click on it to open the BitTorrent download. This is useful if you want to be able to re-open the torrent later on without having to find the link again.
In some uses, it can also refer to everything associated with a certain file available with BitTorrent. For example, someone might say “I downloaded that torrent” or “that server has a lot of good torrents”, meaning there are lots of good files available via BitTorrent on that server.
peer
A peer is another computer on the internet that you connect to and transfer data. Generally a peer does not have the complete file, otherwise it would be called a seed. Some people also refer to peers as leeches, to distinguish them from those generous folks who have completed their download and continue to leave the client running and act as a seed.
seed
A computer that has a complete copy of a certain torrent. Once your client finishes downloading, it will remain open until you click the Finish button (or otherwise close it.) This is known as being a seed or seeding. You can also start a BT client with a complete file, and once BT has checked the file it will connect and seed the file to others. Generally, it’s considered good manners to continue seeding a file after you have finished downloading, to help out others. Also, when a new torrent is posted to a tracker, someone must seed it in order for it to be available to others. Remember, the tracker doesn’t know anything of the actual contents of a file, so it’s important to follow through and seed a file if you upload the torrent to a tracker.
reseed
When there are zero seeds for a given torrent (and not enough peers to have a distributed copy), then eventually all the peers will get stuck with an incomplete file, since no one in the swarm has the missing pieces. When this happens, someone with a complete file (a seed) must connect to the swarm so that those missing pieces can be transferred. This is called reseeding. Usually a request for a reseed comes with an implicit promise that the requester will leave his or her client open for some time period after finishing (to add longevity to the torrent) in return for the kind soul reseeding the file.
swarm
The group of machines that are collectively connected for a particular file. For example, if you start a BitTorrent client and it tells you that you’re connected to 10 peers and 3 seeds, then the swarm consists of you and those 13 other people.
tracker
A server on the Internet that acts to coordinate the action of BitTorrent clients. When you open a torrent, your machine contacts the tracker and asks for a list of peers to contact. Periodically throughout the transfer, your machine will check in with the tracker, telling it how much you’ve downloaded and uploaded, how much you have left before finishing, and the state you’re in (starting, finished download, stopping.) If a tracker is down and you try to open a torrent, you will be unable to connect. If a tracker goes down during a torrent (i.e., you have already connected at some point and are already talking to peers), you will be able to continue transferring with those peers, but no new peers will be able to contact you. Often tracker errors are temporary, so the best thing to do is just wait and leave the client open to continue trying.
downloading
Receiving data FROM another computer.
uploading
Sending data TO another computer.
share rating
If you are using the experimental client with the stats-patch, you will see a share rating displayed on the GUI panel. This is simply the ratio of your amount uploaded divided by your amount downloaded. The amounts used are for the current session only, not over the history of the file. If you achieve a share ratio of 1.0, that would mean you’ve uploaded as much as you’ve downloaded. The higher the number, the more you have contributed. If you see a share ratio of “oo”, this means infinity, which will happen if you open a BT client with a complete file (i.e., you seed the file.) In this case you download nothing since you have the full file, and so anything you send will cause the ratio to reach infinity. Note: The share rating is just a number that is displayed for your convenience. It does not directly affect any aspect of the client at all. In general, out of courtesy to others you should strive to keep this ratio as high as possible, of course.
distributed copies
In some versions of the client, you will see the text “Connected to n seeds; also seeing n.nnn distributed copies.” A seed is a machine with the complete file. However, the swarm can collectively have a complete copy (or copies) of the file, and that is what this is telling you. Referring again to the “people at a table” analogy, consider the case where the book has 10 pages, and person A has pp.1-5 and B has pp.6-10. Collectively, A and B have a complete copy of the book, even though no one person has the whole thing. In other words, even if there are no seeds, as long as there is at least one distributed copy of the file everyone can eventually get a complete file. Meditate on this, the Zen of BitTorrent, grasshopper.
choked
This is a term used in the description of the BitTorrent protocol. It refers to the state of an uploader, i.e. the thread that sends data to another peer. When a connection is choked, it means that the transmitter doesn’t currently want to send anything on that link. A BT client signals that it’s choked to other clients for a number of reasons, but the most common is that by default a client will only maintain –max_uploads active simultaneous uploads, the rest will be marked choked. (The default value is 4 and this is the same setting that experimental client GUI lets you adjust.) A connection can also be choked for other reasons, for example a peer downloading from a seed will mark his connection as choked since the seed is not interested in receiving anything. Note that since each connection is bidirectional and symmetrical, there are two choked flags for each connection, one for each Tx endpoint.
interested
Another term used in the protocol specification. This is the corollary to the choked flag, in that interested refers to the state of a downloader with respect to a connection. A downloader is marked as interested if the other end of the link has any pieces that the client wants, otherwise the connection is marked as not interested.
snubbed
If the client has not received anything after a certain period (default: 60 seconds), it marks a connection as snubbed, in that the peer on the other end has chosen not to send in a while. See the definition of choked for reasons why an uploader might mark a connection as choked. The real function of keeping track of this variable is to improve download speeds. Occasionally the client will find itself in a state where even though it is connected to many peers, it is choked by all of them. The client uses the snubbed flag in an attempt to prevent this situation. It notes that a peer with whom it would like to trade pieces with has not sent anything in a while, and rather than leaving it up to the optimistic choking to eventuall select that peer, it instead reserves one of its upload slots for sending to that peer. (Reference)
optimistic unchoking
Periodically, the client shakes up the list of uploaders and tries sending on different connections that were previously choked, and choking the connections it was just using. You can observe this action every 10 or 20 seconds or so, by watching the “Advanced” panel of one of the experimental clients.
I just downloaded a file ending in .xyz, how do I open it?
Below is a list of common file types you will encounter with BitTorrent, and how to handle them.
| .R00, .R01, .Rnn | If you find a directory with a bunch of files ending in .Rnn, it’s a RAR archive split into multiple parts. This is commonly done for posting to Usenet newsgroups. Open the .RAR file and extract the contents with WinRAR (Windows) or UnRarX (OS X.) Either program should automatically see all the parts if they are in the same directory. |
| .CBR, .CBZ | These are comics in a compressed archive. For Windows, download the free program CDisplay. Or simply rename them (CBR to RAR, CBZ to ZIP) and open with your usual archive program, such as WinRAR or WinZIP. For OS X, try Book Image Viewer after extracting with unrar or unzip. |
| .PAR, .P01, .Pnn | These are parity files, used to reconstruct any missing parts of the archive. Ordinarily you will not have to do anything with them — they are extraneous unless a part is missing or bad, in which case the torrent’s creator should have fixed the archive before distributing the torrent. If WinRAR does give you a message about a missing or corrupt part, then get SmartPAR (Windows) and open the .PAR file. The program will then check all the files and recreate any missing or damaged parts. For OS X, UnRarX should also process the PAR file. |
| .NFO | Files that end in .NFO are plain text files that often contain very useful information about the files you have just downloaded. Always read the NFO file if you are having a problem! Unfortunately, the .NFO extention also has another meaning to Windows, so sometimes when you try to open these files you will get an error from MS System Information about a corrupt file. If this is the case you will also probably see the file listed with a type of “MSInfo File” or something similar. You should open the NFO file in Notepad, or any plain-text editor. More info here. |
| .SFV | Simple File Verification file – used to verify the integrity of a set of files, this is a text file containing file names and typically CRC32 checksums. For Windows, try a program such as QuickSFV or fsum to verify the integrity. Mac OS X users should try MacSFV. Normally these files should not be necessary with BitTorrent, since the BT protocol has its own error checking method (on top of TCP’s checksumming.) If you find some file that doesn’t match the checksum in its SFV file, blame the torrent’s creator, since he or she should have fixed it before creating and distributing the torrent. |
| .BIN, .CUE, .ISO | These are images of a CD. If the file is a movie, they are most likely VCDs or SVCDs. There are several ways to deal with these. For Windows:
For OS X:
|
Source: Dessent.