Proposal for Yum updates via BitTorrent

From Software libre para los países en desarrollo

Jump to: navigation, search

We propose to extend the Yum package manager and Yum mirror servers with BitTorrent facilities, through a Yum plugin.

There are various applications already using BitTorrent technology, and we see no reason to leverage that same technology in Yum and other Linux package managers. After all, BitTorrent is already used to distribute CDs and DVDs of distributions. Why not updates?

You are encouraged to modify this page how you see fit, to enhance the proposal or plaster questions (I suggest you use the discussion page for the purpose).

Contents

Objectives of the proposal

If this proposal comes true, we will achieve the following objectives:

  • faster downloads for end-users
  • built-in data integrity assurances
  • a considerable decrease in bandwidth consumption of mirrors
  • possibly, the future elimination of mirror sites, drastically reducing management overhead for Linux distributors
  • increased legitimacy of BitTorrent (and all of its political consequences) as an Internet protocol

This proposal should be implementable via a Yum plugin. I chose Yum to explain the concept because the idea is sensible to implement using Python (broad availability of deployed Python VMs, Yum uses it, BitTorrent libraries exist for the purpose, etcetera).

Yum repository layout

Yum repos are simple. A directory in a mirror contains a bunch of RPMs, usually in a directory:

 http://someserver/mirrordir/RPMS/*
 http://someserver/mirrordir/repodata/

repodata/ contains the repository metadata files built by createrepo. RPMS/ contains the RPMs for the distribution/mirror.

How Yum goes about doing its job

Yum:

  1. first contacts the server which carry Yum repository (we call it mirror), and downloads the contents of repodata/,
  2. uses repodata/ data to perform dependency resolution, then
  3. uses the manifest in repodata/ to download the appropriate RPMs.

After doing the download job, Yum preserves lots of state, including a cache of the metadata and a cache of the downloaded packages.

Torrent extension

We propose that the mirror manager:

  • create a torrent that refers to the contents of the Yum mirror directory in the mirror, then
  • run a BitTorrent server to serve the very same files in that directory (not needed with HTTP seeds)
  • run an open tracker (conceivably the tracker(s) could be run by a consortium, but we envision either mirror managers to start the practice, or maybe use trackerless torrents), then

Every time the Yum metadata is updated, a new .torrent file is generated and the old one is replaced.

We can theoretically deposit the .torrent file in either:

  • the repodata/ directory, using a proposed mirror.torrent file name, or
  • above the mirror directory, with the same name of the mirror directory, but with an extension .torrent

How we can extend Yum to do the downloading job

In this case, a hypothetical extended Yum would:

  1. look for the .torrent file in the mirror
  2. download the .torrent file if the .torrent file is newer or modified (ETag?)
  3. contact the tracker to get peer information
  4. use BitTorrent to download repodata/ contents (from this point on, the Yum instance would serve data to other Yum instances)
  5. use BitTorrent to download the selected packages (not all of them, but only the ones the Yum instance needs)
  6. fork() to continue serving the torrent in the background until a fair ratio has been served to the network, or a reasonable time window has been completed

Security and resources

The BitTorrent servent would need to run in a reduced-privilege mode. Dropping privileges by changing effective user IDs would be a good strategy.

The last action item wouldn't have an impact on disk space on the machine operating the background BitTorrent servent, because of the cache. The only negative consequence is that upload speed might be hampered (we can control this with judicious upload speed settings) but that's just a consequence of the tit-for-tat strategy of BitTorrent.

Questions

Wouldn't rogue people corrupt the torrent?

People can poison the torrent. BitTorrent however, has integrated data integrity facilities that, in practice, help:

  1. detect peers who poison the torrent
  2. shut them off the torrent, marking those peers as despised

Additional assurances:

  1. RPM packages are usually distributed signed. That is almost impossible to tamper with.
  2. The repodata/ directory also provides a checksum.

Would this be mandatory for mirror servers?

No. The mere presence of the .torrent would activate the BitTorrent strategy, but if the .torrent file isn't present, then Yum would proceed as usual.

What about my upload bandwidth?

The point of this idea is to make BitTorrent seeding optional. You should be able to disable this with a configuration statement, and distributors will be free to disable it by default. You are not helping your fellow friend, however, if you choose to do so, and distributors would need to pay much more for upload bandwidth if they decide to ship it off.

But distros ship with firewalls on

They can:

  • continue to ship with their firewalls on, and the distros will continue having high bandwidth bills,
  • open ports for Yum BitTorrent servents
  • find any other strategy
Personal tools