Showing posts with label zsync. Show all posts
Showing posts with label zsync. Show all posts

Monday, 15 November 2010

More zsync users

MirrorBrain is now offering .zsync files for its downloads . This is exactly the sort of model that I originally had in mind for zsync: file distribution services making .zsync files automatically to enable people to download new versions faster and to reduce their own bandwidth costs.

BAILII has come up with a more unusual use case; they are offering .zsync files for their (in some cases large) RSS feeds. That probably only becomes generally useful if RSS feed readers were to support it; but it would be handy at least for feed aggregators to reduce bandwidth use (and indeed if feed aggregators do use it then it could give a useful bandwidth saving there alone).

And OpenSUSE are using zsync in their new libzypp backend for package downloads. I need to find time to have a look at the code for that...

Monday, 20 September 2010

zsync 0.6.2 released

zsync 0.6.2 is now available from the download page.

This fixes a few bugs that have been spotted in the previous version:
  • fix for using zsync client on files >2GB on 32bit systems.
  • fix redirect handling.
  • improve some edge cases dealing with unusual seed data patterns.
  • optimise by stopping reading seed files if target file is complete.
  • fix infinite loop in zsyncmake when given a truncated (invalid) .gz
  • fix --disable-profile to configure.

Wednesday, 14 July 2010

EVE Online using zsync

I was pointed today at the EVE Online Dev Blog, where they are apparently using zsync as a backup system for repairing incomplete downloads or corrupted files. It sounds like that are not using zsync itself but have probably taken the idea and/or parts of the code and built it into their own repair tool.

It seems like an odd application to me. Incomplete downloads are a rather easy case and do not require the rsync algorithm (there are plenty of HTTP and of course FTP clients that will resume partial downloads). And corrupt files - well I'm surprised if having corrupted parts of files are so common that they have to optimise for it. But I don't know anything more about their problem than is in their blog post. zsync does have the advantage that it combines downloading with verification of what you have, and it does it with minimal work for the central servers (as I guess EVE have the common problem of a high ratio of users to servers).

Saturday, 3 April 2010

Ubuntu using zsync for distribution

Someone drew my attention to Ubuntu using zsync to distribute ISOs (thanks Jon). Cool — that's one of the use cases that I was testing with many years ago.

Sunday, 13 December 2009

zsync's custom zlib

There seems to be a lot of confusion about zsync's custom embedded version of zlib. I had not documented the exact reason for the patches very well; so I have now committed an explanation of the changes to my own repo. As I may not make another release for a little while yet, I am posting the explanation here also.

I have started a discussion with the zlib maintainer about what sort of API
changes could be made such that I could use the standard zlib, but so far
no-one other than me understands the requirements and I'm not actually bothered
about it (it's the Fedora people who are worked up about it). So, absent anyone
else stepping in to do the work, it may take me a while to get to it.

Local changes to zlib used by zsync

There are two different modes of operation that zsync supports that
these patches are designed to support:

Changes to the deflate code: Compressing a file in a way that is optimised for
zsync's block-based rsync algorithm ‒ starting a new zlib block for each 1024
byte (for example) block in the source file. cf
http://zsync.moria.org.uk/paper/ch03s04.html . This is used by makegz.c in the
zsync source.

Changes to the inflate code: Working with files compressed with the standard
gzip(1). To enable people to get started with zsync, I want it to work with
existing compressed content. To achieve optimal results with standard gzip
files, I made zsync capable of starting decompression in the middle of a block.
In these cases it has to download the block header, then skips forward to the
part of the block that gives it the data that it wants. cf
http://zsync.moria.org.uk/paper/ch03s02.html

Contrary to some internet discussion, the changes are not related to rsync's
changes nor to rsync compatibility (zsync isn't compatible with rsync -
whatever that would mean ‒ nor do these changes relate in any way to the
rsyncable gzip patch).

Changes to the deflate code


Essentially, I hijacked Z_PARTIAL_FLUSH to mean something new ‒ I want to start
a new zlib block, but unlike Z_PARTIAL_FLUSH I don't need to emit a whole byte
between blocks, so I took that out. Correctly this ought to be implemented by
adding a Z_NEWBLOCKONLY_FLUSH or something like that instead of repurposing an
existing state.
(If this were the only issue preventing the use of a standard zlib, distros
could change it to use Z_PARTIAL_FLUSH with only a slight loss of compression
efficiency.)

Changes to the inflate code

zsync uses the rsync algorithm to construct a desired file from an
(e.g.) older local version of the file and then downloading any
new/needed blocks from a server; the aim being to minimise the amount of
data downloaded to construct the target file. It supports downloading
those blocks from a gzipped version of the file on the server. If I want
e.g. bytes 4096-8192 of the file from inside the gzipped file, I could
download the whole zlib block (using a map of the compressed file that I
construct beforehand and is downloaded first) containing the range
4096-8192 (zsync 0.1.0 used this method); but it can do better (fewer
bytes downloaded) than that, by downloading just the block header and
then downloading the bytes within that compressed block that correspond
to bytes 4096-8192 of the contained data.

To do that, I need:
a) to be able to start inflating at the start of "any length/literal/eob code
in any dynamic or fixed block, or at any stored byte in a stored block.".
That is what additional function inflate_advance() and the export of
updatewindow() allow me to do.

b) make a map of the gzip file that lets me know what points I should start
downloading at in order to inflate particular byte ranges of the contained
content. To do this, I can decompress each byte range into a buffer of that
size and then quiz zlib for the position in the stream; but I need to know that
the position in the stream does correspond to the start of a code or the middle
of a stored block (not, e.g., that we have just read a backref and the backref
expands to span the boundary; in that case, I would need to know that position
where the backref started and the lib doesn't give me a way to find that out).

This is given by inflateSafePoint(), by the modification to cause the inflator
to return to the caller at each code in a dynamic block (the LENDO change), and
the implicit guarantee provided by using my own copy of the library that I know
how the library behaves around internal states and stream position (I need a
guarantee that the library won't read ahead more than it needs to, and I need
to access certain member variables directly to get the bit position in the
stream).

I also removed inflate_fast as I did not want to spend the time working out if
it was compatible with these changes.

Tuesday, 28 April 2009

zsync 0.6.1 released

zsync 0.6.1 is now available from the download page.
This fixes a few bugs that have been spotted in the previous version, plus a few minor feature changes:
  • recompression support for gzip files made with zlib:gzio.c or gzip -n
  • fix compilation on MacOS X
  • allow HTTP redirects on the target file; not sure whether this is a good idea
    or not...
  • fix unecessary transfer of whole file where file is smaller than the context
    size (1x or 2x blocksize)
  • use sequential_matches=1 when there is only one block; otherwise we're forced
    to transfer the whole file for files below 2kiB
  • fix librcksum handling of zsync streams with sequential_matches == 1; it was
    giving false negatives when applying the rsync algorithm, resulting in poor
    use of local source data when sequential_matches == 1 (which didn't actually
    occur in any recent version of zsync)

Saturday, 24 January 2009

zsync 0.6 released

zsync 0.6 is now available from the download page. This is mainly a maintenance release, fixing various minor bugs that people have noted over the 2 years
since the last release. I have also gone through and tidied up the source code
somewhat.

The only functional changes are:
  • zsync now preserves the mtime on downloaded files (this requires an extra
    field in the .zsync, but this format change is entirely compatible with old
    clients);
  • -q option replaces -s (but -s is retained temporarily as a synonym).
These make zsync align better with wget as a file download client.

The full changelog:
  • fix out-of-bounds memory access when processing last block of non-compressed
    download (patch from Timothy Lee). Also fix an error handling fault for the
    same.
  • fix "try a smaller blocksize" failures when zsyncmakeing for huge compressed
    files on 32bit systems
  • preserve mtime on downloaded files
  • fix potential crash when re/deallocating checksum hash in librcksum (patch
    from Timothy Lee)
  • explain status code errors better
  • better URL handling
  • add -q as a substitute for -s, as -q is more conventional (re wget). -q also
    suppresses the 'no relevant local data' warning now.
  • fix some warnings
  • code tidy-up and better commenting of what it is doing
  • tidy up autoconf use
Version 0.6 is available from the download page, as are all previous versions and the bzr repository.

Sunday, 6 August 2006

zsync 0.5

The main feature of this release is that large file support is now enabled on
systems (Linux/i386 in particular) where it needs to be explicitely selected.
As I do most of the development on Linux/x86_64 now, I was blissfully ignorant
of this problem until Robert Lemmen forwarded a complaint about it.


I have also fixed some compilation problems to MacOS X and Solaris. There is also a substitute for getaddrinfo provided for systems that need it, which someone emailed me about.

Finally, I have made the source code repository for zsync available online.
This, and the new release, are available from the download page.

Saturday, 8 July 2006

zsync 0.4.3

I have had this sitting around for a while, so it is about time that I released it. No big changes in this release; I have tidied up the program output, so the program is more silent with -s now. I have also added HTTP basic authentication support ‒ this makes zsync usable in places where you can't have everybody accessing your downloads. Get it from the download page.

Saturday, 24 June 2006

Succinct

I have not seen this before — some of the Ubuntu people are experimenting
with methods based off of rsync/zsync for doing package updates (link).

In other news, I have frozen a copy of the zsync technical paper, for reference purposes. It has been ages since I updated it anyway, so I am keeping a copy as-is before I update it with some of my current ideas. It is important that I get a comparison in the paper with what I am calling structured patching systems: like the new differential Package list updating that Debian have implemented.

Sunday, 9 October 2005

Tuesday, 12 July 2005

zsync 0.4.1

This is just a bugfix release. Someone noticed that zsync would be vulnerable
to CAN-2005-2096, due to it's use of zlib code. So the patch for that is now done.

This release also includes some HTTP protocol fixes, as there have been some complaints/observations on how zsync failed to correctly implement some details from the standard. I still have some of these to do, and the fixes in this release have not been heavily tested yet, so let me know if there are any problems.

Get it from the download page.

Sunday, 8 May 2005

zsync 0.4.0

Nothing much has changed, but I thought it was about time for another release. The only significant change in this release is some fixes to the progress bar display. As zsync is quite stable now, and bug reports have dried up for the moment, I am declaring it to be a beta instead of an alpha now; I expect the file format and command-line interface to be fairly stable from now on.

Sunday, 10 April 2005

zsync Progress

0.3.3 has made its way into Debian testing. FreeBSD still only has 0.2.2 in the ports, though. I see there are RPMs starting to appear, and it is included in Mandrake cooker.


I haven't touched it for two weeks now. No bug reports, which might be a good sign — or might not be. I think I make the next release a beta instead of an alpha, and just do any small fixes that I think of by then.

Saturday, 26 March 2005

zsync 0.3.3

Here is 0.3.3. The major new feature is the optimised gzip compressor, which I described previously. This, implemented by the new -z option, is the recommended way of distributing files that are not already compressed.

I have also added the -k option, for keeping the zsync file on your local computer. This is useful if you ujst want to run zsync from a cron job, and have it download only when there is something newer on the server. Also new is the -e option, which causes zsyncmake to abort it it would not give the client an exact copy of the target file; and -C to disable the recompression support (which restores the 0.3.1 and earlier behaviour — sorry about that unannounced usage change in the previous version.

Apart from that, there are some bug fixes, for crashes that could occur in unusual cases. As usual, you can get the latest version from the download page. I am increasinly happy with the way zsync is working, and assuming the latest changes all settle down well, I am thinking of declaring a beta (instead of alpha) quality release. So I am interested for feedback on how people are getting on with it.

Downloading source tarballs with zsync

Another interesting application for zsync: downloading updated source tarballs. For regular software updaters, this has to be a big win. I wrote a small wrapper script for the fetch operation in the FreeBSD ports system, and was able to cut the total download for updating XFCE by roughly a third.
% make FETCH_CMD="/home/cph/src/zsync-fetch -ARr"
===>  Vulnerability check disabled, database not found
===>  Found saved configuration for xfce4-wm-4.2.1
=> xfwm4-4.2.1.tar.gz doesn't seem to exist in /usr/ports/distfiles/xfce4.
=> Attempting to fetch from http://www.us.xfce.org/archive/xfce-4.2.1/src/.
Downloading xfwm4-4.2.1.tar.gz; old versions ./xfce4/xfwm4-4.0.6.tar.gz found.
#################### 100.0% 21.2 kBps DONE    
Read /usr/ports/distfiles/./xfce4/xfwm4-4.0.6.tar.gz. Target 27.4% complete.
downloading from http://www.us.xfce.org/archive/xfce-4.2.1/src/xfwm4-4.2.1.tar.g
z:
#################### 100.0% 47.3 kBps DONE    
verifying download...checksum matches OK
used 1681408 local, fetched 857709
===>  Extracting for xfce4-wm-4.2.1
=> Checksum OK for xfce4/xfwm4-4.2.1.tar.gz.
…
=> libxfcegui4-4.2.1.tar.gz doesn't seem to exist in /usr/ports/distfiles/xfce4.
=> Attempting to fetch from http://www.us.xfce.org/archive/xfce-4.2.1/src/.
Downloading libxfcegui4-4.2.1.tar.gz; old versions ./xfce4/libxfcegui4-4.0.6.tar
.gz found.
#################### 100.0% 15.2 kBps DONE
Read /usr/ports/distfiles/./xfce4/libxfcegui4-4.0.6.tar.gz. Target 33.8% complete. 
downloading from http://www.us.xfce.org/archive/xfce-4.2.1/src/libxfcegui4-4.2.1.tar.gz:
#################### 100.0% 48.8 kBps DONE
verifying download...checksum matches OK
used 1582080 local, fetched 548199
===>  Extracting for libxfce4gui-4.2.1
=> Checksum OK for xfce4/libxfcegui4-4.2.1.tar.gz.
…
In total, a 6.1MB download was reduced to 4.0MB. You can see the benefits of the recompression support added in 0.3.2 now — it reconstructs the original .gz file, allowing it to pass the checksum applied by the ports framework. This was on a fairly large update, from 4.0.x to 4.2.x, so a smaller update would probably see a larger benefit; this has to be a big application of zsync, I think.


For the moment I am running a CGI on moria.org.uk which will automatically generate .zsync files for FreeBSD package sources. And I am providing the zsync-fetch wrapper script which you see in use above. Feel free to try it out (note that you should upgrade to zsync-0.3.3 at least before using it, though, as 0.3.2 had some glitches in the recompression support) — it obviously only kicks in if you have an older version of a file to update from, and it only works for ports with gzip-compressed source tarballs (it just falls back to fetch for .tbz, .tar.bz, and files that moria.org.uk rejects or is unable to download). This is highly experimental, but I hope gives a good idea of the potential that zsync has.

Friday, 25 March 2005

Breakthrough!

Committed revision 399.
cph@athlon zsync/c% svn log -r399 makegz.c
-----------------------------------------------------------
r399 | cph | 2005-03-25 09:36:36 +0000 (Fri, 25 Mar 2005)

Built in gzip compressor which optimises for zsync.

I have been thinking about the problem of compression with zsync. Strangely, the code for looking inside gzipped files — written as a workaround, to help kickstart zsync when there is little rsync-able content already available — is currently the most efficient way of transferring most files. I was sure that there had to be something more efficient than compressing stuff with gzip --best and then doing elaborate hacks in zlib to enable us to decopresss mid-file.

I had been intending to look at Transfer-Encoding: gzip, using mod_deflate/mod_gzip, to see if this could get us compresion without the nasty hacks. But I was sceptical that this could take off, because it puts some load back on the server, and these modules are far from ubiquitous. What we want is the individual blocks to be stored compressed on the server, so it does not have to do any compression when clients connect.

Now we have it. I have imported the deflate code from zlib into zsync, and written a small gzip program (actually built into zsyncmake) which optimises the comressed file for zsync. It starts a new deflate block in the output at the start of every zsync block (so every 1024, 2048 or whatever bytes). Initial tests are very promising — on my main test case, of updating a 12MB Debian Packages file with 1 week of changes, the total transfer drops from 140KB to 107KB, taking us amazingly close to rsync -z's best result of 82KB (in fact, the difference between zsync and rsync now is almost precisely the size of the Z-Map — the map of the .gz file — at 23KB).


I have updated the technical paper with the theory and the results.

Thursday, 24 March 2005

tar pit

I've been working with some people on the use of zsync for Gentoo mentioned previously, and we came across this oddity of tar. We were trying to work out why a given directory tree took 160MB as a .tar, but only 88MB when stored in a different format. It turns out that tar uses a block size of 512 bytes, so
every file takes at least 512 bytes for the data, and 512 bytes for the metadata (filename etc). So half the tarball was empty space because... because the tar format was designed to go to 512-byte-block tapes. Given that most of tar's use is for distributing files online now, I don't want to know how much space is wasted just because the old format demanded it.


cpio seems to be more efficient - it doesn't use anything like as much padding.
Somehow I don't think I will boost zsync's popularity if I advise people to use cpio :-).
zip also seems to have very compact metadata.

Tuesday, 22 March 2005

Application to Gentoo emerge

Some discussion about possibly using zsync to speed up emerge downloads for Gentoo. The idea of building the data into an ISO so the download can be immediately mounted for use is certainly interesting. It's nice to see that zsync is holding up fairly well against rsync; but more importantly, being build on HTTP it has simpler requirements and can benefit from proxies, web caching etc.

zsync 0.3.2 is here

I have added some better progress indication to downloads now, which should
make it easier to see how fast zsync is going, how near it is to completion
etc. I have also fixed a SEGV that you could run into using locally downloaded
.zsync files. So I hope this version is a little more user-friendly.


The big feature in this version is recompression. zsync will now compress a
file after you download it if the original was compressed; and it does its best
to make the gzip file identical with the original (fixing the timestamp,
filenames etc in the gzip header). I think this opens up a whole new area of
applications, but I'll defer talking about that until I can put up some kind of
demonstration.