Thursday, 10 May 2012

Uncompressing Pkzip files with Zlib and Minizip

I've had compressed file, which I needed to decompress inside c++ program. This file was zipped inside Java program with and "Not a big deal" - that is what I was thinking while adding zlib to c++ project. Then I tried zlib's function uncomress(), and all I got was this lousy t-shirt with Z_DATA_ERROR. I evolved further and gave a chance to inflate() function, but to no avail. Came across this stackoverflow post and gave inflate() a second chance, this time with inflateInit2() call. But nothing worked.

Soon I looked inside my zipped file and saw that first two bytes were 0x504B. They correspond to "PK" chars. That made me thinking that this is PKZIP file format. PKZIP stands for single zipped file with possibility to include multiple (or just one in my case) inner files. This makes sense if we look in Java code and assume that there could be more than one ZipEntry.

     ZipOutputStream zs = new ZipOutputStream(new FileOutputStream(file));
     zs.putNextEntry(new ZipEntry("file1.txt"));

Could we use zlib to extract Pkzip files? Zlib's docs are beating around the bush, but soon I guessed that it doesn't understand the header of pkzip files in first place. There's wonderful additional library, build on top of zlib, called Minizip, and it knows how to deal with pkzip files. Guys from here suggest to use cleaned up version of Minizip by Sam Soff (and his friends) . For purpose of just unzipping you will need 4 files from there: unzip.c unzip.h ioapi.c ioapi.h. Of course, zlib needs to be included in your project as well.

The quick method to include those Minizip files in your cpp project would be to:

1. Rename unzip.c and ioapi.c to unzip.cpp and ioapi.cpp
2. Comment out void fill_fopen_filefunc (pzlib_filefunc_def){...} from ioapi.cpp
3. #define NOUNCRYPT on top of unzip.h
4. Use like this:

#include "unzip.h"

bool Unzip(const char* fileNameIn, const char* fileNameOut)
    unzFile hFile = unzOpen(fileNameIn);
    if (!hFile) return false;

    unz_global_info  globalInfo = {0};
    if (!unzGetGlobalInfo(hFile, &globalInfo )==UNZ_OK ) return false;
    if (unzGoToFirstFile(hFile) != UNZ_OK) return false;
    if (unzOpenCurrentFile(hFile) != UNZ_OK) return false;

    const int SizeBuffer = 32768;
    unsigned char* Buffer = new unsigned char[SizeBuffer];
    ::memset(Buffer, 0, SizeBuffer);

    int ReadSize, Totalsize = 0;
    while ((ReadSize = unzReadCurrentFile(hFile, Buffer, SizeBuffer)) > 0)
        Totalsize += ReadSize;
        //... Write to output file

    if (Buffer)
        delete [] Buffer;
        Buffer = NULL;

    return (Totalsize > 0);

This is for the case of one file, but it can easily be extended for pkzips with many files (use unzGoToNextFile() function).