Well, a full non merged set, is a format that NOBODY should be using... thus the hugely bloated estimate the other poster gave. It means all clone zip files contain both the clone and parent roms and there are a ton of duplicate files.
You need to be using split/merged sets in which clones and parents each get their own zip file, but files they both share aren't duplicated. This is how mame roms are distributed thoughout the net. This is why a parent rom might be 20 megs but the clone is only 2... mame gets the other 18 megs of data by reading the parent zip.
In other words, don't worry about it as people who distribute roms keep them in the proper format for you.