MultiHash builds MD5 hashes for anything from a single file all the way up to an entire hard drive. Every individual file will have an MD5 hash generated, and then an overall hash is produced which can be used to compare directory trees to assure the entire tree is identical without you having to verify that each individual file’s hash value matches. These hash values are useful for verifying downloads from sites which provide hashes for their downloads. Typically, you will find these for ISO files, Linux distributions, and many applications. Hashes are also useful for verifying software escrow deposits, since they often require multiple identical submissions to verify the data is not corrupt. MultiHash makes all of this easy, saving a log file of the hashes that are calculated, along with providing an overall hash value
With Multihash the user is given the freedom to compute hash values using combinations of algorithms. Multihash offers the same core features as similar tools while adding many features not currently found elsewhere:
- Find-mode: Instead of hashing the command-line arguments, treat them as lists of files to be hashed.
- Match-mode: Compare the hashes of two files, or recursively compare two directory structures.
- Uses the OpenSSL library to support fast and efficeint hash value generation, optimized for multiple platforms.
- Compute hash values from different algorithms in parallel for maximum efficeincy.
- Support for md5sum, md5deep, and OpenSSL output-modes for drop-in compatability.
For a full list of features, their uses, and examples please see the man page.
Multihash is able to offer all this while maintaining performance comparable to other tools. When using recursive mode Multihash is able to match performance with md5deep. Using the OpenSSL library means Multihash has to pay a higher process creation cost than tools with in-core hash functions. This causes benchmarks involving many processes (such as the -exec find(1) option) to suffer. However, when inputs are aggregated using the xargs(1) utility Multihash will perform as well as other tools. Multihash also offers the find-mode alternative to cust the cost of process creation down. In the future, once the rest of the code has had a chance to air out, I hope to move to some more time saving techniques.
Multihash is a protocol for differentiating outputs from various well-established hash functions, addressing size + encoding considerations. It is useful to write applications that future-proof their use of hashes, and allow multiple hash functions to coexist.
Multihash is particularly important in systems which depend on cryptographically secure hash functions. Attacks may break the cryptographic properties of secure hash functions. These cryptographic breaks are particularly painful in large tool ecosystems, where tools may have made assumptions about hash values, such as function and digest size. Upgrading becomes a nightmare, as all tools which make those assumptions would have to be upgraded to use the new hash function and new hash digest length. Tools may face serious interoperability problems or error-prone special casing.
How many programs out there assume a git hash is a sha1 hash?
How many scripts assume the hash value digest is exactly 160 bits?
How many tools will break when these values change?
How many programs will fail silently when these values change?
This is precisely where Multihash shines. It was designed for upgrading.
When using Multihash, a system warns the consumers of its hash values that these may have to be upgraded in case of a break. Even though the system may still only use a single hash function at a time, the use of multihash makes it clear to applications that hash values may use different hash functions or be longer in the future. Tooling, applications, and scripts can avoid making assumptions about the length, and read it from the multihash value instead. This way, the vast majority of tooling – which may not do any checking of hashes – would not have to be upgraded at all. This vastly simplifies the upgrade process, avoiding the waste of hundreds or thousands of software engineering hours, deep frustrations, and high blood pressure.
Generate hash values for files, directories, and entire drives. These hash values can be used to verify that two files or directory trees are identical, or to verify against published hash values when downloading. These hash values can be used to publish your own values so your customers can verify their downloads. Hash values can also be used to verify identical copies submitted for escrow or publication to CD. The user interface allows you to add files in a freeform manner, and it automatically begins processing the files. You can watch as it progresses from file to file. There is a progress bar for the entire file set, and for individual files. Two typical ways to use this software are to either compute a hash of a large archive file containing all of your individual files (like an installer or .iso image), or to calculate individual hash values for every file. To make it simpler to verify a large number of files, MultiHash can generate an overall hash for the file set, and save a data file showing the hash of the individual files.
- This release adds verification of image data downloads using the Glance “multihash” feature introduced in the OpenStack Rocky release. When the
os_hash_valueis populated on an image, the glanceclient will verify this value by computing the hexdigest of the downloaded data using the algorithm specified by the image’s
os_hash_algoproperty.Because the secure hash algorithm specified is determined by the cloud provider, it is possible that the
os_hash_algomay identify an algorithm not available in the version of the Python
hashliblibrary used by the client. In such a case the download will fail due to an unsupported hash type. In the event this occurs, a new option,
--allow-md5-fallback, is introduced to the
image-downloadcommand. When present, this option will allow the glanceclient to use the legacy MD5 checksum to verify the downloaded data if the secure hash algorithm specified by the
os_hash_algoimage property is not supported.
Note that the fallback is not used in the case where the algorithm is supported but the hexdigest of the downloaded data does not match the
os_hash_value. In that case the download fails regardless of whether the option is present or not.
Whether using the
--allow-md5-fallbackoption is a good idea depends upon the user’s expectations for the verification. MD5 is an insecure hashing algorithm, so if you are interested in making sure that the downloaded image data has not been replaced by a datastream carefully crafted to have the same MD5 checksum, then you should not use the fallback. If, however, you are using Glance in a trusted environment and your interest is simply to verify that no bits have flipped during the data transfer, the MD5 fallback is sufficient for that purpose. That being said, it is our recommendation that the multihash should be used whenever possible.
If you have a previous version (1.0, 1.1, or 1.2) you will get a dialog box informing you that you must uninstall the previous version. Do this, and then restart the installation.
Once the installation has completed, you will need to start BIDS, and add the new component into the Toolbox. This is done as follows:
1. Add a data flow task into a new Integration Services package
2. Open the data flow task
3. Display the Toolbox
4. Right click the Data Flow Transformations within the Toolbox
5. Select Choose Items
6. Switch to the SSIS Data Flow Items tab
7. Tick the check box next to Multiple Hash in the list
8. Ok your way out of the dialog boxes
The component generates a single output. This output will add new columns to your data flow that will be Binary data from the Hash functions.
The following Hash functions are supported:
* MD5 Size:16
* Ripe MD 160 Size:20
* SHA1 Size:20
* SHA256 Size:32
* SHA384 Size:48
* SHA512 Size:64
1. To use, drop the component on the design surface.
2. Connect it to a Data Flow Source
3. Edit the component
4. Select the Input Columns Tab (should already be active)
5. Tick the columns that will be used for generation of the hash’s. If planning more than one hash, then ensure that you select the columns for all hash’s to be generated.
6. If you will have a large number of output columns, and will be excuting on a multi core machine, then consider enabling the Multiple Threading
* None will not do Multiple Threading
* Auto will do some basic checking, before enabling multiple threading (number of CPU’s, and Number of Outputs)
* On will enable multiple threading (regardless of the number of CPU Core’s etc
7. Switch to the Output Columns Tab
8. In the Output Columns list, enter a new column name, and then select the Hash function
9. The list to the right should now be populated with the columns that you selected on the Input Columns Tab.
10. Tick the columns that you wish to use for this Hash.
11. Repeat 8 though 10 until finished.
12. Add a Data Flow Destination
13. Connect the Output from the component to the Data Flow Destination
14. Run your SSIS component…
The BCryptCreateMultiHash function creates a multi-hash state that allows for the parallel computation of multiple hash operations. This multi-hash state is used by the BCryptProcessMultiOperations function. The multi-hash state can be thought of as an array of hash objects, each of which is equivalent to one created by BCryptCreateHash.
Parallel computations can greatly increase overall throughput, at the expense of increased latency for individual computations.
Parallel hash computations are currently only implemented for SHA-256, SHA-384, and SHA-512. Other hash algorithms can be used with the parallel computation API but they run at the throughput of the sequential hash operations. The set of hash algorithms that can benefit from parallel computations might change in future updates.