Pruning, Quantization and Huffman Coding to Compress Deep Neural Networks — 2
Project detail
• How to apply weight pruning to remove redundant weights from a large DNN so as to reduce its
memory consumption;
• How to apply quantization (weight sharing) to encode the weights of a large DNN with fewer bits
forfurthermemoryconsumptionreduction; and
• How to apply Huffman coding to optimize the storage of a large DNN model.