This may be used as an interview question for IT-based jobs. The compression algorithms aim to compress the given text/binary data so that they can benefit from the internet transfer. The one simplest compression algorithm is called Run-Length. It is just counting the number of identical sequential characters and note down each time as a pair of appearance and its counter. For example, if a given text is aaabbccc, it can be shorten to a3b2c3. This is great if you have texts such as aaaaaaaaaa … (one more million times) . However, on the contrast, if a very simple text is like this abcde, the compression algorithm will not do any good. Rather it wastes another byte to store the number one. This is why best compression software such as WinRAR implements several combination of compression algorithms, not just a single one.
The idea of the Runlength compression algorithm can be illustrated by the following Python code, simple and elegant. The complexity of the algorithm is efficient with simple pass .
#!/usr/bin/env python # https://helloacm.com def runlen(s): r = "" l = len(s) if l == 0: return "" if l == 1: return s + "1" last = s[0] cnt = 1 i = 1 while i < l: if s[i] == s[i - 1]: # check it is the same letter cnt += 1 else: r = r + s[i - 1] + str(cnt) # if not, store the previous data cnt = 1 i += 1 r = r + s[i - 1] + str(cnt) return r if __name__ == "__main__": print runlen("aaabbccccddddd") print runlen("a") print runlen("") print runlen("abcdefg") print runlen("eeeeeaaaff")
The output of the above code is
a3b2c4d5 a1 a1b1c1d1e1f1g1 e5a3f2
–EOF (The Ultimate Computing & Technology Blog) —
loading...
Last Post: Interview Question: How to Check Integer is the Power of Two?
Next Post: One Interesting Linux Command (Steam Locomotive in BASH)
Superb!!