Count the Repeated K-Length Substrings


Given a string s and an integer k, return the number of k-length substrings that occur more than once in s.

Constraints
n ≤ 100,000 where n is the length of s.
k ≤ 10
Example 1
Input
s = “abcdabc”
k = 3
Output
1
Explanation
“abc” occurs twice in the string

Example 2
Input
s = “aaabbb”
k = 2
Output
2
Explanation
Both “aa” and “bb” occurs twice.

Algorithm to Count K-repeated SubString

We can count the number of K-substring in a hash table. Then, we can iterate the hash map to count those who have occured more than once.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int countKRepeatedSubString(string s, int k) {
    unordered_map<string, int> data;
    string cur = "";
    for (int i = 0; i < s.size(); ++ i) {
        cur += s[i];
        if (cur.size() > k) {
            cur = cur.substr(1);
        }
        data[cur] ++;
    }
    int ans = 0;
    for (auto &[a, b]: data) {
        if (b > 1) {
            ans ++;
        }
    }
    return ans;
}
int countKRepeatedSubString(string s, int k) {
    unordered_map<string, int> data;
    string cur = "";
    for (int i = 0; i < s.size(); ++ i) {
        cur += s[i];
        if (cur.size() > k) {
            cur = cur.substr(1);
        }
        data[cur] ++;
    }
    int ans = 0;
    for (auto &[a, b]: data) {
        if (b > 1) {
            ans ++;
        }
    }
    return ans;
}

Using the Standard C++ count_if we can avoid a for-loop with if statement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int countKRepeatedSubString(string s, int k) {
    unordered_map<string, int> data;
    string cur = "";
    for (int i = 0; i < s.size(); ++ i) {
        cur += s[i];
        if (cur.size() > k) {
            cur = cur.substr(1);
        }
        data[cur] ++;
    }
    return count_if(begin(data), end(data), [](auto &x) { 
        return x.second > 1;
    });
}
int countKRepeatedSubString(string s, int k) {
    unordered_map<string, int> data;
    string cur = "";
    for (int i = 0; i < s.size(); ++ i) {
        cur += s[i];
        if (cur.size() > k) {
            cur = cur.substr(1);
        }
        data[cur] ++;
    }
    return count_if(begin(data), end(data), [](auto &x) { 
        return x.second > 1;
    });
}

Time complexity is O(N) – assuming the C++ substr is O(1) constant. And the space complexity is O(N) as we are using a hash map as additional space.

See also: Teaching Kids Programming – Repeated K-Length Substrings (Sliding Window)

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
374 words
Last Post: Teaching Kids Programming - Compute the Number of Set Bits in an Integer
Next Post: Teaching Kids Programming - Python Function to Check If Valid IPv4 Address

The Permanent URL is: Count the Repeated K-Length Substrings

Leave a Reply