Teaching Kids Programming – Repeated K-Length Substrings (Sliding Window)


Teaching Kids Programming: Videos on Data Structures and Algorithms

Given a string s and an integer k, return the number of k-length substrings that occur more than once in s.

Constraints
n ≤ 100,000 where n is the length of s.
k ≤ 10
Example 1
Input
s = “abcdabc”
k = 3
Output
1
Explanation
“abc” occurs twice in the string

Example 2
Input
s = “aaabbb”
k = 2
Output
2
Explanation
Both “aa” and “bb” occurs twice.

Hints:
Use Hash Table to keep track of k-length substrings
s=”aaaa” and k = 3 expects 1.
Sliding window and frequency?

Repeated K-Length Substrings (Sliding Window Algorithm)

Let’s build the substring with sliding window – if it is more than K characters, we just pop out the leftmost one. And we keep the frequency in a hash table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Solution:
    def repeatedKLengthSubString(self, s, k):
        data = defaultdict(int)
        cur = deque()
        for i in s:
            cur.append(i)
            if len(cur) > k:
                cur.popleft()
            if len(cur) == k:
                data["".join(cur)] += 1
        ans = 0
        for i in data:
            if data[i] > 1:
                ans += 1
        return ans
class Solution:
    def repeatedKLengthSubString(self, s, k):
        data = defaultdict(int)
        cur = deque()
        for i in s:
            cur.append(i)
            if len(cur) > k:
                cur.popleft()
            if len(cur) == k:
                data["".join(cur)] += 1
        ans = 0
        for i in data:
            if data[i] > 1:
                ans += 1
        return ans

We use a double-ended queue aka deque as a list so that when we pop the leftmost element, the complexity is O(1) constant. We can use list’s pop(0) which is linear. When the current sublist is equal to K then we update the frqeuency in the hash table.

Time complexity is O(N), and the space complexity O(N-K+K) hence O(N) i.e. hash table stores at most N-K sublists, O(K) the cur list stores at most K characters.

We can use the following List Comprehension to return the count of the values that is at least one:

1
return len([i for i, j in data.items() if j > 1])
return len([i for i, j in data.items() if j > 1])

We can also filter the values of the hash table:

1
return len(list(filter(lambda x: x > 1, data.values())))
return len(list(filter(lambda x: x > 1, data.values())))

A second implementation is to iterate over the index and build the K-length substring directly:

1
2
3
4
5
6
class Solution:
    def repeatedKLengthSubString(self, s, k):
        data = defaultdict(int)        
        for i in range(len(s) - k + 1):
            data[s[i:i+k]] += 1
        return len(list(filter(lambda x: x > 1, data.values())))
class Solution:
    def repeatedKLengthSubString(self, s, k):
        data = defaultdict(int)        
        for i in range(len(s) - k + 1):
            data[s[i:i+k]] += 1
        return len(list(filter(lambda x: x > 1, data.values())))

Time complexity is O(N-K), the space complexity is O(N-K).

See also: Count the Repeated K Length SubStrings

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
556 words
Last Post: Teaching Kids Programming - Python Function to Find the Mode in an Array (Most Frequent Number)
Next Post: Teaching Kids Programming - Number of Sublists with Max in Interval

The Permanent URL is: Teaching Kids Programming – Repeated K-Length Substrings (Sliding Window)

Leave a Reply