Break a Sentence into Words (Word Break Algorithms) – DFS, BFS and DP


Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words. For example, given s = “helloworld”, dict = [“world”, “hello”]. Return true because “helloworld” can be segmented as “hello world”.

Word Break by Breadth First Search Algorithm

If the problem only allows to break the words into two, it would be much easier with just O(n) loop. However, it says it can be segmented into one or more.

Breadth First Search (BFS) constructs the search tree by trying different possibilities from the current start point, jump as far as possible as long as the next segment is in the dictionary. If it reaches the end, then we have found a route.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        queue<int> q;
        q.push(0);
        while (!q.empty()) {
            int x = q.front();
            q.pop();
            for (int i = s.length() - 1; i > x; --i) {
                // get the word between x and i
                string word = s.substr(x, i - x + 1); 
                if (wordDict.count(word)) {
                    if (i == s.length() - 1) {
                        return true; // reach the end
                    }
                    q.push(i);
                }
            }
        }
        return false; 
    }
};
class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        queue<int> q;
        q.push(0);
        while (!q.empty()) {
            int x = q.front();
            q.pop();
            for (int i = s.length() - 1; i > x; --i) {
                // get the word between x and i
                string word = s.substr(x, i - x + 1); 
                if (wordDict.count(word)) {
                    if (i == s.length() - 1) {
                        return true; // reach the end
                    }
                    q.push(i);
                }
            }
        }
        return false; 
    }
};

It will be a lot slower if you try from the shortest match. BFS does not memorize the intermediate solutions and therefore, both BFS will give a time out exceed limit on the following input:

1
2
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
["a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"]
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
["a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"]

Word Break by Depth First Search Algorithm

Depth First Search (DFS), unlike BFS, it will terminate once found a route, therefore the best case would be very quick, abandoning the remaining search tree.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Solution {
public:
    bool wordBreak(string s, int x, unordered_set<string>& wordDict) {
        for (int i = s.length() - 1; i >= x; --i) {
            if (wordDict.count(s.substr(x, i - x + 1))) {
                if (i == s.length() - 1) {
                    return true;  // reach the end of word. 
                }
                if (wordBreak(s, i + 1, wordDict)) { // jump to next match
                    return true;
                }
            }
        }
        return false;
    }
 
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        return wordBreak(s, 0, wordDict);
    }
};
class Solution {
public:
    bool wordBreak(string s, int x, unordered_set<string>& wordDict) {
        for (int i = s.length() - 1; i >= x; --i) {
            if (wordDict.count(s.substr(x, i - x + 1))) {
                if (i == s.length() - 1) {
                    return true;  // reach the end of word. 
                }
                if (wordBreak(s, i + 1, wordDict)) { // jump to next match
                    return true;
                }
            }
        }
        return false;
    }

    bool wordBreak(string s, unordered_set<string>& wordDict) {
        return wordBreak(s, 0, wordDict);
    }
};

However, the same problem remains, many many intermediate results are computed over and over again.

Word Break by Recursive Depth First Search Algorithm with Memoization

The above two Bruteforce algorithms (DFS or BFS) are slow because the intermediate results are calculated over and over again. There are N+1 split points for a string size of N, so in the worst cases, the time complexity is O(2^N).

If we memorize the intermediate results (based on above solution) and this will give a Accepted result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Solution {
public:
    unordered_map<int, bool> cache;
 
    bool wordBreak(string s, int x, unordered_set<string>& wordDict) {
        if (cache.count(x)) { // has been calculated before
            return cache[x];
        }
        for (int i = s.length() - 1; i >= x; --i) {
            if (wordDict.count(s.substr(x, i - x + 1))) {
                if (i == s.length() - 1) {
                    cache[x] = true;  // store the intermediate result
                    return true;  // reach the end of word. 
                }
                if (wordBreak(s, i + 1, wordDict)) {
                    cache[x] = true; // store the intermediate result
                    return true;
                }
            }
        }
        cache[x] = false; // store the intermediate result
        return false;
    }
 
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        return wordBreak(s, 0, wordDict);
    }
};
class Solution {
public:
    unordered_map<int, bool> cache;

    bool wordBreak(string s, int x, unordered_set<string>& wordDict) {
        if (cache.count(x)) { // has been calculated before
            return cache[x];
        }
        for (int i = s.length() - 1; i >= x; --i) {
            if (wordDict.count(s.substr(x, i - x + 1))) {
                if (i == s.length() - 1) {
                    cache[x] = true;  // store the intermediate result
                    return true;  // reach the end of word. 
                }
                if (wordBreak(s, i + 1, wordDict)) {
                    cache[x] = true; // store the intermediate result
                    return true;
                }
            }
        }
        cache[x] = false; // store the intermediate result
        return false;
    }

    bool wordBreak(string s, unordered_set<string>& wordDict) {
        return wordBreak(s, 0, wordDict);
    }
};

The time complexity is O(N^3) and the space complexity is O(N) as the recursion depth can go up to N.

This is sometimes knowns as Top Down Dynamic Programming Algorithm.

Word Break Algorithm by Dynamic Programming (Bottom Up)

From DFS + Memorization, we notice that many many intermediate results can be cached for later retrieval, so this comes the idea of DP. The recurrence formula of DP is:

1
2
f[i] =  f[i] || (f[j] && wordDict.count(s.substr(j, i - j))); // for 0 =< j < i
f[0] = true;
f[i] =  f[i] || (f[j] && wordDict.count(s.substr(j, i - j))); // for 0 =< j < i
f[0] = true;

If f[i] denotes that the result for the first i-th letter (as a word), so if it is connected, we can try the remaining substring from index i+1 to j to see if it is in dictionary. If yes, then f[j] is true. The final result will be in f[n]. We need a O(N^2) loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        int n = s.length();
        vector<bool> f(n + 1, false);
        f[0] = true;
        for (int i = 1; i <= n; i ++) { // break word at i-th letter
            for (int j = 0; j < i; j ++) {
                f[i] =  f[j] && wordDict.count(s.substr(j, i - j));
                if (f[i]) break;
            }
        }
        return f[n];
    }
};
class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        int n = s.length();
        vector<bool> f(n + 1, false);
        f[0] = true;
        for (int i = 1; i <= n; i ++) { // break word at i-th letter
            for (int j = 0; j < i; j ++) {
                f[i] =  f[j] && wordDict.count(s.substr(j, i - j));
                if (f[i]) break;
            }
        }
        return f[n];
    }
};

The DP is the most efficient method as it solves the problem directly to the point. You don’t need to break the loop, but you should use the following instead:

1
f[i] = f[i] && f[j] && wordDict.count(s.substr(j, i - j));
f[i] = f[i] && f[j] && wordDict.count(s.substr(j, i - j));

However, exiting current loop will be faster because once found, we can set intermediate result f[i] to true.

Word Break Algorithms

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
a WordPress rating system
1138 words
Last Post: How to Reverse Vowels of a String in C/C++?
Next Post: How to Remove Nth Node From End of List in C/C++?

The Permanent URL is: Break a Sentence into Words (Word Break Algorithms) – DFS, BFS and DP

Leave a Reply