Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words. It is guaranteed there is at least one word that isn’t banned, and that the answer is unique.
Words in the list of banned words are given in lowercase, and free of punctuation. Words in the paragraph are not case sensitive. The answer is in lowercase.
Example:
Input:
paragraph = “Bob hit a ball, the hit BALL flew far after it was hit.”
banned = [“hit”]
Output: “ball”Explanation:
“hit” occurs 3 times, but it is a banned word.
“ball” occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph.
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as “ball,”),
and that “hit” isn’t the answer even though it occurs more because it is banned.Note:
1 <= paragraph.length <= 1000.
0 <= banned.length <= 100.
1 <= banned[i].length <= 10.
The answer is unique, and written in lowercase (even if its occurrences in paragraph may have uppercase symbols, and even if it is a proper noun.)
paragraph only consists of letters, spaces, or the punctuation symbols !?’,;.
There are no hyphens or hyphenated words.
Words only consist of letters, never apostrophes or other punctuation symbols.
We may use a Hash map which is unordered_map in C++ to store the frequencies for the valid words. This is O(N) space complexity where N is the number of words in the paragraph.
We may actively construct the current word when the current letter falls into a valid word-character i.e. upper or lower case. If it is a word boundary, we have the current word, and need to check if appears in the banned list, we can do list using std::find, but this will take O(N) to complete.
After the O(N^2), we need to go through the frequency hash table to find the word that appears the most-frequently.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | class Solution { public: string mostCommonWord(string paragraph, vector<string>& banned) { unordered_map<string, int> freq; for (int i = 0; i < paragraph.length(); ++ i) { string word = ""; while (i < paragraph.length() && ((paragraph[i] >= 'A' && paragraph[i] <= 'Z') || (paragraph[i] >= 'a' && paragraph[i] >= 'z'))) { word.push_back(std::tolower(paragraph[i])); i ++; } if (word.length() > 0) { if (std::find(banned.begin(), banned.end(), word) == banned.end()) { if (freq.find(word) != freq.end()) { // we don't need this actually freq[word] += 1; / } else { freq[word] = 1; } } } } string target; int cnt = 0; for (const auto &n: freq) { // iterate the items in the hash map to find the maximum value if (n.second > cnt) { cnt = n.second; target = n.first; } } return target; } }; |
class Solution { public: string mostCommonWord(string paragraph, vector<string>& banned) { unordered_map<string, int> freq; for (int i = 0; i < paragraph.length(); ++ i) { string word = ""; while (i < paragraph.length() && ((paragraph[i] >= 'A' && paragraph[i] <= 'Z') || (paragraph[i] >= 'a' && paragraph[i] >= 'z'))) { word.push_back(std::tolower(paragraph[i])); i ++; } if (word.length() > 0) { if (std::find(banned.begin(), banned.end(), word) == banned.end()) { if (freq.find(word) != freq.end()) { // we don't need this actually freq[word] += 1; / } else { freq[word] = 1; } } } } string target; int cnt = 0; for (const auto &n: freq) { // iterate the items in the hash map to find the maximum value if (n.second > cnt) { cnt = n.second; target = n.first; } } return target; } };
We can use modern C++ features to make the above algorithm concise. First, we can convert the vector to the set, which helps us to check if a word is banned using O(1).
1 | unordered_set<string> words(begin(banned), end(banned)); |
unordered_set<string> words(begin(banned), end(banned));
Then we can use isalpha function to check if a character is either uppercase or lowercase. Then we can use std::transform to convert the word into lowercase using the following:
1 2 3 | std::transform(begin(cur), end(cur), begin(cur), [](auto c) { return std::tolower(c); }); |
std::transform(begin(cur), end(cur), begin(cur), [](auto c) { return std::tolower(c); });
Then, we can use max_element to find the most-frequently-occurred item in an unordered_map
1 2 3 | auto ele = max_element(begin(count), end(count), [](auto &a, auto &b) { return a.second < b.second; }); |
auto ele = max_element(begin(count), end(count), [](auto &a, auto &b) { return a.second < b.second; });
The following C++ code does look shorter and better!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | class Solution { public: string mostCommonWord(string paragraph, vector<string>& banned) { int i = 0; int len = paragraph.size(); unordered_set<string> words(begin(banned), end(banned)); unordered_map<string, int> count; while (i < len) { // skip non-word characters while ((i < len) && !isalpha(paragraph[i])) i ++; int j = i; // find next word boundary while ((j < len) && isalpha(paragraph[j])) j ++; string cur = (paragraph.substr(i, j - i)); // convert the word to lowercase std::transform(begin(cur), end(cur), begin(cur), [](auto c) { return std::tolower(c); }); i = j + 1; // appears in the banned list? if (words.count(cur)) continue; count[cur] ++; } // find the max value in the hash table auto ele = max_element(begin(count), end(count), [](auto &a, auto &b) { return a.second < b.second; }); return ele->first; } }; |
class Solution { public: string mostCommonWord(string paragraph, vector<string>& banned) { int i = 0; int len = paragraph.size(); unordered_set<string> words(begin(banned), end(banned)); unordered_map<string, int> count; while (i < len) { // skip non-word characters while ((i < len) && !isalpha(paragraph[i])) i ++; int j = i; // find next word boundary while ((j < len) && isalpha(paragraph[j])) j ++; string cur = (paragraph.substr(i, j - i)); // convert the word to lowercase std::transform(begin(cur), end(cur), begin(cur), [](auto c) { return std::tolower(c); }); i = j + 1; // appears in the banned list? if (words.count(cur)) continue; count[cur] ++; } // find the max value in the hash table auto ele = max_element(begin(count), end(count), [](auto &a, auto &b) { return a.second < b.second; }); return ele->first; } };
–EOF (The Ultimate Computing & Technology Blog) —
loading...
Last Post: How to Construct Minimum Spanning Tree using Kruskal or Breadth First Search Algorithm?
Next Post: The Git Pre-Commit Hook to Avoid Pushing Only Unit Tests In NodeJs
On lines 16 and 25, can someone explain what the this syntax means:
Line 16: [](auto c)
Line 25: [](auto &a, auto &b)
Thanks.
This is the lambda syntax in C++. auto is the keyword for auto type deduction