Top K Frequent Words

 

1. Clarify the problem: Before diving into solving the problem, let's clarify the requirements:

  • We are given a list of strings.
  • Our task is to find the k most frequent elements in the list.
  • If there are multiple elements with the same frequency, they should be ordered lexicographically.

2. Analyze the problem: To solve this problem, we can use a combination of hash maps and sorting.

  • We can use a hash map to count the frequency of each word in the list.
  • After counting the frequencies, we can sort the words based on their frequency and lexicographic order.
  • Finally, we can return the top k elements from the sorted list.

3. Design an algorithm: Here is the algorithm to solve the problem:

  1. Create a hash map counter to store the frequency of each word in the list.
  2. Iterate through the list and update the frequency in the counter hash map.
  3. Sort the unique words in lexicographic order.
    • If two words have the same frequency, sort them lexicographically.
    • We can use a custom sorting function that compares the frequency first and then the words lexicographically.
  4. Return the top k elements from the sorted list.

4. Explain your approach: The approach involves using a hash map to count the frequency of each word in the list. We then sort the unique words based on their frequency and lexicographic order. To achieve this, we use a custom sorting function that compares the frequency first and then the words lexicographically. Finally, we return the top k elements from the sorted list.

5. Write clean and readable code:

python
def topKFrequent(words, k): counter = {} for word in words: counter[word] = counter.get(word, 0) + 1 unique_words = sorted(counter.keys(), key=lambda x: (-counter[x], x)) return unique_words[:k]

6. Test your code: Let's test the code with different test cases:

  1. words = ["i", "love", "leetcode", "i", "love", "coding"], k = 2
    • Expected output: ["i", "love"]
    • Explanation: The word "i" appears twice, and the word "love" appears twice. Since "i" comes before "love" lexicographically, "i" is the first word in the output.
  2. words = ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
    • Expected output: ["the", "is", "sunny", "day"]
    • Explanation: The word "the" appears four times, "is" appears three times, "sunny" appears two times, and "day" appears once. Sorting these words lexicographically, we get the output as shown.
  3. words = ["aaa", "aa", "a"], k = 3
    • Expected output: ["a", "aa", "aaa"]
    • Explanation: All three words appear once, so we sort them lexicographically.

7. Optimize if necessary: The initial solution already follows an optimal approach using a hash map and sorting. There are no further optimizations required.

8. Handle error cases: The code assumes that the input words is a valid list of strings and k is a positive integer. If words is empty or k is zero, the code will still work correctly and return an empty list.

9. Discuss complexity analysis:

  • The time complexity of this solution is O(n log n), where n is the length of the input list. This is because we iterate through the list to count the frequency of each word and sort the unique words.
  • The space complexity is O(n), where n is the length of the input list. We store the frequency of each word in a hash map, which can take up to n elements. Additionally, the sorted list of unique words can also have up to n elements.
  • The solution has an optimal time and space complexity given the problem requirements.
Next Post Previous Post