Leetcode Repeated-DNA-Sequences

2022-09-06 19:58

Data structures:

#DS #hash_table #string

Difficulty:

#coding_problems #difficulty_medium

Additional tags:

#leetcode

Revisions:

N/A

Problem

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.

For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Examples

Example 1:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

Input: s = "AAAAAAAAAAAAA" Output: ["AAAAAAAAAA"]

Constraints

Thoughts

[!summary] This is a #hash_table problem.

The question ask for an answer, and the substrings can overlap. So, using a map is prefered(Why?)

Two reasons:

Easy way to know if a array is a duplicate (set, map can suffice.)
Keep information on how many duplicates found, so we only append it to the answer the first time we meet it.

One trip-over hole: in the for loop, upper bound should be:
```
for (int i = 0, top = s.size() - 9; i < top; i++)
                              ^^^
```
Minus 9, because 9 is the extended length for an subarray starting with i.
```
1234567890
^        ^
|--------|
i       i+9

i + 9 - i + 1 = 10.
```
With these edge-cases taken care of, we can proceed to the solution:

Solution

class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
  unordered_map<string, int> used;
  vector<string> ans = {};
  for (int i = 0, size = s.size() - 9; i < size; i++) {
    string tmp = s.substr(i, 10);

    if ((used[tmp]++) == 1) {
      ans.push_back(tmp);
    }
  }

  return ans;
}
};

2.1 KiB Raw Blame History

Leetcode Repeated-DNA-Sequences

Data structures:

Difficulty:

Additional tags:

Revisions:

Links:

Problem

Examples

Constraints

Thoughts

Solution

2.1 KiB

Raw Blame History