logseq_notes/pages/OJ notes/pages/Leetcode Repeated-DNA-Sequences.md
2023-06-14 14:27:22 +08:00

107 lines
2.1 KiB
Markdown

# Leetcode Repeated-DNA-Sequences
2022-09-06 19:58
> ##### Data structures:
>
> #DS #hash_table #string
>
> ##### Difficulty:
>
> #coding_problems #difficulty_medium
>
> ##### Additional tags:
>
> #leetcode
>
> ##### Revisions:
>
> N/A
##### Links:
- [Link to problem](https://leetcode.com/problems/repeated-dna-sequences/)
***
### Problem
The **DNA sequence** is composed of a series of nucleotides abbreviated as `'A'`, `'C'`, `'G'`, and `'T'`.
- For example, `"ACGAATTCCG"` is a **DNA sequence**.
When studying **DNA**, it is useful to identify repeated sequences within the DNA.
Given a string `s` that represents a **DNA sequence**, return all the **`10`-letter-long** sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in **any order**.
#### Examples
**Example 1:**
**Input:** s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
**Output:** ["AAAAACCCCC","CCCCCAAAAA"]
**Example 2:**
**Input:** s = "AAAAAAAAAAAAA"
**Output:** ["AAAAAAAAAA"]
#### Constraints
### Thoughts
> [!summary]
> This is a #hash_table problem.
The question ask for an answer, and the substrings can
overlap. So, using a map is prefered(Why?)
Two reasons:
- Easy way to know if a array is a duplicate (set, map can
suffice.)
- Keep information on how many duplicates found, so we only
append it to the answer the first time we meet it.
One trip-over hole: in the for loop, upper bound should be:
```cpp
for (int i = 0, top = s.size() - 9; i < top; i++)
^^^
```
Minus 9, because 9 is the extended length for an subarray starting with i.
```
1234567890
^ ^
|--------|
i i+9
i + 9 - i + 1 = 10.
```
With these edge-cases taken care of, we can proceed to the
solution:
### Solution
```cpp
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<string, int> used;
vector<string> ans = {};
for (int i = 0, size = s.size() - 9; i < size; i++) {
string tmp = s.substr(i, 10);
if ((used[tmp]++) == 1) {
ans.push_back(tmp);
}
}
return ans;
}
};
```