103 lines
2.1 KiB
Markdown
103 lines
2.1 KiB
Markdown
|
# Leetcode Repeated-DNA-Sequences
|
||
|
|
||
|
2022-09-06 19:58
|
||
|
|
||
|
> ##### Data structures:
|
||
|
>
|
||
|
> #DS #hash_table #string
|
||
|
>
|
||
|
> ##### Difficulty:
|
||
|
>
|
||
|
> #coding_problem #difficulty-medium
|
||
|
>
|
||
|
> ##### Additional tags:
|
||
|
>
|
||
|
> #leetcode
|
||
|
>
|
||
|
> ##### Revisions:
|
||
|
>
|
||
|
> N/A
|
||
|
|
||
|
##### Links:
|
||
|
|
||
|
- [Link to problem](https://leetcode.com/problems/repeated-dna-sequences/)
|
||
|
|
||
|
---
|
||
|
|
||
|
### Problem
|
||
|
|
||
|
The **DNA sequence** is composed of a series of nucleotides abbreviated as `'A'`, `'C'`, `'G'`, and `'T'`.
|
||
|
|
||
|
- For example, `"ACGAATTCCG"` is a **DNA sequence**.
|
||
|
|
||
|
When studying **DNA**, it is useful to identify repeated sequences within the DNA.
|
||
|
|
||
|
Given a string `s` that represents a **DNA sequence**, return all the **`10`-letter-long** sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in **any order**.
|
||
|
|
||
|
#### Examples
|
||
|
|
||
|
**Example 1:**
|
||
|
|
||
|
**Input:** s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
|
||
|
**Output:** ["AAAAACCCCC","CCCCCAAAAA"]
|
||
|
|
||
|
**Example 2:**
|
||
|
|
||
|
**Input:** s = "AAAAAAAAAAAAA"
|
||
|
**Output:** ["AAAAAAAAAA"]
|
||
|
|
||
|
#### Constraints
|
||
|
|
||
|
### Thoughts
|
||
|
|
||
|
> [!summary]
|
||
|
> This is a #hash_table problem.
|
||
|
|
||
|
The question ask for an answer, and the substrings can
|
||
|
overlap. So, using a map is prefered(Why?)
|
||
|
|
||
|
Two reasons:
|
||
|
- Easy way to know if a array is a duplicate (set, map can
|
||
|
suffice.)
|
||
|
- Keep information on how many duplicates found, so we only
|
||
|
append it to the answer the first time we meet it.
|
||
|
|
||
|
One trip-over hole: in the for loop, upper bound should be:
|
||
|
```cpp
|
||
|
for (int i = 0, top = s.size() - 9; i < top; i++)
|
||
|
^^^
|
||
|
```
|
||
|
|
||
|
Minus 9, because 9 is the extended length for an subarray starting with i.
|
||
|
```
|
||
|
1234567890
|
||
|
^ ^
|
||
|
|--------|
|
||
|
i i+9
|
||
|
|
||
|
i + 9 - i + 1 = 10.
|
||
|
```
|
||
|
|
||
|
With these edge-cases taken care of, we can proceed to the
|
||
|
solution:
|
||
|
|
||
|
### Solution
|
||
|
|
||
|
```cpp
|
||
|
class Solution {
|
||
|
public:
|
||
|
vector<string> findRepeatedDnaSequences(string s) {
|
||
|
unordered_map<string, int> used;
|
||
|
vector<string> ans = {};
|
||
|
for (int i = 0, size = s.size() - 9; i < size; i++) {
|
||
|
string tmp = s.substr(i, 10);
|
||
|
|
||
|
if ((used[tmp]++) == 1) {
|
||
|
ans.push_back(tmp);
|
||
|
}
|
||
|
}
|
||
|
|
||
|
return ans;
|
||
|
}
|
||
|
};
|
||
|
```
|