# Leetcode Repeated-DNA-Sequences 2022-09-06 19:58 > ##### Data structures: > > #DS #hash_table #string > > ##### Difficulty: > > #coding_problem #difficulty_medium > > ##### Additional tags: > > #leetcode > > ##### Revisions: > > N/A ##### Links: - [Link to problem](https://leetcode.com/problems/repeated-dna-sequences/) --- ### Problem The **DNA sequence** is composed of a series of nucleotides abbreviated as `'A'`, `'C'`, `'G'`, and `'T'`. - For example, `"ACGAATTCCG"` is a **DNA sequence**. When studying **DNA**, it is useful to identify repeated sequences within the DNA. Given a string `s` that represents a **DNA sequence**, return all the **`10`-letter-long** sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in **any order**. #### Examples **Example 1:** **Input:** s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" **Output:** ["AAAAACCCCC","CCCCCAAAAA"] **Example 2:** **Input:** s = "AAAAAAAAAAAAA" **Output:** ["AAAAAAAAAA"] #### Constraints ### Thoughts > [!summary] > This is a #hash_table problem. The question ask for an answer, and the substrings can overlap. So, using a map is prefered(Why?) Two reasons: - Easy way to know if a array is a duplicate (set, map can suffice.) - Keep information on how many duplicates found, so we only append it to the answer the first time we meet it. One trip-over hole: in the for loop, upper bound should be: ```cpp for (int i = 0, top = s.size() - 9; i < top; i++) ^^^ ``` Minus 9, because 9 is the extended length for an subarray starting with i. ``` 1234567890 ^ ^ |--------| i i+9 i + 9 - i + 1 = 10. ``` With these edge-cases taken care of, we can proceed to the solution: ### Solution ```cpp class Solution { public: vector findRepeatedDnaSequences(string s) { unordered_map used; vector ans = {}; for (int i = 0, size = s.size() - 9; i < size; i++) { string tmp = s.substr(i, 10); if ((used[tmp]++) == 1) { ans.push_back(tmp); } } return ans; } }; ```