All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].Solution 1: Use hash map, but store the Integer value instead of String. 2bit can express on DNA bit, 10 DNA bits need 20 binary bits.
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
char[] c=s.toCharArray();
List<String> res=new ArrayList<>();
int n=c.length, div=1<<20;
if (n<10) return res;
int num=0;
for (int i=0; i<10; i++) num=num*4+value(c[i]);
Map<Integer,Integer> map=new HashMap<>();
map.put(num,1);
for (int i=10; i<n; i++) {
num=(num*4+value(c[i]))%div;
if (!map.containsKey(num)) map.put(num,1);
else if (map.get(num)==1) {
map.put(num,2);
res.add(new String(c,i-10+1,10));
}
}
return res;
}
private int value(char c) {
if (c=='A') return 0;
if (c=='C') return 1;
if (c=='G') return 2;
return 3;
}
}
没有评论:
发表评论