2015年10月24日星期六

Leetcode 187 Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Solution 1: Use hash map, but store the Integer value instead of String. 2bit can express on DNA bit, 10 DNA bits need 20 binary bits.
 public class Solution {  
   public List<String> findRepeatedDnaSequences(String s) {  
     char[] c=s.toCharArray();  
     List<String> res=new ArrayList<>();  
     int n=c.length, div=1<<20;  
     if (n<10) return res;  
     int num=0;  
     for (int i=0; i<10; i++) num=num*4+value(c[i]);  
     Map<Integer,Integer> map=new HashMap<>();  
     map.put(num,1);  
     for (int i=10; i<n; i++) {  
       num=(num*4+value(c[i]))%div;  
       if (!map.containsKey(num)) map.put(num,1);  
       else if (map.get(num)==1) {  
         map.put(num,2);  
         res.add(new String(c,i-10+1,10));  
       }  
     }  
     return res;  
   }  
   private int value(char c) {  
     if (c=='A') return 0;  
     if (c=='C') return 1;  
     if (c=='G') return 2;  
     return 3;  
   }  
 }  

没有评论:

发表评论