TCS / Studies / T-79.4201 / Home Assignments
Helsinki University of Technology, 
     Laboratory for Theoretical Computer Science

Assignment round two

Problem

A collection of DNA strings, each of length 20 are combined to form one global cycle which contains each string. A pair of strings can have overlap of 1 to 5 base-pairs. Your task is to find an overlapped concatenation that makes the length of the global cycle no longer than the given limit.

Assignment

Create a Java program that encodes the decision problem instance as a boolean constraint satisfaction problem which is then solved using the boolean circuits solver bczchaff taking as input boolean circuit satisfiability problems given in the described here.

Test your solution with two sets of instances (optimal length is in the file name): set 1 set 2. For each set, test how many can be solved in 10 minutes with bczchaff and what is the average and median time. Note: The instances of the second set had been initially assigned the wrong filenames, which means that the information of the minimal DNA cycle length was wrong. This problem should be solved now.

Input

The program will be run with command

java DNA length <graph>
where <graph> is the file name of a graph with each node representing a DNA string and length is the maximum allowed length of the cycle. An edge between two nodes means that the corresponding strings can be concatenated. Each edge has an integer attribute "overlap" that represents the amount of overlap the corresponding strings have. Name your class containing the main()-method DNA.java.

Output

Your program should output a boolean constraint satisfaction problem for the given problem instance (graph and length parameter) that can be used as input to bczchaff. In order to validate the result, your problem formulation should contain boolean variables ei, which set to true indicate that the ith edge in the graph input file is included in the global cycle and false otherwise.

Example

Consider the input file toyexample.g.

There are four global cycles: 1-2-3-4, 1-2-4-3, 1-3-2-4 and 1-3-4-2. The cycle with the largest overlap is the cycle 1-3-2-4 with 15 overlapping base-pairs. The length of the cycle is therefore 65. The program should be executed by

java DNA 65 toyexample.g | bczchaff | java sat.FromBC
and the truth assignment for the variables should be
e1 = 0;
e2 = 1;
e3 = 1;
e4 = 1;
e5 = 1;
e6 = 0;

Deliverable

Return the commented source code and a small report describing your approach and results to schumach(at)tcs.hut.fi.

Grading

Correct solution on first try with sufficient report earns 3 points, each failed attempt decreasing points by 1. Submitted solutions are compared with each other and a reference solution. Good solutions are granted 1-2 extra points.

Example Instances

Instance with minimum length of 160
Instance with minimum length of 194
Instance with minimum length of 175


[TCS main] [Current] [Contact Info] [Personnel] [Research] [Publications] [Software] [Studies] [Links]