THE PROGRAM

Various versions of the same program, iso, were used at various stages in our calculations.  The program runs in three modes: plan, calculate, and finalize.  The planning mode is optional, but its purpose is to estimate how long it will take to run all the calculations to determine the nontrivial isospectrality classes for binary sequences of a given length L.  This is so that if the overall job takes a long time, then it can be cut into smaller parts.  The calculation mode runs those smaller parts (or everything in one part, if that is manageable).  The finalization mode assembles the data from the various calculation runs into one final report that has all the information for the sequences of given length L.  The way in which a big job is cut into smaller parts is related to how we encode sequences as numbers within the program.  All sequences of length L have a unique code that is an unsigned integer from 0 to -1+2^L.  To get the code of a sequence (s_0,...,s_{L-1}) in +1/-1 form, change all +1's to 0's and -1's to 1s to get a sequence (t_0,...,t_{L-1}) of 0's and 1's, and then the code for that sequence is t_0+t_1*2+t_2*2^2+...+t_{L-1}*2^{L-1}.  To break a big run into many smaller jobs (to be done in the calculation phase), we pick some modulus M and then run M jobs in the calculation mode where each job only considers sequences whose codes have a particular remainder modulo M (and those remainders range from 0,1,...,M-1).  The finalizer then merges the M reports from the calculation phase into one final report.  We now describe more specifically what happens in each of the three types (plan, calculate, finalize) of runs.

Planning runs:

When one runs the program in planning mode, one gets four output files, one with a .txt extension, one with a .dat extension, one with a .job extension, and one with a .sum extension.  The .txt file is an overall account of what we were able to compute during the planning run and an estimate of the expected total duration for the calculation phase for a given sequence length, and a recommendation of how many jobs (M) the calculation phase should be cut up into.  The .dat file is the binary data produced during the planning run (similar to what would be produced in a calculation run---see below).  This file exists because writing output is part of the job that the planning run is trying to simulate.  But these .dat files are not used for any actual calculation.  The .job file is a batch file that one can run to do all the M runs in the calculation phase, with T runs at a time, where T is the number of threads that the user has specified when running the planning phase.  The .sum file is a batch file with a single command in it that will then perform the finalization run to merge the results of the calculation runs done with the .job file.

Calculation runs:

When one runs the program in calculation mode, one gets three output files, one with a .txt extension, one with a .dat extension, one with a .tim extension.  The .txt file is an account of how long the run took and how many candidate sequences were found.  A candidate sequence is a binary sequence that is nontrivially isospectral to another sequence with integer (but perhaps not binary) terms.  The .dat file stores the codes of these candidates (written as 64-bit words that are unsigned integers in little endian order).  The .tim file contains a single 64-bit word that is an unsigned integer that is duration (in microseconds) of the calculation run.  (This is used by the finalizer to determine the sum total of time spent by all the calculation runs.)

Finalization runs:

When one runs the program in finalization mode, one must have all the the .dat and all the .tim files produced by the calculation runs present in the directory where the program is invoked.  The finalization run produces two output files, one with a .txt extension and one with a .dat extension.  These are the data files that we present here, and they are described in detail below under "THE DATA".  Essentially, the .txt file contains a full, humanly-readable account of all the findings, and the .dat file is binary data containing the codes of sequences that are representatives of trivial equicorrelationality classes that make up each nontrivial equicorrelationality class for the binary sequences of the specified length of the run.  But see "THE DATA" below for the exact format.

The specific usage for iso is

./iso [p/c/f] length_of_sequences {planning_time: for p mode only OR modulus_for_codes: for c,f modes only} {target_time: for p mode only OR remainder_for_codes: for c mode only} {speed: for p mode only} {number_of_threads: for p mode only}

The first argument is a 'p' (for planning mode), 'c' (for computing mode), or 'f' (for finalizing mode).

The second argument is the length of the sequences (a positive integer).

The remaining arguments vary depending on the mode.

For planning mode:

The third argument is the number of hours the program is to be run to estimate how much time it will take to do the entire analysis for the given sequence length.

The fourth argument is the target duration (in hours) for each calculation run.  The planner will plan to cut the work up into enough runs so that each run takes at most this duration (according to its estimates).

The fifth argument is speed, which is the user's guess as to the ratio of the speed of the computers on which the calculation phase will be performed over the speed of the computer on which this planning run is being performed.

The sixth argument is the number T of threads on the machine where the calculation phase will be performed.  This is to enable the planning run to make a convenient batch file for job submission that submits T jobs at a time.

For calculation and finalization mode:

The second argument is the number M of jobs that make up the calculation phase.  Basically, this means that jobs in the calculation phase only consider codes of sequences that are in a certain residue class modulo M.

For calculation mode only:

The third argument R is the remainder for the particular calculation run being performed.  When the second argument is M, then this means that this calculation run only considers sequences whose codes reduce to R modulo M.

============================================================

THE VERSIONS OF THE PROGRAM

Four versions were used to produce the data that is presented here.

Version A was used for the calculation and finalization phases that produced the data sets for sequences of lengths 1-34.

Version B was used for the calculation phases for lengths 35-40.  These were run on the Open Science Grid using the container that is also downloadable here.

Version C was used for the calculation phases for lengths 41-44.  These were run on the Open Science Grid using the container that is also downloadable here.

Version D was used for the finalization phases for lengths 35-44.

These versions differ in minor details related to the platform that they were run on, but do not differ in usage.

============================================================

THE DATA FILES

Before we describe the contents of the data files, we note that some of the terminology in these data files uses words slightly different from those in the paper.  "Isospectral" in the file is the same as "equicorrelational" in the paper.  The "size" of a nontrivial isospectrality class is not its cardinality, but its "volume" as defined in the paper.  An "amphidromic" sequence is what the paper calls a "generalized palindrome", so a "nonamphidromic" sequence is one that is not a generalized palindrome.

We present the finalized data for all binary sequences of lengths 1 to 44.  Each finalized run produces two files: iso_f_L_M.txt and iso_f_L_M.dat, where L is the length of the sequences for that file and M is the number of different calculation runs that were merged to make that finalized data.  The iso_f_L_M.txt file is humanly readable, and indicates on its third line exactly what was typed on the command line to invoke iso to do the finalization that produced iso_f_L_M.txt and iso_f_L_M.dat.  The start time, end time, and duration (listed in the penultimate line) pertain to the running of the finalization that produced these iso_f_L_M.txt and iso_f_L_M.dat.  The sum of the durations of all the calculation runs that went into this finalized data can be found under "total duration of calculation phase".  After this, there are three subsections of data: one for palindromic nontrivial isospectrality classes, one for antipalindromic nontrivial isospectrality classes, and one for nonamphidromic nontrivial isospectrality classes.  A palindromic nontrivial isospectrality class is a nontrivial equicorrelationality class that contains a sequence that is a palindrome (1-palindrome).  An antipalindromic nontrivial isospectrality class is a nontrivial equicorrelationality class that contains a sequence that is an antipalindrome ((-1)-palindrome).  A nonamphidromic nontrivial isospectrality class is a nontrivial equicorrelationality class that has no generalized palindrome in it.  Any nontrivial equicorrelationality class is one and only one of three types.  Within each of the three sections (palindromic, antipalindromic, nonamphidromic), there is a count of how many nontrivial isospectrality classes (i.e., nontrivial equicorrelationality classes) there are for that section, and a count of how many of each size (volume).  For each class, we then show the autocorrelation spectrum as a list [C(0), C(1),...,C(L_1)], where CL(s) is the autocorrelation at shift s.  After the autocorrelation spectrum, we have a list of codes that encode the sequences.  Each code encodes a single sequence that is a representative of a trivial equicorrelationality class, and so the nontrivial equicorrelationality class thus described by our list of codes is the union of the trivial equicorrelationality classes that contain the codes on our list.  To decode one of the codes, replace each hexadecimal digit by its binary equivalent (so 0 becomes 0000, 1 becomes 0001, ...., F becomes 1111), so that k hexadecimal digits become 4*k binary digits, and when the length of sequences being represented is not a multiple of 4, remove leading (leftmost) 0's until the binary string is of the correct length.  Then interpret binary 0's as +1s and binary 1's as -1's if you want the sequence in (+1/-1) form.  When choosing representatives of a trivial equicorrelationality class, we always choose the sequence with the smallest code, and when listing codes, we always sort the list in increasing order.  Within each of the three sections (palindromic, antipalindromic, nonamphidromic), the nontrivial equicorrelationality classes in order of increasing volume, and within the list of classes of a particular volume, we order them according to the smallest (first) code on their list of representatives (in increasing order of that smallest code).

There is also the binary data file iso_f_L_M.dat that corresponds to iso_f_L_M.txt.  All binary data is written in words that are 64-bit unsigned integers in little endian order.  The iso_f_L_M.dat file is the concatenation of three sections that correspond to the three sections (palindromic, antipalindromic, nonamphidromic) in the iso_f_L_M.txt file.  Each section in the iso_f_L_M.dat file begins with a word that has the total number A of nontrivial equicorrelationality classes for that section.  Following that word, there are then A subsections.  Each subsection corresponds to one nontrivial equicorrelationality class, and begins with a word that has the total number B of trivial equicorrelational classes that make up the nontrivial equicorrelationality class for that subsection.  This word is then followed by B words that are the codes of the sequences that are representatives of the B trivial equicorrelationality classes that make up the nontrivial equicorrelationality class for that subsection.