General Description of the rs Program:

This is a program designed for computing the asymptotic autocorrelation demerit factors (ADFs), crosscorrelation demerit factors (CDFs), and Pursley-Sarwate criteria (PSCs) for families of Rudin-Shapiro-like Littlewood polynomials.  Each such family originates from a seed polynomial to which a recursive doubling construction is applied to produce larger and larger degree polynomials (the sequence of these is the family, also called the stem arising from the seed).  The user specifies the seed length n, and the program computes the asymptotic ADFs of all stems arising from all 2^n seeds of that length (the length of a Littlewood polynomial is the number of nonzero coefficients it has).  The program records the lowest value of asymptotic ADF that is obtained, and indicates how many seeds give rise to this lowest value.  Then it finds the asymptotic CDFs for pairs of stems among some subset (possibly the full set) of the stems specified by the user (each stem comes from a single seed, as noted before).   It records the lowest values of the asymptotic Pursley-Sarwate criterion obtained for the pairs of stems and indicates the pairs of seeds that give rise to stems with the lowest values.

The program is designed to distribute the work to collection of processors, so the computation is broken into various phases:

Phase 1. "Generation": Plans how to divide the work of calculating asymptotic demerit factors for all stems arising from the 2^n seeds of length n.  (In fact, only 2^(n-2) need to be checked due to symmetry.)  The program produces scripts that tell one how to run this same program in Phase 2, where the asymptotic ADFs are actually calculated, possibly in multiple jobs so that the work can be distributed.  In practical terms, we found that checking 2^34 seeds took a typical computer on the order of one hour.  This phase also supplies a script with the single command that will be used to run Phase 2, which combines all the reports from Phase 3.

Phase 2. "Autocorrelation": Determining the asymptotic ADFs of the stems from a certain collection of seeds.  The seeds are run through in a Gray code order, and one indicates how many seeds will be checked (this must be some number of the form 2^m) and then one gives a number k to indicate that one will start checking at the (k*2^m)th seed (in Gray code order).

3. "Collection": All the reports on asymptotic ADFs from the (possibly many) runs in Phase 2 are gathered together and compared.  The minimum asymptotic ADF among the stems from our seeds of length n is determined.  Then those seeds whose stems we want to crosscorrelate in Phase 4 are collected and their Fourier transforms are saved to speed the calculations in the next phase.  Depending on the mode in which the program is run, this could be all seeds ("Unrestricted"), only those seeds that produce stems with the lowest asymptotic ADF ("Minimizing"), or only those seeds that produce stems with asymptotic ADF less than or equal to a certain value ("Threshold").  This phase of the program produces scripts that divide the upcoming work of calculating asymptotic CDFs into multiple jobs, so the work can be distributed.  In practical terms, we found that checking the crosscorrelations of around 2^17 to 2^18 stems (based on 2^17 or 2^18 seeds) with another similar sized collection of stems was a reasonable amount of work for a typical computer to do in about an hour.  This phase also supplies a script with the single command that will be used to run Phase 5, which combines all the reports from Phase 4.

4. "Crosscorrelation": Computes the asymptotic CDFs and asymptotic Pursley-Sarwate criteria (PSCs) for stems arising from pairs of seeds collected in the previous phase.  We run this program with pairs of data files produced in the previous phase (each one recording subsets of the total collection of stems to be crosscorrelated against each other along with their Fourier transforms).

5. "Finalization": All the reports on the asymptotic CDFs from the (possibly many) runs in the previous phase are gathered together and compared.  The minimum asymptotic PSC among the stems from pairs of seeds retained in the collection phase is determined, and those seed pairs and their performance attributes are recorded.

The program symmetry groups to organize seeds into orbits; the nature of the symmetries is such that all seeds in the same orbit produce stems with the same asymptotic ADF.  Group theory is also used to organize seed pairs into orbits, and seed pairs in the same orbit produce stems with the same asymptotic PSC.

----------------------------------------

Usage Specifications for the rs Program:

rs seed_length (u/m/t) [auto_cutoff_numerator auto_cutoff_denominator {for option t}] (g/a/c/x/f) [log_ADF_frame_length log_CDF_frame_length {for option g}]/[log_ADF_frame_length frame_number {for option a}]/[log_ADF_frame_length log_CDF_frame_length {for option c}]/[log_CDF_frame_length first_frame_number second_frame_number {for option x}]/[log_CDF_frame_length number_of_CDF_frames {for option f}]

* seed_length is the length of the seeds from which the stems will be grown.  Each seed of the specified length is tested determine the asymptotic ADF of the stem it produces by the Rudin-Shapiro-like recursion.

* The (u/m/t) must be one of those three letters, and dictates which seed pairs will have the asymptotic CDFs of their stems computed, based on a criterion concerning the asymptotic ADF of said seeds:

  u indicates "unrestricted" (all stems arising from seeds of length seed_length will be tried)
  m indicates "minimum" (only stems achieving whatever is the minimum asymptotic demerit factor among stems from seeds of length seed_length will be tried)
  t indicates a "threshold" (only stems arising from seeds of length seed_length whose asymptotic demerit factor is less than or equal to auto_cutoff_numerator/auto_cutoff_denominator are retained for the crosscorrelation)

* auto_cutoff_numerator and auto_cutoff_denominator are used for option t described in the previous point

* (g/a/c/x/f) indicates what phase of the program is being run, corresponding respectively to the 5 phases listed above (Generation, Autocorrelation, Collection, Crosscorrelation, Finalization).

* The final two or three arguments indicate how work is being divided, but how they are interpreted depends on which phase we are in:

  Phases 1 (Generation) and 3 (Collection): there are two final arguments, a and b, indicating that we want to check 2^a seeds per run in the Phase 2 (Autocorrelation) and check about 2^b seeds against about 2^b seeds in Phase 4 (Crosscorrelation).

  Phase 2 (Autocorrelation): There are two final arguments, a and c, indicating that we want to check 2^a seeds starting with seed number c*2^a (in Gray code order).

  Phase 4 (Crosscorrelation): There are three final arguments, b, d, and e, indicating that we want to check about 2^b seeds against about 2^b seeds per run of this phase, and that we are checking the dth grouping of seeds against the eth grouping.

  Phase 5 (Finalization): There are two final arguments, b and f, indicating that we were checking about 2^b seeds against about 2^b seeds per run in the Phase 4 (Crosscorrelation), and that there were f groupings of about 2^b seeds being checked against each other.

----------------------------------------

Example of Usage:

Let us consider a tutorial use of the program: suppose that we want to find the minimum asymptotic ADF for Rudin-Shapiro-like Littlewood polynomial stems arising from seeds of length 10, and then determine the minimum asymptotic PSC among all pairs of stems.  So we are running in Unrestricted mode.

Phase 1 (Generation): We begin by executing

./rs 10 u g 6 5

indicating that we want to generate the scripts for studying stems of R-S-like Littlewood polynomials derived from seeds of length 10 in Unrestricted mode.  We plan to study these in batches of 2^6 seeds per run in Phase 2 (Autocorrelation), and then we plan the asymptotic CDFs and PSCs of stems arising from pairs of stems (with all pairs of stems considered since we are in Unrestricted mode) by comparing pairs of batches of about 2^5 stems each in Phase 4 (Crosscorrelation).  These runs are very computationally light, so one can get results quickly for the purpose of this tutorial.

The output of this run consists of three files:

rs-10-u-g-6-5.txt   documents what we did and what we plan to do: distribute the calculation of 2^(10-2) asymptotic ADFs into 4 frames of 64 calculations each.

rs-10-u-g-6-5.job   is a script, which tells us exactly what to run in the next (Autocorrelation) phase:

./rs 10 u a 6 0
./rs 10 u a 6 1
./rs 10 u a 6 2
./rs 10 u a 6 3

rs-10-u-g-6-5.sum   is a script, which tells us exactly what to run after that (for the Collection phase).

./rs 10 u c 6 5

Phase 2 (Autocorrelation): We run the script

./rs-10-u-g-6-5.job

produced in the previous phase (Phase 1: Generation), which is equivalent to running the following 4 jobs:

./rs 10 u a 6 0
./rs 10 u a 6 1
./rs 10 u a 6 2
./rs 10 u a 6 3

These are four different runs in Phase 2 (Autocorrelation), each calculating 2^6 asymptotic ADFs for the various seeds organized in Gray code order.  Note that we still have "10 u" since our seeds are of length 10 and our mode is Unrestricted.  Each such run produces two output files.  For example, the penultimate command produces output

./rs-10-u-a-6-2.txt

which tells us what we did: the autocorrelation mode computations for the 2^6 seeds that are in "frame 2"---that is, starting with the (2*2^6)th seed in Gray code order.  The other file produced by the same run

./rs-10-u-a-6-2

which is a hexadecimal dump of the results of the calculations.  Each seed receives a record of two consecutive words of raw output.  The first word is the numerator of the asymptotic ADF for the stem that comes from this seed  (all denominators here are understood to be three times the square of the seed length).  The second word is the seed sequence (recorded as an integer from 0 to 2^n, with each 0 representing a +1 and each 1 representing a -1 in the Littlewood polynomial).

The full collection of the hexadecimal output files needs to be present when we run the Phase 3 (Collection).

Phase 3 (Collection): We run the script

./rs-10-u-g-6-5.sum

produced in Phase 1 (Generation), which is equivalent to running the following job:

./rs 10 u c 6 5

This indicates Collection phase and tells us that we want to collect all the data (in frames of size 2^6) from the previous phase, and then repackage all the seeds of interest (which is all of them, since we are running in u=Unrestricted mode) into collections of size about 2^5.  Since the program uses group theory symmetries to lighten the work and orbit sizes vary, the actual number of seeds we deal with will usually not be an exact power of 2, so the program makes packages of about size 2^5.

Running this produces various output files:

rs-10-u-c-6-5.txt   documents how the sequence stems are organized into orbits and how the program intends to divide them into subsets of size about 2^5 (it turns out that it needs 5 sets, with each set of size 27 or 28).  The actual data for these 5 sets lies in the hexadecimal output files

rs-10-u-c-6-5-0
rs-10-u-c-6-5-1
rs-10-u-c-6-5-2
rs-10-u-c-6-5-3
rs-10-u-c-6-5-4

[NOTE: if we had been running in Minimizing mode, then we would have rs-10-m-c-6-5.txt with an "m" instead of a "u", and this report would also contain information about the lowest observed asymptotic ADF and which seeds produce stems that achieve it.]

We describe the data format for the hexadecimal output files when the seed length is n.  The size of the cyclic group over which discrete Fourier analysis is performed (which we call the Fourier length, denoted by m) is the least power of 2 that is equal to or greater than 2*n (except when n=1, when the Fourier length is set to be 4).  Each saved seed gets an entry that consists of 3*m/2+7 words of raw output.  The first word is the numerator of the asymptotic ADF for the stem that comes from this seed (all denominators here are understood to be three times the square of the seed length).  The second word is the seed sequence (recorded as an integer from 0 to 2^n, with each 0 representing a +1 and each 1 representing a -1 in the Littlewood polynomial).  The next m/2+1 words are double precision floating point values representing the squared magnitude of the first m/2+1 coefficients of the Fourier transform of the seed (the rest are omitted because they can be determined by symmetry).  The next m/2+2 words are m/4+1 pairs of double precision floating point values (each pair representing the real and imaginary part of a complex number) where the jth entry on this list (counting from j=0) is the product of the jth coefficient of the Fourier transform of the seed times the conjugate of the (m/2-j)th coefficient of the Fourier transform of the seed.  The next m/2+2 words are m/4+1 pairs of double precision floating point values (each pair representing the real and imaginary part of a complex number) where the jth entry on this list (counting from j=0) is the product of e^(2*pi*i*j/m) times the conjugate of the jth coefficient of the Fourier transform of the seed times the (m/2-j)th coefficient of the Fourier transform of the seed.

The full collection of the hexadecimal output files needs to be present when we run the Phase 3 (Collection).

Two scripts are also produced:

rs-10-u-c-6-5.job   which is what you need to run to do the next phase (Phase 4: Crosscorrelation), namely:

./rs 10 u x 5 0 0
./rs 10 u x 5 0 1
./rs 10 u x 5 0 2
./rs 10 u x 5 0 3
./rs 10 u x 5 0 4
./rs 10 u x 5 1 1
./rs 10 u x 5 1 2
./rs 10 u x 5 1 3
./rs 10 u x 5 1 4
./rs 10 u x 5 2 2
./rs 10 u x 5 2 3
./rs 10 u x 5 2 4
./rs 10 u x 5 3 3
./rs 10 u x 5 3 4
./rs 10 u x 5 4 4

and we also get as an output the script

rs-10-u-c-6-5.sum   which is what we need to run for the last phase (Phase 5: Finalization), namely:

./rs 10 u f 5 5


Phase 4 (Crosscorrelation): We run the script

./rs-10-u-c-6-5.job

produced in Phase 3 (Collection), which is equivalent to running

./rs 10 u x 5 0 0
./rs 10 u x 5 0 1
./rs 10 u x 5 0 2
./rs 10 u x 5 0 3
./rs 10 u x 5 0 4
./rs 10 u x 5 1 1
./rs 10 u x 5 1 2
./rs 10 u x 5 1 3
./rs 10 u x 5 1 4
./rs 10 u x 5 2 2
./rs 10 u x 5 2 3
./rs 10 u x 5 2 4
./rs 10 u x 5 3 3
./rs 10 u x 5 3 4
./rs 10 u x 5 4 4

which pairs off the collections of saved seeds in every possible way so as to crosscorrelate their stems.  Each run produces more data, for example running the penultimate command in the list above produces

rs-10-u-x-5-3-4.txt

which tells us what we did (crosscorrelated stems corresponding to seeds from frame 3 with stems for seed from frame 4), and it reports the lowest asymptotic PSC observed as well as rounding errors observed in Fourier analysis.  It also produces a hexadecimal data file

rs-10-u-x-5-3-4

This data file has an initial segment of three words, and then each seed pair that minimizes asymptotic PSC among the pairs checked in this run is given an entry that consists of five consecutive words of raw output.  The initial segment begins with two words representing integers a and b such that the asymptotic PSC for stems from seed pairs recorded in the file is (a+sqrt(b))/(3*n^2), where n is our seed length.  The initial segment concludes with one word representing a double precision floating point number that tells us the furthest distance from an integer that was observed for values that should have been integers but which were calculated using double precision floating point approximations in the Fourier analysis.

Now we describe the five words recorded for each seed pair that minimizes asymptotic PSC within this run.  The first word is the numerator of the asymptotic CDF (all denominators here are understood to be three times the square of the seed length).  The second word is the numerator of the asymptotic ADF for the first stem.  The third word is the seed sequence for the first stem (recorded as an integer from 0 to 2^n, with each 0 representing a +1 and each 1 representing a -1 in the Littlewood polynomial).  The fourth word is the numerator of the asymptotic ADF for the second stem.  The fifth word is the seed sequence for the second stem (recorded as an integer from 0 to 2^n, with each 0 representing a +1 and each 1 representing a -1 in the Littlewood polynomial).

The hexadecimal data file needs to be present when we run the next phase (Phase 5: Finalization).

Phase 5 (Finalization): We run the script

./rs-10-u-c-6-5.sum

produced in Phase 3 (Collection), which is the same as running

./rs 10 u f 5 5

which puts together all the data from the previous phase and gives us two files.  One is

rs-10-u-f-5-5.txt which gives us a report on the minimum PSC observed among all pairs.

The other file produced is

rs-10-u-f

which has the data on these minimizing seed pairs in hexadecimal form.  This data file has an initial segment of three words, and then each seed pair that minimizes asymptotic PSC among the pairs checked in this run is given an entry that consists of six consecutive words of raw output.  The initial segment begins with two words representing integers a and b such that the asymptotic PSC for stems from seed pairs recorded in the file is (a+sqrt(b))/(3*n^2), where n is our seed length.  The initial segment concludes with one word representing a double precision floating point number that tells us the furthest distance from an integer that was observed for values that should have been integers but which were calculated using double precision floating point approximations in the Fourier analysis.

Now we describe the six words recorded for each seed pair that minimizes asymptotic PSC within this run.  The first word is the numerator of the asymptotic CDF (all denominators here are understood to be three times the square of the seed length).  The second word is the numerator of the asymptotic ADF for the first stem.  The third word is the seed sequence for the first stem (recorded as an integer from 0 to 2^n, with each 0 representing a +1 and each 1 representing a -1 in the Littlewood polynomial).  The fourth word is the numerator of the asymptotic ADF for the second stem.  The fifth word is the seed sequence for the second stem (recorded as an integer from 0 to 2^n, with each 0 representing a +1 and each 1 representing a -1 in the Littlewood polynomial).  The sixth word is the number of seed pairs in the orbit of this pair under the action of the symmetry group.  We only report one seed pair per orbit.

----------------------------------------

Versions of the Program:

There are three versions of the program here: A, B, and C.

Version A was the initial version used for the calculations.

At some point it was discovered that when Version A reported the asymptotic demerit factors and PSCs for a pair of stems in Phases 4 (Crosscorrelation) and 5 (Finalization), the order in which the seeds were recorded and the order in which the asymptotic ADFs for the two stems arising from the two seeds were recorded did not always correspond, so one only knew the two values but not which corresponded to which seed.  When run in Minimizing mode, this made no difference (since after Phase 3, all seeds under consideration have the same asymptotic ADF).  But this did create a problem when run in Unrestricted mode, as the seed pairs minimizing asymptotic PSC could generate stems with unequal asymptotic ADFs.  Practically speaking, one could just compute the (usually very small number of) asymptotic ADFs directly from the optimal seed pairs with little computational effort.  But Version B of the program was made to report the asymptotic ADFs in an order corresponding to the order that the seeds were recorded.

At some point it was discovered that when Versions A and B were run in Minimizing mode, the part of the program that reported orbit representatives for seeds whose stems minimized asymptotic ADF did not do the reporting properly when the seed length n was precisely 1, because the group theory degenerates for n=1.  One can readily compute everything for n=1 without a program.  But Version C of the program was made that rectifies this problem (as well as the one described in the previous paragraph).

----------------------------------------

Data Presented Here:

We present data from runs in Minimizing mode for seed lengths 1 to 52 and in Unrestricted mode for seed lengths from 1 to 28.

For the Minimizing run with seed length 1, we present data using version C.

For the Minimizing runs for seed lengths 2 to 52, we present data obtained using version A.  The two issues mentioned in the section on versions cannot manifest when we are in Minimizing mode and the seed length is greater than 1.

For the Unrestricted runs for seed lengths 1 to 27, we present data obtained using version B.  The issue in Version B which Version C rectifies does not manifest in Unrestricted runs.

For the Unrestricted run for seed length 28, we present data obtained using version A.  The issue in Version A that Version B rectifies does not arise in this case, because it happens that pairs of seeds of length 28 whose stems have minimal asymptotic PSC are such that the asymptotic ADFs are the same for the two stems arising from the seeds.  And the issue in Version B which Version C rectifies does not manifest in Unrestricted runs.

For each run, we present record files from Phases 1 (Generation), 3 (Collection, with some files in abridged form as discussed below), and 5 (Finalization), whose filenames contain the letters "g," "c," and "f," respectively.  Every file begins with rs-N-M, where N is a number indicating the seed length and M is a letter (either "m" for Minimizing or "u" for Unrestricted) indicating the mode.  One can rerun these jobs oneself by following the same approach as in the Example tutorial above.  To start, open the rs-N-M-g-... file.  The third line records the command that was run for Phase 1.  Then Phase 1 produces scripts ending in ".job" and ".sum" for running Phase 2 and Phase 3, and then Phase 3 produces scripts ending in ".job" and ".sum" for running Phase 4 and Phase 5 (see Example tutorial above).

* The records for Phase 3 of the Minimizing runs contain the data in Table 1 of the paper (use the lines that begin with "ART" and remove that prefix to construct the lines of the table in the paper: order of data fields is as in the paper's table), but because the original files record all seeds of a given length that produce the minimum asymptotic ADF for that length, space constraints prevent us from presenting the full files here: if more than ten seeds are reported, we cut out all but the first five and last five and place the line [AND OTHERS...] in between to show where we removed lines.

* The records for Phase 5 of the Unrestricted runs contain the data in Table 2 of the paper (use the lines that begin with "PT" and remove that prefix to construct the lines of the table in the paper: order of data fields is as in the paper's table).

* The records for Phase 5 of the Minimizing runs contain the data presented in Table 3 of the paper  (use the lines that begin with "PT" and remove that prefix to construct the lines of the table in the paper: order of data fields is as in the paper's table).