Eliminating Intermediate Data Structures via Inductive Synthesis


SuFu is a program optimizer specialized in reducing the time cost of operating intermediate data structures. It is effective enough to offer complex and algorithm-level optimizations, such as generating structural recursions and synthesizing divide-and-conquer algorithms.

The input of SuFu is a reference program with annotations that explicitly specify those intermediate data structures to be optimized and all code fragments operating these data structures. To generate a more efficient program, SuFu will search to replace annotated code fragments with constant-time expressions and, if necessary, will properly change the representations for annotated data structures.

SuFu has been evaluated on a dataset of 290 tasks, which covers a wide range of paradigms for program optimization, including fusing recursive functions, synthesizing structural recursions, and applying algorithmic paradigms. SuFu can solve 264 out of 290 tasks with an average time cost of 24.4 seconds, significantly outperforming previous approaches.

Online Demo

An online demo of SuFu is available here.

Example: Maximum Suffix Sum

The current version of SuFu requires the reference program to be written in a customized language built on the simply typed lambda calculus. Besides, SuFu also requires the reference program to come with some annotations that explicitly specify those intermediate data structures to be optimized.

To see the usage of SuFu, let us consider a function mts that returns the maximum suffix sum of a (possibly negative) integer list. For example, mts [1, -2, 3, -1, 2] is equal to 4, corresponding to the sum of suffix [3, -1, 2]. This function can be implemented by composing basic list-operating functions, as shown on the right.

This program, though clear, is inefficient due to the large intermediate data structure generated by tails, i.e., the list of suffixes. The operation maximum (map sum .) applied to this list takes quadratic time, which forms the bottleneck of the whole program. To improve the efficiency of this program, we need to speed up this operation on the bottleneck, which also requires optimizing the intermediate data structure returned by tails.

SuFu can automate the above optimization if in the reference program, the intermediate data structure to be optimized and the related operations are explictly annotated, as shown on the right.

After receiving this annotated program, SuFu applies techniques of program synthesis to rewrite all those annotated types and code fragments and then generates a more efficient program for mts, as shown on the right. In this program, instead of generating the whole list of all suffixes, the optimized version of tails returns only two integers - the maximum suffix sum and the sum of all elements, respectively. Consequently, the time cost of operating the result of tails is reduced to a constant and the time complexity of the whole program is reduced to linear.

SuFu considers both the correctness and the efficiency when rewriting annotated types and code fragments.

Besides, to reduce the user burden of providing annotations, SuFu includes a heuristic approach to generate annotations (option "autolabel" in the online demo). Once this approach is enabled, the user needs only to annotate the types of data structures to be optimized. Then SuFu will automatically supply those code-level annotations (label, unlabel, and rewrite) according to a heuristic rule that the size enclosed by rewrite should be as small as possible. The right side is a program with only type annotations, and SuFu can automatically convert it to the fully annotated program above.

Surface Language

The surface language of SuFu is basically the simply typed lambda calculus augmented with inductive data structures and annotations for optimization. Its syntax is shown below.

The type system and the semantics of this language are almost standard. The following supplies some details related to the annotations.

Program Space

Following the SyGuS framework, SuFu synthesizes programs from pre-defined program spaces. As a result, it will fail when the program spaces are not expressive enough to implement necessary programs. SuFu uses two program spaces during the optimization, as shown below.

SuFu provides some interfaces for customizing these program spaces. The following are some examples.

Example 1: Rewrite using programs with non-scalar outputs.

SuFu provides an option to consider programs with nonscalar outputs in the rewrite (option "nonscalar" in the online demo). When this option is enabled, the rewrite space will include programs returning inductive data structures and thus will be more expressive.

The following is an example where programs with non-scalar outputs are necessary for the rewrite.

Note that there are two drawbacks to enabling this option. First, AutoLifter will become less effective due to the increased difficulty of program synthesis. Second, SuFu can no longer automatically supply code-level annotations because of some limitations on the generation approach.

Example 2: Rewrite using programs with recursions.

To ensure the efficiency of the resulting program, SuFu uses only recursion-free functions to construct the rewrite space. However, sometimes recursions are necessary to accomplish the rewrite. For these cases, the user can use a decoration @Combine@Extract to force SuFu to include recursive functions into the rewrite space.

The following is an example where recursive programs are necessary for the rewrite.

Example 3: Use global inputs in the representation function.

When optimizing the reference program, SuFu will first synthesize a representation function to help the rewrite of annotated terms. By default, the representation function takes a value of the annotated data structure (e.g., a concrete list of suffixes) as the input and returns the corresponding value after the optimization (e.g., the pair of the maximum suffix sum and the sum of elements). However, sometimes the representation function may also depend on some global inputs. For these cases, the user can use a decoration @Input to declare global inputs accessible by the representation function.

The following is an example where global inputs are necessary to define the representation function.

To let SuFu perform this optimization, a natural idea is to annotate the output type of tails and also to include w in the input of tails to make it accessible during the rewrite (here we assume that the "autolabel" option is enabled). However, SuFu cannot perform this optimization because it cannot find a valid representation function. Although a program mapping a list of suffixes (denoted as ts) to the result after the optimization can be implemented as (exists w (map sum ts), sum (head ts)), this program requires an extra input w and thus is unavailable in the representation space.

There are two ways to solve this problem, as shown below.


SuFu also provides some interfaces for configuring its synthesis procedure, as listed in the table below. To change the default configurations, the user needs to put the corresponding command (demonstrated in the "Use Case" column) at the top of the reference program.

Name Description Default Use Case
SampleIntMin The minimum integer considered when generating random inputs. -5
SampleIntMax The maximum integer considered when generating random inputs. 5
SampleSize The maximum size of inductive data structures (defined as the number of used constructors) considered when generating random inputs. 10
VerifyBase The base number of random inputs used to test the correctness of a candidate result. The number of tests is directly proportional to this value. 1000
NonLinear Whether to consider the product operator of integers in the rewrite space and the representation space. false
ExtraGrammar Whether to load a pre-defined library into the rewrite space and the representation space. The current version of SuFu supports only the DeepCoder's library, which is comprised of 17 common list-operating functions. None
ComposeNum The maximum number of scalar values in the context accessed by each program used in the rewrite. 3
TermNum The maximum number of if-branches in each program used in the rewrite. 4
ClauseNum The maximum number of DNF clauses in each if-condition used in the rewrite. 4