SuperFusion

Eliminating Intermediate Data Structures via Inductive Synthesis

Overview

SuFu is a program optimizer specialized in reducing the time cost of operating intermediate data structures. It is effective enough to offer complex and algorithm-level optimizations, such as generating structural recursions and synthesizing divide-and-conquer algorithms.

The input of SuFu is a reference program with annotations that explicitly specify those intermediate data structures to be optimized and all code fragments operating these data structures. To generate a more efficient program, SuFu will search to replace annotated code fragments with constant-time expressions and, if necessary, will properly change the representations for annotated data structures.

SuFu has been evaluated on a dataset of 290 tasks, which covers a wide range of paradigms for program optimization, including fusing recursive functions, synthesizing structural recursions, and applying algorithmic paradigms. SuFu can solve 264 out of 290 tasks with an average time cost of 24.4 seconds, significantly outperforming previous approaches.

Online Demo

An online demo of SuFu is available here.

Example: Maximum Suffix Sum

The current version of SuFu requires the reference program to be written in a customized language built on the simply typed lambda calculus. Besides, SuFu also requires the reference program to come with some annotations that explicitly specify those intermediate data structures to be optimized.

To see the usage of SuFu, let us consider a function mts that returns the maximum suffix sum of a (possibly negative) integer list. For example, mts [1, -2, 3, -1, 2] is equal to 4, corresponding to the sum of suffix [3, -1, 2]. This function can be implemented by composing basic list-operating functions, as shown on the right.

In Line 1, List is defined as an inductive data type with two constructors: nil unit constructs an empty list, and cons {h, t} constructs a list with head element h and tail list t. In this definition, Unit is a singleton type with element unit and {Int, List} represents the product type of Int and List.
In this program, \x:T. t represents a lambda function where the input variable is x, the input type is T, and the function body is t; fix represents the standard fix point; and t.k returns the k-th component of tuple t.
To calculate the maximum suffix sum, tails generates a (nested) list including all suffixes of the input list, map applies function sum to each suffix and obtains the list of all suffix sums, and at last maximum returns the maximum among these suffix sums.

This program, though clear, is inefficient due to the large intermediate data structure generated by tails, i.e., the list of suffixes. The operation maximum (map sum .) applied to this list takes quadratic time, which forms the bottleneck of the whole program. To improve the efficiency of this program, we need to speed up this operation on the bottleneck, which also requires optimizing the intermediate data structure returned by tails.

SuFu can automate the above optimization if in the reference program, the intermediate data structure to be optimized and the related operations are explictly annotated, as shown on the right.

There are two primary kinds of annotations. Reframe annotates the types of those data structures to be optimized, and rewrite annotates those code fragments operating Reframe data structures.
At the language level, Reframe is treated as a type constructor, that is, Reframe List is the type of lists to be reframed and is distinguished from List. Correspondingly, two constructors label and unlabel are provided to construct and access Reframe data structures. They act as polymorphic functions of type T → Reframe T and Reframe T → T, respectively. To ensure that all operations on Reframe data structures are annotated, label and unlabel can only be used within the scope of rewrite.
In the annotated mts program, the output type of tails is annotated because it corresponds to the data structure to be optimized. Values of this type are operated in three places: two in tails for constructing the list of suffixes and one in mts for calculating the maximum suffix sum. The codes in these three places are all annotated by rewrite.

After receiving this annotated program, SuFu applies techniques of program synthesis to rewrite all those annotated types and code fragments and then generates a more efficient program for mts, as shown on the right. In this program, instead of generating the whole list of all suffixes, the optimized version of tails returns only two integers - the maximum suffix sum and the sum of all elements, respectively. Consequently, the time cost of operating the result of tails is reduced to a constant and the time complexity of the whole program is reduced to linear.

SuFu considers both the correctness and the efficiency when rewriting annotated types and code fragments.

(Correctness) Ideally, the optimized program should be semantically equivalent to the reference program, that is, should always produce the same output on all possible inputs. However, the current version of SuFu cannot ensure complete correctness due to the limited effect of existing program verifiers. Alternatively, SuFu checks the correctness of candidate programs by random testing. It will regard a candidate program as correct if this program passes a large enough number of randomly generated tests. The configurations of this test procedure can be found in Section Configurations.
(Efficiency) SuFu ensures the efficiency of the resulting program by limiting the space of candidate programs for the rewrite, following the SyGuS framework. Concretely, SuFu rewrites only using programs in a pre-defined program space, in which all programs are ensured to be efficient. By default, this program space includes only constant-time expressions with scalar outputs. The configuration of this program space can be found in Section Program Space.

Besides, to reduce the user burden of providing annotations, SuFu includes a heuristic approach to generate annotations (option "autolabel" in the online demo). Once this approach is enabled, the user needs only to annotate the types of data structures to be optimized. Then SuFu will automatically supply those code-level annotations (label, unlabel, and rewrite) according to a heuristic rule that the size enclosed by rewrite should be as small as possible. The right side is a program with only type annotations, and SuFu can automatically convert it to the fully annotated program above.

Surface Language

The surface language of SuFu is basically the simply typed lambda calculus augmented with inductive data structures and annotations for optimization. Its syntax is shown below.

The type system and the semantics of this language are almost standard. The following supplies some details related to the annotations.

Reframe is not allowed to annotate types involving functions, such as List → Int and {List, Int → List}, because of the limit of the backend program synthesizer. Similarly, rewrite is not allowed to annotate terms whose result involves functions.
label and unlabel acts as polymorphic functions of type T → Reframe T and Reframe T → T, respectively. They are not allowed to be used outside the scope of rewrite to ensure that all operations on Reframe data structures are annotated.

Program Space

Following the SyGuS framework, SuFu synthesizes programs from pre-defined program spaces. As a result, it will fail when the program spaces are not expressive enough to implement necessary programs. SuFu uses two program spaces during the optimization, as shown below.

The first program space (denoted as the rewrite space) specifies programs that can be used in the rewrite. For each rewrite term, its rewrite space is made up of some common scalar operators (such as arithmetic operators + and -) and some of the user-defined functions available in the context. By default, to ensure the efficiency of the resulting program, the rewrite space includes only programs with scalar outputs and user-defined functions that are recursion-free (which implies a constant-time cost).
The second program space (denoted as the representation space) specifies the possible programs for the representation functions, which is an auxiliary function synthesized by SuFu during the optimization. For each data structure to be optimized, its representation function specifies how it is reframed by mapping each of its possible values to the corresponding value in the optimized program. For example, in the mts example, the output data structure of tails is reframed from the list of suffixes to a pair of two integers - the maximum suffix sum and the sum of all elements. The representation function corresponding to this procedure can be implemented as repr ts = {maximum (map sum ts), sum (head ts)}, which maps a list of suffixes to the corresponding two integer values.
The representation function is initially unknown and is synthesized from the representation space. Similar to the rewrite spaces, the representation space is made up of some common operators (such as arithmetic operators and the fold operator for implementing recursions) and the user-defined functions in the reference program. The difference is that, the representation space does not need to consider the efficiency because the representation function is only an auxiliary function for synthesis and will not appear in the optimized program.

SuFu provides some interfaces for customizing these program spaces. The following are some examples.

Example 1: Rewrite using programs with non-scalar outputs.

SuFu provides an option to consider programs with nonscalar outputs in the rewrite (option "nonscalar" in the online demo). When this option is enabled, the rewrite space will include programs returning inductive data structures and thus will be more expressive.

The following is an example where programs with non-scalar outputs are necessary for the rewrite.

The left is another possible annotated program for the mts example, where the last operation on the list of suffixes is regarded as calculating the list of suffix sums. The default SuFu cannot optimize this program because the last annotated term can only be rewritten with terms of type List, otherwise the type will mismatch when applying function maximum.
The right is the optimized program generated by SuFu when the "nonscalar" option is enabled. In this optimization, SuFu reframes the list of suffixes to the list of suffix sums and thus reduces the time complexity of the whole program to linear.

Note that there are two drawbacks to enabling this option. First, AutoLifter will become less effective due to the increased difficulty of program synthesis. Second, SuFu can no longer automatically supply code-level annotations because of some limitations on the generation approach.

Example 2: Rewrite using programs with recursions.

To ensure the efficiency of the resulting program, SuFu uses only recursion-free functions to construct the rewrite space. However, sometimes recursions are necessary to accomplish the rewrite. For these cases, the user can use a decoration @Combine@Extract to force SuFu to include recursive functions into the rewrite space.

The following is an example where recursive programs are necessary for the rewrite.

The left is another possible annotated program for the mts example. In this program, function tails recurses with an index of the whole list, and each time, it inserts the suffix starting from the current index (i.e., drop n xs) into the list of suffixes.
The default SuFu cannot optimize this program because when rewriting the second annotated term in tails, it is impossible to access the n-th element of the whole list without recursions. Here, we solve this problem by providing a function access for accessing by index and forcing the rewrite space to include it using the @Combine@Extract decoration.
The right is the "optimized" program generated by SuFu. In this optimization, SuFu reframes the list of suffixes to the pair of the maximum suffix sum and the sum of all elements and thus rewrites each annotated term with programs that are almost recursion-free except for invoking access. However, since function access costs linear time, the optimized program still runs in quadratic time.

Example 3: Use global inputs in the representation function.

When optimizing the reference program, SuFu will first synthesize a representation function to help the rewrite of annotated terms. By default, the representation function takes a value of the annotated data structure (e.g., a concrete list of suffixes) as the input and returns the corresponding value after the optimization (e.g., the pair of the maximum suffix sum and the sum of elements). However, sometimes the representation function may also depend on some global inputs. For these cases, the user can use a decoration @Input to declare global inputs accessible by the representation function.

The following is an example where global inputs are necessary to define the representation function.

The left is a brute-forced program for checking whether there exists a suffix of the input list xs such that the corresponding suffix sum is equal to a given value w. This program runs in quadratic time due to the large intermediate data structure generated by tails.
The right is an optimized program that runs in linear time, where the optimized tails returns a pair of scalar values (which are the boolean result and the sum of all elements, respectively) instead of constructing the whole list of suffixes.

To let SuFu perform this optimization, a natural idea is to annotate the output type of tails and also to include w in the input of tails to make it accessible during the rewrite (here we assume that the "autolabel" option is enabled). However, SuFu cannot perform this optimization because it cannot find a valid representation function. Although a program mapping a list of suffixes (denoted as ts) to the result after the optimization can be implemented as (exists w (map sum ts), sum (head ts)), this program requires an extra input w and thus is unavailable in the representation space.

There are two ways to solve this problem, as shown below.

One way is to include w as a component of the data structure to be optimized, as shown in the left program. At this time, the value of w can be accessed by the representation function as a part of the input data structure.
The other way is to separately declare w as a global input using the decoration @Input, as shown in the right program. SuFu will consider all variables decorated by @Input as the input when synthesizing the representation function.

Configurations

SuFu also provides some interfaces for configuring its synthesis procedure, as listed in the table below. To change the default configurations, the user needs to put the corresponding command (demonstrated in the "Use Case" column) at the top of the reference program.

Name	Description	Default	Use Case
SampleIntMin	The minimum integer considered when generating random inputs.	-5	Config SampleIntMin = 0;
SampleIntMax	The maximum integer considered when generating random inputs.	5	Config SampleIntMax = 0;
SampleSize	The maximum size of inductive data structures (defined as the number of used constructors) considered when generating random inputs.	10	Config SampleSize = 20;
VerifyBase	The base number of random inputs used to test the correctness of a candidate result. The number of tests is directly proportional to this value.	1000	Config VerifyBase = 10000;
NonLinear	Whether to consider the product operator of integers in the rewrite space and the representation space.	false	Config NonLinear = true;
ExtraGrammar	Whether to load a pre-defined library into the rewrite space and the representation space. The current version of SuFu supports only the DeepCoder's library, which is comprised of 17 common list-operating functions.	None	Config ExtraGrammar = "DeepCoder";
ComposeNum	The maximum number of scalar values in the context accessed by each program used in the rewrite.	3	Config ComposeNum = 4;
TermNum	The maximum number of if-branches in each program used in the rewrite.	4	Config TermNum = 2;
ClauseNum	The maximum number of DNF clauses in each if-condition used in the rewrite.	4	Config ClauseNum = 2;

People

Advisor: Yingfei Xiong, Nadia Polikarpova, Zhenjiang Hu
Developer: Ruyi Ji, Yuwei Zhao