Skip to content

Subset-AVG – Finding a subset of List Which Matches Known Rational Number

I’ve asked this on math overflow and used comments to clarify/overstate my question. I hope it has the intended effect and doesn’t come off as jarring.

I’m attempting to find what subset of numbers reach a known average.

I have a list of known values, negative and possible decimals. They look something like this {-.32,-.64,-.12,.08,-.54,-.43, …}

It’s around 50 numbers in some cases, though this problem would be tested for other cases as well.

The set mostly contains negative decimal numbers, while in rare cases, has a few positive decimals – it never has whole numbers.

I also have a known value, which I know to be the average of some subset of the above list.

The known value is similar to -.03.

I’m uncertain of the grouping mechanism used, but seemed to reach stack overflow trying to solve this problem when not grouping.

I’ve attempted a few ways of going about solving this problem. I’m using Python 3.6 and imported numpy as np.

I’m wondering if the “subset-avg” code I’ve adapted from another solution for subset-sum here (I’ll give due credit when i can find that question again) is not the most efficient way/if there’s any huge mistake in my even attempting to resolve this that I haven’t seen.

Thanks in advance for any thoughts.

def subset_avg(numbers, target, partial=[],depth=1):
    # create AVG function
    # set average of partial
    a = np.mean(partial)
    # check if the partial sum is equals to target
    if a != target:
        print("Currently Testing the Following Subset's " " " + "Average(%s)  =  %snn" % (partial, round(a,2)))
    print(depth)
    if a == target or round(a,2) == target:
            print('nn')
            print("************")
            print("************")
            print('nn')
            print("Found Subset AVG " + "Average(%s)  =  %s" % (partial, target))
            print('nn')
            print("************")
            print("************")
            print('nn')
    print(depth)
    # for each number in range of list
    for i in range(len(numbers)):
        # set n = current iteration in list
        n = numbers[i]
        # remaining values is current iteration + 1 through end of list
        remaining = numbers[i+1:]
        # calculate mean of partial, set partial = partial plus n
        subset_avg(remaining, target, partial + [n],depth+1)
# Example of use
x = [-.32,-.64,-.12,.08,-.54,-.43]
subset_avg(x,-.03)

Answer

Here is a solution I adapted from a subSet sum algorithm I posted for another question (here). Since the algorithm loops through potential solution sizes, it was easy to adapt it to search for an average.

The iSubSum() function takes 3 parameters: The target average, the list of values and an optional rounding precision parameter. It is a generator so it will produce all possible solutions when used in a loop. You can also get the first solution quickly using the next()function. This should produce results much faster than a brute force approach, especially for large lists.

The function is based on a modified version of a subset-sum algorithm that returns solutions as lists of indices. This is intended to distinguish combinations that have duplicate values coming from different indices in the original list.

from bisect import bisect_right
from itertools import accumulate
def iSubAverage(M,A,P=0):
    smallSize     = 20
    smallSums     = set()
    def subSumForSize(S,A,size,failedSums=None):
        nextSum = A[size-2][2] if size>1 else 0
        index   = bisect_right([a for a,_,_ in A],S-nextSum) # max element for target
        A       = A[:index]
        if len(A) < size:    return                  # not enough elements for size
        if A[size-1][2]  > S: return                 # minimum sum > target
        maxSum = A[-1][2]
        if len(A) > size: maxSum -= A[-size-1][2]
        if maxSum < S:  return                       # maximum sum < target
        if len(A) <= smallSize and S not in smallSums: return
        if failedSums is None: failedSums = set()
        while index >= size:
            index -= 1
            a,i,ca = A[index]
            if size == 1:
                if a == S: yield [i]
                continue
            c0 = A[index-size][2] if index>size else 0
            if ca-c0 < S: break
            subS = S-a
            if subS in failedSums: continue # known unreachable sum
            failed = True
            for result in subSumForSize(subS,A[:index],size-1,failedSums):
                yield result+[i]
                failed = False
            if failed: failedSums.add(subS)
    if not A: return
    if M < 0: M,A = -M,[-a for a in A] # must have positive target
    offset = max(0,-min(A)) # circumvent negatives (requires loop on sizes)
    A      = sorted([(round(a+offset,P),i) for i,a in enumerate(A)])
    cumA   = accumulate(a for a,i in A)
    A      = [(a,i,ca) for (a,i),ca in zip(A,cumA)]
    for a,_,_ in A[:smallSize]:
        newSums = [a+s for s in smallSums] + [a]
        smallSums.update(newSums)
    for size in range(1,len(A)+1):
        subS  = round(M*size,P)
        if subS != round(M*size,P*2): continue # fractional numerator
        subS += round(offset*size,P)
        for result in subSumForSize(subS,A,size):
            yield result

To get the actual values, the iSubAvg() function maps indices to the corresponding values in the list:

def iSubAvg(M,A,P=0):
    for iA in iSubAverage(M,A,P):
        yield sorted([A[i] for i in iA])
L       = [-.32,-.64,-.12,.08,-.54,-.43]
targetL = -0.02
for solution in iSubAvg(targetL,L,2):
    print(solution)
# [-0.12, 0.08]   (there isn't a solution for -0.03)
K = [0.72, 0.69, 0.81, -0.28, 0.6, 0.59, 0.77, 0.46, 0.36, 0.66, 0.88, 0.88, 0.9, -0.24, 0.5, -0.5, 0.46, 0.96, -0.22, -0.8, -0.13, 0.87, 0.78, 0.2]
targetK = -0.02
for solution in iSubAvg(targetK,K,2):
    print(solution)
# [-0.5, 0.46]
# [-0.5, 0.46]
# [-0.8, -0.22, 0.96]
# [-0.5, -0.28, 0.72]
# [-0.28, -0.24, 0.46]
# [-0.28, -0.24, 0.46]
# [-0.5, -0.24, 0.2, 0.46]
# [-0.5, -0.24, 0.2, 0.46]
# [-0.8, -0.28, -0.24, -0.22, 0.46, 0.96]
# [-0.8, -0.28, -0.24, -0.22, 0.46, 0.96]
next(iSubAvg(0.165,K,2))
# [-0.8, -0.28, -0.24, 0.66, 0.69, 0.96]

note that the function returns all combinations including repetitions for duplicate values in the source list. You can filter out these duplicates if you don’t need them