Introduction to pyfora¶
pyfora sends your Python code to a local or remote cluster, where it is compiled and can be automatically run in parallel on many machines. With pyfora you can run distributed computations on a cluster without having to modify your code. You can speed up your computations by running them on hundreds of CPU cores, and you can operate on datasets many times larger than the RAM of a single machine.
Setting up a Cluster¶
To get started with pyfora you will need a backend cluster with at least one worker. The backend is available as a docker image that can be run locally on your machine in a single-node setup, or on a cluster of machines on a local network or in the cloud.
Connecting to a Cluster¶
Once you have a running cluster, you can connect to it and start executing code:
import pyfora executor = pyfora.connect('http://localhost:30000')
- Asynchronously using
- Synchronously by enclosing code to be run remotely in a special python
In this tutorial we will use the synchronous model.
First, we’ll define a function that we are going to work with:
def isPrime(p): x = 2 while x*x <= p: if p%x == 0: return 0 x = x + 1 return 1
Now we can use the
Executor to run some remote computations with the function
with executor.remotely.downloadAll(): result = sum(isPrime(x) for x in xrange(10 * 1000 * 1000)) print result
What just happened?¶
The code in the body of the
with block was shipped to the cluster, along with any dependent
objects and code (like
isPrime() in this case) that it refers to, directly or indirectly.
The python code was translated into pyfora bitcode and executed on the cluster. The resulting
result in the example above) were downloaded from the cluster and copied back
back into the local python environment because we used
Depending on the code you are running, and the number of CPU cores available in your cluster, the
runtime looks for opportunities to parallelize the execution of your code.
In the example above, the runtime can see that individual invocations of
isPrime() within the generator
isPrime(x) for x in xrange(10 * 1000 * 1000) are independent of each other can therefore
be run in parallel.
In fact, what the runtime does in this example is to split the
xrange() iteration across all available
cores in the cluster. If a particular subrange completes while others are still running, the runtime
dynamically subdivides a range that is still computing to maximize the utilization of CPUs.
This is something that is bound to happen in problems like this when time-complexity of a computation
is not constant across the entire input space (determining whether a large number is prime is much
harder than a small number).
Not all python code can be converted to pyfora bitcode and run in this way. In order to benefit form the performance and scalability advantages of pyfora, your code must be:
- Side-effectless: data structures cannot be mutated.
- Deterministic: running with the same input must always yield the same result.
See Pure Python for more details.
Working with proxies¶
In the previous example, the result of the computation was the number of prime numbers in the specified
range. That’s a single
int that can be easily downloaded from the cluster and copied into
the local python environment.
Now consider this variation of the code:
with executor.remotely.remoteAll(): primes = [x for x in xrange(10 * 1000 * 1000) if isPrime(x)]
This time the result of the computation, the variable
list of all prime numbers in the range. But because we used
primes is a proxy to a list of primes that lives in-memory on the
cluster (it is actually an instance of
There are two things you can do with remote python objects:
- Download them into the local python scope where they become regular python objects.
- Use them in subsequent remote computations on the cluster.
primes = primes.toLocal().result()
This call downloads all the data in the remote
primes list from the cluster to the
client machine where it is converted back into python. If the list is very large, or the connection
to the cluster is slow, this can be a slow operation.
Furthermore, the size of the list may be greater than the amount of memory available on the local
machine, in which case it is impossible to download it this way.
As an alternative to downloading the entire result, we may choose to compute with it inside of
with executor.remotely block. For example:
with executor.remotely.downloadAll(): lastFewPrimes = primes[-100:]
The backend recognizes that
primes is a proxy to data that lives remotely in the cluster,
and lets us refer to it in dependent computations, the result of which we then return as regular
For convenience, it also possible to write:
with executor.remotely.downloadSmall(bytecount=100*1024): ...
In this case, objects larger than
bytecount are left in the cluster and returned as proxies,
while smaller objects are downloaded and copied into the local scope.
The pyfora runtime supports a restricted, “purely functional” subset of python that we call “Pure Python”. By “purely functional” we mean code in which:
- All data-structures are immutable (e.g. no modification of lists).
- No operations have side-effects (e.g. no sockets, no
- All operations are deterministic - running them on the same input always yields the same result (e.g. no access to system time, amount of available memory, etc.)
These restrictions are essential to the kinds of reasoning that pyfora applies to your code. Some of these restrictions may be relaxed in the future under certain circumstances, but at this time the following constraints are enforced:
Objects are immutable (except for
selfwithin an object’s
__init__()). Expressions like
obj.x = 10are disallowed, as they would modify
obj. The exception to this rule is
__init__(), where assignments are used to provide initial values to object members.
Lists are immutable. Expressions like
a = 10won’t work, nor will
However, given a list
a, “appending” a value
xto it by producing a new list using
a + [x]results in effecient code without superflous copying of data.
Dictionaries are immutable. In the future, updates to dictionaries will be allowed in cases where pyfora can prove that there is exactly one reference to the dictionary. But for the moment dictionaries can only be constructed from iterators, as in:
dict((x, x**2) for x in xrange(100))
Also note that at the moment, dictionaries can be quite slow, so use them sparingly.
No augmented assignment. Expressions like
x += 10are disallowed since they modify
Regular variable assignments do work as expected. The following code, for example, is allowed:
x = x + 5 v = [x] v = v + 
Violation of Constraints¶
Whenever you invoke pyfora on a block of python code, the runtime attempts to give you either (a) the exact same answer you would have received had you run the same code in your python interpreter locally, or (b) an exception .
Constraint checking happens in two places. Some of the constraints are enforced at parse-time.
As soon as you enter a
with executor.remotely block, pyfora tries to determine all the code your
invocation can touch. If any of that code contains syntatic elements that pyfora knows are invalid
print() statements), it will generate an exception.
Other constraints are enforced at runtime. For instance, the append method of lists, when invoked in pyfora,
InvalidPyforaOperation exception that’s not catchable by python
code running inside of pyfora. This indicates that the program has attempted to execute semantics that
pyfora can’t faithfully reproduce.
|||Currently, the only intended exception to this rule is integer arithmetic:
on the occurrence of an integer arithmetic overflow, pyfora will give you the semantics of the
underyling hardware, whereas python will produce an object of type |