This project implements a middleware working as a switch in AWS Lambda platform. Serverless function instances simply send messages with the receiver's id to an input interface, then the receiver, another serverless function instance, will receive messages automatically.
import softwareforwarder
def lambda_handler(event, context):
@softwareforwarder.SoftwareForwarder(url=event['server_url'])
def user_function(event, context):
send: Callable = event['sen']
receive: Callable = event['rec']
uid: int = event['uid']
peers: List[int] = event['peers']
# send message to other instances
message: Picklable = ...
peer_id: int = ...
send(peer_id, message)
# receive message
message = receive()
...
user_function(event, context)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
}
A task of simple distributed Pagerank is deployed as a demonstration of this project.
First, Partition an immense graph into small sub-tasks by leveraging metis. Then sub-tasks are sent to different workers, each worker takes the responsibility for its sub-task. During the Pagerank calculation, necessary states will be exchanged among workers in each iteration
If we look at Pagerank equation (Ignore damping here, to make Pagerank calculation simpler)
is Pagerank of vertex m in the iteration n
is any vertex that
edge
exists
e.g.
All the states a worker X needs to know
are
and
.
are constant,
but
will be updated in each iteration.
Worse, worker X is probably not responsible for
.
So other workers have to send calculated
back to worker X. E.g. worker A sends
to
worker B, worker C sends
to worker B in
iteration n.
As long as worker B has collected
and
, worker B can start iteration n+1.
preprocess.py, invoker.py, server.py can be deployed on different ec2 instances. client.py and softwareforwarder.py
should be deployed to the same AWS Lambda function.
- Download data from Stanford Large Network Dataset Collection
- Put preprocess.py, invoker.py, server.py in an ec2 instance with enough performance, check configuration. Create a S3 bucket.
- Deploy client.py and softwareforwarder.py to AWS Lambda
- Run preprocess.py to generate dataset from raw data download from Stanford Large Network Dataset Collection, if preprocess.py can not parse the raw data, adjust it. Dataset generated will be uploaded to S3.
- Run server.py
- Run invoker.py