Friday, November 13, 2015

Building a Compute Cluster with the Beagle Bone Black

With a small size, extremely low power consumption and great Linux support, arm based boards are great for developing small projects. The Beagle Bone Black was launched in 2008; Texas Instruments as an open source computer developed the original Beagle Board. It featured a 720 MHz Cortex A8 arm chip and 256MB of memory. The BeagleBoard-xm and Beagle Bone were released in subsequent years leading to the Beagle Bone Black as the most recent release.
Setting up the Cluster
2x beagle bone blacks (Connected to machines)
2x Ethernet cables
With ifconfig change the ip address and Edit  /etc/hostname and place the new hostname in the file.
hostname beaglebone1

127.0.0.1     localhost
192.168.1.51  beaglebone1 (Example)
192.168.1.52  beaglebone2 (Example)
Creating a Compute Cluster With MPI
MPI is a standardized system for passing messages between machines on a network. It is powerful in that it distributes programs across nodes so each instance has access to the local memory of its machine and is supported by several languages such as C, Python and Java. There are many versions of MPI available MPICH is one of them. As a root execute following commands.
sudo apt-get update
sudo apt-get install gcc
sudo apt-get install libcr-dev mpich2 mpich2-doc
MPI works by using SSH to communicate between nodes and using a shared folder to share data. The first step to allowing this was to install NFS.
Install NFS server on first BBB1 using   # apt-get install nfs-server
Install NFS client second   BBB2 using  # apt-get install nfs-client
Create a directory to be used for MPI on each BBB with mkdir /hpcuser
Synchronise the folders by issuing the command on the master node:
# echo "/hpcuser *(rw,sync)" | sudo tee -a /etc/exports
Mount the master's node on  slave so they can see any files that are added to the master node:
# mount beaglebone1:/hpcuser /hpcuser
create the hpcuser and assign it the shared folder:
# useradd -d /hpcuser hpcuser
To generate a key to use for the SSH communication.
# su - hpcuser
# sshkeygen -t rsa
Testing MPI
Once the machines were able to successfully connect to each other, Write a simple program on the master node.
log in as hpcuser,
create a simple program in its root directory /hpcuser and call it mpi1.c. MPI needs the program to exist in the shared folder so it can run on each machine.
The program below simply displays the index number of the current process, the total number of processes running and the name of the host of the current process. Finally, the main node receives a sum of all the process indexes from the other nodes and displays it:

#include <mpi.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
    int rank, size, total;
    char hostname[1024];
    gethostname(hostname, 1023);
    MPI_Init(&argc, &argv);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);
    MPI_Comm_size (MPI_COMM_WORLD, &size);
    MPI_Reduce(&rank, &total, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    printf("Testing MPI index %d of %d on hostname %s\n", rank, size, hostname);
    if (rank==0)
    {
        printf("Process sum is %d\n", total);
    }
    MPI_Finalize();
    return 0;
}

Created a file called machines.txt in the same directory and place the names of the nodes in the cluster inside, one per line. This file tells MPI where it should run:
beaglebone1
beaglebone2
compile  the program using mpicc and the test it :

mpicc mpi1.c -o mpiprogram
mpiexec -n 8 -f machines.txt ./mpiprogram


TITLE : INTERAFCING OF STEPPER mOTOR.
OBJECTIVE :
Study of stepper motor interface with BBB.
S/W AND H/W REQUIREMENT :
BBB ,Stepper motor driving circuitry and stepper motor host .
REFERENCES :
       www.beagleboard.org

THEORY :
A unique type f motor useful for maving things in small increments is a stepper motor. Stepper motors rotate or step from one fixed position to the next. Stepper motors are used in do matrix printers, floppy disk( used to position the read and write head over the desired track ) and to move the pen around on x-y plotters.
                                    Common step sizes for stepper motors range from 9 to 30 degrees.A stepper motor is stepped from one position to the next by changing the currents through the fields in the motor. The two common field connections are reffered to as two phase and four phase

          Step                 Switch
                              SW4   SW3  SW2  SW1
            1                  0       0        1         1
            2                  1       0        0         1
            3                  1       1        0         0
            4                  0       1        1         0


STEP TO DO :
Stepper motor  PIO is connected to BBB board . Connections  are done using 4GPIOs on header p9.
       Send a rotation patter for the stepper motor.
       Observe the rotation of the motor.

Interfacing diagram




Program in Python to rotate the stepper motor
import Adafruit_BBIO.GPIO as GPIO
import time
GPIO.setup("P9_11", GPIO.OUT)
GPIO.setup("P9_12", GPIO.OUT)
GPIO.setup("P9_13", GPIO.OUT)
GPIO.setup("P9_14", GPIO.OUT)
while True:
    GPIO.output("P9_11", GPIO.HIGH)
    GPIO.output("P9_12", GPIO.LOW)
    GPIO.output("P9_13", GPIO.LOW)
    GPIO.output("P9_14", GPIO.LOW)
   time.sleep(0.25)
   GPIO.output("P9_11", GPIO.LOW)
    GPIO.output("P9_12", GPIO.HIGH)
    GPIO.output("P9_13", GPIO.LOW)
    GPIO.output("P9_14", GPIO.LOW)
   time.sleep(0.25)
   GPIO.output("P9_11", GPIO.LOW)
    GPIO.output("P9_12", GPIO.LOW)
    GPIO.output("P9_13", GPIO.HIGH)
    GPIO.output("P9_14", GPIO.LOW)
   time.sleep(0.25)

    GPIO.output("P9_11", GPIO.LOW)
    GPIO.output("P9_12", GPIO.LOW)
    GPIO.output("P9_13", GPIO.LOW)
    GPIO.output("P9_14", GPIO.HIGH)
   time.sleep(0.25)
    GPIO.cleanup()

A file holds a data structure that is written and modified by number of users in a distributed manner. Multiple users on multiple computers use Read-Modify-Write cycle provided resource is available else use use modify once before exit. Write necessary Program using OpenCL.

PROGRAM

b13.c

#include <stdio.h>
#include <stdlib.h>
#include <CL/cl.h>
#define SRC_SIZE (0x100000)
     
int main()
{
    cl_device_id device_id = NULL;
    cl_context context = NULL;
    cl_command_queue command_queue = NULL;
    cl_program program = NULL;
    cl_kernel kernel = NULL;
    cl_platform_id platform_id = NULL;
    cl_uint num_devices, num_platforms;
    cl_mem memobj1 = NULL, memobj2 = NULL, memobj3 = NULL;
    size_t g=16, l=1;
     
    FILE *fp = fopen("./b13.cl", "r");
    if(!fp) 
    {
        printf("Failed to load kernel.\n");
        exit(1);
    }
    char * src = (char*)malloc(SRC_SIZE);
    size_t src_size = fread(src, 1, SRC_SIZE, fp);
    fclose(fp);
    
    clGetPlatformIDs(1, &platform_id, &num_platforms);
    clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &num_devices);
    context = clCreateContext(NULL, 1, &device_id, NULL, NULL, NULL); 
    command_queue = clCreateCommandQueue(context, device_id, 0, NULL); 
    memobj1 = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(int), NULL, NULL);
    memobj2 = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(int), NULL, NULL);
    memobj3 = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(int), NULL, NULL);
    
    program = clCreateProgramWithSource(context, 1, (const char **)&src, (const size_t *)&src_size, NULL); 
    clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
    kernel = clCreateKernel(program, "b13", NULL);   
    clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobj1);
    clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&memobj2);
    clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&memobj3); 
    clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &g, &l, 0, NULL, NULL);
      
    clFlush(command_queue);
    clFinish(command_queue);
    clReleaseKernel(kernel);
    clReleaseProgram(program);
    clReleaseMemObject(memobj1);
    clReleaseMemObject(memobj2);
    clReleaseMemObject(memobj3);
    clReleaseCommandQueue(command_queue);
    clReleaseContext(context);
    free(src);
    return 0;

}

b13.cl

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_global_int32_extended_atomics : enable
#pragma OPENCL EXTENSION cl_khr_local_int32_extended_atomics : enable

void GetSem(__global int * sem) 
{
    int occupied = atom_xchg(sem, 1);
    while(occupied > 0)
        occupied = atom_xchg(sem, 1);
}

void ReleaseSem(__global int * sem)
{
    int prevVal = atom_xchg(sem, 0);
}

__kernel void b13(__global int * sem, __global int * x, __global int * lock)
{
    int i = get_global_id(0);
    if(i%2 == 0)
    {
        GetSem(&sem[0]);
        *x = i;
        *lock = 1;
        printf("Kernel %d setting value of x: %d\n", i, *x);
        ReleaseSem(&sem[0]);
    }
    else
    {
        while((*lock)!=1)
            printf("Kernel %d waiting for first write\n",i);
        GetSem(&sem[0]);
        printf("Kernel %d reading value of x: %d\n", i, *x); 
        ReleaseSem(&sem[0]);  
    }
}

A text file is stored in a distributed manner on three hard disks on three machines such that consecutive lines, one per hard disk are stored in cyclic manner. Write a program using OpenCL to read/Write/Modify the file.

PROGRAM

b12.c

#include <stdio.h>
#include <stdlib.h>
#include <CL/cl.h>
#include <string.h>
#define SRC_SIZE (0x100000)

int total_lines()
{
    FILE * fp;
    char s[50];
    int line = 0;
    fp = fopen("file1.txt", "r");
    while(!feof(fp))
    {
        fgets(s, 50, fp);
        line++;
    }
    fclose(fp);
    fp = fopen("file2.txt", "r");
    while(!feof(fp))
    {
        fgets(s, 50, fp);
        line++;
    }
    fclose(fp);
    fp = fopen("file3.txt", "r");
    while(!feof(fp))
    {
        fgets(s, 50, fp);
        line++;
    }
    line  -= 6;
    fclose(fp);
    return line;
}

int main()
{
    cl_device_id device_id = NULL;
    cl_context context = NULL;
    cl_command_queue command_queue = NULL;
    cl_program program = NULL;
    cl_kernel kernel = NULL;
    cl_platform_id platform_id = NULL;
    cl_uint num_devices, num_platforms;
    cl_mem buff1 = NULL, buff2 = NULL, x = NULL, l = NULL;
    size_t gl = 3, lo = 1;
     
    FILE *fp = fopen("./b12.cl", "r");
    if(!fp) 
    {
        printf("Failed to load kernel.\n");
        exit(1);
    }
    char * src = (char*)malloc(SRC_SIZE);
    size_t src_size = fread(src, 1, SRC_SIZE, fp);
    fclose(fp);
    
    clGetPlatformIDs(1, &platform_id, &num_platforms);
    clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &num_devices);
    context = clCreateContext(NULL, 1, &device_id, NULL, NULL, NULL); 
    command_queue = clCreateCommandQueue(context, device_id, 0, NULL); 
    
    buff1 = clCreateBuffer(context, CL_MEM_READ_WRITE, 50*sizeof(char), NULL, NULL);
    buff2 = clCreateBuffer(context, CL_MEM_READ_WRITE, 50*sizeof(char), NULL, NULL);
    x = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(int), NULL, NULL);
    l = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(int), NULL, NULL);
    
    program = clCreateProgramWithSource(context, 1, (const char **)&src, (const size_t *)&src_size, NULL); 
    clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
    kernel = clCreateKernel(program, "b12", NULL);
   
    while(1)
    {
        int choice = 0, line = 0;
        char s[50];
         
        printf("1. Read\t2. Write\t3. Exit\n");
        scanf("%d", &choice);

        switch(choice)
        {
            case 1:
                printf("Enter line number: ");
                scanf("%d", &line);
                if (line > total_lines())
                {
                    printf("Invalid line number\n");
                    continue;    
                }
                break;
                
            case 2:
                printf("Enter line to write: ");
                scanf("%s", s);
                clEnqueueWriteBuffer(command_queue, buff2, CL_TRUE, 0, 50, s, 0, NULL, NULL);
                line = total_lines();
                break;
            case 3:
                exit(0);
                
            default:
                printf("Wrong option!\n");
                continue;
        }   
        switch(line%3)
        {
            case 0:
                fp = fopen("file1.txt", "r");
                fread(s, 50, 1, fp);
                clEnqueueWriteBuffer(command_queue, buff1, CL_TRUE, 0, strlen(s), s, 0, NULL, NULL);
                fclose(fp);
                break;
            
            case 1:
                fp = fopen("file2.txt", "r");
                fread(s, 50, 1, fp);
                clEnqueueWriteBuffer(command_queue, buff1, CL_TRUE, 0, strlen(s), s, 0, NULL, NULL);
                fclose(fp);
                break;
            
            case 2:
                fp = fopen("file3.txt", "r");
                fread(s, 50, 1, fp);
                clEnqueueWriteBuffer(command_queue, buff1, CL_TRUE, 0, strlen(s), s, 0, NULL, NULL);
                fclose(fp);          
                break;
        }
        
        clEnqueueWriteBuffer(command_queue, x, CL_TRUE, 0, sizeof(int), &choice, 0, NULL, NULL);
        clEnqueueWriteBuffer(command_queue, l, CL_TRUE, 0, sizeof(int), &line, 0, NULL, NULL);
        clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&buff1);
        clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&buff2);
        clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&x);
        clSetKernelArg(kernel, 3, sizeof(cl_mem), (void *)&l);
        clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &gl, &lo, 0, NULL, NULL);
        
        switch(choice)
        {
            case 1:
                clEnqueueReadBuffer(command_queue, buff2, CL_TRUE, 0, 50, s, 0, NULL, NULL);
                printf("%s\n", s);
                break;
                
            case 2:
                line++;
                clEnqueueReadBuffer(command_queue, buff1, CL_TRUE, 0, 50, s, 0, NULL, NULL);
                switch(line%3)
                {
                case 0:
                    fp = fopen("file3.txt", "w");
                    fwrite(s, 1, strlen(s), fp);
                    fclose(fp);
                    break;
                
                case 1:
                    fp = fopen("file1.txt", "w");
                    fwrite(s, 1, strlen(s), fp);
                    fclose(fp);
                    break;
            
                case 2:
                    fp = fopen("file2.txt", "w");
                    fwrite(s, 1, strlen(s), fp);
                    fclose(fp);          
                    break;
                }
                break;
        }   
    }  
    
    clFlush(command_queue);
    clFinish(command_queue);
    clReleaseKernel(kernel);
    clReleaseProgram(program);
    clReleaseMemObject(buff1);
    clReleaseMemObject(buff2);
    clReleaseMemObject(x);
    clReleaseMemObject(l);
    clReleaseCommandQueue(command_queue);
    clReleaseContext(context);
    free(src);
    return 0;

}

b12.cl

void readline(__global char * buff1, __global char * buff2, __global int * l)
{
    int i=0, j=0, k=0, line = (*l)/3;
    while(j < line)
    {
        if(buff1[i] == '\n')
            j++;
        i++;
    }
    while(buff1[i] != '\n' && buff1[i] != '\0')
    {
        buff2[k] = buff1[i];
        i++;
        k++;
    }
}

void writeline(__global char * buff1, __global char * buff2,__global int *l)
{
    int i=0, j=0;
    while(buff1[i]!='\0')
        i++;
    do
    {
        buff1[i] = buff2[j];
        i++;
        j++;
    }while(buff2[j]!='\0');   
}

__kernel void b12(__global char * buff1, __global char * buff2, __global int * x, __global int * l)
{
int i = get_global_id(0);
    if(i != (*l)%3)
         return;
    if(*x == 1)
        readline(buff1, buff2, &l[0]);
    else if(*x == 2)
        writeline(buff1, buff2,&l[0]);
}

Perform a suitable assignment using Xen Hypervisor or equivalent open source to configure it. Give necessary GUI.

 To install kvm on Fedora:  yum install kvm  yum install virt-manager libvirt libvirt-python python-virtinst  su -c "yum install @v...