MPI Fox Algorithm

Implementation of the Fox algorithm for square matrix multiplication using MPI

Fox algorithm

The objective of this project is to show the execution time and RAM usage of the algorithm for different matrix sizes and number of processes.

The program will run the algorithm for all the matrix sizes \( 2^x\ \forall\ x \in [n, m] \) five times and will calculate the average execution time and RAM usage. The results will be used to generate a graph that will be saved in the charts folder.

Developed by

José Felipe Marcial Rojas. Github
Miguel Ángel Hernández Moreno. Github

TO DO

  • Add RAM measurement for Windows.
  • Create CSV files with the results.
  • Add a script to generate the charts from the CSV files.
  • Add charts to docs page.

References

How It Works

This chapter is a detailed explanation of how the algorithm works, its implementation in Python and its dependencies, how to measure the time and ram memory consumption, as well as its graphing.

Algorithm Explanation

Checkerboard

Most parallel matrix multiplication functions use a checkerboard distribution of the matrices. This means that the processes are viewed as a grid, and, rather than assigning entire rows or entire columns to each process, we assign small sub-matrices. For example, if we have four processes, we might assign the element of a 4x4 matrix as shown below, checkerboard mapping of a 4x4 matrix to four processes.

Fox Algorithm

Fox‘s algorithm is a one that distributes the matrix using a checkerboard scheme like the above.

In order to simplify the discussion, lets assume that the matrices have order \(n\), and the number of processes, \(p\), equals \(n^2\). Then a checkerboard mapping assigns \(a_{ij}\), \(b_{ij}\) , and \(c_{ij}\) to process (\(i\), \(j\)).

Fox‘s algorithm takes \(n\) stages for matrices of order n one stage for each term aik bkj in the dot product

\(C_{i,j} = a_{i,0}b_{0,j} + a_{i,1}b_{1,i} +. . . +a_{i,n−1}b_{n−1,j}\)

Initial stage, each process multiplies the diagonal entry of A in its process row by its element of B:

Stage \(0\) on process\((i, j)\): \(c_{i,j} = a_{i,i} b_{i,j}\)

Next stage, each process multiplies the element immediately to the right of the diagonal of A by the element of B directly beneath its own element of B:

Stage \(1\) on process\((i, j)\): \(c_{i,j} = a_{i,i+1} b_{i+1,j}\)

In general, during the kth stage, each process multiplies the element k columns to the right of the diagonal of A by the element \(k\) rows below its own element of B:

Stage \(k\) on process\((i, j)\): \(c_{i,j} = a_{i,i+k} b_{i+k,j}\)

Example 2x2 Fox's Algorithm

Consider multiplying 2x2 block matrices \(A\), \(B\), \(C\):

\[ \begin{aligned} A \times B &= C \\ \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \times \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} &= \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix} \end{aligned} \]

Stage 0:

Process (i, i mod 2)Broadcast along row i
(0,0)\(a_{00}\)
(1,1)\(a_{11}\)

\[ \begin{bmatrix} a_{00}, b_{00} & a_{00}, b_{01}\\ a_{11}, b_{10} & a_{11}, b_{11}\\ \end{bmatrix} \]

Process (\(i\), \(j\)) computes:

\[ \begin{aligned} C_{\text{Stage 0}} = \begin{bmatrix} 1 \times 1 & 1 \times 1 \\ 1 \times 1 & 1 \times 1 \end{bmatrix} &= \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \end{aligned} \]

Shift-rotate on the columns of \(B\)

Stage 1:

Process (\(i\), (\(i\) + 1) mod 2)Broadcast along row \(i\)
(0,1)\(a_{01}\)
(1,0)\(a_{10}\)

\[ \begin{matrix} a_{01}, b_{10} & a_{01}, b_{11}\\ a_{10}, b_{00} & a_{10}, b_{01}\\ \end{matrix} \]

Process (\(i\), \(j\)) computes:

\[ \begin{aligned} C_{\text{Stage 1}} = \begin{bmatrix} 1+1 \times 1 & 1+1 \times 1 \\ 1+1 \times 1 & 1+1 \times 1 \end{bmatrix} &= \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix} \end{aligned} \]

Fox in Secuencial

Sequential implementation of the Fox matrix multiplication algorithm. It initializes matrices, performs the multiplication, prints execution time and resource usage. The progress bar (tqdm) visualizes the iteration progress.

Imports

import os
import numpy as np
from sys import argv
from tqdm import tqdm
from time import perf_counter

We import necessary libraries, including os, numpy, sys, tqdm (for progress bar), and time.

Check Operating System and Define Constants

isLinux = os.name == 'posix'
if isLinux:
   from resource import getrusage, RUSAGE_SELF

Determines if the operating system is Linux and imports resource-related functions for measuring resource usage.

Matrix Multiplication Function:**

def fox_SEC(exponent: int, isInt: bool) -> float:

Defines a function fox_SEC that takes the exponent and a flag isInt as parameters and returns the execution time.

Matrix Initialization

MATRIX_SIZE = 2**exponent
matrix_A = ...
matrix_B = ...
matrix_C = np.zeros((MATRIX_SIZE, MATRIX_SIZE), dtype=int) if isInt else np.zeros((MATRIX_SIZE, MATRIX_SIZE))

Initializes square matrices (A), (B), and (C) of size (2^{\text{exponent}} \times 2^{\text{exponent}}) with random integers or floats based on the isInt flag.

Matrix Multiplication Loop

start_time = perf_counter()

for x in tqdm(range(MATRIX_SIZE)):
   for i in range(MATRIX_SIZE):
       y = (x + i) % MATRIX_SIZE
       matrix_C[i] += matrix_A[i, y] * matrix_B[y]

Performs matrix multiplication using nested loops. It calculates the product matrix (C) by iterating through rows and columns.

Print Results

print(perf_counter() - start_time)
print(getrusage(RUSAGE_SELF).ru_maxrss) if isLinux else print(0)

Prints the execution time and the RAM usage if the operating system is Linux.

Main Execution

if __name__ == '__main__':
   fox_SEC(int(argv[1]), bool(argv[2]))

Calls the fox_SEC function with command-line arguments for exponent and isInt flag if we want matrices with integers or float type number.

Fox MPI

We use MPI to parallelize the matrix multiplication of randomly generated squared matrices A and B, and it calculates the product matrix C. The matrices are distributed among processes, and each process computes a portion of the final result. The MPI Allreduce function is used to combine the partial results from all processes.

Imports

import os
import numpy as np
from sys import argv
from mpi4py import MPI
from time import perf_counter

The code imports necessary libraries, including os, numpy, sys, mpi4py, and time. mpi4py is used for MPI (Message Passing Interface) communication in parallel computing.

Check Operating System and Define Constants

isLinux = os.name == 'posix'

Determines if the operating system is Linux.

if isLinux:
   from resource import getrusage, RUSAGE_SELF

On Linux, it imports resource-related functions to measure resource usage.

Command Line Arguments

exponent = int(argv[1])
isInt = bool(argv[2])

Reads command line arguments: exponent and isInt. These are used to define the size of the matrices and the type of matrix elements.

MPI Initialization

comm = MPI.COMM_WORLD  # get the communicator object
size = comm.Get_size() # total number of processes
rank = comm.Get_rank() # rank of this process

Initializes MPI communication, retrieves the total number of processes, and the rank of the current process.

Matrix Initialization on Rank 0

if rank == 0:
   MATRIX_SIZE = 2**exponent
   # generate two random matrices of size MATRIX_SIZE
   # initialize the matrices A, B, and C
   # matrices A and B contain random integers or floats based on 'isInt'
   matrix_A = ...
   matrix_B = ...
   matrix_C = np.zeros((MATRIX_SIZE, MATRIX_SIZE), dtype=int) if isInt else np.zeros((MATRIX_SIZE, MATRIX_SIZE))
   data = (MATRIX_SIZE, matrix_A, matrix_B, matrix_C)
else:
   data = None

On the master process (rank 0), it initializes matrices A, B, and C, and packs the data into a tuple (data). Other processes receive None.

Broadcast Data to All Processes

data = comm.bcast(data, root=0)  # broadcast the data to all processes
MATRIX_SIZE, matrix_A, matrix_B, matrix_C = data  # unpack the data

All processes receive the data using MPI broadcast.

Matrix Multiplication Loop

start_time = perf_counter()
for x in range(MATRIX_SIZE):
   if rank == x % size:
       for i in range(MATRIX_SIZE):
           y = (x + i) % MATRIX_SIZE
           matrix_C[i] += matrix_A[i, y] * matrix_B[y]

Each process calculates a row of the matrix C. The rows are distributed among processes, and the multiplication is performed locally.

MPI Allreduce

comm.Allreduce(MPI.IN_PLACE, matrix_C, op=MPI.SUM)

MPI Allreduce function is used to sum the rows of matrix C calculated by each process.

Print Results

if rank == 0:
   print(perf_counter() - start_time)
   print(getrusage(RUSAGE_SELF).ru_maxrss) if isLinux else print(0)

The master process prints the execution time and the RAM memory usage if the operating system is Linux.

Time and Memory Measurement

This code measures the execution time and memory consumption using the time and resource modules. Let's break down how the time and memory measurements are implemented:

Time Measurement

The perf_counter function from the time module is used to measure the execution time.

start_time = perf_counter()
# ... (code to be measured)
print(perf_counter() - start_time)
  • start_time is recorded using perf_counter() before the code to be measured.
  • The execution time is calculated by subtracting start_time from the current time after the code has executed.
  • The result is printed, representing the time taken for the code to run.

Memory Consumption Measurement (RAM)

getrusage only works in linux/unix and FreeBSD operative systems based, therefore, this functionality is disabled in windows. For more compatibility information check the resource library docs.

The getrusage function from the resource module is used to measure the maximum resident set size, which is an indicator of memory consumption.

print(getrusage(RUSAGE_SELF).ru_maxrss) if isLinux else print(0)
  • getrusage(RUSAGE_SELF).ru_maxrss retrieves the maximum consumption of RAM used by the process.
  • This value is printed only if the operating system is Linux (isLinux is True); otherwise, it prints 0.

Charts Generator

We have used the matplotlib and manim library to represent the comparison of the execution times and memory usage of the Fox algorithm with matrices of either integers or floats.

Matplotlib

Calculate Mean Times and Memory Usage

times_MPI, times_SEC, memory_MPI, memory_SEC = data(min_e, max_e, isInt=isInt)

Calls the data function to obtain execution times and memory usage for sequential and parallel executions.

Prepare Data for Plotting

y_Time_MPI   = [np.mean(times_MPI [exponent]) for exponent in times_MPI ]
y_Time_SEC   = [np.mean(times_SEC [exponent]) for exponent in times_SEC ]
y_Memory_MPI = [np.mean(memory_MPI[exponent]) for exponent in memory_MPI]
y_Memory_SEC = [np.mean(memory_SEC[exponent]) for exponent in memory_SEC]

Calculates the mean times and memory usage for both MPI and sequential executions.

Configure Bar Chart Parameters

bar_w = 0.35
x_Time_MPI   = np.array([exponent - bar_w/2 for exponent in times_MPI])
x_Time_SEC   = np.array([exponent + bar_w/2 for exponent in times_SEC])
x_Memory_MPI = np.array([exponent - bar_w/2 for exponent in memory_MPI])
x_Memory_SEC = np.array([exponent + bar_w/2 for exponent in memory_SEC])

Sets parameters for the bar chart, including bar width and positions for both time and memory plots.

Create Matplotlib Figure

fig, (Time_ax, Memory_ax) = plt.subplots(1, 2, figsize=(20, 5)) if isLinux else plt.subplots(1, 1, figsize=(10, 5))

Creates a Matplotlib figure with one or two subplots, depending on whether the operating system is Linux.

Configure Plot Titles and Labels

fig.suptitle(f'Comparison of Fox Algorithm Executions with {num_type} numbers.\n{PROCESSOR} - {nthreads} Threads - {RAM}GB')
Time_ax.set_title('Sequential and Parallel Execution Time Comparison')
Time_ax.set_xlabel('Matrix Order')
Time_ax.set_ylabel('Execution Time (s)')

Sets titles and labels for the plots.

Plot Execution Time Bars

bars_Time_MPI = Time_ax.bar(x_Time_MPI, y_Time_MPI, bar_w, label='MPI', color='#EA75FA')
bars_Time_SEC = Time_ax.bar(x_Time_SEC, y_Time_SEC, bar_w, label='Sequential', color='#4590FA')

Plots bars for execution times for both MPI and sequential executions.

Plot Memory Usage Bars

In the background we use getrusage. This function only works in linux/unix and FreeBSD operative systems based, therefore, this functionality is disabled in windows. For more compatibility information check the resource library docs.

if isLinux:
   Memory_ax.set_title('Memory Usage (RAM) Comparison')
   Memory_ax.set_xlabel('Matrix Order')
   Memory_ax.set_ylabel('Memory Used (MB)')
   bars_Memory_MPI = Memory_ax.bar(x_Memory_MPI, y_Memory_MPI, bar_w, label='MPI', color='#EA75FA')
   bars_Memory_SEC = Memory_ax.bar(x_Memory_SEC, y_Memory_SEC, bar_w, label='Sequential', color='#4590FA')

Plots bars for memory usage for both MPI and sequential executions.

Display Labels and Legends

for bar in bars_Time_SEC + bars_Time_MPI:
   x_pos   = bar.get_x() + bar.get_width()/2.0
   y_value = bar.get_height()
   Time_ax.text(x_pos, y_value, f'{y_value:5.5}s', va='bottom', ha='center')
Time_ax.legend()

if isLinux:
   for bar in bars_Memory_SEC + bars_Memory_MPI:
       x_pos   = bar.get_x() + bar.get_width()/2.0
       y_value = bar.get_height()
       Memory_ax.text(x_pos, y_value, f'{(y_value/1024):5.5}MB', va='bottom', ha='center')
   Memory_ax.legend()

Displays labels and legends on the plots.

Save Plots

plt.tight_layout()
fig.savefig(create_dir('charts', f'{2**(min_e)}-{2**(max_e-1)}', 'int' if isInt else 'float'))

Saves the plots to a file.

Manim

This Manim script defines a scene that creates an animated bar chart using data from different matrix multiplication scenarios. The script loops over various parameters, generates animations, and saves the results in an organized directory structure.

Manim Scene Class

class chart(Scene):
   def construct(self):

Defines a Manim scene class named chart.

Configure Frame Dimensions

config["frame_width"] = 12
config["frame_height"] = 8

Sets the width and height of the animation frame.

Create Bar Chart

chart = BarChart(
   values = np.zeros(len(data)),
   y_range=[0, round(max_v + max_v/8,2), round(max_v/4,2)],
   x_length=16,
   y_length=6,
   bar_width=0.8,
   bar_names=[2**exp for exp in EXP],
   y_axis_config={"font_size": 24},
   x_axis_config={"font_size": 24},
).scale(0.7)

Creates a bar chart using Manim's BarChart class with specified configurations.

Create Axes Labels

x_label = chart.get_x_axis_label('Matrix Size', edge=DOWN, direction=DOWN, buff=0.05).scale(0.6)
y_l = 'Time (s)' if MEASURE == 'time_mean' else 'Memory (MB)'
y_label = chart.get_y_axis_label(y_l, edge=UP, direction=UP, buff=-1).scale(0.6).rotate(PI/2).next_to(chart, LEFT)

Adds labels for the x-axis and y-axis.

Create Group for Chart and Labels

gv = VGroup(chart, x_label, y_label).to_edge(DOWN)

Groups the chart and labels together.

Create Title and Architecture Information

Title = f'Comparison of RAM used by the Fox algorithm with {dt} numbers using the {meth} method' if MEASURE == 'memory_mean' else f'Comparison of execution time of the Fox algorithm with {dt} numbers using the {meth} method'
t_top = Tex(Title).scale(0.75).to_edge(UP)
architecture = Tex(f'{PROCESSOR} {THREADS} {RAM} DDR4 3200MHz', color='purple_a').scale(0.6).next_to(t_top, DOWN)

Creates a title and information about the system architecture.

Display Elements

self.play(Create(gv), Create(t_top), Create(architecture))

Displays the grouped chart, labels, title, and architecture information.

Animate Chart Bars

self.play(chart.animate.change_bar_values(data), run_time=1.5)

Animates the bars in the chart based on the data.

Display Bar Labels

self.play(Create(chart.get_bar_labels(font_size=30, color=WHITE)))

Displays labels for the bars in the chart.

Save and Display Animation

self.wait(3)

Waits for 3 seconds before completing the animation.

Main Execution Loop

if __name__ == '__main__':
    # Loop over different parameters (DTYPE, METHOD, EXP) and generate animations for each case.

Uses nested loops to iterate over different data parameters and create animations for each case.

Move Generated Files

shutil.move('manim/media/images/chart.png', f'{path}/{file}.png')
shutil.move('manim/media/videos/720p30/chart.mp4', f'{path}/{file}.mp4')

Moves the generated image and video files to a specified directory.

Results

AMD Ryzen 5 3600

Characteristics

  • 6 Core Processor 12 Threads
  • 16GB RAM DDR3
  • Windows 11

Float

Float Sequential Time

Float MPI Time

exponenttime_meanmemory_meant1t2t3t4t5m1m2m3m4m5
6 MPI0.00305518000095617030.00.0034430999949108810.0035917000059271230.0026457999920239670.00240620000113267450.00318910001078620550.00.00.00.00.0
7 MPI0.0121475800027837970.00.0087016000034054740.0074114000017289070.0116362000117078420.0126347000041278080.0203539999929489570.00.00.00.00.0
8 MPI0.047850700002163650.00.063282799994340170.068042900005821140.044276300002820790.0316117000038502740.0320398000039858740.00.00.00.00.0
9 MPI0.150076639998587780.00.140424199998960830.149269699992146340.178173799999058250.139485599996987730.143029900005785750.00.00.00.00.0
10 MPI0.72852932000241710.00.70565750000241680.73111819999758150.72296020000067070.7416932000051020.74121750000631440.00.00.00.00.0
11 MPI4.2200994399987390.04.4612885999958964.1565965000045254.1576512999890844.1546249000093674.1703358999948250.00.00.00.00.0
12 MPI30.018965879999450.029.8808261999947730.0879026999900830.04735210000944830.12382789999537629.9549205000075740.00.00.00.00.0
13 MPI236.67505233999690.0237.7994949999993235.69214719999582236.10693409999658236.1462773999956237.630407999997260.00.00.00.00.0
6 SEC0.029381759997340850.00.0295653999928617850.0285342999995918940.0288250000012340020.0300177999888546760.0299663000041618940.00.00.00.00.0
7 SEC0.06880043999699410.00.060923900004127060.0610880000022007150.073839099990436810.064722199997049760.083428999991156160.00.00.00.00.0
8 SEC0.23220374000084120.00.27957880000758450.220144799997797240.23787890000676270.211738600002718160.211677599989343430.00.00.00.00.0
9 SEC0.91525499999988820.00.9196588999911910.91549749999830960.91782800000510180.90526720001071230.91802339999412650.00.00.00.00.0
10 SEC4.2414400199981180.04.1964372999937044.211148000002144.2525618999934524.3470263000053824.2000265999959080.00.00.00.00.0
11 SEC22.6003117999993250.022.4882649999926822.7749230000044922.64343920000828822.54217909999715622.5527526999940160.00.00.00.00.0
12 SEC133.526420859998320.0133.1521122999984133.94261509999342133.40880069999548133.72657189999882133.40200430000550.00.00.00.00.0
13 SEC867.60510434000170.0861.1038434999937863.3430384000094882.7370500000106865.974721999999864.86686779999580.00.00.00.00.0

Integer

Integer Sequential Time

Integer MPI Time

exponenttime_meanmemory_meant1t2t3t4t5m1m2m3m4m5
6 MPI0.0026893000002019110.00.00237280000146711250.00313109999842708930.0031932999991113320.0023044999979902060.0024448000040138140.00.00.00.00.0
7 MPI0.0081290800022543410.00.0073570000022300520.0070352000038838010.0107852000001003030.0073534000039217060.0081146000011358410.00.00.00.00.0
8 MPI0.0290781600007903770.00.035054100000706970.024487700000463520.028022300000884570.0286545999988447870.0291721000030520370.00.00.00.00.0
9 MPI0.143465039998409340.00.135133299998415170.15674719999515220.13235970000096130.145035499997902660.148049499999615360.00.00.00.00.0
10 MPI0.71030645999999250.00.70790919999853940.73687069999868980.70450279999931810.71581860000151210.68643100000190320.00.00.00.00.0
11 MPI4.1973816199999420.04.14381149999826454.21077299999888064.1552248000007244.2933555000054184.1837432999964220.00.00.00.00.0
12 MPI29.8978940400018480.030.0669922000015530.01674579999962629.88980450000235629.8393082000038729.6766195000018340.00.00.00.00.0
13 MPI235.111447040000340.0236.3504993000024234.47165250000398233.7182845999996233.5796835000001237.437115299995640.00.00.00.00.0
6 SEC0.0292828800011193380.00.0307144000034895730.0291313999987323770.0293040000033215620.0286360000027343630.0286285999973188150.00.00.00.00.0
7 SEC0.0603334999992512150.00.060738000000128520.059981599995808210.060192199998709840.0604928000029758550.060262899998633660.00.00.00.00.0
8 SEC0.204144600003201060.00.201372000003175340.20146080000267830.203067500006000050.20439890000125160.210423800002899950.00.00.00.00.0
9 SEC0.89347962000174450.00.89701330000389130.89097250000486380.89462569999886910.89273020000109680.89205640000000130.00.00.00.00.0
10 SEC4.0997622199996840.04.09905539999454054.084894999999964.1083941000033524.10301729999628154.1034493000042860.00.00.00.00.0
11 SEC22.426186499999310.022.35521049999806622.4143879000039322.4038996000017522.53602589999354722.4214085999992680.00.00.00.00.0
12 SEC133.97278387999830.0134.05730229999608133.69151380000403134.34823799999867134.0702519999977133.696613299995080.00.00.00.00.0
13 SEC868.04116364000220.0864.2142595000041865.9321351000035870.7643005000064871.6217934000015867.67332969999550.00.00.00.00.0

AMD Ryzen 3 5300U with Radeon Graphics

Characteristics

  • 4 Threads
  • 8 GB RAM
  • Ubuntu 20.01 LTS

Float

Float Sequential Time

Float Sequential Memory

Float MPI Time

Float MPI Memory

exponenttime_meanmemory_meant1t2t3t4t5m1m2m3m4m5
6 MPI0.00688241399984690444827.20.006160767999972450.0075712679999924150.0091208499998174370.0057883129993570040.00577087100009521244516.044976.044856.044984.044804.0
7 MPI0.02621681200034845505.60.02353088900053990.030572395000490360.0302958709999074930.0232326150007793330.02345229000002291345340.045380.045916.045408.045484.0
8 MPI0.0710334415998659148777.60.06442596699980640.074588970000149860.064091004000147220.082473239999671930.0695880269995541348832.048752.048840.048740.048724.0
9 MPI0.3151936476000628560998.40.30904532899967310.31053867399987210.3232623320000130.3280399640007090.30508193900004760808.061292.061040.061132.060720.0
10 MPI1.9657097786001032110150.41.9566007160001391.9724600889994691.9677149290000671.9639821250002571.9677910340005837110216.0110316.0109880.0110424.0109916.0
11 MPI15.086349821199837306739.215.09783160899951315.0621904830004515.08103473299979715.1194803359994715.071211944999959306568.0306676.0306928.0306700.0306824.0
12 MPI118.096014119399841093108.0117.49792914299996117.55301655800031117.4915157659998120.29619238499981117.641416744999331093068.01093104.01093056.01093292.01093020.0
6 SEC0.02014490380006464134108.00.0201554820005185320.020167664999462430.020308080999711820.01965200900031050.020441282000319916134108.0134108.0134108.0134108.0134108.0
7 SEC0.050219269600347616134108.00.049544436000360290.049978838000242830.0510603160000755450.050306920000366520.05020583800069289134108.0134108.0134108.0134108.0134108.0
8 SEC0.17468092960007198134108.00.17341186999965430.173800725999171850.17398194900033560.175968498000656840.1762416050005413134108.0134108.0134108.0134108.0134108.0
9 SEC0.8327643823999097134108.00.82146007899973480.83393793100003680.83753140099997840.84238352400006990.8285089769997285134108.0134108.0134108.0134108.0134108.0
10 SEC4.5236622611999335134108.04.5045095990008124.536391652000024.5177550449998314.5367356729993844.522919336999621134108.0134108.0134108.0134108.0134108.0
11 SEC25.990362968200134134108.026.09920662700005826.007437245000826.02876390300025425.856019145999225.960387920000358134108.0134108.0134108.0134108.0134108.0
12 SEC168.52421300860004426997.6168.4287555890005167.8469278450002168.67190678299994169.53767954900013168.1357952769995426988.0426992.0426944.0426948.0427116.0

Integer

Integer Sequential Time

Integer Sequential Memory

Integer MPI Time

Integer MPI Memory

exponenttime_meanmemory_meant1t2t3t4t5m1m2m3m4m5
6 MPI0.005505195800014944852.00.00315173999979370.00573806300008070.00574684000002880.00400651700010710.00888281900006444136.044744.045052.045140.045188.0
7 MPI0.027289842000118345420.80.02315024700010320.02341501000000780.02921513100000080.03484493700034360.025823885000136245544.044824.045760.045392.045584.0
8 MPI0.073961638800028648340.00.07291026100028830.08112366199975440.08582476900028260.06329018499991430.066659316999903248452.048448.047740.048440.048620.0
9 MPI0.323901389999991860753.60.33502739999994440.32625286499978750.31300125000007030.33365519500011940.311570240000037360692.060584.060768.060932.060792.0
10 MPI1.9734662450000544109984.81.97455656800002481.9738934460001481.97081187599997091.9697811229998481.97828821200028109884.0110200.0109584.0110072.0110184.0
11 MPI15.107547805800005306612.815.1171799120002115.07320491699965515.1107372450001115.11220300800005115.124413946999994306548.0306824.0306568.0306516.0306608.0
12 MPI117.229145200400131093093.6117.3144714117.3090739270001117.34079946900056117.31188550499974116.869495701000251093016.01092912.01093332.01093060.01093148.0
6 SEC0.019605919799960290328.00.01897243699977480.01967908100004930.0203757969998150.01940697300005920.019595311000102890328.090328.090328.090328.090328.0
7 SEC0.049217039999984990328.00.04837420100011510.04938016599999170.04930457099999330.04966163900007810.049364622999746690328.090328.090328.090328.090328.0
8 SEC0.174486473600063690328.00.17725510200034470.17333942599998410.17357615199989590.17391980399997920.17434188400011490328.090328.090328.090328.090328.0
9 SEC0.831870109600185990328.00.83061390000011670.84018698500040050.83812962299998620.82367606200023150.826743978000195190328.090328.090328.090328.090328.0
10 SEC4.52091578340005190328.04.5182786250002214.5152608059997874.5177351090001144.4886141040001354.564690272999996590328.090328.090328.090328.090328.0
11 SEC26.019327671799875131990.425.92774936399973725.8835637150000425.88751162600010526.45281803299985825.944995620999634131952.0131952.0131952.0132144.0131952.0
12 SEC169.48952086539992426860.8167.03305509500024169.1147969169997168.76519542999995176.81053264399998165.72402424099982426904.0426864.0426868.0426860.0426808.0

Installation

This chapter explains how to test this implementation on both unix/linux and Windows based systems.

Linux Installation

For this installation we used:

  • Arch Linux 2023.11.1 and Ubuntu 20.04 LTS
  • Python 3.11.5
  • pip 23.3.1

First, clone the github repository:

git clone https://github.com/miguehm/fox-algorithm.git && cd fox-algorithm

or alternatively, download the zip file and extract it:

Then, create a virtual environment

python3 -m venv venv

activate the virtual environment

source ./venv/bin/activate # for linux

and install requirements

pip install -r requirements.txt

Windows installation

For this installation we used:

  • Windows 11 Nov 2023 update
  • Python 3.11.5
  • pip 23.3.1

You need to have MPI installed, you can download the installer from the microsoft official page.

First, clone the github repository:

git clone https://github.com/miguehm/fox-algorithm.git && cd fox-algorithm

or alternatively, download the zip file and extract it:

Then, create a virtual environment

python -m venv venv

activate the virtual environment

.\venv\Scripts\Activate

and install requirements

pip install -r requirements.txt

Usage

WARNING: The program will execute the algorithm for all the matrix sizes \(2^x\ \forall\ x \in [n, m]\). For values of \(n\) and \(m\) greater than 11, the execution time will be very long. For example, for \(n=11\) and \(m=12\), the execution time will be approximately 1 hour.

python3 is for linux/unix OS based, if you have windows use python instead.

Help

python3 src/main.py --help

Output

Usage: main.py [OPTIONS]                        
                                                 
╭─ Options ─────────────────────────────────────╮
│ --help          Show this message and exit.   │
╰───────────────────────────────────────────────╯
╭─ Matrix order range ──────────────────────────╮
│ --min-exp        INTEGER  Exponent of base 2  │
│                           matrix order (2^)   │
│                           [default: 6]        │
│ --max-exp        INTEGER  Exponent of base 2  │
│                           matrix order (2^)   │
│                           [default: 13]       │
╰───────────────────────────────────────────────╯
╭─ MPI options ─────────────────────────────────╮
│ --threads        INTEGER  Number of cores to  │
│                           use                 │
│                           [default: 4]        │
╰───────────────────────────────────────────────╯

Default execution

python3 src/main.py

The execution will be done for the matrix of size \(2^6\) with 4 threads.

Multiple matrix multiplication by range

python3 src/main.py --min-exp 6 --max-exp 13

The charts with the results will be created in the charts folder.

Specify threads number

python3 src/main.py --threads 4

You can find out the number of threads available on your computer using top command.