MPI Fox Algorithm

Implementation of the Fox algorithm for square matrix multiplication using MPI

Fox algorithm

The objective of this project is to show the execution time and RAM usage of the algorithm for different matrix sizes and number of processes.

The program will run the algorithm for all the matrix sizes \( 2^x\ \forall\ x \in [n, m] \) five times and will calculate the average execution time and RAM usage. The results will be used to generate a graph that will be saved in the charts folder.

Developed by

José Felipe Marcial Rojas. Github
Miguel Ángel Hernández Moreno. Github

TO DO

Add RAM measurement for Windows.
Create CSV files with the results.
Add a script to generate the charts from the CSV files.
Add charts to docs page.

References

Dean, N. (2017). Matrix Multiplication. https://www.cs.csi.cuny.edu/~gu/teaching/courses/csc76010/slides/Matrix%20Multiplication%20by%20Nur.pdf
mpi4py. (2023). https://mpi4py.readthedocs.io/en/stable/
Matplotlib (2023) https://matplotlib.org/
Manim (2023) https://docs.manim.community/en/stable/

How It Works

This chapter is a detailed explanation of how the algorithm works, its implementation in Python and its dependencies, how to measure the time and ram memory consumption, as well as its graphing.

Algorithm Explanation

Checkerboard

Most parallel matrix multiplication functions use a checkerboard distribution of the matrices. This means that the processes are viewed as a grid, and, rather than assigning entire rows or entire columns to each process, we assign small sub-matrices. For example, if we have four processes, we might assign the element of a 4x4 matrix as shown below, checkerboard mapping of a 4x4 matrix to four processes.

Fox Algorithm

Fox‘s algorithm is a one that distributes the matrix using a checkerboard scheme like the above.

In order to simplify the discussion, lets assume that the matrices have order \(n\), and the number of processes, \(p\), equals \(n^2\). Then a checkerboard mapping assigns \(a_{ij}\), \(b_{ij}\) , and \(c_{ij}\) to process (\(i\), \(j\)).

Fox‘s algorithm takes \(n\) stages for matrices of order n one stage for each term aik bkj in the dot product

\(C_{i,j} = a_{i,0}b_{0,j} + a_{i,1}b_{1,i} +. . . +a_{i,n−1}b_{n−1,j}\)

Initial stage, each process multiplies the diagonal entry of A in its process row by its element of B:

Stage \(0\) on process\((i, j)\): \(c_{i,j} = a_{i,i} b_{i,j}\)

Next stage, each process multiplies the element immediately to the right of the diagonal of A by the element of B directly beneath its own element of B:

Stage \(1\) on process\((i, j)\): \(c_{i,j} = a_{i,i+1} b_{i+1,j}\)

In general, during the kth stage, each process multiplies the element k columns to the right of the diagonal of A by the element \(k\) rows below its own element of B:

Stage \(k\) on process\((i, j)\): \(c_{i,j} = a_{i,i+k} b_{i+k,j}\)

Example 2x2 Fox's Algorithm

Consider multiplying 2x2 block matrices \(A\), \(B\), \(C\):

\[ \begin{aligned} A \times B &= C \\ \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \times \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} &= \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix} \end{aligned} \]

Stage 0:

Process (i, i mod 2)	Broadcast along row i
(0,0)	\(a_{00}\)
(1,1)	\(a_{11}\)

\[ \begin{bmatrix} a_{00}, b_{00} & a_{00}, b_{01}\\ a_{11}, b_{10} & a_{11}, b_{11}\\ \end{bmatrix} \]

Process (\(i\), \(j\)) computes:

\[ \begin{aligned} C_{\text{Stage 0}} = \begin{bmatrix} 1 \times 1 & 1 \times 1 \\ 1 \times 1 & 1 \times 1 \end{bmatrix} &= \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \end{aligned} \]

Shift-rotate on the columns of \(B\)

Stage 1:

Process (\(i\), (\(i\) + 1) mod 2)	Broadcast along row \(i\)
(0,1)	\(a_{01}\)
(1,0)	\(a_{10}\)

\[ \begin{matrix} a_{01}, b_{10} & a_{01}, b_{11}\\ a_{10}, b_{00} & a_{10}, b_{01}\\ \end{matrix} \]

Process (\(i\), \(j\)) computes:

\[ \begin{aligned} C_{\text{Stage 1}} = \begin{bmatrix} 1+1 \times 1 & 1+1 \times 1 \\ 1+1 \times 1 & 1+1 \times 1 \end{bmatrix} &= \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix} \end{aligned} \]

Fox in Secuencial

Sequential implementation of the Fox matrix multiplication algorithm. It initializes matrices, performs the multiplication, prints execution time and resource usage. The progress bar (tqdm) visualizes the iteration progress.

Imports

import os
import numpy as np
from sys import argv
from tqdm import tqdm
from time import perf_counter

We import necessary libraries, including os, numpy, sys, tqdm (for progress bar), and time.

Check Operating System and Define Constants

isLinux = os.name == 'posix'
if isLinux:
   from resource import getrusage, RUSAGE_SELF

Determines if the operating system is Linux and imports resource-related functions for measuring resource usage.

Matrix Multiplication Function:**

def fox_SEC(exponent: int, isInt: bool) -> float:

Defines a function fox_SEC that takes the exponent and a flag isInt as parameters and returns the execution time.

Matrix Initialization

MATRIX_SIZE = 2**exponent
matrix_A = ...
matrix_B = ...
matrix_C = np.zeros((MATRIX_SIZE, MATRIX_SIZE), dtype=int) if isInt else np.zeros((MATRIX_SIZE, MATRIX_SIZE))

Initializes square matrices (A), (B), and (C) of size (2^{\text{exponent}} \times 2^{\text{exponent}}) with random integers or floats based on the isInt flag.

Matrix Multiplication Loop

start_time = perf_counter()

for x in tqdm(range(MATRIX_SIZE)):
   for i in range(MATRIX_SIZE):
       y = (x + i) % MATRIX_SIZE
       matrix_C[i] += matrix_A[i, y] * matrix_B[y]

Performs matrix multiplication using nested loops. It calculates the product matrix (C) by iterating through rows and columns.

Print Results

print(perf_counter() - start_time)
print(getrusage(RUSAGE_SELF).ru_maxrss) if isLinux else print(0)

Prints the execution time and the RAM usage if the operating system is Linux.

Main Execution

if __name__ == '__main__':
   fox_SEC(int(argv[1]), bool(argv[2]))

Calls the fox_SEC function with command-line arguments for exponent and isInt flag if we want matrices with integers or float type number.

Fox MPI

We use MPI to parallelize the matrix multiplication of randomly generated squared matrices A and B, and it calculates the product matrix C. The matrices are distributed among processes, and each process computes a portion of the final result. The MPI Allreduce function is used to combine the partial results from all processes.

Imports

import os
import numpy as np
from sys import argv
from mpi4py import MPI
from time import perf_counter

The code imports necessary libraries, including os, numpy, sys, mpi4py, and time. mpi4py is used for MPI (Message Passing Interface) communication in parallel computing.

Check Operating System and Define Constants

isLinux = os.name == 'posix'

Determines if the operating system is Linux.

if isLinux:
   from resource import getrusage, RUSAGE_SELF

On Linux, it imports resource-related functions to measure resource usage.

Command Line Arguments

exponent = int(argv[1])
isInt = bool(argv[2])

Reads command line arguments: exponent and isInt. These are used to define the size of the matrices and the type of matrix elements.

MPI Initialization

comm = MPI.COMM_WORLD  # get the communicator object
size = comm.Get_size() # total number of processes
rank = comm.Get_rank() # rank of this process

Initializes MPI communication, retrieves the total number of processes, and the rank of the current process.

Matrix Initialization on Rank 0

if rank == 0:
   MATRIX_SIZE = 2**exponent
   # generate two random matrices of size MATRIX_SIZE
   # initialize the matrices A, B, and C
   # matrices A and B contain random integers or floats based on 'isInt'
   matrix_A = ...
   matrix_B = ...
   matrix_C = np.zeros((MATRIX_SIZE, MATRIX_SIZE), dtype=int) if isInt else np.zeros((MATRIX_SIZE, MATRIX_SIZE))
   data = (MATRIX_SIZE, matrix_A, matrix_B, matrix_C)
else:
   data = None

On the master process (rank 0), it initializes matrices A, B, and C, and packs the data into a tuple (data). Other processes receive None.

Broadcast Data to All Processes

data = comm.bcast(data, root=0)  # broadcast the data to all processes
MATRIX_SIZE, matrix_A, matrix_B, matrix_C = data  # unpack the data

All processes receive the data using MPI broadcast.

Matrix Multiplication Loop

start_time = perf_counter()
for x in range(MATRIX_SIZE):
   if rank == x % size:
       for i in range(MATRIX_SIZE):
           y = (x + i) % MATRIX_SIZE
           matrix_C[i] += matrix_A[i, y] * matrix_B[y]

Each process calculates a row of the matrix C. The rows are distributed among processes, and the multiplication is performed locally.

MPI Allreduce

comm.Allreduce(MPI.IN_PLACE, matrix_C, op=MPI.SUM)

MPI Allreduce function is used to sum the rows of matrix C calculated by each process.

Print Results

if rank == 0:
   print(perf_counter() - start_time)
   print(getrusage(RUSAGE_SELF).ru_maxrss) if isLinux else print(0)

The master process prints the execution time and the RAM memory usage if the operating system is Linux.

Time and Memory Measurement

This code measures the execution time and memory consumption using the time and resource modules. Let's break down how the time and memory measurements are implemented:

Time Measurement

The perf_counter function from the time module is used to measure the execution time.

start_time = perf_counter()
# ... (code to be measured)
print(perf_counter() - start_time)

start_time is recorded using perf_counter() before the code to be measured.
The execution time is calculated by subtracting start_time from the current time after the code has executed.
The result is printed, representing the time taken for the code to run.

Memory Consumption Measurement (RAM)

getrusage only works in linux/unix and FreeBSD operative systems based, therefore, this functionality is disabled in windows. For more compatibility information check the resource library docs.

The getrusage function from the resource module is used to measure the maximum resident set size, which is an indicator of memory consumption.

print(getrusage(RUSAGE_SELF).ru_maxrss) if isLinux else print(0)

getrusage(RUSAGE_SELF).ru_maxrss retrieves the maximum consumption of RAM used by the process.
This value is printed only if the operating system is Linux (isLinux is True); otherwise, it prints 0.

Charts Generator

We have used the matplotlib and manim library to represent the comparison of the execution times and memory usage of the Fox algorithm with matrices of either integers or floats.

Matplotlib

Calculate Mean Times and Memory Usage

times_MPI, times_SEC, memory_MPI, memory_SEC = data(min_e, max_e, isInt=isInt)

Calls the data function to obtain execution times and memory usage for sequential and parallel executions.

Prepare Data for Plotting

y_Time_MPI   = [np.mean(times_MPI [exponent]) for exponent in times_MPI ]
y_Time_SEC   = [np.mean(times_SEC [exponent]) for exponent in times_SEC ]
y_Memory_MPI = [np.mean(memory_MPI[exponent]) for exponent in memory_MPI]
y_Memory_SEC = [np.mean(memory_SEC[exponent]) for exponent in memory_SEC]

Calculates the mean times and memory usage for both MPI and sequential executions.

Configure Bar Chart Parameters

bar_w = 0.35
x_Time_MPI   = np.array([exponent - bar_w/2 for exponent in times_MPI])
x_Time_SEC   = np.array([exponent + bar_w/2 for exponent in times_SEC])
x_Memory_MPI = np.array([exponent - bar_w/2 for exponent in memory_MPI])
x_Memory_SEC = np.array([exponent + bar_w/2 for exponent in memory_SEC])

Sets parameters for the bar chart, including bar width and positions for both time and memory plots.

Create Matplotlib Figure

fig, (Time_ax, Memory_ax) = plt.subplots(1, 2, figsize=(20, 5)) if isLinux else plt.subplots(1, 1, figsize=(10, 5))

Creates a Matplotlib figure with one or two subplots, depending on whether the operating system is Linux.

Configure Plot Titles and Labels

fig.suptitle(f'Comparison of Fox Algorithm Executions with {num_type} numbers.\n{PROCESSOR} - {nthreads} Threads - {RAM}GB')
Time_ax.set_title('Sequential and Parallel Execution Time Comparison')
Time_ax.set_xlabel('Matrix Order')
Time_ax.set_ylabel('Execution Time (s)')

Sets titles and labels for the plots.

Plot Execution Time Bars

bars_Time_MPI = Time_ax.bar(x_Time_MPI, y_Time_MPI, bar_w, label='MPI', color='#EA75FA')
bars_Time_SEC = Time_ax.bar(x_Time_SEC, y_Time_SEC, bar_w, label='Sequential', color='#4590FA')

Plots bars for execution times for both MPI and sequential executions.

Plot Memory Usage Bars

In the background we use getrusage. This function only works in linux/unix and FreeBSD operative systems based, therefore, this functionality is disabled in windows. For more compatibility information check the resource library docs.

if isLinux:
   Memory_ax.set_title('Memory Usage (RAM) Comparison')
   Memory_ax.set_xlabel('Matrix Order')
   Memory_ax.set_ylabel('Memory Used (MB)')
   bars_Memory_MPI = Memory_ax.bar(x_Memory_MPI, y_Memory_MPI, bar_w, label='MPI', color='#EA75FA')
   bars_Memory_SEC = Memory_ax.bar(x_Memory_SEC, y_Memory_SEC, bar_w, label='Sequential', color='#4590FA')

Plots bars for memory usage for both MPI and sequential executions.

Display Labels and Legends

for bar in bars_Time_SEC + bars_Time_MPI:
   x_pos   = bar.get_x() + bar.get_width()/2.0
   y_value = bar.get_height()
   Time_ax.text(x_pos, y_value, f'{y_value:5.5}s', va='bottom', ha='center')
Time_ax.legend()

if isLinux:
   for bar in bars_Memory_SEC + bars_Memory_MPI:
       x_pos   = bar.get_x() + bar.get_width()/2.0
       y_value = bar.get_height()
       Memory_ax.text(x_pos, y_value, f'{(y_value/1024):5.5}MB', va='bottom', ha='center')
   Memory_ax.legend()

Displays labels and legends on the plots.

Save Plots

plt.tight_layout()
fig.savefig(create_dir('charts', f'{2**(min_e)}-{2**(max_e-1)}', 'int' if isInt else 'float'))

Saves the plots to a file.

Manim

This Manim script defines a scene that creates an animated bar chart using data from different matrix multiplication scenarios. The script loops over various parameters, generates animations, and saves the results in an organized directory structure.

Manim Scene Class

class chart(Scene):
   def construct(self):

Defines a Manim scene class named chart.

Configure Frame Dimensions

config["frame_width"] = 12
config["frame_height"] = 8

Sets the width and height of the animation frame.

Create Bar Chart

chart = BarChart(
   values = np.zeros(len(data)),
   y_range=[0, round(max_v + max_v/8,2), round(max_v/4,2)],
   x_length=16,
   y_length=6,
   bar_width=0.8,
   bar_names=[2**exp for exp in EXP],
   y_axis_config={"font_size": 24},
   x_axis_config={"font_size": 24},
).scale(0.7)

Creates a bar chart using Manim's BarChart class with specified configurations.

Create Axes Labels

x_label = chart.get_x_axis_label('Matrix Size', edge=DOWN, direction=DOWN, buff=0.05).scale(0.6)
y_l = 'Time (s)' if MEASURE == 'time_mean' else 'Memory (MB)'
y_label = chart.get_y_axis_label(y_l, edge=UP, direction=UP, buff=-1).scale(0.6).rotate(PI/2).next_to(chart, LEFT)

Adds labels for the x-axis and y-axis.

Create Group for Chart and Labels

gv = VGroup(chart, x_label, y_label).to_edge(DOWN)

Groups the chart and labels together.

Create Title and Architecture Information

Title = f'Comparison of RAM used by the Fox algorithm with {dt} numbers using the {meth} method' if MEASURE == 'memory_mean' else f'Comparison of execution time of the Fox algorithm with {dt} numbers using the {meth} method'
t_top = Tex(Title).scale(0.75).to_edge(UP)
architecture = Tex(f'{PROCESSOR} {THREADS} {RAM} DDR4 3200MHz', color='purple_a').scale(0.6).next_to(t_top, DOWN)

Creates a title and information about the system architecture.

Display Elements

self.play(Create(gv), Create(t_top), Create(architecture))

Displays the grouped chart, labels, title, and architecture information.

Animate Chart Bars

self.play(chart.animate.change_bar_values(data), run_time=1.5)

Animates the bars in the chart based on the data.

Display Bar Labels

self.play(Create(chart.get_bar_labels(font_size=30, color=WHITE)))

Displays labels for the bars in the chart.

Save and Display Animation

self.wait(3)

Waits for 3 seconds before completing the animation.

Main Execution Loop

if __name__ == '__main__':
    # Loop over different parameters (DTYPE, METHOD, EXP) and generate animations for each case.

Uses nested loops to iterate over different data parameters and create animations for each case.

Move Generated Files

shutil.move('manim/media/images/chart.png', f'{path}/{file}.png')
shutil.move('manim/media/videos/720p30/chart.mp4', f'{path}/{file}.mp4')

Moves the generated image and video files to a specified directory.

Results

AMD Ryzen 5 3600

Characteristics

6 Core Processor 12 Threads
16GB RAM DDR3
Windows 11

exponent	time_mean	t1	t2	t3	t4	t5
6 MPI	0.0030551800009561703	0.003443099994910881	0.003591700005927123	0.002645799992023967	0.0024062000011326745	0.0031891000107862055
7 MPI	0.012147580002783797	0.008701600003405474	0.007411400001728907	0.011636200011707842	0.012634700004127808	0.020353999992948957
8 MPI	0.04785070000216365	0.06328279999434017	0.06804290000582114	0.04427630000282079	0.031611700003850274	0.032039800003985874
9 MPI	0.15007663999858778	0.14042419999896083	0.14926969999214634	0.17817379999905825	0.13948559999698773	0.14302990000578575
10 MPI	0.7285293200024171	0.7056575000024168	0.7311181999975815	0.7229602000006707	0.741693200005102	0.7412175000063144
11 MPI	4.220099439998739	4.461288599995896	4.156596500004525	4.157651299989084	4.154624900009367	4.170335899994825
12 MPI	30.01896587999945	29.88082619999477	30.08790269999008	30.047352100009448	30.123827899995376	29.954920500007574
13 MPI	236.6750523399969	237.7994949999993	235.69214719999582	236.10693409999658	236.1462773999956	237.63040799999726
6 SEC	0.02938175999734085	0.029565399992861785	0.028534299999591894	0.028825000001234002	0.030017799988854676	0.029966300004161894
7 SEC	0.0688004399969941	0.06092390000412706	0.061088000002200715	0.07383909999043681	0.06472219999704976	0.08342899999115616
8 SEC	0.2322037400008412	0.2795788000075845	0.22014479999779724	0.2378789000067627	0.21173860000271816	0.21167759998934343
9 SEC	0.9152549999998882	0.919658899991191	0.9154974999983096	0.9178280000051018	0.9052672000107123	0.9180233999941265
10 SEC	4.241440019998118	4.196437299993704	4.21114800000214	4.252561899993452	4.347026300005382	4.200026599995908
11 SEC	22.600311799999325	22.48826499999268	22.77492300000449	22.643439200008288	22.542179099997156	22.552752699994016
12 SEC	133.52642085999832	133.1521122999984	133.94261509999342	133.40880069999548	133.72657189999882	133.4020043000055
13 SEC	867.6051043400017	861.1038434999937	863.3430384000094	882.7370500000106	865.974721999999	864.8668677999958

exponent	time_mean	t1	t2	t3	t4	t5
6 MPI	0.002689300000201911	0.0023728000014671125	0.0031310999984270893	0.003193299999111332	0.002304499997990206	0.002444800004013814
7 MPI	0.008129080002254341	0.007357000002230052	0.007035200003883801	0.010785200000100303	0.007353400003921706	0.008114600001135841
8 MPI	0.029078160000790377	0.03505410000070697	0.02448770000046352	0.02802230000088457	0.028654599998844787	0.029172100003052037
9 MPI	0.14346503999840934	0.13513329999841517	0.1567471999951522	0.1323597000009613	0.14503549999790266	0.14804949999961536
10 MPI	0.7103064599999925	0.7079091999985394	0.7368706999986898	0.7045027999993181	0.7158186000015121	0.6864310000019032
11 MPI	4.197381619999942	4.1438114999982645	4.2107729999988806	4.155224800000724	4.293355500005418	4.183743299996422
12 MPI	29.897894040001848	30.06699220000155	30.016745799999626	29.889804500002356	29.83930820000387	29.676619500001834
13 MPI	235.11144704000034	236.3504993000024	234.47165250000398	233.7182845999996	233.5796835000001	237.43711529999564
6 SEC	0.029282880001119338	0.030714400003489573	0.029131399998732377	0.029304000003321562	0.028636000002734363	0.028628599997318815
7 SEC	0.060333499999251215	0.06073800000012852	0.05998159999580821	0.06019219999870984	0.060492800002975855	0.06026289999863366
8 SEC	0.20414460000320106	0.20137200000317534	0.2014608000026783	0.20306750000600005	0.2043989000012516	0.21042380000289995
9 SEC	0.8934796200017445	0.8970133000038913	0.8909725000048638	0.8946256999988691	0.8927302000010968	0.8920564000000013
10 SEC	4.099762219999684	4.0990553999945405	4.08489499999996	4.108394100003352	4.1030172999962815	4.103449300004286
11 SEC	22.42618649999931	22.355210499998066	22.41438790000393	22.40389960000175	22.536025899993547	22.421408599999268
12 SEC	133.9727838799983	134.05730229999608	133.69151380000403	134.34823799999867	134.0702519999977	133.69661329999508
13 SEC	868.0411636400022	864.2142595000041	865.9321351000035	870.7643005000064	871.6217934000015	867.6733296999955

AMD Ryzen 3 5300U with Radeon Graphics

Characteristics

4 Threads
8 GB RAM
Ubuntu 20.01 LTS

exponent	time_mean	memory_mean	t1	t2	t3	t4	t5	m1	m2	m3	m4	m5
6 MPI	0.006882413999846904	44827.2	0.00616076799997245	0.007571267999992415	0.009120849999817437	0.005788312999357004	0.005770871000095212	44516.0	44976.0	44856.0	44984.0	44804.0
7 MPI	0.026216812000348	45505.6	0.0235308890005399	0.03057239500049036	0.030295870999907493	0.023232615000779333	0.023452290000022913	45340.0	45380.0	45916.0	45408.0	45484.0
8 MPI	0.07103344159986591	48777.6	0.0644259669998064	0.07458897000014986	0.06409100400014722	0.08247323999967193	0.06958802699955413	48832.0	48752.0	48840.0	48740.0	48724.0
9 MPI	0.31519364760006285	60998.4	0.3090453289996731	0.3105386739998721	0.323262332000013	0.328039964000709	0.305081939000047	60808.0	61292.0	61040.0	61132.0	60720.0
10 MPI	1.9657097786001032	110150.4	1.956600716000139	1.972460088999469	1.967714929000067	1.963982125000257	1.9677910340005837	110216.0	110316.0	109880.0	110424.0	109916.0
11 MPI	15.086349821199837	306739.2	15.097831608999513	15.06219048300045	15.081034732999797	15.11948033599947	15.071211944999959	306568.0	306676.0	306928.0	306700.0	306824.0
12 MPI	118.09601411939984	1093108.0	117.49792914299996	117.55301655800031	117.4915157659998	120.29619238499981	117.64141674499933	1093068.0	1093104.0	1093056.0	1093292.0	1093020.0
6 SEC	0.02014490380006464	134108.0	0.020155482000518532	0.02016766499946243	0.02030808099971182	0.0196520090003105	0.020441282000319916	134108.0	134108.0	134108.0	134108.0	134108.0
7 SEC	0.050219269600347616	134108.0	0.04954443600036029	0.04997883800024283	0.051060316000075545	0.05030692000036652	0.05020583800069289	134108.0	134108.0	134108.0	134108.0	134108.0
8 SEC	0.17468092960007198	134108.0	0.1734118699996543	0.17380072599917185	0.1739819490003356	0.17596849800065684	0.1762416050005413	134108.0	134108.0	134108.0	134108.0	134108.0
9 SEC	0.8327643823999097	134108.0	0.8214600789997348	0.8339379310000368	0.8375314009999784	0.8423835240000699	0.8285089769997285	134108.0	134108.0	134108.0	134108.0	134108.0
10 SEC	4.5236622611999335	134108.0	4.504509599000812	4.53639165200002	4.517755044999831	4.536735672999384	4.522919336999621	134108.0	134108.0	134108.0	134108.0	134108.0
11 SEC	25.990362968200134	134108.0	26.099206627000058	26.0074372450008	26.028763903000254	25.8560191459992	25.960387920000358	134108.0	134108.0	134108.0	134108.0	134108.0
12 SEC	168.52421300860004	426997.6	168.4287555890005	167.8469278450002	168.67190678299994	169.53767954900013	168.1357952769995	426988.0	426992.0	426944.0	426948.0	427116.0

exponent	time_mean	memory_mean	t1	t2	t3	t4	t5	m1	m2	m3	m4	m5
6 MPI	0.0055051958000149	44852.0	0.0031517399997937	0.0057380630000807	0.0057468400000288	0.0040065170001071	0.008882819000064	44136.0	44744.0	45052.0	45140.0	45188.0
7 MPI	0.0272898420001183	45420.8	0.0231502470001032	0.0234150100000078	0.0292151310000008	0.0348449370003436	0.0258238850001362	45544.0	44824.0	45760.0	45392.0	45584.0
8 MPI	0.0739616388000286	48340.0	0.0729102610002883	0.0811236619997544	0.0858247690002826	0.0632901849999143	0.0666593169999032	48452.0	48448.0	47740.0	48440.0	48620.0
9 MPI	0.3239013899999918	60753.6	0.3350273999999444	0.3262528649997875	0.3130012500000703	0.3336551950001194	0.3115702400000373	60692.0	60584.0	60768.0	60932.0	60792.0
10 MPI	1.9734662450000544	109984.8	1.9745565680000248	1.973893446000148	1.9708118759999709	1.969781122999848	1.97828821200028	109884.0	110200.0	109584.0	110072.0	110184.0
11 MPI	15.107547805800005	306612.8	15.11717991200021	15.073204916999655	15.11073724500011	15.112203008000051	15.124413946999994	306548.0	306824.0	306568.0	306516.0	306608.0
12 MPI	117.22914520040013	1093093.6	117.3144714	117.3090739270001	117.34079946900056	117.31188550499974	116.86949570100025	1093016.0	1092912.0	1093332.0	1093060.0	1093148.0
6 SEC	0.0196059197999602	90328.0	0.0189724369997748	0.0196790810000493	0.020375796999815	0.0194069730000592	0.0195953110001028	90328.0	90328.0	90328.0	90328.0	90328.0
7 SEC	0.0492170399999849	90328.0	0.0483742010001151	0.0493801659999917	0.0493045709999933	0.0496616390000781	0.0493646229997466	90328.0	90328.0	90328.0	90328.0	90328.0
8 SEC	0.1744864736000636	90328.0	0.1772551020003447	0.1733394259999841	0.1735761519998959	0.1739198039999792	0.174341884000114	90328.0	90328.0	90328.0	90328.0	90328.0
9 SEC	0.8318701096001859	90328.0	0.8306139000001167	0.8401869850004005	0.8381296229999862	0.8236760620002315	0.8267439780001951	90328.0	90328.0	90328.0	90328.0	90328.0
10 SEC	4.520915783400051	90328.0	4.518278625000221	4.515260805999787	4.517735109000114	4.488614104000135	4.5646902729999965	90328.0	90328.0	90328.0	90328.0	90328.0
11 SEC	26.019327671799875	131990.4	25.927749363999737	25.88356371500004	25.887511626000105	26.452818032999858	25.944995620999634	131952.0	131952.0	131952.0	132144.0	131952.0
12 SEC	169.48952086539992	426860.8	167.03305509500024	169.1147969169997	168.76519542999995	176.81053264399998	165.72402424099982	426904.0	426864.0	426868.0	426860.0	426808.0

Installation

This chapter explains how to test this implementation on both unix/linux and Windows based systems.

Linux Installation

For this installation we used:

Arch Linux 2023.11.1 and Ubuntu 20.04 LTS
Python 3.11.5
pip 23.3.1

First, clone the github repository:

git clone https://github.com/miguehm/fox-algorithm.git && cd fox-algorithm

or alternatively, download the zip file and extract it:

Then, create a virtual environment

python3 -m venv venv

activate the virtual environment

source ./venv/bin/activate # for linux

and install requirements

pip install -r requirements.txt

Windows installation

For this installation we used:

Windows 11 Nov 2023 update
Python 3.11.5
pip 23.3.1

You need to have MPI installed, you can download the installer from the microsoft official page.

First, clone the github repository:

git clone https://github.com/miguehm/fox-algorithm.git && cd fox-algorithm

or alternatively, download the zip file and extract it:

Then, create a virtual environment

python -m venv venv

activate the virtual environment

.\venv\Scripts\Activate

and install requirements

pip install -r requirements.txt

Usage

WARNING: The program will execute the algorithm for all the matrix sizes \(2^x\ \forall\ x \in [n, m]\). For values of \(n\) and \(m\) greater than 11, the execution time will be very long. For example, for \(n=11\) and \(m=12\), the execution time will be approximately 1 hour.

python3 is for linux/unix OS based, if you have windows use python instead.

Help

python3 src/main.py --help

Output

Usage: main.py [OPTIONS]                        
                                                 
╭─ Options ─────────────────────────────────────╮
│ --help          Show this message and exit.   │
╰───────────────────────────────────────────────╯
╭─ Matrix order range ──────────────────────────╮
│ --min-exp        INTEGER  Exponent of base 2  │
│                           matrix order (2^)   │
│                           [default: 6]        │
│ --max-exp        INTEGER  Exponent of base 2  │
│                           matrix order (2^)   │
│                           [default: 13]       │
╰───────────────────────────────────────────────╯
╭─ MPI options ─────────────────────────────────╮
│ --threads        INTEGER  Number of cores to  │
│                           use                 │
│                           [default: 4]        │
╰───────────────────────────────────────────────╯

Default execution

python3 src/main.py

The execution will be done for the matrix of size \(2^6\) with 4 threads.

Multiple matrix multiplication by range

python3 src/main.py --min-exp 6 --max-exp 13

The charts with the results will be created in the charts folder.

Specify threads number

python3 src/main.py --threads 4

You can find out the number of threads available on your computer using top command.

Fox MPI in python