# Structured Parallelism Group

## Research Overview and Activities

The Structured Parallelism Group is interested in the structure of parallel computation. In the sequential world, high level structuring and abstraction mechanisms have superseded simpler primitive concepts. We believe that analogous mechanisms and models must be developed to allow parallel computing to enjoy similar success.

In our work on algorithmic skeletons we investigate the idea that recurrent patterns of parallel computation and interaction can be abstracted as frameworks, templates or second order operations, parameterised by other simpler operations, thereby raising the conceptual level at which parallel programs are described and encouraging cost effective portability.

You can find more specific information on the eSkel and Enhance projects, or more general information on the area at the skeletal parallelism homepage.

## Members

## Former Members

- Anne Benoit
- Alex Collins
- David Crooke
- Horacio Gonzalez-Velez
- Luis Fabricio Goes
- Tim Harris
- Dominik Grewe
- Noman Javed
- Yasushi Hayashi
- Israel Hernandez
- Marcus Marr
- Zoe Leiper
- Thibaut Lutz
- Roopa Rangaswami
- Siddharth Mohanty
- Gagarine Yaikhom
- Azusa Yamaguchi

## Publications

**Towards Collaborative Performance Tuning of Algorithmic Skeletons**, Chris Cummins, Pavlos Petoumenos, Michel Steuwer and Hugh Leather, Workshop on High-Level Programming for Heterogeneous & Hierarchical Parallel Systems (HLPGPU), 2016.

**Autotuning OpenCL Workgroup Size for Stencil Patterns**, Chris Cummins, Pavlos Petoumenos, Michel Steuwer and Hugh Leather, International Workshop on Adaptive Self-tuning Computing Systems (ADAPT), 2016.

**Compositional Compilation for Sparse, Irregular Data Parallelism**, Adam Harries, Michel Steuwer, Murray Cole, Alan Gray and Christophe Dubach, Workshop on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU), 2016.

**LIRA: Adaptive Contention-Aware Thread Placement for Parallel Runtime Systems**, Alexander Collins, Tim Harris, Murray Cole and Christian Fensch, Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers, Article No. 2, ACM, 2015.

**Helium: A Transparent Inter-kernel Optimizer for OpenCL**, Thibaut Lutz, Christian Fensch and Murray Cole, Proceedings of the 8th Workshop on General Purpose Processing Using GPUs (GPGPU15), ACM, pages 70-80, 2015.

**Cooperative auto-tuning of parallel skeletons**, Alexander Collins, PhD thesis, University of Edinburgh, 2015.

**Autotuning wavefront patterns for heterogeneous architectures**, Siddharth Mohanty, PhD thesis, University of Edinburgh, 2015.

**Enhancing productivity and performance portability of opencl applications on heterogeneous systems using runtime optimizations**, Thibaut Lutz, PhD thesis, University of Edinburgh, 2015.

**NOVA: A Functional Language for Data Parallelism**, Alex Collins, Dominik Grewe, Vinod Grover, Sean Lee and Adriana Susnea, Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, pages 1-8, 2014.

**Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures**, Siddharth Mohanty and Murray Cole, International Workshop on
Programming Models and Applications for Multicores and Manycores (PMAM14), pages 1-9, ACM, 2014.

**Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications**, Luis Fabricio Wanderley Goes, Christiane Pousa Ribeiro, Marcio Castro, Jean-Francois Mehaut, Murray Cole and Marcelo Cintra, International Journal of Parallel Programming, 42(2):365-382, 2014.

**MaSiF: Machine Learning Guided Auto-tuning of Parallel Skeletons**, Alex Collins, Christian Fensch, Hugh Leather and Murray Cole, IEEE International Conference on High Performance Computing (HiPC13), pages 186-195, 2013.

**Partans: An Autotuning Framework for Stencil Computation on Multi-GPU Systems**, Thibaut Lutz, Christian Fensch and Murray Cole, ACM Transactions on Architecture and Code Optimization 9(4), (also at HiPEAC 2013), 2013.

**Resource Analyses for Parallel and Distributed Coordination**, P.W. Trinder, M. I. Cole, K. Hammond, H-W. Loidl and G.J. Michaelson, Concurrency and Computation: Practice and Experience 25(3)309-348, 2013.

**Distributed Computing Practice for Large-Scale Science & Engineering Applications**, Shantenu Jha, Murray Cole, Daniel S. Katz, Manish Parashar, Omer Rana and Jon Weissman, Concurrency and Computation: Practice and Experience 25(11):1559-1585, 2013.

**Auto-Tuning Parallel Skeletons**, Alexander Collins, Christian Fensch, and Hugh Leather, Parallel Processing Letters, 22(02):1240005, 2012.

**Automatic skeleton-driven performance optimizations for transactional memory**, Luis Fabricio Goes, PhD thesis, University of Edinburgh, 2012.

**Autotuning Wavefront Abstractions for Heterogenous Architectures**, Siddharth Mohanty and Murray Cole, Proceedings of the 3rd Workshop on Applications for Multi-Core Architectures at SBAC-PAD 2012.

**Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications**, Luis Wanderley Goes, Nikolas Ioannou, Polychronis Xekalakis, Murray Cole and Marcelo Cintra, IEEE Transactions on Parallel and Distributed Computing 23(12):2205-2218, 2012.

**Optimization Space Exploration of the FastFlow Parallel Framework**, Alexander Collins, Christian Fensch, Hugh Leather, presented at HLPUGPU'12 as part of HiPEAC'12, 2012.

**Parallel Skeletons**, Sergei Gorlatch and Murray Cole, in Encyclopedia of Parallel Computing (Ed. D. Padua), pages 1417-1422, Springer, 2011.

**A Machine Learning-Based Approach for Thread Mapping on Transactional Memory Applications**, Marcio Castro, Luis Fabricio Wanderley Goes, Christiane Pousa Ribeiro, Murray Cole, Marcelo Cintra and Jean-Francois Mehaut, Proceedings of 18th Annual International Conference on High Performance Computing (HiPC11), 1-10, 2011.

**Performance Database: Capturing Data for Optimizing Distributed Streaming Workflows**, Chee Sun Liew, Malcolm P. Atkinson, Radoslaw Ostrowski, Murray Cole, Jano I. van Hemert and Liangxiu Han, Phil. Trans. R. Soc. A 369(1949):3268-3284, 2011.

**Adaptive Statistical Scheduling of Divisible Workloads in Heterogeneous Systems,**H. Gonzalez-Velez and M. Cole, Journal of Scheduling, 2010.

**Adaptive Structured Parallelism for Distributed Heterogeneous Architectures: A Methodological Approach,**Horacio Gonzalez-Velez and M. Cole, Concurrency and Computation: Practice and Experience, 2010.

**Characterising Effective Resource Analyses for Parallel and Distributed Coordination**, P. W. Trinder, M. I. Cole, H-W. Loidl and G. J. Michaelson, International Workshop on Foundational and Practical Aspects of Resource Analysis (FOPARA '09), Eindhoven, The Netherlands, November 2009.

**Adaptive Structured Parallelism**, Horacio Gonzalez-Velez, PhD thesis, University of Edinburgh, 2008.

**Reactive Scheduling of DAG Applications on Heterogeneous and Dynamic Distributed Computing Systems**, Jesus Israel Hernandez, PhD thesis, University of Edinburgh, 2008.

**An Adaptive Parallel Pipeline Pattern for Grids**, Horacio Gonzalez-Velez and Murray Cole, IEEE International Parallel & Distributed Processing Symposium, pages 1-11, 2008.

**Scheduling DAGs on Grids with Copying and Migration**, Israel Hernandez and Murray Cole, Parallel Processing and Applied Mathematics 2007 (PPAM07), LNCS 4967, pages 1019-1028, Springer, 2008.

**A Structural Approach for Modelling Performance of Systems using Skeletons**, Gagarine Yaikhom, Murray Cole, Stephen Gilmore and Jane Hillston, Electronic Notes in Theoretical Computer Science 190(3), 167-183, Elsevier, 2007.

**Reliable DAG Scheduling with Rewinding and Migration**, Israel Hernandez and Murray Cole, First International Conference on Networks for Grid Applications (GridNets07), pages 1-8, ACM Press, 2007.

**Parallel stochastic simulation of macroscopic calcium currents**, Virginia Gonzalez-Velez and Horacio Gonzelez-Velez, Journ. of Bioinform. Comput. Biol. 5(3) pages 755-772, 2007.

**A structural approach for modelling performance of workflow systems**, Gagarine Yaikhom, Murray Cole, Stephen Gilmore and Jane Hillston, in Proceedings of 5th International Workshop on the Quantitative Aspects of Programming Languages (QAPL 07), 2007.

**Adaptive Structured Parallelism for Computational Grids**, Horacio Gonzalez-Velez and Murray Cole, PPoPP'07: 12th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming, pages 140-141, ACM Press, 2007.

**Reactive Grid Scheduling of DAG applications**, Israel Hernandez and Murray Cole, Parallel and Distributed Computing and Networks 2007, 25th IASTED International Multi-Conference on Applied Informatics, pages 92-97, ACTA Press, 2007.

**Towards fully adaptive pipeline parallelism for heterogeneous distributed environments**, Horacio Gonzalez-Velez and Murray Cole, ISPA 2006, LNCS 4330, pages 916-926, Springer-Verlag, 2006.

**Combining measurement and stochastic modelling to enhance scheduling decisions for a parallel Mean Value Analysis algorithm**, Gagarine Yaikhom, Murray Cole, and Stephen Gilmore, Proceedings ICCS 2006, LNCS 3992, pages 929--936, Springer Verlag, 2006.

**Self-adaptive skeletal task farm for computational grids**, Horacio Gonzalez-Velez, Parallel Computing 32(7-8), pages 479-490, 2006.

**Message Passing with Communication Structures**, Gagarine Yaikhom, PhD thesis, University of Edinburgh, 2006.

**Automatic mapping of ASSIST applications using process algebra**, Marco Aldinucci and Anne Benoit, Proceedings of the Third International Workshop on High-level Parallel Programming and Applications (HLPP 2005), July 2005.

**A grid-based stochastic simulation of unitary and membrane Ca2+ currents in spherical cells**, Virginia Gonzalez-Velez and Horacio Gonzalez-Velez, Proceedings of CBMS 2005, pages 171-176, IEEE Press, 2005.

**Structured parallelism for the grid**, Horacio Gonzalez-Velez, In Science and Supercomputing in Europe, P. Alberigo, G. Erbacci, and F. Garofano, Eds. CINECA, Bologna, 2005, pages 201-203. ISBN: 88-86037-15-5, 2005.

**An Adaptive Skeletal Task Farm for Grids**, Horacio Gonzalez-Velez, Proceedings of Euro-Par 2005, LNCS 3648, pages 401-410, Springer Verlag, 2005.

**Buffered Branching Channels with Rendezvous Message Passing**, Gagarine Yaikhom, in Fahringer, T. and Hamza, M., editors, Proceedings of the Intl. Conference on Parallel and Distributed Computing and Networks (PDCN), pp. 184-192. ACTA Press, 2005.

**Shared Message Buffering without Intermediate Memory Copy**, Gagarine Yaikhom, in Proceedings of the Third International Workshop on High-level Parallel Programming and Applications (HLPP 2005), July 2005.

**Integrating MPI-Skeletons with Web Services.**Jan Duennweber, Sergei Gorlatch, Anne Benoit and Murray Cole, in Proceedings of ParCo 2005. Malaga, 2005.

**Native Services for Grid Applications**, Jan Duennweber, Anne Benoit, Murray Cole, Sergei Gorlatch, Proceedings of ParCo 2005. Malaga, 2005.

**Using eSkel to implement the multiple baseline stereo application**, Anne Benoit, Murray Cole, Stephen Gilmore and Jane Hillston, Proceedings of ParCo 2005. Malaga, 2005.

**Flexible Skeletal Programming with eSkel**, Anne Benoit, Murray Cole, Stephen Gilmore and Jane Hillston, Proceedings of Euro-Par 2005. LNCS 3648, pages 761-770, Springer Verlag, 2005.

**Buffered Branching Channels with Rendezvous Message Passing**, Gagarine Yaikhom, in Parallel and Distributed Computing and Networks 2005, 23rd IASTED International Multi-Conference on Applied Informatics, pages 184-192, ACTA Press, 2005.

**On the Abstraction of Message-Passing Communications using Algorithmic Skeletons: A Case Study**, Horacio Gonzalez-Velez, ISSADS 2005, LNCS 3563, pages 43-50, Springer-Verlag, 2005.

**Component-based grid programming. A case study on wavelets**, Jan Duennweber, Sergei Gorlatch, Anne Benoit and Murray Cole, HPC-Europa project report, April 2005.

**Enhancing the effective utilisation of Grid clusters by exploiting on-line performability analysis**, Anne Benoit, Murray Cole Stephen Gilmore and Jane Hillston, Proceedings of CCGrid 2005, vol 1, pages 317-324, IEEE Press, 2005.

**Two fundamental concepts in skeletal parallel programming**, Anne Benoit and Murray Cole, ICCS 2005, Part II, LNCS 3515, pages 764-771, Springer Verlag, (as part of Practical Aspects of High-level Parallel Programming PAPP 2005), 2005.

**Scheduling skeleton-based grid applications using PEPA and NWS**, Anne Benoit, Murray Cole, Stephen Gilmore and Jane Hillston, The Computer Journal, 48(3):369-378, 2005.

**Evaluating the performance of pipeline-structured parallel programs with skeletons and process algebra**, Anne Benoit, Murray Cole, Stephen Gilmore and Jane Hillston, Scalable Computing: Practice and Experience 6(4), pages 1-16, 2005. A preliminary version appeared in Practical Aspects of High-level Parallel Programming, PAPP 2004.

**Evaluating the performance of skeleton-based high level parallel programs**, Anne Benoit, Murray Cole, Stephen Gilmore and Jane Hillston, Proceedings of The International Conference on Computational Science (ICCS 2004), Part III, LNCS vol 3038, pages 289-296, Springer Verlag, 2004.

**A comparative study of intrinsic parallel programming methodologies**, Gonzalez-Velez, H., de Luca, A., and Gonzalez-Velez, V, ICEEE: First Int Conf on Electrical and Electronics Engineering (Acapulco, Mexico, June 2004), IEEE, pp. 200-205, 2004.

**Why Skeletal Parallel Programming Matters**, Murray Cole, Proceedings of Euro-Par 2004, Danelutto, Vanneschi, Laforenza (Eds), LNCS vol 3149, page 37, Springer Verlag, 2004 (also see slides from this keynote talk).

**Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming**, Murray Cole, Parallel Computing 30(3) pages 389-406, 2004.

**The Integration of Task and Data Parallel Skeletons**, Herbert Kuchen and Murray Cole, Parallel Processing Letters 12(2), pages 141-156, 2002. (Draft appeared in Proceedings of CMPP02, Dagstuhl, 2002)

**Automated Cost Analysis of a Parallel Maximum Segment Sum Program Derivation**, Yasushi Hayashi and Murray Cole, Parallel Processing Letters 12(1), pages 95-112, 2002.

**Static Performance Prediction of Skeletal Programs**, Yasushi Hayashi and Murray Cole, Parallel Algorithms and Applications 17(1) pages 59-84, 2002.

**Shape-based Cost Analysis of Skeletal Parallel Programs**, Yasushi Hayashi, PhD thesis, University of Edinburgh, 2001.

**Coordinating Heterogeneous Parallel Systems with Skeletons and Activity Graphs**, Murray Cole and Andrea Zavanella, Journal of Systems Integration, 10(2), pages 127-143, 2001.

**Frame: An Imperative Coordination Language for Parallel Programming**, Murray Cole, Technical report EDI-INF-RR0026, 2000.

**Activity Graphs: A Model-Independent Intermediate Layer for Skeletal Co-ordination**, Murray Cole and Andrea Zavanella, Proceedings of ACM Symposium on Applied Computing, Vol 1, pages 255-261, 2000.

**BSP-based Cost Analysis of Skeletal Programs**, Yasushi Hayashi and Murray Cole, Proceedings of Scottish Workshop on Functional Programming 1999, pages 20-28, Intellect, 2000.

**Algorithmic Skeletons,**Murray Cole, which is chapter 13 (pages 289-303) of Research Directions in Parallel Functional Programming, K. Hammond & G. Michaelson (Eds.), Springer-Verlag, 1999.

**Practical Structured Parallelism using BMF**, David Crooke, PhD thesis, University of Edinburgh, 1999.

**Descriptive Simplicity in Parallel Computing**, Marcus Marr, PhD thesis, University of Edinburgh, 1998.

**A Monadic Calculus for Parallel Costing of a Functional Language of Arrays**C.B. Jay, M.I. Cole, M. Sekanina, and P.A. Steckler, Proceedings of Euro-Par 97, LNCS 1300, pages 650-661, Lengauer, Griebl, Gorlatch (Eds), Springer Verlag, 1997.

**On Dividing and Conquering Independently**, Murray Cole, Proceedings of Euro-Par 97, LNCS 1300, pages 634-637, Lengauer, Griebl, Gorlatch (Eds), Springer Verlag, 1997. (The link is to a longer version which appeared as a University of Edinburgh Computer Science technical report, ECS-CSG-31-97).

**Recursive 3D Mesh Indexing with Improved Locality**, George Chochia and Murray Cole, Proceedings of HPCN '97, Hertzberger, Sloot (Eds), pages 1014-1015, LNCS 1225, Springer Verlag, 1997.

**Compile-time Cost Analysis for Parallel Programming**, Roopa Rangaswami, presented at Euro-Par 96, Lyon (France).

**Robustness and Performance in Structured Parallelism**, David Crooke, in Abstract Machine Models for Parallel and Distributed Computing, Kara et al (eds), IOS Press 1996.

**HOPP - A Higher-Order Parallel Programming Model**, Roopa Rangaswami, in "Algorithms and Parallel VLSI Architectures", M.Moonen and F.Cathoor (Editors), Elsevier Science 1995.

**A Cost Analysis for a Higher-Order Parallel Programming Model**, Roopa Rangaswami, PhD thesis, University of Edinburgh, 1996.

**Synchronizing Arbitrary Processor Groups in Dynamically Partitioned 2D Meshes,**George Chochia, Murray Cole and Todd Heywood, which is CSG report ECS-CSG-25-96, July 1996.

**Lower Bounds on Average Time for Random Destination Mesh Routing and Their Utility as Performance Predictors for PRAM Simulation**, George Chochia, Murray Cole and Todd Heywood, Proceedings of the Workshop on Randomized Parallel Computing at the IPPS 96, 1996.

**Implementing the Hierarchical PRAM on the 2D Mesh: Analyses and Experiments**, George Chochia, Murray Cole and Todd Heywood, Proceedings IEEE SPDP 95, pages 587--595, 1995.

**Hierarchical Skeletons and ad-hoc Parallelism**, Marcus Marr and Murray Cole, Proceedings of ParCo 95, D'Hollander et al (Eds.), Advances in Parallel Computing, vol. 11, pages 673-676, Elsevier Press, 1996.

**Parallel Programming with List Homomorphisms**, Murray Cole, Parallel Processing Letters 5(2), pages 191-203, 1995.

**A survey of PRAM simulation techniques**, Tim J. Harris, ACM Computing Surveys 26(2), pages 187-206, 1994.

**List Homomorphic Parallel Algorithms for Bracket Matching**, Murray Cole, which is technical report CS-29-93, a preliminary version of the ParCo 93 paper above.

**Parallel Programming, List Homomorphisms and the Maximum Segment Sum Problem**, Murray Cole, which is technical report CS-25-93.

**Parallel Programming, List Homomorphisms and the Maximum Segment Sum Problem,**, Murray Cole,

*Proceedings of ParCo 93*pages 489-492, Trystram (Ed.), Elsevier Press, 1993. (A short conference paper based on the technical report above).

**The Parameterized PRAM**, Tim Harris and Murray Cole,

*Proceedings of the Workshop on Parallel and Distributed Processing,*Boyonov (Ed.), Elsevier Press, 1993.

**Parallel Software Paradigms**, Murray Cole, chapter 1 of

*Advances in Parallel Algorithms*L. Kronsjo & D. Shumsheruddin (Eds.), pp 1-25, Blackwell Scientific, 1992.

**Do-it-Yourself Shared Memory Instruction Sets in occam**, Murray Cole,

*Proceedings of Tools and Techniques for Transputer Applications*pages 1-10, S.J. Turner (Ed.), IOS Press, 1990.

**Algorithmic Skeletons: Structured Management of Parallel Computation**, Murray Cole,

*MIT Press & Pitman*, 1989. (Derived from my PhD thesis and now available on the web by kind permission of the publishers. Also see its entry on the MIT Press website. There are some minor formatting changes from the published version, since the original Latex style is no longer available to me. The text itself is unchanged.

**A Skeletal Approach to the Exploitation of Parallelism**, Murray Cole,

*Proceedings of CONPAR 88*, pages 667-675, C. Jesshope (Ed.), Cambridge University Press, 1989.

**Recursive Splitting as a General Purpose Skeleton for Parallel Computation**, Murray Cole,

*Proceedings of the Second International Conference on Supercomputing*, L.P. & S.I. Kartashev (Eds), vol 3, pp. 133-140, 1987.