Study and Design of a High-Performance Computing Infrastructure for Iranian Light Source Facility based on the Accelerator Physicists and Engineers’ Applications Requirements

Synchrotron design and operation are one of the complex tasks which requires a lot of precise computation. As an example, we could mention the simulations done for calculating the impedance budget of the machine which requires a notable amount of computational power. In this paper we are going to review different HPC scenarios suitable for this matter then we will present our design of a suitable HPC based on the accelerator physicists and engineers’ needs. Going through different HPC scenarios such as shared memory architectures, distributed memory architectures, cluster, grid and cloud computing we conclude implementation of a dedicated computing cluster can be desired for ILSF. Cluster computing provides the opportunity for easy and saleable scientific computation for ILSF also another advantage is that its resources can be used for running cloud or grid computing platforms as well.

Study of Different Solutions for Deployment of a Development Server at Iranian Light Source Facility (ILSF)

Abstract To create a software product, a development team of several programmers work together using multiple tools, languages and development environments. In order to maintain integration and quality of the final product we need to ensure that the used tools are compatible to each other. Therefore, it is common to use development servers which provide programmers their required tools to develop the software. These types of server can be deployed as a service in the cloud or as virtual private servers and also dedicated servers. These methods have been reviewed in this paper then we have inspected the requirements for the design and deployment of the development server for Iranian Light Source Facility using containers each of which can provide development tools, test environment, version control and also bug tracking. Finally, the proposed architecture and its software, hardware and deployment cost estimation is presented.

Study of different Solutions for Implementing Bug Tracking System at Iranian Light Source Facility (ILSF)

Bug tracking and error correction is one of the important steps in an application life cycle. When the project is limited in number of users and developers and also in scale usual applications such as spreadsheets can be used for this matter but when it comes to large scale projects and numerous applications as we have in synchrotrons a suitable bug tracking system is required. In this paper we have reviewed various bug tracking and application life cycle management tools then we presented the design of a suitable bug tracking system for the use of Iranian Light Source Facility based on the Jira software.

HPC for ILSF Beamlines

Abstract Data analysis is a very important step in doing experiments at light sources, where multiple application and software packages are used for this purpose. In this paper we have reviewed some software packages that are used for data analysis and design at Iranian Light Source Facility then according to their processing needs, after taking in mind different HPC scenarios a suitable architecture for deployment of the ILSF HPC is presented. The proposed architecture is a cluster of 64 computing nodes connected through Ethernet and InfiniBand network running a Linux operating system with support of MPI parallel environment.
Keywords: HPC, Data Analysis Software, Cloud, Grid, Cluster

1. Introduction
To do research on the structure of different materials, we can study the structure of materials at the level of its atoms. Active synchrotron light sources around the world study the structure of materials using X-rays in beam lines. Techniques used in beam line experiments can be divided into absorption, diffraction, emission or reflection, imaging, ion spectroscopy, lithography, photoelectron emission and scattering. The data generated and recorded through these experiments need to be analysed in order to obtain useful information from this raw data. Several applications have been developed to perform these analyses and are used by synchrotron light sources. Analysing this data by researchers is time consuming, depending on factors such as the hardware used and the amount of data processed.
Computing plays a key role in synchrotrons. In particular, advanced software plays an important and influential role in the performance and efficiency of operations and experiments performed on beam lines in synchrotron assemblies. According to previous studies in this field, with the increase in data production rate, the importance of using advanced software to provide timely results has become more apparent. The use of advanced and new data analysis software as well as the implementation of appropriate computing infrastructure is one of the important needs of users and researchers in synchrotron collections [1].
Data analysis requires high-performance computing to achieve results in short time, as well as the use of technologies such as multi-threaded processing and convenient graphics processors [2]. According to the mentioned cases, identifying suitable software for analysing the data generated by performing various experiments in beam lines, as well as the hardware requirements of each, including the required memory, number of processors, graphics card etc. is necessary. This paper examines the hardware requirements for analysing beam line data as well as some of the software used to analyse the data generated in ILSF beam lines.
2. Hardware and Computational Needs for Analysing the Data Produced in Experiments at ILSF Beamlines
In general, the analysis of data obtained through experiments at beam lines of the light source can be done in two ways. In the first method, the data obtained is first stored in external storage and then analysed at another time. In this method, due to the large volume of data generated, external storage with notable capacities is required. In the second method, which is done in real time, it is necessary to use supercomputers or processing systems with high computing power [3].
In general, there are four hardware components that should be considered for data analysis and simulation using various software used at the beam lines, including Avizo, TomoPy, HLK2000, FIT2D. , HipGISAXS, XMAS and MIDAS. These four components are: graphics card, processor, main memory and external storage. The necessity of each of them is stated below [4].
2.1. Graphics Card
The volume of data produced in ILSF collections is increasing day by day. Due to the extensive developments in the field of laboratory tools, including detectors, cameras, etc., the generated data have a high quality and volume. The performance of various software in image processing, in addition to the CPU, depends on the graphics card used. Different GPUs may be used depending on the test performed and the software used. GPUs also differ in the amount of memory they have. The minimum recommended memory for a GPU is 1 GB, but the most powerful GPUs have between 12 and 16 GB of memory. Choosing a GPU with the right memory depends on the amount of data being processed.
Some analytics software used in beam lines, such as Avizo, do not care much about the type of GPU, but only that the GPU used can fully implement the OPENGL 2.1 or higher. This is while some other software requires a special version of GPUs.
2.2. Processor
As mentioned, computing plays a key role in the analysis of light source beam line data. The main processor is one of the hardware components that is important to pay attention to. Depending on the different computational needs of data analysis software, processors with different specifications are used. Today, due to the high volume of data generated through various experiments, the need for real-time analysis and data reduction techniques is felt. With advances in laboratory equipment, including detectors, leading to high-quality, high-volume image data in beam lines, the use of a single processor, even multi-core, is not very efficient, and this is exactly where the importance lies. The use of multi-core multiprocessor systems, computing clusters and even grid computing systems is emerging.
2.3. Memory
Another important hardware component in analysing the data generated by performing various experiments at beam lines is main memory and external storage. Various data analysis software packages need high-capacity main memory to minimize latency in order to be able to analyse large volumes of data very quickly. Also, after the data is analysed, memory with the appropriate capacity is needed to store the results of the analysis. When data processing and analysis is not real-time, high-speed and high-capacity storage is also needed to minimize latency due to data transfer between storage and main memory.
3. Data Analysis Software at ILSF Beamlines
Researchers are trying to publish the results of experiments performed at beam lines in the form of scientific research articles and thus contribute to the advancement of science. Therefore, the data generated from various experiments are placed in an analysis cycle, in which several different steps are taken to analyse the data, one of which is the use of software and data analysis tools [5]. According to the different needs of researchers, such as data processing and effective components in data analysis, including data volume, modelling, etc., various analytical applications are used. Some software has a general aspect in the sense that they are used to analyse the results of various experiments as well as by different synchrotron light sources, while some of other software is custom-developed for a specific task. Some analysis software is used simultaneously with data collection, while some software does not have the ability to process and analyse in real time and perform processing operations on stored data [5].
When the processing is not real-time and the processing operation is performed after storing the data obtained from various experiments, the data obtained are first stored in the external storage and will be analysed at another time. In this method, due to the large volume of data generated, external storage with extraordinary capacities is required. In this case, it is not possible for researchers to make results-based online decisions to change the test parameters. If online processing is possible using fast processing systems, researchers can make immediate decisions [3].
Another criterion on which the analytical software used beamlines can be classified is the type of test performed there.
In general, various analysis software have been developed for processing the data generated at beamlines, for example, in (1) some of these software packages used in some beamlines of European synchrotrons are listed, or in (2) some software related to small angle scattering is listed. This section only introduces some of them.
3.1. TomoPy
By performing cross-sectional experiments, three-dimensional images of the sample can be created. Creating these images and analysing this data requires a strong processing system. The faster the image is generated and collected, the more computational needs increase and more powerful computing tools are needed. TomoPy is one of the tools developed to analyse images obtained from cross-sectional experiments at APS in the United States. This software is an open source software and 35% of its code is written in C and C ++ and the rest in Python, which can be downloaded for free through GitHub. In addition to running on Windows, Linux and Mac operating systems, it runs well on supercomputers. Most synchrotron light sources in the United States use this software. The ability to use GPUs is also included in this software. Testing this software on Mira supercomputers up to 32,000 processing cores shows that TomoPy has high scalability in reconstructing cross-sectional images. The results of running this software on the IBM BG / Q supercomputer show that up to 8,000 processing cores, the rate of increase in execution speed is constant. The software acceleration coefficient of this software on the supercomputer reaches a maximum when it reaches 32,000 cores, and the reconstruction time of cross-sectional images is reduced up to 95.4% compared to the time when one thousand processing cores are used. On average, if the time to reconstruct cross-sectional images when using 256 processors is two hours, using 32,000 processors, this time is reduced to one minute. Software such as STP, which has a graphical user interface, also uses TomoPy at the bottom layer.

Figure 1 Usage of TomoPy at bottom Layer of STP

Figure 2 Sample GUI of the STP
3.2. HiSPoD
This software is a standalone MATLAB-based application that can be installed on any computer, including desktop and laptop, with various operating systems, including Windows, Linux, and Macintosh, on which MATLAB software and the MATLAB image processing toolbox are installed. This software was developed to simulate and analyze the diffraction patterns obtained from experiments performed on crystal specimens at the APS light source in the United States. This software is used to analyze the data obtained from imaging experiments and is also used to analyze the data of scattering experiments. It is not possible to determine the exact hardware required for this software, because the hardware required for data analysis is usually based on the amount of data processed, the type of analysis, and in the case of online analysis, data production rate. However, at least 4 GB of GPU RAM and at least 16 GB of main memory are recommended.

Figure 3 Sample GUI of the HiSPoD
3.3. HipGISAXS
This software is used to analyze and simulate the data obtained from performing scattering experiments including GISAXS. This can be run on a variety of computers including desktops, laptops, computing clusters and supercomputers. It is a handy tool used by scientists at light source facilities. Designed based on parallel programming, the software provides high-performance computations by running on clusters with multi-core and graphics processors. The software runs on 32-bit and 64-bit Linux operating systems, including Ubuntu and Red Hat, as well as Macintosh 32-bit and 64-bit operating systems, including Lyon. It is recommended that the computer running HipGISAXS have an Nvidia GPU of at least 4 GB and a processor of at least 8 cores. Figure 4 shows the scalability of computational speed in this software by increasing the number of nodes. There are 24 computational cores in each of these nodes. Figure five also shows this scalability as the number of GPUs increases. In this case, each computing node has a GPU. Software and hardware environments compatible with this software are listed in Table 1.

Figure 4 Computational speed scalability in HipGISAXS by increasing the number of computing nodes and cores [3].

Figure 5 Scalability of computing speed in HipGISAXS by increasing the number of nodes and GPUs [3].

Table 1 Software and hardware environments compatible with HipGISAXS software

Item Description
Operating System GNU/Linux x86_64: Ubuntu, Red Hat Linux, SUSE Linux, Cray Linux Environment (XK7, XE6, XC30, XC40).


Darwin x86_64: Mac OS X (El Capitan, Yosemite).


on any UNIX based OS: generally, it should work.

Processor Architecture and Type Intel/AMD processors (64-bit).

NVidia GPUs with compute capability >= 2.0.

Intel MIC architecture: support under development.

Computation Environment Generic x86 laptop/desktop/server.

Clusters/Supercomputers based on x86 processors

Generic x86 laptop/desktop/server equipped with NVidia GPUs.

Clusters/Supercomputers equipped with NVidia GPUs as accelerators on each node.

3.4. HHKL-2000
HKL-2000 software package is one of the applications used in different synchrotrons of the world such as ALS, APS, SSRL for analysing and processing beamline data. This software package is one of the most popular data processing software based on the developed versions of Denzo, scalepack, xdisplayf, which has several different tabs for setting parameters, scalability, Indexing as well as integration. It allows working with different data sets [6]. This can be run on Red Hat, Fedora and Ubuntu versions of the Linux operating system and can also be installed on different types of Mac OS. The minimum RAM for running the software is 512 MB and an storage with a minimum capacity of one and a half GB. It also requires a 16bit graphics card. Summary information about the operating system and hardware required for this program is shown in Table2. To use this software, depending on the type of use (commercial, educational and research), you need to purchase a license to use it.
Table 2 Hardware and operating system requirements of HKL-2000 software

Item Description
Main Memory Minimum 512 MB
Storage Minimum 1.5 GB
Graphics Minimum 16 bit Graphics Card
Intel Macintosh operating system Mac OS X El Capitan (10.11)

Mac OS X Yosemite (10.10)

Mac OS X Mavericks (10.9)

Mac OS X Mountain Lion (10.8)

Mac OS X Lion (10.7)

Mac OS X Snow Leopard (10.6)

Linux operating system  Red Hat Enterprise 7,6,5,4

32-bit and 64-bit versions

Red Hat 32-bit and 64-bit versions, Fedora 8 and above

Ubuntu 32-bit and 64-bit versions

Figure 6 One of the HKL-2000 interface windows [4].

3.5. AutoPROC
AutoPROC is a tool used to automatically process X-ray diffraction data. This tool uses other programs such as XDS / XSCALE, CCP4, POINTLESS and AIMLESS to process and analyse data. This tool automatically automates all steps related to data processing, including analysis of generated image collections and header files of each image, image indexing, precise determination of cellular parameters, integration and aggregation of a set of images, file generation Performs in various formats etc. This bundle is used in beamlines of various synchrotron light sources such as Alba in Spain, ESRF in France and DLS in the UK. This software package is available for Linux and Darwin operating systems.
This software has the ability to run in parallel and also distributed modes, and at the Alba, a high-performance computing system of the grid type is used to run this software. At ESRF, this tool is also available to users through a computational cluster. This software package is able to process the data generated through Pilatus detectors. Other formats that this software package can process are data in MTZ format. This file format is used to store data obtained through reflection. Use of this software for academic purposes, if you meet the criteria mentioned on the website of this software, is free, otherwise you must purchase a license to use it.
3.6. FIT2D
This software is used to analyse two-dimensional and one-dimensional data in many ESRF beamlines. It can be run on Windows, Linux and Mac operating systems. Correction and calibration of detector distortion is one of the mentioned capabilities for this software. The software uses dynamic memory allocation to analyse data obtained from beamline tests. For image data, the amount of memory required depends on factors such as image size. Depending on the size of the image, the following equation [5] can be used to calculate the required memory.
Required memory = 9 * X_DIMENSION * Y_DIMENSION + (~10) Mbytes

Figure 7 An example of the GUI of FIT2D software for powder diffraction data analysis [6].
3.7. QXRD
This software is based on C++ and uses Qt libraries in the drawing section. The software provides an integrated system for online data reading, data visualization and data reduction in X-ray synchrotron experiments. Visualization and data reduction features have been further developed for use in powder diffraction experiments and / or SAXS. Has been [7]. It can be run using multiple cores and using more cores can increase the processing speed. If the maximum read speed of the detectors is used, fast hard disks are required due to the image production rate, in which case the use of zero level RAID or solid state disk is recommended. In terms of supported operating system, this software has Linux, Windows and Mac versions.

Figure 8 Sample of QXRD software environment [8].
3.8. DAWN
DAWN is an open source software for visualizing and processing scientific data that has been developed specifically to analyze synchrotron light source experimental data. The software supports various data formats such as NeXus, EDF, MAR, tiffs, hdf5. The main investor in this software is DLS. DAWN is based on Eclipse / RCP and is available to the public for free. This software can be run on Linux, Windows and Mac operating systems. However, in this software, a compatibility software called Lazy Loading is provided to be able to selectively examine large data sets without full load in temporary memory. This makes the cache requirements somewhat easier, but it is still recommended to use high-speed fast processing systems to increase the processing speed in proportion to the amount of data being processed.

Figure 9 View of the DAWN software environment [9].
4. Different HPC Scenarios
There are basically three different architectures for high-performance computing environments: symmetric multiprocessors, abbreviated as SMP, cluster, and grid. Each of these three architectures is used to perform a specific set of calculations [10] as described below.
4.1. SMP
In symmetric multiprocessor systems, system resources are divided between processors, and one processor has no priority over other processors in accessing system resources. In this architecture, the problem is broken down into several parts, and each part runs on one processor at the same time. The task of breaking the problem into several sub-problems and assigning them to processors is the responsibility of the operating system. SMP systems are more compatible with operating systems that use small processes. Therefore, Windows NT and Linux, which have a relatively small volume, are suitable for working with these systems. When the number of processors is limited, it is relatively easy to build these systems, but with the increase in the number of processors, it becomes difficult to build the system, and the reason for this is the difficulty of coordination for all processors to access memory and input/output. The architecture of this system is shown in Figure 10, in which the system resources, including memory and input/output, are shared between all processors, and all processors share these resources.

Figure 10 SMP Architecture
4.2. Cluster Computing
Cluster computing, is a type of parallel processing system in which several independent computers, each known as a node, are connected to each other over a network. In fact, this connection is such that the whole system seems to be a single system. Clusters are used to speed up and improve computational performance. Improving computational performance and increasing processing speed are achieved through parallel programming, while through rapid local communication between different nodes, error tolerance can be increased [11].
Clusters are generally divided into two categories: high-performance clusters and high-access clusters. The purpose of designing high-performance clusters is to achieve higher computing power to perform complex computations than when using a single computer. High-access clusters are also designed to provide highly reliable services.
Standalone computers, including PCs, workstations, symmetric multiprocessor systems, operating systems, firmware, applications, parallel programming environments, and high-performance communication networks form the main components of a cluster. Figure 11 shows the general architecture of a cluster. In fact, this figure shows how the main components are connected to each other and form a cluster [12].

Figure 11 General cluster architecture [12]
4.2.1. Main Components of a Cluster
In general, the components of cluster construction are classified into two groups of hardware and software components, each of which is briefly described below.
A) Hardware components of cluster construction
Cluster hardware components include cluster computing node hardware and cluster network hardware.
• Cluster Computing Node Hardware
Nodes in a cluster can be personal computers or different computer systems that are connected to each other through a cluster network. Cluster nodes are generally divided into three categories: lead node, login and computational nodes. In small clusters, the login node can be the lead node. The node’s job is to manage resources and schedule tasks. The login node is responsible for monitoring user login, software development, job registration, preprocessing and post-processing. Large clusters have several lead nodes and several separate login nodes. Computational nodes have the main task of processing and performing calculations. In selecting the computational nodes of a cluster, it is important to consider factors such as the number of processors, the number of cores per processor, the amount of main memory or RAM, local memory, and the graphics processor.
• Cluster Network Hardware
For a cluster to function well, it needs to have a fast internal connection between its nodes and support high bandwidth and minimal latency using fast internal communication technologies. For this reason, network selection in a cluster is important. Choosing the right network infrastructure for the cluster is important to several factors including operating system, performance and efficiency, price and compatibility with the cluster hardware. There are two criteria for bandwidth and latency to measure the performance and efficiency of internal communication in the cluster. Various technologies are used to communicate in clusters and create internal networks. Also, some clusters use sequential technologies to establish internal network communication in the cluster [12]. Some of the most used technologies are:
i. Ten Gigabit Ethernet
ii. Infiniband FDR
iii. Aries interconnect
iv. Infiniband EDR
v. Intel Omni-Path
vi. Twenty Five Gigabit Ethernet
vii. Custom Interconnect
viii. Myrinet
ix. Giganet Clan
x. QsNet II
xi. Scalable coherent interface
The first seven are the technologies that are most used in the list of 500 high-performance computing systems [13], and in general, it can be said that Gigabit Ethernet technology is the most widely used technology for creating internal communication in cluster networks. Table 3 lists some of these technologies along with some of their features [12]
Table 3 Some internal communication technologies in a cluster




Gigabit Ethernet GiganetcLAN Infiniband Myrinet QsNet II Scalable

Coherent Interface


Bandwidth in megabytes per second < 100 < 125 850 230 1064 < 320
Delay rate in microseconds < 100 7-10 <7 10 < 3 1-2
Number of nodes it can support 1000’s 1000’s > 1000’s 1000’s 4096 1000’s
Linux operating system support yes yes yes yes yes yes
Virtual interface architecture NT/Linux NT/Linux Software Linux None Software
Message transfer interface support MVICH over M-VIA, TCP 3rd Party MPI/Pro Quadrics 3rd Party

Also, in some clusters, combined technologies are used according to the need for which the cluster is designed.
B) Cluster software components
One of the main software components of a cluster is the operating system. Also the programming model of the cluster system and middleware are other two important parts. This section describes these three components.
• Cluster Operating System
The operating system used in a cluster must have the features of resource management, maintaining system stability, efficient and optimal performance, scalability and scalability. Operating systems are offered in both free and commercial forms, including free operating systems such as Linux and Mosix. One of the most widely used operating systems used to design and build clusters is the Linux operating system [12], that 276 of the world’s top 500 supercomputers, or about 53% of them use Linux operating system [13].
• Cluster Programming Model
The cluster programming model follows the parallel programming model. In this programming model, simultaneous and parallel execution of the program, which is broken into several parts, is performed. Two examples of the most common programming models in the cluster are PVM and MPI. In fact, using these programming models, it is possible to send and exchange messages between different cluster processors.
• Cluster Middleware
Cluster middleware provides the conditions for the system to look integrated. The firmware is responsible for managing, organizing, and aggregating the various resources available in the cluster, and itself consists of several different components, including operating systems, hardware, parallel programming environments, and various programs and subsystems, resource and scheduling management system and application time management system. The resource management system provides the conditions for programs and tasks to be executed without the user being aware of the various complexities of its implementation. In fact, it provides integrated view, allowing the user to see the entire system seamlessly. Various resource management systems have been developed, including Condor, Libra, Load leveler and LSF [12].
4.2.2. Cluster Computing Summary
Clusters are one of the most widely used architectures for building high-performance systems. According to reports, 437 of the world’s top 500 supercomputers use cluster architecture [13].

4.3. Grid Computing
Grid is a collection of different computers with different computing power that are connected to each other through a network and form a supercomputer with high computing power. Computers that are members of grid can be located in different workstations and not geographically in one place. Therefore, grid systems are heterogeneous systems that the necessary coordination between different systems between them is done through the grid middleware.
Grid has a layered architecture. In one of the architectures proposed for grid, it consists of four layers. In this layered architecture, the underlying layer, known as the fabric layer, contains all of the grid resources, and through the grid middleware, which is the main part of the grid system, with the outermost layer of this architecture, the application layer that includes users and applications communicates. The grid firmware itself consists of two layers, which are placed between the two layers of fabric and the application layer. These two layers are called the central middleware and the user level middleware, respectively, from bottom to top [14]. Each of these four layers is briefly described below. Figure 12 shows this layered architecture.

Figure 12 Grid multi-layered architecture
• Fabric Layer
The lowest layer in the grid layer architecture is the main layer or fabric. All heterogeneous sources of taurine are located in this layer. These resources include computer systems, databases, and local administrators each.
• Connectivity Layer
This layer is exactly like a distributed system, one of its functions is to hide the differences in access to similar but heterogeneous resources from the view of the upper layer. Other tasks of this layer are establishing security in the grid system, as well as managing resource allocation and managing the registration and search of information in grid.
• Globus
The main task of this layer is to manage the resources and tasks and requests of the user. Also, another task of this layer is to enable programmers to produce different applications. These include programming languages, compilers, libraries, and runtime environments and systems.
• Application Layer
The top layer of the grid multilayer architecture is the application layer in which applications and portals are located.

As mentioned, one of the main parts of grid is its middleware. In fact, middleware is a computer software that connects computers, software components and various applications to each other and includes a set of different services and this possibility provides different computational processes opportunity to run on several different machines.
So far, various middleware has been developed around the world, including Globus, Glight, D-Cache and EMI.
5. Proposed HPC Design
In this section, according to the computational needs of different data analysis software and the possible rate of data production in each beamline of the first stage of launching Iranian Light Source Facility, a high-performance computing system to perform the calculations required by data analysis software is suggested. In addition to periodic upgrades, the hardware and software of this system must be upgraded and improved by increasing the number of beamlines.
For the following reasons use of cluster computing is recommended:
• Has a lower implementation cost than performance.
• The price of hardware and software is reasonable and relatively low.
• The cost of supply and maintenance is low.
• It is possible to develop and update the system at a relatively reasonable cost.
• The system development can be done easily by increasing the need.
• If necessary, it is possible to use this system in the form of a grid system or cloud.
• Most high-performance computing systems use this type of architecture.
• A number of synchrotrons including Diamond light source and Sesame also use cluster computing for their calculations.
5.1. Proposed Computing Cluster Hardware and Software
One of the types of clusters that have different applications in meteorological systems, seismography and various sciences is the Beowulf cluster which has a relatively good performance. This type of class has a simple architecture that consists of a server node and several computational nodes and a network for communication between nodes.
The type of hardware of the server node and computing nodes, the type of network technology, the operating system and the type of programming model of the proposed cluster are given in Table 4.
Table 4 The type of hardware and software used in the proposed cluster

Item Specification
Server type Dell PowerEdge R710 (Better or Similar one available at the implementation time)
Server CPU type Intel Xeon (Better or Similar one available at the implementation time)
Server Memory 32 GB
GPU NVIDIA TESLA K20(Better or Similar one available at the implementation time)
Computing node 64
Computing node CPU type CPU intel coreTM i7 (Better or Similar one available at the implementation time)
Per node memory 128 GB
Hard disk capacity for server 2 TB
Hard disk capacity for each node 1 TB
Network 10 Gigabit Ethernet + InfiniBand
Operating System Linux
Parallel Environment MPI

In the following, a brief explanation is given about each of the selected hardware and software components for the proposed system and the reasons for choosing each one.

• Server and Computing Node
The processors used in the design of the proposed cluster are processors made by Intel. The operator of this cluster has an 8-core Zeon processor that has good computing power. Also, 32 GB of main memory or RAM is provided for this server. In addition to the 8-core Zeon processor, the server is also equipped with NVIDIA Tesla graphics cards. According to the type of possible experiments and the approximate estimate of the volume of production data in the seven beamlines of the first stage of launching ILSF, currently 64 computational nodes have been considered for this computational system. Naturally, the number of nodes can be changed as this information is updated. The current number of cores per node is 32 cores, but due to technology upgrades and lower cost of processors with more cores, this number can be increased during the actual implementation. In this case, assuming a constant budget, a compromise can be made between the number of nodes and the number of cores per node. More nodes contribute to the redundancy of this computing system. More cores in a node allows the processing of data through the high-speed internal bus of that node and by imposing less load on the Gigabit network or INFINIBAND computing cluster. In cluster implementation for proper use of processing resources and optimal performance, a minimum of 4 GB of RAM per core is recommended.

• Network Technology
Another key component in computing cluster design is its network technology. Given that the computing system must be able to have acceptable computing power, the connection between the nodes of the computing system, which can be the bottleneck of the performance of this system, is of particular importance. Considering current needs and to maintain flexibility and scalability in responding to new needs, there must be a rapid internal connection between cluster nodes. One of the most widely used networking technologies is Gigabit Ethernet technology, which is also used in most computing clusters, including those used in the Diamond Light Source. This technology has a high data transfer rate and low latency. Having the ability to communicate through INFINIBAND, provides more redundancy and the possibility of communication with less latency than Gigabit Ethernet. It is recommended to have this feature in applications where there is a lot of storage and processing load at the same time

• Operating System
The proposed operating system for this cluster is the Linux operating system. The reason for choosing the Linux operating system is that it is an open source operating system and is easily and freely available to consumers. Most beam analytics software also supports this operating system. Also, this operating system is well compatible with Gigabit Ethernet network technology.

• Programming Model
MPI is a standardized message-passing interface designed by a team of academic and industrial researchers to work on a wide range of parallel computers. Due to the widespread use of this parallel programming communication contract in analytical software with parallel processing capability, providing the possibility of developing and executing such programs requires the implementation of MPI in the computational cluster system.

Figure 13 General architecture of the proposed computational cluster system

Figure 14 Proposed hardware cluster configuration for data analysis of ILSF beamlines

6. Conclusion
For analysing the data produced in experiments at ILSF beamlines, the general need includes hardware components like graphics card, processor, main memory and external storage.
The pertinent hardware components were considered for data analysis and simulation using various software used at the beamlines, including Avizo, TomoPy, HLK2000, FIT2D, HipGISAXS, XMAS and MIDAS that was presented in this document briefly. The SMP (Symmetric Multi-Processor), Cluster Computing, and also Grid Computing, are the different HPC ( High Performance Computing) scenarios that were studied. Finally, a suitable HPC design is proposed for responding the beamlines data computational requirements of the ILSF’s future beamline scientists.

[1]: PaNdata Software Catalogue. [Online]
[2]: Software. THE home for Small Angle Scattering. [Online]
[3]: HipGISAXS: a high-performance computing code for simulating grazing-incidence X-ray scattering data. Chourou, Slim T. 2013.
[4]: HKL-2000 Manual Edition 2.6. 2007.
[6]: Hammersley, A. P. FIT2D: An Introduction and Overview. 1997.
[7]: System Requirements. QXRD – Readout Software for Flat Panel X-Ray Detectors . [Online]
[8]: QXRD – Real Time Readout, Visualization and Data Reduction for powder diffraction and SAXS. APS. [Online]
[9]: Data Analysis WorkbeNch (DAWN). Basham, Mark. s.l. : Journal of Synchrotron Radiation, 2015, Vol. 22.
[11]: Study of Cluster, Grid and Cloud Computing. Vinayak Shinde, Amreen Shaikh, Chris Donald D’Souza. 10, 2015, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 4, pp. 445-448.
[12]:Chee Shin Yeo , Rajkumar Buyya , Hossein Pourreza , Rasit Eskicioglu , Peter Graham , Frank Sommers. Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers. handbook of Nature-Inspired and Innovative Computing: Integrating Classical Models with Emerging Technologies. New York : Springer, 2006, pp. 521-551.
[13]: TOP500 Supercomputer Sites. [Online]
[14]: APS Scientific Computing Strategy. s.l. : The Advanced Photon Source, 2017.