Archive for the ‘regular update’ Category

More Images & a New MakeGalaxy Code

Sunday, February 5th, 2012

I have had some more time to struggle with practice using Sunrise now. I've managed to sucessfully run the radiative transfer on several GADGET-2 snapshots of merging systems. This required me to make a minor edit to the code.  There was a seemingly arbitrary speed limit in the code which caused it to exit if any particles moved faster than 1000 km/s. I had to increase the speed limit in order to prevent the code from exiting.  I've contacted Patrik to see if there was a technical reason for the 1000 km/s speed limit. I wouldn't want to break anything.  The code has now been recompiled with CUDA support.  The CUDA version runs noticeably faster, as expected.

In other news, T.J. Cox just made the MakeGalaxy code public on Bitbucket. Someone has uploaded a new flexible version of CombineGalaxies to Bitbucket as well. To download the Mercurial repositories use:

hg clone bitbucket.org/lutorm/makegalaxy
hg clone bitbucket.org/surftour/flexcombinegalaxies

So now all of the codes that I'm using are public except for P-GADGET-2.

Here are some images of two spiral galaxies in the process of merging. This is a single snapshot from three different viewpoints after the "first pass."  The color images were generated from the FITS file using Patrik's Python code as described here. I used scale=(1.002,0.647,1.1) for the RGB intensity scale and then I modified the output images by adding a very slight Gaussian blur. The fourth image below is a screenshot of the system monitor during a memory-intensive part of a high resolution run.

First Sunrise Images

Sunday, January 29th, 2012

After spending quite a bit of time moving to a new apartment (cleaning, packing, moving, more cleaning, and unpacking), I have had time to try using Patrik Jonsson's Sunrise code.  The instructions on the Sunrise wiki were quite helpful. I ran into very few problems in the process. The biggest "problem" was a minor bug that took two minutes or so to figure out; the code requires the environment variable HOST to be defined. The code tries to handle the possibility that the name of the host computer is stored in the variable HOSTNAME (the default situation in Ubuntu), but there's a problem with the implementation:

const char* h=getenv ("HOST");
if(h == 0)
  h = getenv ("HOSTNAME");
assert(h != 0);
string host (h);

First, h is declared as a const and initialized to the value stored in HOST. If this fails (i.e., if HOST is not defined), the code tries to set h to the value stored in HOSTNAME. Of course h is a constant, so it can't take on any other value! I'll see if the problem described in the previous post still exists. If it does, I'll report both issues to Patrik on the Sunrise mailing list.

Here are the first few images that I generated using Sunrise. The images are only 300x300 pixels and the merger remnant isn't centered properly, but it's a start!

Now I need to play with parameters and learn more about manipulating the FITS files and analyzing spectra.

Sunrise is Compiled!

Friday, December 30th, 2011

I have finally managed to compile the latest (default) version of Sunrise! I had to tinker with it periodically over the course of a few days, but I finally managed to do it.  The current version of the code in the Mercurial repository has evidently been designed to compile only with Arepo support, but I do not have Arepo and I will not be using Arepo for my project.  It seems that compiling the code without Arepo is planned, but not fully implemented in the current version.  There are several errors that occur when Arepo is absent.  In order to get the code to compile, I had to extensively edit the pre-processor directives.

There may be a cleaner  way of doing this, but this works:

First, in the config.h header, I had to uncomment the line:

#undef WITH_AREPO

Then, I wrapped each block of code containing the word "arepo" in

#ifdef WITH_AREPO

blah blah arepo blah

#endif

Some (but not enough) pieces of code were already wrapped in this way. In at least two source files, it was done incorrectly (ir_grid.cc and aux_grid.cc). Once I had properly prevented all Arepo-specific code from being compiled, the compilation was successful.  I now have the three executables sfrhist, mcrx, and broadband.  I haven't tried running the exucutables yet---except for the very simple test of seeing whether they ask for input files.  I also haven't attempted to compile with CUDA support, but both of these tasks should be completed soon and I'll find out if there are any run-time errors in this version...fun!

 

And Now, A Two-Point Cubic Spline

Friday, November 11th, 2011

The simple quadratic spline discussed my previous post seems to be sufficient for interpolating particle positions between two GADGET-2 snapshots for the purposes of making animations, but it's also possible to use a cubic spline for the task.  The advantages of using a cubic spline are (1) it can be more accurate---especially in cases where the acceleration changes sign once between snapshots, and (2) the derivatives at the end-points of the spline match the derivatives specified in the input data. In the case of the quadratic spline, if the acceleration (second derivative) changes sign even once during the interval, the interpolation is poor and the derivative of the spline at the end points is not guaranteed to match the specified value. Thus, in the case of the quadratic spline, the derivative is generally discontinuous when moving from one interval to another. This doesn't matter too much when making animations with reasonably small intervals, but it would be important for certain other situations. The disadvantage of the cubic spline is that it adds complexity.

Cubic spline interpolation seems to be quite popular. Derivations of the general method can be found in most textbooks on numerical methods. I'll just discuss the specific result that's useful for my application---the cubic spline in terms of the end-point positions and velocities. To derive the result, we just have to slightly modify the derivation given in the previous post. We add a jerk term \mathbf{j}(t) (third derivative of the position with respect to time) to the third and fourth Taylor expansions,

\mathbf{r}_{0}\approx\mathbf{r}(t)+(t_{0}-t)\mathbf{v}(t)+\frac{1}{2}(t_{0}-t)^2\mathbf{a}(t)

\mathbf{r}_{1}\approx\mathbf{r}(t)+(t_{1}-t)\mathbf{v}(t)+\frac{1}{2}(t_{1}-t)^2\mathbf{a}(t)

\mathbf{v}_{0}\approx\mathbf{v}(t)+(t_{0}-t)\mathbf{a}(t)+\frac{1}{2}(t_{1}-t)^2\mathbf{j}(t)

\mathbf{v}_{1}\approx\mathbf{v}(t)+(t_{1}-t)\mathbf{a}(t)+\frac{1}{2}(t_{1}-t)^2\mathbf{j}(t)

and solve the system of equations for \mathbf{r}(t). There are, of course, several ways of doing this. I'll skip the details and show the end result. I defined

\delta t \equiv t_1-t_0,\qquad \tau_0 \equiv t_0-t,\qquad \tau_1 \equiv t_1-t

In terms of these definitions,

\displaystyle\mathbf{r}(t)=\frac{\mathbf{r}_0\tau_1^2+\mathbf{r}_1\tau_0 ^2+\tau_0\tau_1(\mathbf{v}_1\tau_0 ^2-\mathbf{v}_0\tau_1^2)/\delta t}{\tau_1^2+\tau_0^2}

This can be written more compactly as

\displaystyle\mathbf{r}(t)=\frac{\mathbf{r}_0+\rho \mathbf{r}_1+(\rho \mathbf{v}_1-\mathbf{v}_0)\tau_1\tau_0/\delta t}{\rho+1}

where

\rho\equiv\frac{\tau_0^2}{\tau_1^2}=\frac{(t_0-t)^2}{(t_1-t)^2}

As an example, the cubic function generated by this technique when given values corresponding to \sin(x) at x=0 and x=2\pi is

\displaystyle s(x)=\frac{(2\pi-x)(\pi-x)x}{x^2-2\pi x+2\pi^2}

In the plot below, \sin(x) is the blue curve and s(x) is the dotted black curve.

Quadratic Spline Interpolation

Thursday, November 10th, 2011

While animating my first GADGET-2 galaxy mergers, I discovered that setting the frame rate to the common value of 25 frames per second resulted in movies that were too fast. The complexity of the merger process couldn't be appreciated--even though the snapshot interval was quite small (0.005 simulation time units). I had to interpolate the GADGET-2 snapshots in order to produce intermediate frames that could be used to slow the animation down. Linear interpolation didn't work well enough to meet my standards, so I looked for an efficient quadratic interpolation scheme. I wanted to perform a quadratic interpolation using only two snapshots at a time (a quadratic interpolation using two snapshots is possible in this case because each snapshot contains particle position and velocity information). The first thing that came to mind was simply using the familiar expression

\displaystyle\mathbf{r}(t)=\mathbf{r}_0+\mathbf{v}t

to take a half-step forward from the first snapshot and a half step backward from the second snapshot and then average the two results together to estimate the likely position of each particle in the snapshot like this:

\displaystyle\mathbf{r}_{1/2}=\frac{1}{2}(\mathbf{r}_0+\mathbf{r}_1)+\frac{1}{2}\left(\mathbf{v}_0\frac{\delta t}{2} - \mathbf{v}_1\frac{\delta t}{2}\right)

Where \delta t is the time interval between snapshots 0 and 1, \mathbf{r}_{1/2} is the position of a particle at time \delta t/2 after snapshot 0, \mathbf{r}_0\mathbf{r}_1, \mathbf{v}_0, and \mathbf{v}_1 are the positions and velocities of the particle at snapshot 0 and snapshot 1. This method is slightly better than linear interpolation, but not as good as using a quadratic spline. I tested it out and the results were disappointing. I searched my computational physics and numerical analysis textbooks and searched the web, but I couldn't find what I was looking for, so I derived the method myself. My final result was

\displaystyle\mathbf{r}(t)=\frac{1}{\delta t}\left[\mathbf{r}_0(t_1-t)-\mathbf{r}_1(t_0-t)\right]+\frac{1}{2\delta t}\left(\mathbf{v}_1-\mathbf{v}_0\right)(t_1-t)(t_0-t)


This is very closely related to a quadratic Bézier curve. The derivation follows:

The Taylor expansion of a function f(x) about the point x=x_0 is

\displaystyle f(x)=f(x_0)+(x-x_0)f'(x_0)+\frac{1}{2}(x-x_0)^2 f''(x_0)+\cdots

It is possible to approximate the value of the function at x_0 using a Taylor expansion by swapping x with x_0 in the expansion above and truncating the series:

\displaystyle f(x_0)\approx f(x)+(x_0-x)f'(x)+\frac{1}{2}(x_0-x)^2 f''(x)

Writing expansions for \mathbf{r}_0, \mathbf{r}_1, \mathbf{v}_0 and \mathbf{v_1},

\mathbf{r}_{0}\approx\mathbf{r}(t)+(t_{0}-t)\mathbf{v}(t)+\frac{1}{2}(t_{0}-t)^2\mathbf{a}(t)

\mathbf{r}_{1}\approx\mathbf{r}(t)+(t_{1}-t)\mathbf{v}(t)+\frac{1}{2}(t_{1}-t)^2\mathbf{a}(t)

\mathbf{v}_{0}\approx\mathbf{v}(t)+(t_{0}-t)\mathbf{a}(t)

\mathbf{v}_{1}\approx\mathbf{v}(t)+(t_{1}-t)\mathbf{a}(t)

Eliminating \mathbf{a}(t) using the third and fourth equations:

\displaystyle\mathbf{a}(t)\approx\frac{\mathbf{v}_{1}-\mathbf{v}(t)}{(t_{1}-t)}\approx\frac{\mathbf{v}_{0}-\mathbf{v}(t)}{(t_{0}-t)}

The first two equations then become

\mathbf{r}_{0}\approx\mathbf{r}(t)+\frac{1}{2}(t_{0}-t)[\mathbf{v}_{0}+\mathbf{v}(t)]

\mathbf{r}_{1}\approx\mathbf{r}(t)+\frac{1}{2}(t_{1}-t)[\mathbf{v}_{1}+\mathbf{v}(t)]

Now we eliminate \mathbf{v}(t) between the two equations

\frac{2\mathbf{r}_{0}-2\mathbf{r}(t)}{(t_{0}-t)}-\mathbf{v}_{0}=\frac{2\mathbf{r}_{1}-2\mathbf{r}(t)}{(t_{1}-t)}-\mathbf{v}_{1}

Solving for \mathbf{r}(t) and defining \delta t=t_1-t_0 yields the result stated above.

This is a decent approximation for the motion of each particle between snapshots. In fact, it's a pretty good approximation for any function with specified end point values and first derivatives as long as the sign of the second derivative doesn't change during the interval.

For demonstration purposes, the blue curve below is \sin(x). The black dashed curve is the interpolating parabola generated by the expression derived above for endpoints at x_0=0 and x_1=\pi. The red dashed curve is the interpolating parabola for endpoints at x_0=0 and x_1=\pi/2

Density profile calculator

Thursday, February 24th, 2011

I had to find an efficient way to plot the density profiles of spherical systems of point particles of varying mass.  Here's a summary:

First note that the mass enclosed in a sphere of radius r is

\displaystyle M(r)=4\pi\int_0^r\rho(r')r'^2 dr'

I'm trying to find \rho(r), so I'd like to get rid of the integral sign. Thus I differentiate with respect to r

\displaystyle \frac{dM(r)}{dr}=4\pi \rho(r)r^2

then 

\displaystyle \rho(r)=\frac{1}{4\pi r^2}\frac{dM(r)}{dr}.

The task has been reduced to the problem of computing {dM(r)}/{dr}, which is rather straightforward.  If a fast sorting algorithm, such as quicksort, is used to sort the particles by distance from the center of the spherical distribution (i.e. radius), then a mass array M(i) can be quickly constructed such that each entry i is the mass contained between r and r+i\delta r for 0\leq r\leq R, where R=n\delta r and n is the number of elements in the array. Then a finite differencing scheme can be used to compute the derivative {dM(r)}/{dr} at each point. Finish by properly handling the case for which the derivative at i is vastly different from the derivatives at the surrounding points i-1 and i+1. This problem can be mitigated by using (at least) a central differencing scheme and choosing an appropriate value of \delta r; it should be no smaller than the resolution limit of the particle distribution in the region where the density is being computed/plotted.

There are, of course, several other ways of computing the density profile.  In particular, if the potential has already been computed everywhere, then the density distribution can be inferred from that. The method described above seemed to be the best method for my particular code since the quicksort algorithm typically scales as \mathcal{O}(N\log N) and all of the other operations either scale as \mathcal{O}(N) or \mathcal{O}(n).

UPDATE:  Here's a plot of the density profile using only 10,000 particles and no smoothing (blue line) compared to the exact analytic density profile from which the distribution was built (red line).  Density is in arbitrary units and is scaled by the central density, so \rho = \mbox{const} \rho(r)/\rho_c.

When cloud-in-cell smoothing was added (green line below), the result was roughly the same as the un-smoothed version, but with less random variation and some systematic differences; the gradient in the particle number density caused a slight bias toward higher density at larger radii and lower central density. This is expected behavior because the particle distribution has been "blurred" in this case, which softens the density gradient.

Paper in preparation, P-GADGET-2, & Git

Monday, February 21st, 2011

Last week I finally obtained a copy of the full-physics version of GADGET-2.  According to Desika Narayanan, it's a version which has been verified to work with Sunrise. Once I'm no longer busy with other things, I'll start working with it.

I have almost finished writing a cycle 19 HST theory proposal. The deadline is in a few days.

I'm in the early stages of working on a paper describing my qualifier project.  After reading through the literature from the last 34 years and thinking about the deficiencies in my project as it stands, I've decided to further modify the code to include several more features.  Yesterday I wrote an energy tracking function and I've begun writing a density profile plotter.  I've also improved the performance of the code somewhat.  Functions for plotting the line-of-sight velocity distribution and for computing the skewness of the distribution will be added shortly.  I may also try writing a hierarchical time-stepping scheme to improve performance further.

I've decided to maintain the qualifier code using a revision control system.  I've used Subversion for projects in the past, but Mercurial and Git seemed to be much better options.  I ultimately decided to use Git.

Git Tutorial,  Git Manual

Git should also come in handy when collaborating on LaTeX documents.

Crunch

Monday, February 14th, 2011

Dr. Gillian Wilson, one of my thesis committee members, was kind enough to provide me with funding to build a workstation for my project. I needed a fast shared memory machine with at least 32 GB of RAM in order to run Sunrise efficiently. After configuring several systems on the HP, Dell, Apple, and other websites, I discovered that it would be considerably more cost-effective if I designed and built the machine myself. Ideally, I would have built a system consisting of 2 or 4 of Intel's latest Xeon processors and an Nvidia Tesla c2050 card, but I tried to keep the price as low as possible for the level of performance necessary.  I eventually obtained the following combination of components:

Motherboard: Asus KGPE-D16 Dual Socket G34 AMD SR5690 SSI EEB 3.61
Processors: 2x AMD Opteron 6172 Magny-Cours 2.1 GHz  (24 cores total)
Memory: 16x 4 GB DDR3 1333 unregistered DRAM (64 GB total)
Graphics: EVGA Nvidia GeForce GTX 580 (containing 512 CUDA cores and 1.5 GB GDDR5 )
System disk: 64 GB Crucial RealSSD C300
Data disks: 2x 1.0 TB Western Digital Caviar Black WD1002FAEX
Chassis: Intel 5U Server Chassis SC5650WSNA, with 1000W PSU

Note 1: The Magny-Cours Opteron processors are slower per clock cycle than the current Intel Xeons, however (1) the performance of the Magny-Cours systems scales better with core count than Xeon-based machines, (2) the codes that I will be using scale rather well with thread count, (3) the Opterons are considerably cheaper for a given performance level, and (4) The Magny-Cours Opteron processors allowed me to use a large quantity of inexpensive unregistered memory modules.  The only Xeons that support this amount of unregistered memory are approximately twice as expensive as the Opterons. To summarize, I could have built an equivalent machine with fewer Xeon cores running at a higher clock rate, but such a system would have been considerably more expensive.

Note 2: The GTX 580 graphics card is typically faster than the Tesla c2050 for CUDA as long as the problem can fit in its smaller memory.  The primary deficiency of the GTX is that it was not expressly designed for sustained HPC workloads, and thus it is likely to fail sooner under heavy workloads.  On the other hand, one could buy four GTX 580 cards for less than the price of one Tesla c2050.

I've tentatively named the system "Crunch".  It's running Ubuntu 10.04.2 LTS with the 2.6.38-rc4 Linux kernel.  I'll likely upgrade to the final 2.6.38 kernel when it's released and keep  the kernel version unchanged until I finish my Ph.D. work. I have installed all of the development tools, compilers, and libraries that I think I'll need, including the latest Nvidia driver and CUDA Toolkit.  I have successfully compiled and run sample CUDA programs from the CUDA SDK, as well as my qualifier project code, the ray-tracing code POV-Ray 3.7, and benchmarks from the Phoronix Test Suite. I also installed Apache2, PHP5, MySQL, and many modules so that the system can be used as a web server. The latest version of WordPress has been installed on the web server.  The server could potentially be used for data-sharing.

Here are the outputs of bandwidthTest, deviceQuery, hwinfo, lshw, lsmod, lspci, and  lsusb.

For a listing of all software installed from the Ubuntu repositories (i.e. all pre-compiled packages), click here.

Here's a screenshot from the machine's first parallel code execution:

Running my qualifier project code remotely and viewing the gnome-system-monitor using ssh -X.

Some images taken during the assembly process:

The processors (top-side)

Processors (under-side)

16 DIMMs

GTX 580

A populated server board

Heatsink and fan

Almost finished

The finished product

Beamer

Saturday, December 11th, 2010

I used the Beamer \LaTeX class to create the presentation file that I used for my advancement exam.  Beamer is an easy to use tool that makes very nice presentations.  It's much less of a hassle and yields better results than OpenOffice Impress or MS PowerPoint. The source to my presentation can be found here:

beamer-presentation-source.zip

The full source, including the videos that I showed during the presentation, can be found here:

beamer-presentation-full.zip

Hopefully these can be of use to someone who is learning to use Beamer. Having examples is a good way of learning. To compile the document, run the shell script called compile.sh or manually issue the commands that you find in the script. Alternatively, a \LaTeX editor such as Kile can be used to compile the document as long as you tell it to use pdflatex.

The output PDF will end up looking like this.

Simulation Lectures

Monday, May 3rd, 2010

I designed a series of three lectures for my fellow astronomy / astrophysics students here at UC Riverside.  The purpose of the lectures was two-fold:

  1. To introduce the students to the essentials of N-body methods and computational fluid dynamics.
  2. To motivate me to study the subject in more detail and breadth.

Having finished preparing for the lectures, I can say that the second goal has definitely been met.  Two of the three lectures were presented last week and the feedback has been positive, so it seems that the first goal was also met. Tomorrow is the third and last lecture of the series. After that, I will be able to spend time focusing on my research project once again!

The PDF presentation can be found at www.idius.net/lectures/N-body-methods.pdf.