Trix & Graphix

Columns to rows and vice versa in ASCII files

Imagine you have a table with data within an ASCII file. Let's asume your table is homogenious (otherwise you can take a look at this post. You may want to turn it into just one row, or just one column. I'm going to explain how to do it in two different ways using two different Unix tools: awk and tr.

- tr version:

This is the easiest way. In order to turn the file into just one row, the "\n" characters have to be removed. This can be done with the command:

tr "\n" " " < filein.asc > fileout.asc

Similarly, in order to turn the file into a column you can change the "\n" characters as field separator:

tr " " "\n" < filein.asc > fileout.asc

- awk version

Focusing now in the awk version, you can use

awk '{for(i=1;i<=NF;i++) print $i}' filein.asc > fileout.asc

in order to turn the file into a column, or

awk '{for(i=1;i<=NF;i++) printf "%s ", $i}' filein.asc > fileout.asc

to turn the file into just one row.

Easy, isn't?

Tips for cleaning up an ascii file

Suposse you have a table in ASCII file such as this one:

element1x1 element1x2 element1x3

element2x2 element2x2 element2x3

element3x1, element3x2, element3x3

It's full of undesirable heterogeneities: tabs, comma instead of just spaces as columns delimiters, undesired spaces, empty lines... How can you homogenize this file using simple command-line Unix tools?

Well, the first thing is to remove the tabs. This is easy using the tr tool, which substitutes some character by others. To remove the tabs just type

tr "\t" " " < file.asc

and this will return a copy of the file but without tabs through the standard output. Similarly, to remove the "," symbols you can pipe the last command:

tr "\t" " " < file.asc | tr "," " "

Next thing is to remove the emply lines and some undesired spaces. This may be done using awk:

tr "\t" " " < file.asc | tr "," " " | awk '{ if(NF>0) {for(i=1;i<=NF;i++) printf "%s ", $i; printf "\n"}}'

Finally, last command adds an undesired space at the end of each line. It can be removed using sed:

tr "\t" " " < file | tr "," " " | awk '{ if(NF>0) {for(i=1;i<=NF;i++) printf "%s ", $i; printf "\n"}}' | sed 's/ $//'

Well, this is all. The final output you get through the standard output after running this command is

element1x1 element1x2 element1x3
element2x2 element2x2 element2x3
element3x1 element3x2 element3x3

Yes, I know, I have used a lot of tools with few explanations. Maybe in some post in the future I will explain some these tools in more detail...

Strings in BASH

This post is to show some really nice features I have just found regarding the management of strings in Bash scripts. How have I been abled to live without this? :-)

Strings index
An array may be splitted using indexes. The sitanxis is something like ${variable:indx1:elemensts}, where inx1 is the first element on the substring you want to get (the first element is numbered 0, as in C), and elements in the number of elements you want to remain. Maybe it's easier to understand with some few examples:

echo ${var:0}

echo ${var:4}

echo ${var:4:1}

echo ${var:4:2}

Note that ${var:0} is equivalent to ${var}. A really interesting feature is that you may index from the right
echo ${var: -2:2}
Note the blank space before the minus sign!!!

Replacing elements (regular expresions-like)
Another interesting feature is that you can use Bash to replace some elements of the string. This is a thing that people use to do with sed or awk. Nevertheless for easy tasks the direct form using Bask I'm going to explain may be useful. The sytanxis is ${variable/substringToBeReplaced/stringReplacing/}. Let's see some exmamples:


echo ${var/123/XY}

echo ${var//123/XY}

echo ${var/23}
As you can see the sintaxis is similar to awk. By default, only the first time that the substring is found from the left it is replaced. Using "//" you can force to replace all the instances of the substring in the main string. Finally, if only one string is specified, it's simply removed (replaced by a null string).

Finally, if you want to replace, but beginning from the right, "%" has to be appended:


echo ${var/#123/XY}

echo ${var/%123/XY}

Note that the simbol "#" means "reading from the left". In general it is not necesary as this is the default behaviour.

Just two more examples. In order to get the name and the extension of a file:


echo "The filename is ${var/%.*}"
The filename is name

echo "The extension is ${var/#*.}"
The extension is ext

Here you can see that "*" can mean the beginning as well as the end of the string.

I hope it may be useful for you :-).

Removing the transparent background when converting formats in Imagemagick

By some reason, the version of Imagemagick I have in my workstation (6.4.3) sets the background to transparent when I convert EPS files to PNG format (for example the EPS files from GMT or gnuplot).

Although it's in general a desirable feature, sometimes you just want a plain white background. In order to get it, I have found a solution (there must be many others, but this one just works for me), which consists in adding this options:

convert -layers flatten figure.eps figure.png

Boundary annotations in GMT

In the last post, I tried to give a general description of GMT. In the next few post I'm going to try to explain, by means of several examples, some details of the use of these tools.

In this first example, I'm going to use pscoast to create a simple map of Europe. Mostly I'm going to focus in how to customise annotation boundaries. In the next post I'll explain in more detail some flags of pscoast.

This is the output we want to get:

And this is the simple bash script which generates it

# Bash script for using GMT

# Some environment variables
gmtset PAPER_MEDIA a5+
gmtset PAGE_ORIENTATION portrait
gmtset PLOT_DEGREE_FORMAT ddd:mm:ssF
gmtset DEGREE_SYMBOL none

# The only command of this plot
pscoast -Jl15/35/30/60/1:20000000 -R-15/45/35/70 -A0/0/1 -Ba5g5/a5g5nSEW:."Europe": -Ggrey > map.eps

# To convert the EPS file into png
convert -density 100 map.eps map.png

The main task is developed by pscoast, which uses 5 flags (-J, -R, -A, -B and -G). By the way, the order of these flags is not relevant. Before that, there are some options which controls the general behaviour of GMT and the aspect of the final result, like fonts styles, colors, formats and so on. Finally Imagemagick converts the EPS file to a more friendly png image.

The explanation of the flags is as follows:

J defines the projection, meanwhile R defines the region you want to plot. Of course, depending on the region you want to plot, you have to chose the adequate projection. I have chosen the Lambert Conic Conformal projection, in which you have to set the central coordinates, two true latitudes and the scale. It's beyond the scope of this blog to explain the exact meaning of these parameters, sorry. I have chosen -Jl15/35/30/60/1:20000000 because 15 degrees East is the central latitude of the region I have chosen (-R-15/45/35/70), and 30 and 60 as the two true latitude just because they are inside the domain I want to plot.

A is to select whether you want or not to show small water regions, like lakes. It's a bit confusing and in fact I don't understand it very well. I just use it with those options as far as I have found out it removes the lakes...

G is to say GMT to plot the land areas. The option grey is just to set the colour you want. You can also use the RGB model, for example G255/0/0 is red.

Finally, B is to specify the boundaries of the plot. It's quite important to get nice maps. I have used -Ba5g5/a5g5nSEW:."Europe":, which may appear to be really complex, but don't worry, isn't so hard ;-). a5 says GMT to set the degree label every 5 degrees. g5 says it to draw the white-black rectangles every 5 degrees, and also to plot the inner grid lines along the map. If you use f instead of g, you removes the inner grid. I use it twice, a5g5/a5g5, because first one is for horizontal and second for vertical boundaries (if you only set it once, both horizontal and vertical share same options). Later I add nSEW. This is to say in which axis you want to set degree annotations, lower case means no annotations, and upper case means annotations. Then, for example, NsWe sets annotations only in West and North. Finally, :."Europe": adds a title to the map.

A last comment regarding the general options of the map. By default, these are the options. gmtset allows you to modify these options. DEGREE_SYMBOL none disables the degree symbol, and PLOT_DEGREE_FORMAT ddd:mm:ssF sets the degree interval to -180/180 and adds a W instead of the minus symbol for west coordinates. HEADER_FONT_SIZE 18 sets to this size the title font and so on.

This is all. I hope this post is not too confusing. In the next one I'll try to add some more options to pscoast in order to get a more complex map.

GMT: a very brief introduction

GMT is a set of tools (command line programs) to manipulating geographic data sets, and in particular to create maps. This set of tools are a great example of the philosofy that should follow all the Unix programs: very light, fast and each tools does just one thing, but does it well. Here there are some examples of the kind of things that you can get with these tools. Maybe the bigger problem with GMT is the complexity when using it, the thousand of options and its dark syntax. In this blog I'll try to post some tips & tricks I have found out up to now... I still learn new things every day.

The general way of plotting a map with GMT is to use several tools sequentially. Each tool adds a new detail in the final result. For example, the coastlines are added with pscoast, shadings are added with grdimage, some symbols are included later with psxy and so on. When a command is called, the output of GMT is EPS code through the standard output of the terminal. Thus, you have to redirect the output after each call to GMT to the same EPF file (which is, by the way, an ASCII file). As you can understand, in order to work sucessfully with GMT, bash scripting is mandatory. Otherwise, you will become mad after just 10 minutes using it.

The syntax is nevertheless quite complex. In the next posts I'll explain some details about it. Anyway, the best place to learn about GMT is, as could be expected in a Unix program, the man pages. Just as a general comment, the syntax is something like:
command -Flag1Options -Flag2Options > file.eps
where command is one of the many GMT tools. Flag1 and Flag2 can be -D, -O, -B,... there are tens of flags which deppend on the exact tool. Finally, Options are the many options you can use for each flag. It's important to note that there is no space between the flag and the option. For example you have to use -Aa0tf10 instead of -A a0 t f10.

Well, I know, it's very complicated. I'll put some examples in the next posts...

La Ciencia en España no necesita tijeras

Ésta es mi modestia aportación a la iniciativa "La Ciencia en España no necesita tijeras" propuesta por La aldea irreductible.

Mi argumento será del tipo del que les interesan a los políticos: ECONÓMICO. Vamos a ver, ¿de qué cojones sirve que el estado se gaste un montón de dinero en formar a doctorandos, si luego no hay un proyecto científico serio en España para dar plaza a esta gente? ¿Qué pasa si recortan el presupuesto para nuevos proyectos? que cuando termine el doctorado (y como yo otros cuantos miles de estudiantes de doctorado) me quedan dos opciones:

- me voy al extranjero a seguir con mi carrera científica, en cuyo caso el estado se ha gastado mucho dinero en formarme para mandarme a que se beneficio otro país de mi formación

- me voy a la paro. En este caso, de todas formas tendré que seguir cobrando del estado, con la diferencia de que ahora me tocaré los cojones.

En cualquier caso, la Ciencia en España no necesita tijeras, sino una inversión continuada y a más largo plazo que el que marcan los putos 4 años electorales. La ciencia no se hace al ritmo que marcan los ciclos políticos ni económicos, e intentar forzar eso es estrangularla.

Plots inside other plots with Gnuplot

This is another post when I'm going to try to explain how to use the multiplot environment. In this case, I'm going to use the multiplot environment, but setting the position of the plots "by hand" in order to get the effect of one plot inside other. This may be useful if you want to show some detail of a small part of the general plot. This is the final result:

Well, the overall idea is the same than in this other post, but in this case I won't use the automatic layout. In order to set the size and position of the plot manually, you have to use the parameters size and origin, respectively. You have to know that the units of these commands are the size of the whole figure, and first number refers to x axis. Then for example, set size 1,1 means use the whole figure as the size of the plot. Analogously, set origin 0.5,0.5 sets the bottom left corner of the plot in the centre of the figure.

Other example: if you want to insert two plots, one over the other, you could set size 1,0.5 in order to reduce the vertical size to half the total, and then use set origin 0,0.5 before plotting the second graph.

This is the code to generate the figure above. I have plotted also a rectangle and an arrow. You can skip that, it's not really important, but I just wanted to show some more capabilities of Gnuplot.

gnuplot << TOEND
set terminal postscript eps color enhanced
set output 'multiplot2.eps'

#set ytics 0.25
#set format y "%.2f"

set multiplot

# Bigger plot options
set yrange [-4:5]
set size 1,1
set origin 0,0
set title 'Whole plot'
set xlabel 'time/s'
set ylabel 'variable/m'

### This is to plot the square. You can skip this ###
set arrow from 1.1,-0.9 to 1.0,0.3 lw 1 back filled
set arrow from 0.9,-3 to 1.5,-3 lw 1 front nohead
set arrow from 0.9,-1 to 1.5,-1 lw 1 front nohead
set arrow from 0.9,-1 to 0.9,-3 lw 1 front nohead
set arrow from 1.5,-1 to 1.5,-3 lw 1 front nohead

# This plots the big plot
plot 'datos.dat' w l lt 1 lc 3 lw 3 t ''

# Now we set the options for the smaller plot
set size 0.6,0.4
set origin 0.2,0.5
set title 'Zoom'
set xrange [0.9:1.5]
set yrange [-3:-1]
set xlabel ""
set ylabel ""
unset arrow
set grid

# And finally let's plot the same set of data, but in the smaller plot
plot 'datos.dat' w l lt 1 lc 3 lw 3 t ''

# It's important to close the multiplot environment!!!
unset multiplot


Array of plots in Gnuplot

In this post I will explain how to create an array of plots. This may be useful for example if you need to merge several plots in the same figure in a paper. Within this post, by figure I will mean a file which in general may contain several independent plots. The aim of this post is to explain how to get the next figure:

For this purpose, gnuplot has the multiplot environment. To use it, you just have to type multiplot once you are already in gnuplot. From that moment on, each time you type the command plot, you will get a new plot added to the same figure. You have to take into account that every new plot is independent of the former ones. This means that you can (or you have to) reset again all parameters you want to change for the next plot. If you don't reset them, they will remain as in the previous call to plot command. For example, if you set xlabel to "variable x", all plots will have in common this label for their x axis. Of course you can change this by just typing sex xlabel "whatever" always before to use the next plot command. For this reason, it's a good idea to set all common parameters to every plots in the figure before initiating the multiplot environment.

I think the next example is quite self-explanatory, so just one thing has to be remarked. In the multiplot environment you can use an automatic layout for the plots or a manual one. If you want to create a regular array of plots, it makes more sense to use the first option. In the next post I will put an example of how to use the manual approach for a more complex result.

gnuplot << TOEND
set terminal postscript eps color enhanced
set output 'multiplot.eps'

# Some common options
set xrange [-pi:pi]
set mxtics 2
set ytics 0.25
set xtics ("-{/Symbol p}" -pi, "-{/Symbol p}/2" -pi/2, "0" 0,"{/Symbol p}/2" pi/
2, "{/Symbol p}" pi)
set format y "%.2f"

set grid

# This begins the multiplot environment. This example uses a 2x2 regular layout
set multiplot layout 2,2 title 'Using layout 2x2

# Now, every time I use the plot command, I will get a new plot
# in the correct position acording to the selected layout

set title 'Plot 1'
plot sin(1*x) w l lt 1 lc 1 lw 3 t ''

set title 'Plot 2'
plot sin(2*x) w l lt 1 lc 2 lw 3 t ''

set title 'Plot 3'
plot sin(3*x) w l lt 1 lc 3 lw 3 t ''

set title 'Plot 4'
plot sin(4*x) w l lt 1 lc 4 lw 3 t ''

# It is important to close the multiplot environment before leave!!!
unset multiplot