Archive for the 'Tools' Category

Introducing Statim

01/24/2011

I just push Statim, a very simple static site generator that I’ve worked on the past couple days. It is written in shell as an attempt to limit run-time dependencies so that it can be used on pretty much any Unix server. I more or less wanted a simple site generator that I could trigger on a git hook to build a website where I was less concerned about looks, and just wanted a quick way to push up information.

Currently running statim does the following:

  1. Copies all files from the content directory to the destination directory of your choice
  2. Runs any files ending in “.md” or “.markdown” through the “markdown” command and relocates the resulting html
  3. For each subdirectory it creates links based on file names and camel-casing conventions (like a wiki)
    • There’s nothing super complicated or revolutionary here, its just a simple tool for me because none of the others did what I wanted.

Emacs File Loading

12/09/2010

I have a grown fairly large emacs configuration over the years and I am constantly tweaking it. I test out new packages and remove old ones that I haven’t been using. My original method for adding a new package to my emacs configuration was just to follow the install instructions verbatim from the README, but as I’ve been learning more about emacs and programming emacs lisp, I decided it was time to look into the different methods of including elisp code that were available.

I noticed that some packages use load, some use require, and others use autoload and wondered what the differences were and which ones I should be using…

load & load-file

load and load-file are the most basic methods of including an elisp file into your configuration. Each one simply reads the file and executes the code. This means that if the load expression is executed more than once the file contents will be executed more than once. There are times where this is useful, but I think that its safe to say that this is not something I desire for the normal case.

The difference between load and load-file comes from the way the file is located on the system. load will search through each directory in the list load-path and if it finds the file there it will load it, while load-file just looks directly at the path that you have specified.

require

require is a command that solves the problem of loading the same file more than once mentioned in the previous section. With require, emacs will search for the appropriate file only if it hasn’t yet been loaded and then execute it. To use this the file must use the provide function to define the feature that it includes. After this function has been executed (i.e. when the file has been loaded and the feature has been included), if require is called again emacs will not try to re-execute the file.

By default require expects that the feature will be provided by a file of the same name located within the load path. This can be overridden using an optional second parameter to require but I can’t really think of a reason I would want to do that very often.

autoload

autoload works a little differently than the other two functions, as it doesn’t load any file when it is called. Instead, its purpose is to delay the execution of the referenced file until the code within it is actually needed. You give autoload the name of a function and a file that contains that function and, when the function is called — if the file hasn’t yet been loaded — autoload will read and execute the lisp code from within the file. This is nice because it can greatly increase your startup time by not loading packages that you are not planning to use.

On the other side of the issue is each autoload call only binds one function to a file. This means if you have 5 functions contained in a file that you want to trigger that file’s autoloading, you must call autoload 5 separate times in your configuration.

eval-after-load

Another potential problem with autoload happens when you have some additional code you want to execute after the file has loaded (i.e. you want to change some of the keybindings defined within the file). If you were to set them at startup time, when one of the autoload functions is called the file will be read/evaluated and your bindings will be overridden back to the defaults.

Luckily, emacs has already thought of this problem and provides us with eval-after-load. This function allows us to define some additional code to evaluate after some specific code is loaded. In our previous scenario if we define our keybindings using eval-after-load to execute the definitions after the autoload, our code will be executed second and override the files keybindings back to the settings we want.

Applying this knowledge

After clearing up my understanding I have decided on a few things:

  1. I want to remove pretty much all load and load-file calls from my configuration.
  2. I want to switch all of my modules that I can to use require.
  3. I want to use autoload only rare occasions when loading the file takes a lot of time/resources.

My first instinct was to move as much as possible to autoload but I realized that I don’t actually start emacs very frequently. I usually just leave it open and close and open buffers as necessary. For this reason I think I would find the sudden pause when loading a module that I am referencing for the first time more annoying than an increased startup time (YMMV).

Now that I’ve got a plan all that’s left is to actually do it.

Scripting with ruby

11/13/2010

Earlier I wrote about using Unix command line tools to manage text when the job at hand calls for a quick fix instead of a program that you plan to keep around. When the script that I’m writing is a longer one I will often reach for ruby, but ruby can also be quite useful for quick scripts. Specifically, the ruby executable provides several command line flags that are helpful when writing these quick scripts.

-e

-e is the first flag we’ll need for using ruby as our command line swiss army knife. If you call ruby with -e it will evaluate the string following it with the ruby interpreter. Example:

ruby -e 'puts "Hello, world!"'

Got it? Good, now lets move on to more interesting options.

-n, -p, and -i

The -n flag causes ruby to loop over each line of the input. For example if you want to capitalize all of the lines in a file (to stdout) you could do the following:

ruby -n -e 'puts $_.upcase' < original-file.txt > upcased-file.txt

Printing out something is so common that ruby provides another flag that will print out the value of $_ after each step. With -p the following example becomes:

ruby -p -e '$_.upcase!' < original-file.txt > upcased-file.txt

Notice that we are now using the destructive version of upcase (namely upcase!) so that the value of $_ is redefined before it is printed out. It turns out that taking a file, performing some operation on each line, printing the changed line and then putting in a new file is so common that ruby gives us yet another flag to help with this occasion. We can shorten our simple example even further with -i:

ruby -p -i -e '$_.upcase!' file.txt

The -i flag tells ruby to operate on the passed file in-place. This means rather than redirect the file into ruby and the output out of ruby, it will open the file itself and overwrite it with the modified lines. Obviously this isn’t quite the same result as the earlier examples in that the original file is no longer maintained. If you don’t want to lose the original (or you aren’t confident that your script is going to work as expected) you can pass -i a backup extension to make a copy of the original file.

You’ll notice this is similar to the -i flag of sed. I find myself using ruby with -i now whenever I might reach for sed because the -i flag of sed seems to work differently in Linux than in the BSD tools. With ruby I don’t have to worry about the cross-platform stuff as much.

Other Resources

Dave Thomas (of the Pragmatic Programmers) put together a list of handy one liners for ruby. This is old but still quite useful:

http://www.fepus.net/ruby1line.txt

And, as always, the man page has a lot of great information.

Learning Unix

11/01/2010

Programmers work with text. Lots of text. So much text that we need specialized tools to help us manage and navigate this text. There are a handful of relatively simple unix commands that when strung together can greatly increase your efficiency dealing with these massive amounts of text. Let’s take a look at a few of them, shall we?

Note: I’ll be looking at the POSIX versions of these tools. If you’re running Linux it is possible that some of these commands might vary slightly. See the manpages for a more definitive reference

wc

wc is a tool to count characters, lines, words or bytes. Most commonly I use this tool to count the lines with the -l flag. For example to count the number of lines in a text file:

wc -l some_file.txt

Or to count the number of files in a directory:

ls | wc -l

sort

sort can be used to sort and merge lines of a text file. In the simplest case it can be used to sort a list of items alphabetically, but it can also sort by columns in a text file. For example to sort the files in a directory by size from largest to smallest:

ls -al | sort -k 5 -nr

The -k 5 specifies that it sorts by the 5th column when the line is split by whitespace characters. The -nr tells it to make a numeric sort in reverse order.

uniq

uniq is used to find and filter duplicate lines in a text file. It can also be used to count lines with the -c flag. Mixing this with the above sort command gives us an easy way to detect which line in a file appears most frequently:

sort my-file.txt | uniq -c | sort -nr

Note: that we are sorting the file first so that uniq can count the total number of times each line appears, not just the number of times that it appears in a row. Alternatively with wc we can easily count the number of unique lines in a file:

sort my-file.txt | uniq | wc -l

cut

We can use cut to pull only the specific parts of files that we care about for processing. The general model is you specify a delimiter and cut will split each line at the delimiter and then you can pick which fields you want from there. For example to pull out just the first and fourth columns from a csv you could use:

cut -d',' -f1,4

We can also use cut to select a series of characters from a line by position with the -c flag. To take the first 2 characters from every line and count the number of times they appear:

cut -c1-2 | sort | uniq -c

The point

The point of all this isn’t that any one of these commands is super useful by itself, but by knowing them you can often throw together a script very quickly to extract data for some file. If I find myself needing a larger script that I will want to maintain I will almost always reach for Ruby or a similar programming language, but being able to write these quick scripts without thinking about it too much can save a lot of time.

Follow

Get every new post delivered to your Inbox.