Wiki : hacking:makefile_split
 

Makefile split

Makefiles are great for parallelization, just call it with -j N and voilá, it'll spawn N threads running whatever it can in parallel. But be careful, some dependencies won't break when run in serial because of the time it takes between one task and the other. In parallel this time can be zero (if you have enough threads) so be extra careful to name your dependencies very explicitly.

Splitting big files

When you can split the file and analyse it in parallel (ie. syntax checkers) I bet you'd build your program using threads and syntax check one chunk on each thread, right? But what if you don't (want to?) have access to the syntax checker? Split the file and use make!

Splitting is easy but then, when you want to use the same Makefile for all splits (can be of any number of chunks), it's not easy to automate it using Makefile as “1 + 1” is not 2 in Make. Than you can do this:

$ make -j 10 -f split.make STEM=foo EXT=bar FILECOUNT=10 SPLIT=my_split_program CMD=my_syntax_check

and the Makefile like this:

# Creates a string "1 2 3 ... N"
ELEMENTS    = $(shell for ((i=1;i<=$(FILECOUNT);i++)); do echo -n $$i' '; done)
# Creates a string STEM_1 STEM_2 ... STEM_N
STEMSPLIT   = $(foreach suffix,$(ELEMENTS),$(STEM)_$(suffix))

all : $(STEM).$(EXT).out

# Copy all output files to overall output file
$(STEM).$(EXT).out : $(STEMSPLIT:=.$(EXT).out)
    cat $^ > $@

# Run your program and save output
%.$(EXT).out : %.$(EXT)
    $(CMD) $< > $@

# Split the files into chunks
$(STEMSPLIT:=.$(EXT)) : $(STEM).$(EXT)
    $(SPLIT) $(FILECOUNT) $<

Brief explanation

In order to get the whole output ($(STEM).$(EXT).out) you need the individual output files (%.$(EXT).out) generated by your program. Your program needs the chunks (%.$(EXT)) created by your splitter (use Unix split if you can) from the original file ($(STEM).$(EXT)).

Tips

  • To make it even faster make your split get only the positions A and B that defines the chunk and read the whole file but fseek'ing to A and stopping at B, redirecting the output to your program.
  • In case your program doesn't accept input from STDIN use FIFOs.

PS: Remember, when cutting the Makefile from this page, transform all spaces into TABs of the Makefile wno't work properly.



 
hacking/makefile_split.txt · Last modified: 05 09 2007 19:15 (external edit)
 
Recent changes RSS feed Creative Commons License Driven by DokuWiki