IMSLP talk:Scanning music scores

  • I've made a great discovery (haha) regarding why some people can make scans that are so tiny, and yet high quality. The secret lies in a monochrome image compression algorithm called CCITT (Group 4; Group 3 is much inferior), which is used commonly for faxes. Not only does it have very high compression ratio, it also is, surprisingly, lossless (wow). The catch is that it can only compress monochrome well, and fails miserably (or so I heard) when trying to compress color images. Currently the only way I've succeeded in creating a CCITT Group 4 compressed PDF is via Imagemagick:
convert -compress Group4 input.bmp output.pdf
Of course, you can then combine the PDF files with pdftk. --Feldmahler 12:21, 29 September 2006 (EDT)
If you have a whole directory of files to convert, and you're using a Linux machine:
> find *.png -exec png2pdf.sh {} \;
> pdftk *.pdf output outfile.pdf
png2pdf.sh:
#!/bin/bash
echo "Converting $1 to pdf"
convert "$1" -compress Group4 -monochrome "$(basename "$1" .png).pdf"
  • It is also interesting to note that the average compression ratio of CCITT Group 4 is 4-5%, meaning that a 1000KB monochrome image will compress to around 40-50KB, all the while being lossless. CCITT performs best when there are many repeating pixels of the same color, hence the reason why it compresses monochrome well and not color. --Feldmahler 14:12, 7 October 2006 (EDT)
With most scanners and scanner software, once you choose the output format as TIFF you should be able to set compression to CCITT-4, and this usually gives even higher compression and is as you point out best with 1-bit images. For those using or wish to use Acrobat Professional, Adobe has a proprietary compression called JBIG2 which is a further lossy compression (and must be changed in the preferences) and even better when compiling these CCITT-4 compressed TIFFs into your final PDF. Daphnis 10:20, 30 August 2007 (EDT)
  • The highest compression ratios nowadays are achieved with JPEG2000, a format that is used in djvu and the latest pdf versions. However this format is relatively hard to use, and achieving high ratios is not easy. Please share your experiences!
  • Indeed JPEG2000 is a good compression format, but it is a generic one. It is lossy, and it is also not specially made for compressing monochrome images... and so will perform much better than CCITT Group 4 on color/grayscale images, but not as good on monochrome images. --Feldmahler 14:12, 7 October 2006 (EDT)


Contents

How to make PDF files

(This contains info that could be useful to some on making pdf files. Edit, add delete modify as you see fit!).

How to make PDF files of scanned public domain sheet music

DETERMINE IF YOUR EDITION OF SHEET MUSIC IS PUBLIC DOMAIN OR NOT!!!

Procedure No.1: Using Freeware/Shareware programs

1) Obtain sheet music that is in the public domain, such as a volume of sheet music published before 1923 or a reprint of a public domain source. DETERMINE IF YOUR EDITION OF SHEET MUSIC IS PUBLIC DOMAIN OR NOT.

2) Using the Black and White setting, scan the sheet music into TIFF CITT-4 graphics format. A program called Infothek 2000 scan (shareware from http://www.informatik.com) does this automatically if your scanning application has not this feature.

3) Number each TIFF file sequentially with leading zeroes, i.e. the first file named 001.tif, the second named 002.tif, then 003.tif, etc. The Infothek program also does this automatically as you scan.

4) When finished scanning, place all of the TIFF files into a directory. Next, place in the same directory a freeware program called C42pdf.

5) Finally, if using Windows, open a command prompt window (Start, cmd.exe) and go to the directory with the TIFF files and the C42 program. Type the following command at the command prompt:

C42 *.tif

All of the sequentially numbered CITT-4 TIFF files in the directory will be combined into a single pdf file (called 001.pdf) containing all of the pages in sequential order.

6) Go back to Windows, load up the 001.pdf file in Adobe Acrobat Reader (available free from Adobe ) and check to make sure that all of the pages are there, clear and not cut off, and in the proper order. If any page is missing, scan it in, and add the TIFF file in the directory with the TIFF files, making sure that the TIFF file name is numbered in the order where it should appear in the file (for example, page 39 should be numbered 039.tif). If that means you must rename every TIFF file in the directory by 1 (for example change 040.tif to 041.tif), use a batch file rename program to do so, such as the freeware Rename program at http://www.1-4a.com/rename/ .

7) Once you are satisfied with your pdf file, give it a new file name and upload it to imslp, reading Score submission guide. This is a great technique for making pdf files, since it is simple, quick, and the files are small in file size, typically 100-120k per page for a 600-dpi scan.

Procedure No.2: Using the Adobe Acrobat Suite

Buy the Adobe Acrobat Suite from Adobe.com or from your software dealer, and use the Adobe Acrobat Exchange file in the suite to directly scan public domain sheet music into pdf files. Use a resolution of at least 300 DPI (Dots Per Inch; 400-600 DPI give better results, but the file size will be larger). After making the pdf, you may use the Adobe Acrobat Distiller (which comes with the Acrobat Suite) to compress the size of the PDF file further.

Procedure No.3: Using Photoshop with the Adobe Acrobat Suite

Sometimes you may want to add text to your scanned sheet music PDF file. On the title page, for example, you might want to add your own title or other info.

1) To do so, scan the first page into Adobe Photoshop. Then, use the Erase tool to erase text or other things you do not want to see on the title page. Next, use the Text tool to write in your own text, and drag the text onto the title page.

2) After you are done with your editing, save the file in Photoshop EPS format.

3) Then, use Adobe Acrobat Distiller to change the EPS file into a pdf file.

4) Now, open that title page PDF file in Adobe Acrobat Exchange. Use the Exchange program to insert, after that first title page that you have edited, a PDF file containing the other pages of the sheet music. Save the file. You now have the complete sheet music PDF file with the edited title page.

Procedure No.4: Using freewares ImageMagick and pdftk

Install ImageMagick and pdftk

1) Do steps 1) and 2) of Procedure No.1. Finding the best monochrome conversion threshold for step 2) may require some care.

with leading zeroes:

2) Number each TIF file sequentially with leading zeroes if your scanning application hasn't done it, with the first one named p001.tif, the tenth named p010.tif etc. At this stage you can crop margins, unskew pages, add text and remove pencil markings and with irfanview, Gimp or Photoshop etc. if your scanning application has not these features.

3) Building the score:
type magick p*.tif -density %[fx:w/8.26] yourscore.pdf

8.26 is the output page width in inches. Adapt it to letter format, landscape etc.

or without leading zeroes:

2) Number each TIF file sequentially without leading zeroes if your scanning application hasn't done it, with the first one named p1.tif, the tenth named p10.tif etc.

3) Building the score:
Copy these lines into a file named tif2pdfbook.bat:
for /L %%a in (1, 1, %1) do (
magick p%%a.tif -bordercolor white -border 100 tempopage.tif
magick tempopage.tif -density %%[fx:w/8.26] p%%a.pdf)
rem making the command line to append all pdf's with pdftk:
set myorder=pdftk
for /l %%a in (1, 1, %1) do (
call set myorder=%%myorder%% p%%a.pdf)
set myorder=%myorder% output mybook.pdf
call echo %myorder% > makebook.bat
call makebook.bat
start " " mybook.pdf

When done, type tif2pdfbook.bat <number of pages>

The line magick p%%a.tif -bordercolor white -border 100 tempopage.tif is useful only if you have totally cropped the margins. Adjust the number 100 to the margins you want.

Procedure No.5: Using freewares LaTeX and ImageMagick

Install TeX distribution and ImageMagick and Ghostscript. Also install Gsview if you want to view the ps and eps files.

1) Do steps 1) and 2) of Procedure No.1. Finding the best monochrome conversion threshold for step 2) may require some care.

2) Number each TIF file sequentially without leading zeroes if your scanning application hasn't done it, with the first one named p1.tif, the tenth named p10.tif etc. At this stage you can totally crop margins, unskew pages, add text and remove pencil markings and with irfanview, Gimp or Photoshop etc. if your scanning application has not these features.

3) Convert the tif pages to eps files with imagemagick: create a script named tif2eps.bat, containing:
FOR /L %%A IN (%1,1,%2) DO convert -threshold 50%% p%%A.tif p%%A.eps.
Open a command-prompt window, move to the directory where the tifs are and type:
tif2eps <1st page number> <last page number> If your scanning application provides only jpg files you can use a similar imagemagick converting script, changing the threshold if necessary. Convert a few pages to tif to do this adjustment. Same if you have imported a set of jpg's from a library.

4) Create a LaTeX file myscore.tex containing:
documentclass[12pt]{article} \usepackage{epsfig}
\usepackage{ifthen}
\setlength{\textwidth}{21cm}\setlength{\textheight}{29.7cm} \setlength{\hoffset}{-2cm}\setlength{\voffset}{-2cm}
\setlength{\evensidemargin}{0pt}\setlength{\oddsidemargin}{0pt}
\setlength{\topmargin}{0pt}\setlength{\parindent}{0cm} \setlength{\headheight}{0pt}\setlength{\headsep}{0pt}\setlength{\footskip}{0pt} \newcounter{mapage}\pagestyle{empty}
\begin{document}
\setcounter{mapage}{1}
\whiledo{\not\value{mapage}>\nbpage}
{\epsfig{figure=p\arabic{mapage}.eps, width=19.8cm}

\stepcounter{mapage}}
\end{document}

Replace nbpage with the real page number and change {1} if your 1st page is not p1. You can adjust the margins and the paper size if you want. You can skip some pages if you want.

The settings are for A4 portrait format. For letter or landscape, or to change the margins, adjust \textwidth, \textheight, \voffset, \hoffset and width in the epsfig command.

5) Type latex myscore

6) If there are no errors type dvips myscore to obtain myscore.ps.

7) Type ps2pdf -sPAPERSIZE#a4 -r300 %1.ps %1.pdf to obtain myscore.pdf The ps and eps files may be huge but you can delete them once you have the pdf.

If the TIFF CITT-4 compression has not been done with the tifs the final pdf result will be compressed anyway, about 100-120 kB/page. You can print it without using "fit to printable area".

Procedure No.6: Typeset your own

You can use a proprietary or freeware music notation program (=scorewriter) such as MusiXTeX or LilyPond to typeset public domain music. If PostScript (.ps) is the output format of this program also use the freeware GhostScript (version 6.0 and above) to convert this format into an Adobe PDF file. Also read IMSLP:Typesetting Guidelines and IMSLP:Typeset Music formats. And see this page from Werner Icking Music Archive for information about MusiXTeX and its related programs if you choose this one.

How correct a bad situation with image name succession

  • suppose you have images p1.tif, p2.tif ... p40.tif but you forgot to scan 2 pages at position p20-p21
  • suppose the same but p10.tif and p11.tif are copyrighted introduction pages and you want to skip them,

... then you need to do a selective page renumbering. Edit a file SHIFTPAGE.BAT containing:

if %3 GTR 0 GOTO positif
for /L %%a in (%1, 1, %2) do (
set /a iplus="%%a+%3"
call rename p%%a.tif p%%iplus%%.tif)
goto end
:positif
for /L %%a in (%2, -1, %1) do (
set /a iplus="%%a+%3"
call rename p%%a.tif p%%iplus%%.tif)
:end

Open a command prompt window. In the first case type shiftpage 20 40 2 and the names p20.tif and p21.tif will be available for the 2 missing scans. In the second case delete p10.tif and p11.tif and type shiftpage 12 40 -2. This will give you the right set of pages with the right names.

The conflict of leading-zeroes

Some applications like leading zeroes, i.e. image names starting with p001.tif, p002.tif...p010.tif... Other applications prefer p1.tif, p2.tif...p10.tif... Page numbers above 100 are the sames.

You can convert a set of images of a type to another with these command-prompt batches:

to add leading zeroes edit ADDLEADZER.BAT containing:

FOR /L %%A IN (1,1,%2) DO call :routine %1 %%A
goto :eof
:routine
set/a num="%2+1000"
set num=%num:~-3%
copy %1%2.tif q%num%.tif

and type addleadzer <old name> <last image number>. The new images will be q001.tif,q002.tif...

To delete leading zeroes edit DELLEADZER.BAT containing:

FOR /L %%A IN (1,1,%2) DO call :routine %1 %%A
goto :eof
:routine
set/a num="%2+1000"
set num=%num:~-3%
copy %1%num%.tif r%2.tif

and type delleadzer <old name> <last image number>. The new images will be r1.tif,r2.tif...