Thursday, July 24, 2014

Latex 'pdfpagelabels' turned off when using Hyperref

I received the following error when using ACM's sig-alternate.cls style file and using the PDFLatex command.

Error:
Package hyperref Warning: Option `pdfpagelabels' is turned off
hyperref because \thepage is undefined.

That seems to be caused by an update of the hyperref package. The workaround is to switch this option off.
Instead of using:
\usepackage{hyperref}


Use:
\PassOptionsToPackage{pdfpagelabels=true}{hyperref}

References:
1. http://www.latex-community.org/forum/viewtopic.php?f=5&t=162

Wednesday, July 16, 2014

Gnuplot pdf terminal dashed lines

To set pdf terminal in Gnuplot, check if the output of command "print GPVAL_TERMINALS" contains "pdfcairo" listed. If yes, then you can set the terminal output as PDF using the command

set terminal pdf
set output 'out.pdf'

To enables dashed lines when using PDF terminal in gnuplot, as pointed out in reference [1], set the terminal using the command below

set terminal pdf monochrome dashed
set output 'out.pdf'

References:
1. http://theletterpsi.blogspot.com/2010/11/setting-dashed-line-style-on-pdf.html
2. http://stackoverflow.com/questions/14004797/gnuplot-pdf-output

Wednesday, July 9, 2014

English Word Frequency Lists

Many might have come across a requirement for reasonable sized English word frequency lists. Here is one good and free word frequency list based on  British National Corpus (BNC). This post is just a pointer to the real resource (Reference 1), but I will copy some text from the reference describing the details about the file structure.

-----------------------------------------------
These are all available in 6 forms:
  • sorted alphabetically ("al") or by frequency (highest frequency first) ("num");
  • the complete lists, or a smaller file containing only those items occurring over five times (suffix "o5");
  • all lists are available compressed using gzip (".gz"). The
o5 lists are also available uncompressed (no suffix). The frequencies are for <CLAWS-word, POS> pairs.
For a list and brief descriptions of CLAWS POS-tags, see here.

The format is: four fields, separated by spaces.
 1: frequency
 2: word
 3: pos
 4: number of files the word occurs in
For non-orthographic words, spaces are replaced by underscore, giving eg "in_spite_of".
Lists are provided for the complete BNC (all), and for three subsets, as below:
 cg 'context-governed' spoken material    
  (eg meetings, lectures etc)  6.2M tokens,  79,906 types
 demog   'demographic' spoken material        
  (eg conversation)      4.2M tokens,  54,652 types
        written                             89.7M tokens, 921,074 types
 all                             100.1M tokens, 939,028 types
File sizes in MB ("al" and "num" variants all the same size) are:
  all uncompressed .gz o5 o5.gz
-------------------------------------------------------------
all  18.1   4.8 4.0 1.32
cg   1.4   0.39 0.43 0.15
demog   0.9   0.26 0.25 0.09 
written  17.8   4.7 3.9 1.30
-------------------------------------------------------------
For all.al.gz click here
For all.al.o5 click here
For all.al.o5.gz click here
For all.num.gz click here
For all.num.o5 click here
For all.num.o5.gz click here
For written.al.gz click here
For written.al.o5 click here
For written.al.o5.gz click here
For written.num.gz click here
For written.num.o5 click here
For written.num.o5.gz click here
For cg.al.gz click here
For cg.al.o5 click here
For cg.al.o5.gz click here
For cg.num.gz click here
For cg.num.o5 click here
For cg.num.o5.gz click here
For demog.al.gz click here
For demog.al.o5 click here
For demog.al.o5.gz click here
For demog.num.gz click here
For demog.num.o5 click here
For demog.num.o5.gz click here

References:
1. http://www.kilgarriff.co.uk/bnc-readme.html