Title: | LyX to MSWord etc |
---|---|
Description: | Tools for smooth Lyx export via pandoc, currently to MSWord but to potentially other formats |
Authors: | Mark Bravington <[email protected]> |
Maintainer: | Mark Bravington <[email protected]> |
License: | GPL-2 |
Version: | 1.0.155 |
Built: | 2024-12-12 09:22:53 UTC |
Source: | https://github.com/markbravington/lyxport |
The lyxport
R package is for exporting LyX documents to MSWord— which I sometimes have to do, under duress— and perhaps other formats. Unlike LyX's built-in “MS Word Open Office XML” export, lyxport
does proper cross-referencing including tables, figures (correctly sized), lists, equations, appendices, and bibliography. Most of the heavy lifting is still done by Pandoc, as in LyX's built-in export option; but Pandoc— wonderful though it is!— doesn't get everything right even with well-known filters (as you have probably discovered yourself by now, else you mightn't be reading this). So the package contains a lot of behind-the-scenes fiddly code in order to save you lots of manual post-tinkering.
Once you've installed the package, run lyxprefhack
to set things up for direct use from LyX. Then you should see an "MSWord (lyxport)" option in "File->Export", and a "lyxport" item at the end of "Help->Specific manuals", where the full package documentation lives (you may wish to consider at least opening it...). To just see the features, do "File->Open example" and filter for "lyx", to open "lyxport-demo.lyx". You can try exporting with LyX's built-in MSWord option, and with lyxport
. I haven't tested every LyX feature; it's mostly just stuff I need. More things might get added.
The package currently has one main user-visible function, lyxprefhack
, which you need to call (once) for setup. The conversion work is done by lyxzip2word
, but you don't normally need to call it directly; lyxzip2word
is normally called by LyX on your behalf, when you export to "MSWord (lyxport)" or perhaps some other format. There are also some helper utilities:
requote_lyx
(qv) to help sort out quotation marks, e.g. in case your document includes imports from other formats such as plain-text.
tidy_initials
(qv) and tex2utf8
to tweak bibliography files to be MSWord-ready. These are called automatically by lyxzip2word
, unless you tell it not to, but you might also find them useful in their own right.
lyxprefhack
makes a number of assumptions about your LyX config files (for good reasons). It works for me, but that could be dumb luck; if it goes wrong for you, be aware that it makes backups of your "preferences" and "ui/stdmenus.inc" files— so you can manually restore them. If you can't see "lyxport" as a help option, try typing "help-open lyxport-docu" in the minibuffer. And you can also see a PDF version of that documentation in R, via RShowDoc("lyxport-docu",package="lyxport")
.
This function should be called just after you install the lyxport package, and you probably won't need to call it again. It modifies your LyX "preferences" file to add better MSWord export, with shortcut "W" in File->Export; see Details. It also creates a LyX help file and an example that you can access straight from LyX— so you may never need to use this package again from R, since almost everything you need is accessible from LyX itself. (One exception is if you want to use requote_lyx
(qv) to sort out quotation-mark problems. Many people will never need it.)
lyxprefhack( userdir)
lyxprefhack( userdir)
userdir |
Where your config files live; see section 2 of LyX's "Customization" manual. R will prompt you for it if you don't set the parameter, so you can copy it from "Help->About" in LyX; single backslashes are OK. |
To enable automatic export from LyX, you need to add two Preference settings inside LyX, either manually or by editing the "preferences" file in your LyX Userdir. The function lyxprefhack
will attempt to do the latter, at your own risk.
The two settings can instead be set manually inside LyX, from the "Tools->Preferences->File Handling" menu. First, define a new File Format, which should be a copy of LyX's built-in "MS Word Open Office XML" but with a different name. The only fields you absolutely need are:
Format name: MSWord (lyxport)
Tick the boxes for "Document Format", "Show in Export menu", and "Vector graphics format"
Short name: wordx
Extensions: docx
Shortcut: you don't need this, but I use "W" so I can export via "Alt-F E W"
Second, you need to define a Converter, as follows:
From: Lyx Archive (zip) - not from straight LyX!
To: MSWord (lyxport) - ie the name of the new Format
Converter: Rscript --no-save --no-restore --verbose -e lyxport::lyxzip2word(FROM_LYX=TRUE) $$i $$r $$p 1 > docxconv.log 2>&1
When the converter runs, it will write a logfile into LyX's temporary folder, which unfortunately is a bit hard to find if things go wrong (if they don't, you don't need to find it). Weirdly, if you try to put the logfile into the "main" folder (ie where source and export live), by using "$$r/<something" after the redirect, then LyX says it can't execute the command...
Overwrites the "preferences" file (unless there's no change required), after backing up the old one to "old_preferences<N>" (guaranteed not to overwrite any existing backup). Adds a "lyxport" option to "Help->Specific manuals", in the file "ui/stdmenus.inc", again making a backup of the latter if there's any change. Copies one file from the R installation to LyX's "<userdir>/doc" ("lyxport-docu.lyx") and one set of files to LyX's "<userdir>/examples/lyxport" ("lyxport-demo.lyx" and associated files).
## Not run: lyxprefhack() ## End(Not run) misc
## Not run: lyxprefhack() ## End(Not run) misc
lyxzip2word
starts from a "LyX archive" (zip export from LyX) and converts to MS Word (or potentially other formats) using various tools, mostly Pandoc. See RShowDoc("lyxport-docu",package="lyxport")
for more information about normal use from LyX, and any requirements of your LyX source file (eg specifying the bibliography format).
You don't normally need to know anything about this function, since it is called automatically from LyX. However, I should document it for maintenance-type reasons. Also, if you want to experiment with exporting to other formats, you might want to use it direct from R, setting the outext
and panoutopts
arguments.
lyxzip2word( zipfile, outext= 'docx', panoutopts= outext, origdir= dirname( zipfile), tempdir= base::tempdir(), copy= FALSE, FROM_LYX= !interactive(), refdir= NULL, lyxdir= NULL, lyx_userdir= NULL, natnum_pandoc= FALSE, # devil or deep-blue-sea? crossref_pandoc= TRUE, verbose= FALSE, dbglyx= '' )
lyxzip2word( zipfile, outext= 'docx', panoutopts= outext, origdir= dirname( zipfile), tempdir= base::tempdir(), copy= FALSE, FROM_LYX= !interactive(), refdir= NULL, lyxdir= NULL, lyx_userdir= NULL, natnum_pandoc= FALSE, # devil or deep-blue-sea? crossref_pandoc= TRUE, verbose= FALSE, dbglyx= '' )
zipfile |
Name of input file, normally a LyX-zip archive with a path. Extension is optional, but ".zip" is assumed. If (for experimentation only) the extension is ".lyx", then all the other necessary files had better be in |
outext |
File extension of output. |
panoutopts |
For pandoc's writer, to tell it what kind of output to produce, ie pandoc's "-t" argument. Normally the default of |
origdir |
where |
tempdir |
where to unpack the zipfile and create temporary files etc. |
copy |
whether to copy |
FROM_LYX |
set TRUE iff called from inside LyX by a Converter, in which case |
refdir |
Top of the folder-tree for bibliography-finding. It should have a a folder |
lyxdir |
Where Lyx executable lives. However, you should probably make sure Lyx is on the search path anyway, otherwise things may not work; if it is, then you can leave this NULL. |
lyx_userdir |
what you'd pass in the "-u <userdir>" option when starting LyX. However, as of Lyx 2.4.2.1, there's a bug which stops that working (in the particular context of |
natnum_pandoc |
whether to turn on pandoc's own numbering scheme (when reading and when writing). More trouble than it's worth so far, hence the default is FALSE. |
crossref_pandoc |
whether to use the "pandoc-crossref" filter when reading the Latex source. The default is TRUE, but |
verbose |
if TRUE |
dbglyx |
Only for debugging, obvs. Should be blank, or a positive integer as per "lyx -dbg". Any number will also cause R to print out various things, such as paths. |
The steps in the conversion process are (actually there's more than this, this list is out-of-date...):
Move & extract zip file
Generate Tex, by running "lyx –export"
Merge any input/include files
Add eqn labels: eqn_labels_for_word()
Check and prepare bibliography
Twiddle any appendices, so as to not confuse pandoc
Export Tex -> pandoc-native: pandoc
Fix labelled-eqn column widths: eqalignfix()
Perhaps move the bibliography to before appendices
Export pandoc-native -> docx: pandoc
Should produce a file "<zipfilename>.<outext>" in folder origdir
. There will also be various files tempdir
(which will be LyX's session tempdir, if this was invoked from LyX itself), including a logfile "docxconv.log" which should/might contain useful error messages if there are any. Look carefully thru LyX's "View->Messages" window to see where that LyX tempdir is (it changes from one LyX session to the next). The formal R return-value of lyxzip2word
is TRUE or FALSE according as whether it thinks everything worked.
## Not run: # In LyX, open the "lyxport-demo.lyx" example, then File->Export->Lyxzip. Then try exporting # to a non-MSWord format, via eg... lyxzip2word( 'lyxport-demo.zip', outext='html') ## End(Not run) misc
## Not run: # In LyX, open the "lyxport-demo.lyx" example, then File->Export->Lyxzip. Then try exporting # to a non-MSWord format, via eg... lyxzip2word( 'lyxport-demo.zip', outext='html') ## End(Not run) misc
This function tries to sensibly turn assorted types of quotation marks in a LyX document into LyX "dynamic quotes". The latter make it easy to render the document with any of a number of defined "nationalesque" quotation schemes, just by tweaking a single item in Document->Settings->Language. For example, you can get outer double/inner single quotes, outer single/inner double, guillemets, and so on; see "Quotation marks" in the LyX UserGuide (currently section 3.9.4.2). Without this functionality, a large LyX document can end up having multiple types of quotation marks (especially if it is multi-authored or includes excerpts of other documents), which can't easily be changed or searched for or made coherent.
Not all quotation marks should be changed: for example, straight quotes within listings or ERT should be left alone. requote_lyx
tries to get that right. However, many funny-looking things come out OK when exported to Latex (which is a necessary step in producing eg nice MSWord documents, using the other functions in this package).
requote_lyx
mainly aims at double-quotes (since IME these are the commonest defaults for normal quotation), but does some single-quote stuff too:
Apostrophes are left alone (deliberately; they are tricky!)
Any explicit single quotes are made dynamic, but their singleness is kept; it's assumed to be deliberate.
Hard-wired directional single-quote characters are turned into dynamic double quotes, just like hard-wired directional double-quotes. Coz that was probably the intention of an author who just prefers single-quotes for outer.
requote_lyx
isn't aiming at perfection, and may well not be foolproof; there might be situations where it doesn't work properly, because LyX used some structuring that I hadn't anticipated. Sorry.
requote_lyx( filename = NULL, lyx = NULL, outfile = NULL)
requote_lyx( filename = NULL, lyx = NULL, outfile = NULL)
filename |
optional name of file to read from |
lyx |
or you can pass the actual text in directly, as a character vector |
outfile |
optional filename to write the output to. |
The modified LyX text will be returned, invisibly. Also, if outfile
is not NULL, the modified LyX text will be written to outfile
.
Biblatex bibliography (dot-bib) files may contain legacy Latex/Bibtex representations of characters, such as "\c{G}" for "ģ". These are normally fine— though some of them are technically incorrect, but may still work— but not always, e.g. when looking for consistent names, as in tidy_initials
(qv). So you can try tex2utf8
to translate such representations into "native UTF8 codepoints". It might help you.
This might even work on more general Latex (ie not on a bib file) but you are on your own there...
tex2utf8(tex, file = NULL, outfile = NULL, debrace = FALSE)
tex2utf8(tex, file = NULL, outfile = NULL, debrace = FALSE)
tex |
character vector containing the bibliography contents. Provide just one of |
file |
if supplied, this is used in place of |
outfile |
if supplied, the result will be written here. |
debrace |
whether to remove superfluous braces around single UTF8 characters. IME these are mostly legacy effects of Latex representation, rather than deliberate statements about upper/lower case (the only legit use I can think of). Extra braces are usually harmless if you are using |
This is harder than it sounds. The key was to find a couple of tables on WWW; see source code for details. tex2utf8
tries to fix up common Bibtex representation errors (i.e. semi-problems in my own master biblio file, mostly from WWW sources), but probably won't catch everything. And there may be "native UTF8 codepoints" for some characters that aren't in the (original?) Latex list, and won't be transformed. They can of course be produced in Latex by a composite (eg "\k{n}"; I have no idea whether that is a real character in some alphabet). They will stay composite in the output.
The modified contents.
lyzip2word
, tidy_initials
# Not compulsory to have an EXAMPLES -- you can put examples into other sections. # Here's how to make a "don't run" example: ## Not run: reformat.my.hard.drive() ## End(Not run)
# Not compulsory to have an EXAMPLES -- you can put examples into other sections. # Here's how to make a "don't run" example: ## Not run: reformat.my.hard.drive() ## End(Not run)
In a dot-bib file, the same author can appear with slightly different names in different papers: "A. Psmith", "Alan Psmith", "Alan B. Psmith", "A. Bertram Psmith", and so on. If you are not careful, your citations can come out funny as a result. For example, you might see "Psmith et al. (1999)" but "A.B. Psmith (2004)" even though Alan Bertram is the only Psmith you are citing. With Biblatex and PDFs, you can suppress such nonsense via "uniquename=false" and "uniquelist=false". But with CSL and MSWord etc, it seems to be harder— well, so does pretty much everything, actually.
To circumvent the problem, you can call tidy_initials
on your bib file beforehand (which lyxzip2word
does automatically by default, on a temporary copy of the dot-bib, unless you ask it not to). This will ensure all plausibly-identical authors have exactly the same bib entries (initials only, and the longest possible set), and should eliminate silly citations.
The merging rules, which depend slightly on the gungho
argument, are as follows:
"Alan Psmith" and "A. Psmith" are always assumed to be the same (and "Alan Psmith" will be remapped to "A. Psmith", since only initials are retained when there's a discrepancy).
"Alan Psmith" and "Alicia Psmith" are never merged. That is: if there's a full name rather than initial, it's taken seriously.
Mismatched initials are never merged, eg "A.B. Psmith" and "A.C. Psmith".
If gungho=TRUE
, then missing initials are ignored, so that "A. Psmith" gets merged with "A.B. Psmith". If there's also an "A.C. Psmith", then it's the luck-of-the draw as which one "A. Psmith" will be merged with.
If gungho=FALSE
, then merging only occurs when two authors also have the same number of initials; so "Alan Bertram Psmith" gets merged with "A.B. Psmith" but not with "A. Bertram Psmith".
In order to make this surprisingly tricky programming task a bit easier, tidy_initials
calls tex2utf8
, which you might find useful in its own right.
tidy_initials( bib, gungho = TRUE)
tidy_initials( bib, gungho = TRUE)
bib |
character vector containing the bibliography contents. Use one of |
file |
if supplied, this is used in place of |
outfile |
if supplied, the result will be written here. |
gungho |
TRUE if you prefer to assume that middle initials are easy-come-easy-go. Saves pain, but in rare cases could merge genuinely different people. |
The modified contents, as a character vector. If outfile
is supplied, that file will be created.
# Not compulsory to have an EXAMPLES -- you can put examples into other sections. # Here's how to make a "don't run" example: ## Not run: reformat.my.hard.drive() ## End(Not run)
# Not compulsory to have an EXAMPLES -- you can put examples into other sections. # Here's how to make a "don't run" example: ## Not run: reformat.my.hard.drive() ## End(Not run)