We tried to find a simple solution to convert MS Office files to pdf, without online services. We tried OpenOffice but the results were awful! There are a lot of free, and commercial, PDF printers available. But they are for desktops and a user has to print manually. We wanted a solution that works without user interaction.
There is a very useful open source project called PDFCreator. It also is a printer but has a useful API. The API is available via COM, which is not the best technology for Java, but it's also not bad.
We didn't find a ready-to-use solution for our idea and it shouldn't cost money. The solution had to be open source. We found some great commercial tools and SDKs but all of them were not cheap.
We spent some hours and used PDFCreator, Jacob and some other open source tools to create an "Online service for PDFCreator". The result of our work is toPDF.
What is toPDF?
It's a small library that allows conversion of files to PDF, via PDFCreator. It's also a web application that offers services for remote conversion via http. The application has a REST service and a simple servlet service.
Simply POST binary data via http request and receive a PDF in the response. The servlet supports multipart form-data and simple application/octet-stream as requests. The REST service also supports multipart form-data but also JSON requests.
A short example:
URL url
= new URL(getServletService
());
URLConnection ucon = url.openConnection();
ucon.setDoOutput(true);
ucon.setDoInput(true);
ucon.setUseCaches(false);
ucon.setRequestProperty("Content-Type", "application/octet-stream");
ucon.setRequestProperty("Content-Disposition", "attachment; filename=\"Forms.docx\";");
FileUtil.copy(ResourceUtil.getResourceAsStream("/com/sibvisions/topdf/Forms.docx"),
ucon.getOutputStream());
byte[] byData = FileUtil.getContent(ucon.getInputStream());
or as Multipart:
MultipartUtil multipart
= new MultipartUtil
("UTF-8");
multipart.
addDataPart("data",
"Forms.docx",
ResourceUtil.
getResourceAsStream("/com/sibvisions/topdf/Forms.docx"));
byte[] byData = multipart.post(getServletService());
The conversion via PDFCreator works great, but not perfect. There are different problems with small page margins in Word documents, problems with OpenOffice documents, ...
The problem is not toPDF, because it works as good as PDFCreator does. If PDFCreator doesn't convert a document, toPDF has no chance to convert it.
We had problems with simple image conversions to PDF because default windows print dialog appeared and we didn't associate image extensions with another tool. We solved the conversion of images with iText instead of PDFCreator. Now it's possible to create PDFs from images very easily without pop-ups.
License?
AGPL 3.0, because PDFCreator is licensed under GPL and iText is licensed under AGPL.
Used tools and libraries
toPDF is a mixture of different open source projects:
PDFCreator
iText
RESTlet
Jackson
JVx
Apache commons FileUpload and IO
Jacob
PDFCreator4J
Installation?
- toPDF was written in Java, but the installation only makes sense on Windows (same requirements as PDFCreator)
- Install PDFCreator (default desktop installation, with COM)
- Deploy topdf.war on Tomcat or JBoss or your preferred Java application server. If your application server runs as windows service, be sure that it runs as OS user.