Extract Images from an Excel Document

出至:http://stackoverflow.com/questions/5503015/extract-images-from-an-excel-document

First, use unoconv to convert the .xls to .pdf:

http://dag.wieers.com/home-made/unoconv/

On Ubuntu 10.10 command line:

sudo apt-get install unoconv
unoconv -f pdf file.xls
Then extract the images from the pdf using pdfimages (which seems to come bundled with Ubuntu):

http://en.wikipedia.org/wiki/Pdfimages

Back on the command line:

pdfimages file.pdf fileimage
And done! All of the images in the .xls are now in separate files in the directory. This could be done very easily on most Linux systems using your language of choice. In python, for example:

import subprocess
subprocess.call([‘unoconv’,’-f’,’pdf’,’file.xls’])
subprocess.call([‘pdfimages’,’file.pdf’,’fileimage’])

I would love to hear a simpler solution if somebody has one.
******************************************************************************************

If a excel file is a compressed file.(xlsx)

$ unzip file.xlsx

in xl/media/ are all pictures

Asp.net Response.write PDF to IE browser and got garbled

Asp.net response.write a binary pdf file to chrome is ok , but ie is fail that got garbled. Because ie must get server response a end tag.

在 Asp.net 寫出 PDF 檔至前端瀏覽器顯示。在 chrome 可以正常顯示,但在 IE 上無法檢視 僅出現亂碼。因為 IE 會跟你計較輸出資料串流時有沒有告知它串流已輸出完了。Response.End() 沒給 IE 就出亂碼。

程式片段

myConnection.Open();
SqlDataReader myReader = myCommand.ExecuteReader();

if (myReader.Read())
{
//Response.ContentType = myReader[“MIME"].ToString();
Response.ContentType = “application/pdf";
Response.BinaryWrite((byte[])myReader[“isoebb010″]);
Response.End();
}
else{
Response.Write(“你沒有權限瀏覽與列印所選取的文件。
如有任何疑問請聯繫資訊課人員。");
}

myReader.Close();
myConnection.Close();
[/sourcecode]

pdfcrack – PDF files password cracker

PDFCRACK(1)
NAME
pdfcrack – PDF files password cracker

SYNOPSIS
pdfcrack -f filename [options]

DESCRIPTION
pdfcrack is a simple tool for recovering passwords from pdf-documents. It should be able to handle all
pdfs that uses the standard security handler but the pdf-parsing routines are a bit of a quick hack so
you might stumble across some pdfs where the parser needs to be fixed to handle.

OPTIONS
-b, –bench
Perform benchmark and exit.

-c, –charset=STRING
Use the characters in STRING as charset.

-m, –maxpw=INTEGER
Stop when reaching INTEGER as password length.

-n, –minpw=INTEGER
Skip trying passwords shorter than INTEGER.

-l, –loadState=FILE
Continue from the state saved in FILENAME.

-o, –owner
Work with the ownerpassword.

-p, –password=STRING
Uses STRING as userpassword to speed up breaking ownerpassword (implies -o).

-q, –quiet
Run quietly.

-s, –permutate
Try permutating the passwords (currently only supports switching first character to uppercase).

-u, –user
Work with the userpassword (default).

-v, –version
Print version and exit.

-w, –wordlist=FILE
Use FILE as source of passwords to try.

AUTHOR
Written by Nacho Barrientos Arias <chipi@criptonita.com> for the Debian GNU/Linux system (but may be
used by others).

BUGS
Report bugs to Henning Noren <confusion42@users.sourceforge.net>.

COPYRIGHT
Copyright © 2006, Henning Noren <confusion42@users.sourceforge.net> – All Rights Reserved. This program
is free software; you can redistribute it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation, Inc.

http://sourceforge.net/projects/pdfcrack

 
pdfcrack 0.8 october 26, 2006 PDFCRACK(1)

use pdftk to set password for pdf file command line

PDFTK(1) PDFTK(1)

NAME
pdftk – A handy tool for manipulating PDF

SYNOPSIS
pdftk <input PDF files | – | PROMPT>
[ input_pw <input PDF owner passwords | PROMPT> ]
[ <operation> <operation arguments> ]
[ output <output filename | – | PROMPT> ]
[ encrypt_40bit | encrypt_128bit ]
[ allow <permissions> ]
[ owner_pw <owner password | PROMPT> ]
[ user_pw <user password | PROMPT> ]
[ flatten ] [ compress | uncompress ]
[ keep_first_id | keep_final_id ] [ drop_xfa ]
[ verbose ] [ dont_ask | do_ask ]
Where:
<operation> may be empty, or:
[ cat | shuffle | burst |
generate_fdf | fill_form |
background | multibackground |
stamp | multistamp |
dump_data | dump_data_utf8 |
dump_data_fields | dump_data_fields_utf8 |
update_info | update_info_utf8 |
attach_files | unpack_files ]

For Complete Help: pdftk –help

DESCRIPTION
If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-
decoder-ring, and X-Ray-glasses. Pdftk is a simple tool for doing everyday things with PDF documents.
Use it to:

* Merge PDF Documents or Collate PDF Page Scans
* Split PDF Pages into a New Document
* Rotate PDF Documents or Pages
* Decrypt Input as Necessary (Password Required)
* Encrypt Output as Desired
* Fill PDF Forms with X/FDF Data and/or Flatten Forms
* Generate FDF Data Stencils from PDF Forms
* Apply a Background Watermark or a Foreground Stamp
* Report PDF Metrics such as Metadata and Bookmarks
* Update PDF Metadata
* Attach Files to PDF Pages or the PDF Document
* Unpack PDF Attachments
* Burst a PDF Document into Single Pages
* Uncompress and Re-Compress Page Streams
* Repair Corrupted PDF (Where Possible)

OPTIONS
A summary of options is included below.

–help, -h
Show summary of options.

<input PDF files | – | PROMPT>
A list of the input PDF files. If you plan to combine these PDFs (without using handles) then
list files in the order you want them combined. Use – to pass a single PDF into pdftk via
stdin. Input files can be associated with handles, where a handle is a single, upper-case let‐
ter:

<input PDF handle>=<input PDF filename>

Handles are often omitted. They are useful when specifying PDF passwords or page ranges, later.

For example: A=input1.pdf B=input2.pdf

[input_pw <input PDF owner passwords | PROMPT>]
Input PDF owner passwords, if necessary, are associated with files by using their handles:

<input PDF handle>=<input PDF file owner password>

If handles are not given, then passwords are associated with input files by order.

Most pdftk features require that encrypted input PDF are accompanied by the ~owner~ password. If
the input PDF has no owner password, then the user password must be given, instead. If the
input PDF has no passwords, then no password should be given.

When running in do_ask mode, pdftk will prompt you for a password if the supplied password is
incorrect or none was given.

[<operation> <operation arguments>]
If this optional argument is omitted, then pdftk runs in ‘filter’ mode. Filter mode takes only
one PDF input and creates a new PDF after applying all of the output options, like encryption
and compression.

Available operations are: cat, shuffle, burst, generate_fdf, fill_form, background, multiback‐
ground, stamp, multistamp, dump_data, dump_data_utf8, dump_data_fields, dump_data_fields_utf8,
update_info, update_info_utf8, attach_files, unpack_files. Some operations takes additional
arguments, described below.

cat [<page ranges>]
Catenates pages from input PDFs to create a new PDF. Page order in the new PDF is specified
by the order of the given page ranges. Page ranges are described like this:

<input PDF handle>[<begin page number>[-<end page number>[<qualifier>]]][<page rotation>]

Where the handle identifies one of the input PDF files, and the beginning and ending page
numbers are one-based references to pages in the PDF file, and the qualifier can be even or
odd, and the page rotation can be N, S, E, W, L, R, or D.

If the handle is omitted from the page range, then the pages are taken from the first input
PDF.

The even qualifier causes pdftk to use only the even-numbered PDF pages, so 1-6even yields
pages 2, 4 and 6 in that order. 6-1even yields pages 6, 4 and 2 in that order.

The odd qualifier works similarly to the even.

The page rotation setting can cause pdftk to rotate pages and documents. Each option sets
the page rotation as follows (in degrees): N: 0, E: 90, S: 180, W: 270, L: -90, R: +90, D:
+180. L, R, and D make relative adjustments to a page’s rotation.

If no arguments are passed to cat, then pdftk combines all input PDFs in the order they were
given to create the output.

NOTES:
* <end page number> may be less than <begin page number>.
* The keyword end may be used to reference the final page of a document instead of a page
number.
* Reference a single page by omitting the ending page number.
* The handle may be used alone to represent the entire PDF document, e.g., B1-end is the same
as B.

Page Range Examples w/o Handles:
1-endE – rotate entire document 90 degrees
5 11 20 – take single pages from input PDF
5-25oddW – take odd pages in range, rotate 90 degrees
6-1 – reverse pages in range from input PDF

Page Range Examples Using Handles:
Say A=in1.pdf B=in2.pdf, then:
A1-21 – take range from in1.pdf
Bend-1odd – take all odd pages from in2.pdf in reverse order
A72 – take a single page from in1.pdf
A1-21 Beven A72 – assemble pages from both in1.pdf and in2.pdf
AW – rotate entire in1.pdf document 90 degrees
B – use all of in2.pdf
A2-30evenL – take the even pages from the range, remove 90 degrees from each page’s rotation
A A – catenate in1.pdf with in1.pdf
AevenW AoddE – apply rotations to even pages, odd pages from in1.pdf
AW BW BD – catenate rotated documents

shuffle [<page ranges>]
Collates pages from input PDFs to create a new PDF. Works like the cat operation except that
it takes one page at a time from each page range to assemble the output PDF. If one range
runs out of pages, it continues with the remaining ranges. Ranges can use all of the fea‐
tures described above for cat, like reverse page ranges, multiple ranges from a single PDF,
and page rotation. This feature was designed to help collate PDF pages after scanning paper
documents.

burst Splits a single, input PDF document into individual pages. Also creates a report named
doc_data.txt which is the same as the output from dump_data. If the output section is omit‐
ted, then PDF pages are named: pg_%04d.pdf, e.g.: pg_0001.pdf, pg_0002.pdf, etc. To name
these pages yourself, supply a printf-styled format string via the output section. For exam‐
ple, if you want pages named: page_01.pdf, page_02.pdf, etc., pass output page_%02d.pdf to
pdftk. Encryption can be applied to the output by appending output options such as owner_pw,
e.g.:

pdftk in.pdf burst owner_pw foopass

generate_fdf
Reads a single, input PDF file and generates an FDF file suitable for fill_form out of it to
the given output filename or (if no output is given) to stdout. Does not create a new PDF.

fill_form <FDF data filename | XFDF data filename | – | PROMPT>
Fills the single input PDF’s form fields with the data from an FDF file, XFDF file or stdin.
Enter the data filename after fill_form, or use – to pass the data via stdin, like so:

pdftk form.pdf fill_form data.fdf output form.filled.pdf

After filling a form, the form fields remain interactive unless you also use the flatten out‐
put option. flatten merges the form fields with the PDF pages. You can use flatten alone,
too, but only on a single PDF:

pdftk form.pdf fill_form data.fdf output out.pdf flatten

or:

pdftk form.filled.pdf output out.pdf flatten

If the input FDF file includes Rich Text formatted data in addition to plain text, then the
Rich Text data is packed into the form fields as well as the plain text. Pdftk also sets a
flag that cues Acrobat/Reader to generate new field appearances based on the Rich Text data.
That way, when the user opens the PDF, the viewer will create the Rich Text fields on the
spot. If the user’s PDF viewer does not support Rich Text, then the user will see the plain
text data instead. If you flatten this form before Acrobat has a chance to create (and save)
new field appearances, then the plain text field data is what you’ll see.

background <background PDF filename | – | PROMPT>
Applies a PDF watermark to the background of a single input PDF. Pass the background PDF’s
filename after background like so:

pdftk in.pdf background back.pdf output out.pdf

Pdftk uses only the first page from the background PDF and applies it to every page of the
input PDF. This page is scaled and rotated as needed to fit the input page. You can use –
to pass a background PDF into pdftk via stdin.

If the input PDF does not have a transparent background (such as a PDF created from page
scans) then the resulting background won’t be visible — use the stamp operation instead.

multibackground <background PDF filename | – | PROMPT>
Same as the background operation, but applies each page of the background PDF to the corre‐
sponding page of the input PDF. If the input PDF has more pages than the stamp PDF, then the
final stamp page is repeated across these remaining pages in the input PDF.

stamp <stamp PDF filename | – | PROMPT>
This behaves just like the background operation except it overlays the stamp PDF page on top
of the input PDF document’s pages. This works best if the stamp PDF page has a transparent
background.

multistamp <stamp PDF filename | – | PROMPT>
Same as the stamp operation, but applies each page of the background PDF to the corresponding
page of the input PDF. If the input PDF has more pages than the stamp PDF, then the final
stamp page is repeated across these remaining pages in the input PDF.

dump_data
Reads a single, input PDF file and reports various statistics, metadata, bookmarks (a/k/a
outlines), and page labels to the given output filename or (if no output is given) to stdout.
Non-ASCII characters are encoded as XML numerical entities. Does not create a new PDF.

dump_data_utf8
Same as dump_data excepct that the output is encoded as UTF-8.

dump_data_fields
Reads a single, input PDF file and reports form field statistics to the given output filename
or (if no output is given) to stdout. Non-ASCII characters are encoded as XML numerical enti‐
ties. Does not create a new PDF.

dump_data_fields_utf8
Same as dump_data_fields excepct that the output is encoded as UTF-8.

update_info <info data filename | – | PROMPT>
Changes the metadata stored in a single PDF’s Info dictionary to match the input data file.
The input data file uses the same syntax as the output from dump_data. Non-ASCII characters
should be encoded as XML numerical entities. This does not change the metadata stored in the
PDF’s XMP stream, if it has one. For example:

pdftk in.pdf update_info in.info output out.pdf

update_info_utf8 <info data filename | – | PROMPT>
Same as update_info except that the input is encoded as UTF-8.

attach_files <attachment filenames | PROMPT> [to_page <page number | PROMPT>]
Packs arbitrary files into a PDF using PDF’s file attachment features. More than one attach‐
ment may be listed after attach_files. Attachments are added at the document level unless the
optional to_page option is given, in which case the files are attached to the given page num‐
ber (the first page is 1, the final page is end). For example:

pdftk in.pdf attach_files table1.html table2.html to_page 6 output out.pdf

unpack_files
Copies all of the attachments from the input PDF into the current folder or to an output
directory given after output. For example:

pdftk report.pdf unpack_files output ~/atts/

or, interactively:

pdftk report.pdf unpack_files output PROMPT

[output <output filename | – | PROMPT>]
The output PDF filename may not be set to the name of an input filename. Use – to output to std‐
out. When using the dump_data operation, use output to set the name of the output data file.
When using the unpack_files operation, use output to set the name of an output directory. When
using the burst operation, you can use output to control the resulting PDF page filenames
(described above).

[encrypt_40bit | encrypt_128bit]
If an output PDF user or owner password is given, output PDF encryption strength defaults to 128
bits. This can be overridden by specifying encrypt_40bit.

[allow <permissions>]
Permissions are applied to the output PDF only if an encryption strength is specified or an
owner or user password is given. If permissions are not specified, they default to ‘none,’
which means all of the following features are disabled.

The permissions section may include one or more of the following features:

Printing
Top Quality Printing

DegradedPrinting
Lower Quality Printing

ModifyContents
Also allows Assembly

Assembly

CopyContents
Also allows ScreenReaders

ScreenReaders

ModifyAnnotations
Also allows FillIn

FillIn

AllFeatures
Allows the user to perform all of the above, and top quality printing.

[owner_pw <owner password | PROMPT>]

[user_pw <user password | PROMPT>]
If an encryption strength is given but no passwords are supplied, then the owner and user pass‐
words remain empty, which means that the resulting PDF may be opened and its security parameters
altered by anybody.

[compress | uncompress]
These are only useful when you want to edit PDF code in a text editor like vim or emacs. Remove
PDF page stream compression by applying the uncompress filter. Use the compress filter to
restore compression.

[flatten]
Use this option to merge an input PDF’s interactive form fields (and their data) with the PDF’s
pages. Only one input PDF may be given. Sometimes used with the fill_form operation.

[keep_first_id | keep_final_id]
When combining pages from multiple PDFs, use one of these options to copy the document ID from
either the first or final input document into the new output PDF. Otherwise pdftk creates a new
document ID for the output PDF. When no operation is given, pdftk always uses the ID from the
(single) input PDF.

[drop_xfa]
If your input PDF is a form created using Acrobat 7 or Adobe Designer, then it probably has XFA
data. Filling such a form using pdftk yields a PDF with data that fails to display in Acrobat 7
(and 6?). The workaround solution is to remove the form’s XFA data, either before you fill the
form using pdftk or at the time you fill the form. Using this option causes pdftk to omit the
XFA data from the output PDF form.

This option is only useful when running pdftk on a single input PDF. When assembling a PDF from
multiple inputs using pdftk, any XFA data in the input is automatically omitted.

[verbose]
By default, pdftk runs quietly. Append verbose to the end and it will speak up.

[dont_ask | do_ask]
Depending on the compile-time settings (see ASK_ABOUT_WARNINGS), pdftk might prompt you for fur‐
ther input when it encounters a problem, such as a bad password. Override this default behavior
by adding dont_ask (so pdftk won’t ask you what to do) or do_ask (so pdftk will ask you what to
do).

When running in dont_ask mode, pdftk will over-write files with its output without notice.

EXAMPLES
Collate scanned pages
pdftk A=even.pdf B=odd.pdf shuffle A B output collated.pdf
or if odd.pdf is in reverse order:
pdftk A=even.pdf B=odd.pdf shuffle A Bend-1 output collated.pdf

Decrypt a PDF
pdftk secured.pdf input_pw foopass output unsecured.pdf

Encrypt a PDF using 128-bit strength (the default), withhold all permissions (the default)
pdftk 1.pdf output 1.128.pdf owner_pw foopass

Same as above, except password ‘baz’ must also be used to open output PDF
pdftk 1.pdf output 1.128.pdf owner_pw foo user_pw baz

Same as above, except printing is allowed (once the PDF is open)
pdftk 1.pdf output 1.128.pdf owner_pw foo user_pw baz allow printing

Join in1.pdf and in2.pdf into a new PDF, out1.pdf
pdftk in1.pdf in2.pdf cat output out1.pdf
or (using handles):
pdftk A=in1.pdf B=in2.pdf cat A B output out1.pdf
or (using wildcards):
pdftk *.pdf cat output combined.pdf

Remove ‘page 13’ from in1.pdf to create out1.pdf
pdftk in.pdf cat 1-12 14-end output out1.pdf
or:
pdftk A=in1.pdf cat A1-12 A14-end output out1.pdf

Apply 40-bit encryption to output, revoking all permissions (the default). Set the owner PW to
‘foopass’.
pdftk 1.pdf 2.pdf cat output 3.pdf encrypt_40bit owner_pw foopass

Join two files, one of which requires the password ‘foopass’. The output is not encrypted.
pdftk A=secured.pdf 2.pdf input_pw A=foopass cat output 3.pdf

Uncompress PDF page streams for editing the PDF in a text editor (e.g., vim, emacs)
pdftk doc.pdf output doc.unc.pdf uncompress

Repair a PDF’s corrupted XREF table and stream lengths, if possible
pdftk broken.pdf output fixed.pdf

Burst a single PDF document into pages and dump its data to doc_data.txt
pdftk in.pdf burst

Burst a single PDF document into encrypted pages. Allow low-quality printing
pdftk in.pdf burst owner_pw foopass allow DegradedPrinting

Write a report on PDF document metadata and bookmarks to report.txt
pdftk in.pdf dump_data output report.txt

Rotate the first PDF page to 90 degrees clockwise
pdftk in.pdf cat 1E 2-end output out.pdf

Rotate an entire PDF document to 180 degrees
pdftk in.pdf cat 1-endS output out.pdf

NOTES
The pdftk home page permalink is:
http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
The easy-to-remember shortcut is: http://www.pdftk.com

AUTHOR
Sid Steward (sid.steward at pdflabs dot com) maintains pdftk. Please email him with questions or bug
reports. Include pdftk in the subject line to ensure successful delivery. Thank you.

 

October 28, 2010 PDFTK(1)

PDF 數位簽章有效性不詳的解決方法

PDF 數位簽章有效性不詳的解決方法

如何確認數位簽章,辦公時經常收到一些電子文檔,而PDF格式的電子文檔用得算是比較廣泛的。而這些文件出於安全信任原因可以使用安全密碼或電子簽章,當用戶收到一份帶有電子簽章的文檔時,如果沒有設定即會出現如下圖標。那麼如何讓這個問號變成對號顯示為勾勾,可參考如下說明。

以下的操作會因為新舊版本的差異,所以介面也會略有不同,但大致上的做法是將手動將數位簽名加入信任的簽章裏。

1.打開PDF檔後出現會在數位簽名的位置出現「有效性不詳」的一個問號

2.在顯示問號中雙擊鼠標,點擊“簽名內容"

3.顯示認證
4.新增至信任身分
5.Acrobat 保全
6.視窗確定到底直到關閉簽名內容視窗,接著重新點擊數位簽名就能看到綠色的勾勾了。

注意這個電子簽章的確認,僅僅只是針對此簽章發行人有效。如為其它發行人簽章,你必需按上面流程將其加入到你的電腦。而且此作法僅對你所設定的電腦有效。

想刪除手動新增的信任簽名憑證,可以到「Documents and Settings\Administrator\Application Data\Adobe\Acrobat\9.0\Security」路徑下,將 「addressbook.acrodata」 的檔案刪除即可。

Ubuntu chromium pdf-plugin use google-chrome libpdf.so

預設 Ubuntu 上所安裝 Chromium 最初是沒有供檢視 PDF 檔案的插件存在,也就是說無法在瀏覽器上直接開啟檢視 PDF 類型的檔案。但我們可以自行安裝 「檢視 PDF的插件」,而這個插件可以由 google-chrome 的安裝檔案中取得。我們只要複製 google 提供的 libpdf.so 這個插件,並將它貼到 chrimium 插件放置的目錄下(/usr/lib/chromium-browser/)。然後重新啟動 chromium 並在網址列鍵入「about:plugins」,進入到插件設置頁面修改啟用剛剛加入的PDF插件即可。

實際操作如下:

上面操作片段文章來源:Chrome PDF Plugin in Ubuntu – How To Enable

除了上述的方法外,可以在「Chrome 線上應用程式商店」中找到安裝「gpdf」擴充套件,它可以透過 google docs viewer 在瀏覽器中線上檢視 PDF 等文件檔案。注意:檢視文件的路徑位置必須要對外開放,如此才能讓 google docs viewer 開啟。

功能說明:
gPDF automatically sets links pointing to RTF, ODT, ODS, ODP, CSV, SXW, SXC and SXI files to open with Zoho Viewer; and links pointing to PDF, DOC, DOCX, XLS, XLSX, PPT and PPTX files to open with Google Docs Viewer.