用 Shell 在 PDF 檔案中,找出指定字串

From: http://stackoverflow.com/questions/14449968/find-string-inside-pdf-with-shell

Find string inside pdf with shell

As nicely pointed by Simon, you can simply convert the pdf to plain text using pdftotext, and then, just search for what you’re looking for.

After conversion, you may use grep, bash regex, or any variation you want:

while read line; do

    if [[ ${line} =~ [0-9]{4}(-[0-9]{2}){2} ]]; then
        echo ">>> Found date;";
    fi

done <<(pdftotext infile.pdf -)

發表迴響

在下方填入你的資料或按右方圖示以社群網站登入:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 變更 )

Twitter picture

You are commenting using your Twitter account. Log Out / 變更 )

Facebook照片

You are commenting using your Facebook account. Log Out / 變更 )

Google+ photo

You are commenting using your Google+ account. Log Out / 變更 )

連結到 %s