PDF Indexing

Questions regaring MySQL and PHP may go here
Forum rules
1) This is a user forum for Synology users to share experience/help out each other: if you need direct assistance from the Synology technical support team, please use the following form:

https://account.synology.com/support/su ... p?lang=enu



2) To avoid putting users' DiskStation at risk, please don't paste links to any patches provided by our Support team as we will systematically remove them. Our Support team will provide the correct patch for your DiskStation model.
carpens
I'm New!
I'm New!
Posts: 4
Joined: Sun Jan 24, 2010 12:26 am

PDF Indexing

Unread post by carpens » Sun Jan 24, 2010 12:38 am

Hello,

I am new on this forum.
Does anyone knows if pdftotext tool is included in Linux distributions provided with Synology servers?
I'd like to index pdf files with a search engine like phpdig.
I am afraid the answer is no, which means cross compilation is necessary to get the binaries. Has anyone ever tried that before?

thanks

User avatar
iugrifma
Versed
Versed
Posts: 279
Joined: Thu Apr 02, 2009 12:25 pm
Location: Hachau, Austria

Re: PDF Indexing

Unread post by iugrifma » Tue Feb 02, 2010 8:30 am

Hi carpens!

I had a little look for you this morning and pdftotext is NOT part of the base Syno distro.
BUT since you mention cross compilation I guessed you might be a modder and had a look in the ipkg
package list.
If you look at the wiki for pdftotext it says:
The pdftotext program is provided by the Poppler PDF rendering library; on most Linux distributions it is included as part of the poppler-utils package.

The older package called xpdf (from which Poppler is derived) also includes an implementation of pdftotext.
AND in the ipkg list of applications you'll find 'xpdf' so I guess there's a good chance you'll find pdftotext in that package. Even so, it might not be the latest version.

Grifffo. :wink:

* Model: RS-810+ * Firmware: 4.1-2567
* Model: DS-209+ * Firmware: 4.2-3246
* Modification(s) DS-209+ RAM 1Gb / RS810 RAM 3Gb / +Optware
* Drives Seagate DS209+: 2x ST31500341AS / RS810: 3x ST3000VN000-1H41.
* Network: 1000xBASE T, Full duplex. [Bonded on 810+]
* Services enabled: Most of them!

carpens
I'm New!
I'm New!
Posts: 4
Joined: Sun Jan 24, 2010 12:26 am

Re: PDF Indexing

Unread post by carpens » Wed Jun 30, 2010 5:13 pm

Hello

After much reading forums and trials I finally made it. Here are some tips that can save a lot of time to the readers of this post.
I downloaded the pdftotext source code and I installed the toolchain on my ds107. Cf this link:
http://forum.synology.com/wiki/index.ph ... #Bootstrap
Alternatively you can download with ipkg the pdftotext binary as part of the xpdf package
I installed the php search engine Sphider plus. It can be downloaded for free here:
http://code.google.com/p/sphiderplus/downloads/list
I followed the instructions in the readme very carefully but still no way to make the pdftotext working.
The binary file was working in command line. I finally found out that the php command exec() could not execute.
The reason is that the safe_mode_exec_dir is PHP.ini is not empty (even if php safe mode is disabled)
http://www.swisscenter.co.uk/component/ ... /catid,10/
So after it is done, pdf indexing works like a charm on my ds107
Thanks for you help grifffo!

User avatar
iugrifma
Versed
Versed
Posts: 279
Joined: Thu Apr 02, 2009 12:25 pm
Location: Hachau, Austria

Re: PDF Indexing

Unread post by iugrifma » Fri Aug 13, 2010 9:01 am

Great feedback on how you got on carpens, glad you got it working, will try it myself now you've done all the hard work :wink:

Griffo.

* Model: RS-810+ * Firmware: 4.1-2567
* Model: DS-209+ * Firmware: 4.2-3246
* Modification(s) DS-209+ RAM 1Gb / RS810 RAM 3Gb / +Optware
* Drives Seagate DS209+: 2x ST31500341AS / RS810: 3x ST3000VN000-1H41.
* Network: 1000xBASE T, Full duplex. [Bonded on 810+]
* Services enabled: Most of them!

Danelicious
I'm New!
I'm New!
Posts: 4
Joined: Fri Mar 04, 2016 1:08 am

Re: PDF Indexing

Unread post by Danelicious » Fri Mar 04, 2016 1:25 am

Hi there,

I really appreciate your post, but I don't get the last part:
"The reason is that the safe_mode_exec_dir is PHP.ini is not empty (even if php safe mode is disabled)"
How can I fix this? The link does not help, because the relevant/helping post includes a broken link. Could you explain how you fixed this problem?

I don't get any error messages when I try to index a pdf-link with Sphider Plus. Before I got some, but I changed the path to the right directory of the converter. Now I get no error message, but at the same time the pdf's don't get indexed.

I really would appreciate help on that. Any idea?

BTW: Model is DS215j

Thanks in advance.

Danelicious
I'm New!
I'm New!
Posts: 4
Joined: Fri Mar 04, 2016 1:08 am

Re: PDF Indexing

Unread post by Danelicious » Fri Mar 04, 2016 12:00 pm

I did it. Solution can be found here: https://forum.synology.com/enu/viewtopi ... 5&t=115095

User avatar
syno.dustin
Sorcerer
Sorcerer
Posts: 2244
Joined: Thu Oct 29, 2015 11:03 pm
Location: Seattle, WA

Re: PDF Indexing

Unread post by syno.dustin » Fri Mar 04, 2016 9:20 pm

The indexing in DSM6.0 can search inside PDFs: https://www.synology.com/en-us/dsm/6.0b ... anagement/

Indexing was completely rebuilt so it's not a huge resource hog. It should just be heavy during initial run (which is still faster) and not use up more resources later. Those of you running j-series I'd love to hear how the performance is in real life scenarios.
If you need technical support please use this form: https://account.synology.com/support/support_form.php
Synology does not consistently browse this forum for technical support, feature requests, or any other inquiries as it notes at the top of the page. Please use the proper channels when you need help from someone at Synology.

Post Reply

Return to “MySQL/PHP Mods”