WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: Title, tags, lang - where are they in a PDF document? Beginning or end?

for

Number of posts in this thread: 3 (In chronological order)

From: Corrine Schoeb
Date: Wed, May 24 2017 7:38AM
Subject: Title, tags, lang - where are they in a PDF document? Beginning or end?
No previous message | Next message →

We are working on creating a scan of PDF documents, some of which are 100+
pages. Rather than scan the full document to find out if it is tagged, has
a title and language we thought we might be able to do the first 5-10 pages
but I'm not sure where the title, tag, lang data is stored in a PDF.

So my question is, are title, tag, lang attributes of a PDF stored at the
beginning of a PDF or at the end?

--

Corrine Schoeb
Technology Accessibility Coordinator, ITS
610-957-6208

*** Swarthmore College ITS will never ask you for your password, including
by email. Please keep your passwords private to protect yourself and the
security of our network.

To learn more about web security visit
http://www.swarthmore.edu/its/security

From: Corrine Schoeb
Date: Wed, May 24 2017 9:12AM
Subject: Re: Fwd: Title, tags, lang - where are they in a PDF document? Beginning or end?
← Previous message | Next message →

Thank you to everyone who has responded so far.

I think I need to clarify - this is a scan using code not a physical
scanner. We've developed a scan for our Moodle instance. It can recognize
text vs. an image of text but we are working on refining that scan
further. Large documents take up a lot of cpu/memory so we are thinking we
might be able to limit our scan the first 5-10 pages to see if there is a
title, tags, etc. I'm just not sure where that data is stored - at the
beginning or at the end of the PDF.

I know this is very technical question and maybe obscure but I figured this
might be the right group.


---------- Forwarded message ----------
From: Corrine Schoeb < = EMAIL ADDRESS REMOVED = >
Date: Wed, May 24, 2017 at 9:38 AM
Subject: Title, tags, lang - where are they in a PDF document? Beginning or
end?
To: = EMAIL ADDRESS REMOVED =


We are working on creating a scan of PDF documents, some of which are 100+
pages. Rather than scan the full document to find out if it is tagged, has
a title and language we thought we might be able to do the first 5-10 pages
but I'm not sure where the title, tag, lang data is stored in a PDF.

So my question is, are title, tag, lang attributes of a PDF stored at the
beginning of a PDF or at the end?

--

Corrine Schoeb
Technology Accessibility Coordinator, ITS
610-957-6208 <(610)%20957-6208>

*** Swarthmore College ITS will never ask you for your password, including
by email. Please keep your passwords private to protect yourself and the
security of our network.

To learn more about web security visit http://www.swarthmore.
edu/its/security




--

Corrine Schoeb
Technology Accessibility Coordinator, ITS
610-957-6208

*** Swarthmore College ITS will never ask you for your password, including
by email. Please keep your passwords private to protect yourself and the
security of our network.

To learn more about web security visit
http://www.swarthmore.edu/its/security

From: Duff Johnson
Date: Wed, May 24 2017 12:35PM
Subject: Re: Title, tags, lang - where are they in a PDF document? Beginning or end?
← Previous message | No next message

Hi Corrine,

> I think I need to clarify - this is a scan using code not a physical
> scanner. We've developed a scan for our Moodle instance. It can recognize
> text vs. an image of text but we are working on refining that scan
> further. Large documents take up a lot of cpu/memory so we are thinking we
> might be able to limit our scan the first 5-10 pages to see if there is a
> title, tags, etc. I'm just not sure where that data is stored - at the
> beginning or at the end of the PDF.

If the question is: "how do I find out if this PDF is tagged?- The information denoting tags (structure elements) in the PDF are located in the body of the file. The nature of PDF is such that it's not easy to predict where in the file the information specific to structure elements (tags) is.

If your tool can parse, even a little, you would do well to spend a few minutes with the PDF specification - ISO 32000-1. It's available for free from here.

http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

The short version: you are looking for a 'structure element dictionary-. See clauses 14.7 and 14.8.

> I know this is very technical question and maybe obscure but I figured this
> might be the right group.

Not really. If you are interested in developing PDF technology you might consider joining the PDF Association - see pdfa.org. Full disclosure: I'm the Executive Director of the PDF Association. Feel free to ask me any questions offlist.

Thanks,

Duff.

>
> ---------- Forwarded message ----------
> From: Corrine Schoeb < = EMAIL ADDRESS REMOVED = >
> Date: Wed, May 24, 2017 at 9:38 AM
> Subject: Title, tags, lang - where are they in a PDF document? Beginning or
> end?
> To: = EMAIL ADDRESS REMOVED =
>
>
> We are working on creating a scan of PDF documents, some of which are 100+
> pages. Rather than scan the full document to find out if it is tagged, has
> a title and language we thought we might be able to do the first 5-10 pages
> but I'm not sure where the title, tag, lang data is stored in a PDF.
>
> So my question is, are title, tag, lang attributes of a PDF stored at the
> beginning of a PDF or at the end?
>
> --
>
> Corrine Schoeb
> Technology Accessibility Coordinator, ITS
> 610-957-6208 <(610)%20957-6208>
>
> *** Swarthmore College ITS will never ask you for your password, including
> by email. Please keep your passwords private to protect yourself and the
> security of our network.
>
> To learn more about web security visit http://www.swarthmore.
> edu/its/security
>
>
>
>
> --
>
> Corrine Schoeb
> Technology Accessibility Coordinator, ITS
> 610-957-6208
>
> *** Swarthmore College ITS will never ask you for your password, including
> by email. Please keep your passwords private to protect yourself and the
> security of our network.
>
> To learn more about web security visit
> http://www.swarthmore.edu/its/security
> > > >