🇬🇧 PID manager

Home » <span title="🇬🇧 PID manager"> 🇬🇧 PID manager


🇫🇷 Version fran­çaise

Logo
Source : Twitter / CC BY

This page is for web desi­gners (inclu­ding myself) eager to disco­ver (or remem­ber) the design of this website with respect to Persistent IDentifiers (PID lien:o5wn).

PID is a concept worked out by mana­gers of repo­si­to­ries for the long-term preser­va­tion of digi­tal docu­ments. A well-known example in acade­mic circles is the “DOI” (Digital Object Identifier) assi­gned to every publi­ca­tion in a scien­ti­fic journal.

For instance, a paper listed by Frontiers in Neurology under URL
https://​www​.fron​tier​sin​.org/​a​r​t​i​c​l​e​s​/​1​0​.​3​3​8​9​/​f​n​e​u​r​.​2​0​1​8​.​0​0​9​5​2​/​f​ull has been assi­gned
“doi:10.3389/fneur.2018.00952”.

Following link https://​doi​.org/​1​0​.​3​3​8​9​/​f​n​e​u​r​.​2​0​1​8​.​0​0​952 should redi­rect to the paper’s web page… Archive mana­gers are aware that the target URL is bound to vary in the long term because Internet domains and paths leading to docu­ments are not persistent. The advan­tage of linking via a DOI is that the regis­trant (Frontiers in Neurology) takes care of main­tai­ning its redi­rec­tion to a valid loca­tion. Thus, a DOI is a type of Persistent IDentifier : a long-lasting refe­rence to a docu­ment, file, web page, or other object (lien:o5wn).

Another type of PID regis­tra­tion used for acade­mic archives is the Handle system (lien:8bby). Unlike DOI, Handles are free of charge and may be set up for proce­du­ral access to archi­val items in an archive.

A digi­tal object may be assi­gned seve­ral PIDs. Each one points at the object’s loca­tion or at another PID forwar­ding queries…

Sommaire

A ‘local’ PID service

DOIs are mana­ged by the DOI Foundation (lien:p7it) via a tech­ni­cal infra­struc­ture dedi­ca­ted to publi­ca­tions in acade­mic jour­nals. However, it makes sense to create and manage PIDs on any domain which claims to be ‘persistent’ and is used by a rela­ted device. This is the case of the leti​.lt domain asso­cia­ted with the LeBonheurEstPossible.org website, as both are hosted on the same account.

PID mana­ger is working on the leti​.lt domain listing paths to resources cited on the website. The core of this server is a table whose records contain the unique iden­ti­fier and its target URL. Every record also includes a brief descrip­tion to be displayed on the title field of HTML links, and optio­nally a complete biblio­gra­phi­cal refe­rence named the content. (This voca­bu­lary is mine.)

PIDs on leti​.lt are 4‑character strings, for instance ‘dyig’ poin­ting at the Frontiers in Neurology paper mentio­ned above. Thanks to the PID resol­ver, link https://​leti​.lt/​d​yig should be redi­rec­ted to the paper. In plain text, this is summa­ri­zed as lien:dyig or link:dyig. (The word « lien » means “link” in French.)

The complete list of PIDs is avai­lable on page https://​leti​.lt/​l​ist. Search for ‘dyig’ to find the article cited as an example.

Below are two entries in the data­base, ‘6ran’ and ‘rm0y’, the former without content and the latter with a content (tagged with a red disk 🔴):

Two entries in the PID database

In the left­most column, expres­sion (lien:6ran) can be copied and pasted to any text on a website. It is a good idea to include paren­theses so that it will not be trun­ca­ted. Depending on the context, PID mana­ger will decide whether these paren­theses need to be preser­ved. Bibliographic refe­rences can be more expli­cit, for instance (Blekkenhorst L et al., 2015 lien:6ran).

The targets of PIDs on leti​.lt should be DOIs whene­ver these are avai­lable. Sometimes I prefer to store the (rela­ti­vely stable) URL of a publi­ca­tion when it leads to a down­loa­dable version shared by the authors. It is also not certain that a DOI poin­ting at a preprint will remain the same after the publi­ca­tion. Adding to the confu­sion, archi­vists some­times bind a DOI to the PDF full text instead of the arti­cle’s descrip­tion, which is a bad practice.

A plugin Broken Link Checker takes care of signa­ling obso­lete target URLs. On page https://​leti​.lt/​l​ist, admins have access to buttons opening an editor for target URLs, PID descrip­tions, or even to delete a PID in a secure manner.

All links of page https://​leti​.lt/​l​ist contain title fields. Therefore, drag­ging the cursor over link ‘6ran’ displays its descrip­tion : “Dietary satu­ra­ted fat intake and athe­ros­cle­ro­tic vascu­lar disease morta­lity in elderly women : a pros­pec­tive cohort study (version 2019-12-19)”. Unfortunately, this field is dele­ted when pasting the link to a page in the current WordPress editor — unless copy/paste is done in HTML code… PID mana­ger takes care of resto­ring titles on final links.

The date “(version 2019-12-19)” appea­ring at the end of a title is that of crea­ting the PID entry.

Creating a PID

At the bottom of page https://​leti​.lt/​l​ist, a link can be clicked to log in as an admi­nis­tra­tor of the PID service. Links will then appear leading to the PID crea­tion page. Suppose that we need a PID for “Jukka K. Korpela : IT and commu­ni­ca­tion” whose URL is http://​jkor​pela​.fi/. We enter the two parameters :

Creating a PHD
Entering parameters to create a PID

After clicking “Create PID” we see the result :

PID has been created
PID ‘1pbb’ has been created.

Note that the target access proto­col is ‘http’, not ‘https’. The secure proto­col ‘https’ has become the stan­dard for websites though it may not have been imple­men­ted on old ones. PID mana­ger tries to open the URL in both proto­cols and compares their outcomes in numbers of bytes. Whenever ‘https’ yields an equal or larger size it is recor­ded in the data­base. On page https://​leti​.lt/​l​ist, URLs which have not yet been acces­sed under ‘https’ are high­ligh­ted in yellow. The admi­nis­tra­tor needs to click an ‘update’ button from time to time until the ‘https’ proto­col has been confirmed.

Whatever the proto­col, PID1ppb’ has now been crea­ted for Jukka’s website. Now, (lien:1pbb) takes us to the target site. In case the randomly gene­ra­ted code sounds weird, click ‘Choose a different PID’ to get a different one.

PID mana­ger warns the admin in case a PID linking to the same URL is already exis­ting. An option to change the descrip­tion is offe­red. To achieve a suitable matching of iden­ti­cal targets, URLs are stan­dar­di­zed. For instance, copying the URL of Wikipedia page « Corticoïde » makes it appear as :
https://fr.wikipedia.org/wiki/Cortico%C3%AFde

But PID mana­ger will store it as :
https://fr.wikipedia.org/wiki/Corticoïde
which is equi­va­lent and easier to read. This case occurs with every URL contai­ning Unicode charac­ters outside the English alphabet.

In addi­tion, PID mana­ger may cut out the end of a URL at the point it is no longer signi­fi­cant. This is the case with pages ending up with ‘?utmsource=…’ or ‘?fbclid=…’ — both allo­wing the site owner to trace whoe­ver acces­sed their page !

Running PID manager

After being iden­ti­fied as an admin on the PID server, access is gran­ted to the list of posts/pages with a few actio­nable buttons :

  • Add URL” will add to the list a page/post of the website ;
  • Delete” will suppress a page/post from the list ;
  • Run PID mana­ger” will launch PID Manager on the page/post.
PID manager: List of posts/pages

No para­me­ter needs to be set for running PID mana­ger as all options have already been writ­ten on the page (read below). Click “Run PID mana­ger” close to the page that needs to be processed :

Running PID manager on the “detoxination” page

After running, PID mana­ger displays the proces­sed page on a new window and returns to the list of post/pages. An “Undo” button has been crea­ted, making it possible to cancel the whole operation.

After running PID manager

Note that PID mana­ger works on pages and posts compliant with the current WordPress editor (Gutenberg). Warnings are displayed on attempts to run it on a Classic design.

Basic operation

Two types of text docu­ments may be publi­shed on WordPress websites, namely pages and posts. The diffe­rence lies in clas­si­fi­ca­tion systems. Posts use cate­go­ries and tend to be listed in a chro­no­lo­gi­cal order, whereas pages are suppo­sed to be perma­nent. These points bear no rele­vance to PID mana­ger as it only takes care of the content of the page or post. In what follows, the word “page” is used to desi­gnate either type.

Effect of running PID manager for the first time

Displaying PIDs in the body of a text makes reading a bit unplea­sant, as shown on the left side of the picture above. This was an incen­tive to imple­ment PID mana­ger which renders shor­ter links in the form of foot­note calls.

Let us take for instance the sentence :

There is an esca­la­ting debate over the value and vali­dity of memory-based dietary assess­ment methods (Archer E et al., 2018 lien:5ys0).

After being proces­sed by PID mana­ger, it will read :

There is an esca­la­ting debate over the value and vali­dity of memory-based dietary assess­ment methods (Archer E et al., 2018N6).

This process takes care of (presu­ma­bly) all syntac­tic variants, deci­ding to keep or remove paren­theses, and where super­script is appro­priate for a compact and compre­hen­sive occur­rence of the PID in its context. Mixed levels are some­times requi­red, such as for instance :

Une autre patho­lo­gie qui n’est pas abor­dée dans cet article est l’anévrisme (N8 ; vidéoN9).

French typo­gra­phy is not bound by the conven­tion of Modern Language Association (lien:9t7t) stating that super­script numbers within the text should be placed outside any punc­tua­tion that might be present. This conven­tion is follo­wed by PID mana­ger as an option for English text. A multi­lin­gual version is under study.

It is possible to run PID mana­ger seve­ral time on the same page. No change will occur unless its content has been modi­fied. Formally if we call f the func­tion of PID mana­ger, we can write : f ○ f = f

Uncategorized links (notes)

By default, foot­note calls are label­led N1, N2 etc. and listed at the bottom of the page along with their descrip­tions. See for example the bottom of page Vivre bien et longtemps. Let us convene that ‘N’ means ‘note’. These links are unca­te­go­ri­zed. Different labe­lings such as A1…, B1… etc. are used in speci­fic contexts (read below).

If you don’t want to display the list as foot­notes, type _no_footnotes at the end of the page. This instruc­tion will be picked up by PID mana­ger and saved for further use as an invi­sible phrase : <div id=“_no_footnotes”></div>. The phrase remains visible in the WordPress editor and can be deleted.

Links format­ted by PID mana­ger are user-friendly : they display their descrip­tions on “mouse-over” (the title fields of HTML links) and a new window is opened on clicking the link :

Clicking link opens its target in a specific window
Clicking link ‘N186’ opened its target in a specific window

Opening new windows will work even with brow­sers set up to blocking pop-ups.

Categorized links (bibliographic entries)

Web pages should contain expli­cit biblio­gra­phic entries readable on paper prints. This is the case with page Soigner ses artères. Entries have been initially edited as follows :

Bibliographic entries, source

After being proces­sed by PID mana­ger they appear as :

Bibliographic entries, processed

In this process, foot­note calls ‘A1’ and ‘A2’ have been crea­ted which are distinct from ‘N1’, ‘N2’ etc.

In the body of the text, all calls previously label­led lien:exjf have been repla­ced with A1, gene­rally displayed in super­script. Moving the mouse over ‘A1’ displays the descrip­tion (title field) and clicking it opens a window contai­ning its target.

The ‘’ sign close to the H2, H3 or H4 header tag of a biblio­gra­phic list (i.e. <h2>✓ or <h3>✓ or <h4>✓) tells PID mana­ger that the list should be cate­go­ri­zed.

Thesign is a Unicode charac­ter, not a glyph. You may ignore thesign, an optio­nal link jumping back to the table of contents.

PID mana­ger does not only reshape foot­note calls and entries. It also stores the full entry (the content) into the PID data­base. It reads the entry in the biblio­gra­phy and compares its length with both the descrip­tion and any content already saved in the data­base. The longest content is saved if it is different from the descrip­tion.

Conversely, the content stored in the data­base will be copied to the biblio­gra­phic entry if it is longer than the current entry. This can be used to construct biblio­gra­phies in a very quick way. For instance, just type :

After running PID mana­ger we get full entries because the contents of ‘exjf’ and ‘3a5m’ had already been stored in the database :

  • A1 · exjf · Alehagen, U et al. (2015). Reduced Cardiovascular Mortality 10 Years after Supplementation with Selenium and Coenzyme Q10 for Four Years : Follow-Up Results of a Prospective Randomized Double-Blind Placebo-Controlled Trial in Elderly Citizens. PLOS (on line).
  • A2 · 3a5m · Allan, NJR (1990). Household Food Supply in Hunza Valley, Pakistan. Geographical Review 80, 4, Oct.: 399–415.

Note that the garbage text writ­ten after “lien:3a5m” on this demo should be shor­ter than the descrip­tion of PID3a5m’, other­wise it would be picked up and stored as a new content.

You can create seve­ral biblio­gra­phi­cal lists on the same page for different cate­go­ries of publi­ca­tions. For instance, on page Statines et médicaments anticholestérol there are 3 cate­go­ries of publi­ca­tions assi­gned prefixes A, B and C. You may use ‘A’ for ‘articles’ and ‘B’ for ‘books’… Since ‘N’ is reser­ved to unca­te­go­ri­zed PID entries, please contact me if you need to go beyond ‘M’!

Alphabetic order

By default, cate­go­ri­zed biblio­gra­phic entries are kept in the order they have been ente­red. There is an option for alpha­be­tic sorting a biblio­gra­phi­cal list. An example of sorted entries is page Pourquoi diminuer le cholestérol ?. Conversely, entries had been sorted chro­no­lo­gi­cally in Faut-il jeter les enquêtes nutritionnelles ? and PID mana­ger did not change their order.

To acti­vate this option, type “_alpha” in the headers of blocks contai­ning cate­go­ri­zed biblio­gra­phic entries that need to be sorted alpha­be­ti­cally, for instance :

<h2>✓ Ouvrages _alpha

This instruc­tion will be read by PID mana­ger and saved for further use as an invi­sible phrase such as <div id=“_alphabetic_order_1”></div>. This phrase remains visible as a HTML block in the Gutenberg editor and can be deleted.

Shared bibliography

Bibliographic entries crea­ted on a page are auto­ma­ti­cally repro­du­ced as unca­te­go­ri­zed links (notes) in other pages mentio­ning the same PIDs.

Look for instance at page Cancer - conclusion et références which contains all refe­rences cited on seve­ral rela­ted pages. This page contains the follo­wing reference :

  • A8 · sfm0 · Blasco, MT et al. (2019). Complete Regression of Advanced Pancreatic Ductal Adenocarcinomas upon Combined Inhibition of EGFR and C‑RAF. Cancer Cell, 35, 4 : 573–587. doi:10.1016/j.ccell.2019.03.002.

A call to PID sfm0 is found on page Cancer - nouvelles pistes and label­led ‘N106’. PID mana­ger repro­du­ced this entry exactly at the bottom of the page :

  • N106 · sfm0 · Blasco, MT et al. (2019). Complete Regression of Advanced Pancreatic Ductal Adenocarcinomas upon Combined Inhibition of EGFR and C‑RAF. Cancer Cell, 35, 4 : 573–587. doi:10.1016/j.ccell.2019.03.002.

These biblio­gra­phic entries may be copied by hand on the same page to construct lists of cate­go­ri­zed links for articles (A), books (B) etc. These will be optio­nally sorted alpha­be­ti­cally. Once an entry is listed as a cate­go­ri­zed link, PID mana­ger no longer includes it as unca­te­go­ri­zed (N) — dupli­ca­ting foot­notes would be ugly…

The contents of these entries are synchro­ni­zed each time PID mana­ger is run on the pages on which they are displayed.

Cleaning-up references

By default, PID mana­ger cleans up refe­rences. Every cate­go­ri­zed link (biblio­gra­phic entry) is displayed as stri­ke­through text if it does not appear in the text of the page. For instance :

Striked bibliographic entries

To deac­ti­vate this clean-up, enter “_no_strike” anyw­here on the page. This instruc­tion will be picked up by PID mana­ger and saved for further use as an invi­sible phrase : <div id=“_no_strike”></div>. The phrase remains visible in the WordPress editor and can be deleted.

In addi­tion, if you want these biblio­gra­phic entries to remain unchan­ged on the page, type “_no_modify” anyw­here on the page. Example : https://​leti​.lt/​l​jg6.

Cutting-out labels

Labels on the list of cate­go­ri­zed links may be igno­red, nota­bly when entries are not found in the text of the page. This will yield the following :

Categorized links without labels

To remove labels, enter “_no_label” anyw­here on the page. This instruc­tion will be picked up by PID mana­ger and saved for further use as an invi­sible phrase : <div id=“_no_label”></div>. The phrase remains visible in the WordPress editor and can be deleted.

No audio player

By default, PID mana­ger inserts an audio player element in the page so that Meks Audio Player is displayed and fed by the sound file produ­ced by SPEAKER — read page Lecture automatique TTS. Pages that are not subject to text-to-speech conver­sion can be marked with the iden­ti­fier « _no_speaker » which is conver­ted to <div id=“_no_speaker”>.

No recode + secure recovering of pages

Some pages/posts may not be eligible for being proces­sed by PID mana­ger. This one for instance. In order to protect them against unwan­ted use of the proce­dure, type “_no_recode” anyw­here on the page. This instruc­tion will be read by PID mana­ger and saved for further use as an invi­sible phrase : <div id=“_no_recode”></div>.

PID mana­ger modi­fies the text content of a page in WordPress without crea­ting a new version in the WordPress data­base. It does not even modify its date of last saving. This means that if the page has been mista­kenly proces­sed it cannot be reco­ve­red in the WordPress envi­ron­ment. Fortunately, the program stores a backup of its origi­nal version as a text file that can be retrie­ved simply by clicking the “Undo” button.

Be care­ful that backups are over­writ­ten each time a page is proces­sed. It is there­fore safe to check that the process has done what was expec­ted. Even safer, keep the page opened in edit mode while applying PID mana­ger. If the result is not satis­fac­tory, clicking the “Update” button in the editor will return to the pre-processing version.

Table of contents

This site constructs tables of contents using the designer-friendly plugin CM Table Of Contents Pro. This page is an example of the process. The plugin builds dyna­mi­cally a table of contents based on the hierar­chy of <h1>, <h2>, <h3> tags. A speci­fic marker [cmtoc_…] needs to be placed in the text at the very loca­tion the table of contents will be displayed.

PID mana­ger looks for the [cmtoc_…] marker and performs two changes :

  1. It inserts a “Sommaire” line above the table of content with anchor id=“toc”. This id is used both for retur­ning to the “Sommaire” loca­tion and format­ting this word via the “toc” iden­ti­fier in CSS.
  2. It inserts an up-arrow () at the begin­ning of every <h…> header linking back to the table of contents.

This table of contents feature is optio­nal : PID mana­ger only does this if the [cmtoc_…] marker has been found in the page.

Syntax of footnote calls

In most cases, foot­note calls appear as single occur­rences in a simple syntax, e.g.:

There is an esca­la­ting debate over the value and vali­dity of memory-based dietary assess­ment methods (Archer E et al., 2018 lien:5ys0).

However, multiple calls may occur, for instance :

There is an esca­la­ting debate over the value and vali­dity of memory-based dietary assess­ment methods (Archer E et al., 2018 lien:5ys0, lien:yhcg ; Young SS, Karr A, 2011 lien:5ep8).

After being proces­sed by PID mana­ger this sentence will be displayed as :

There is an esca­la­ting debate over the value and vali­dity of memory-based dietary assess­ment methods (Archer E et al., 2018N9·N13 ; Young SS, Karr A, 2011N14).

Multiple entries are grou­ped when linked by commas or spaces, such as

Voir Archer, E et al. (2018 lien:54ji, lien:aw3j, lien:cm76 ; 2017 lien:nxg2 ou encore 2015 lien:f4st lien:ohn3 lien:s9ks).

yiel­ding :

Voir Archer, E et al. (2018N13·N14·N15 ; 2017N16 ou encore 2015N17·N18·N19).

Many syntac­tic variants of multiple biblio­gra­phic calls are proces­sed by PID mana­ger, and a few mistakes such as an unwan­ted closing paren­the­sis may be auto­ma­ti­cally fixed. More cases will be inclu­ded in the imple­men­ta­tion whene­ver possible.

Faulty closing quote produced by WP-Typography
Faulty closing quote produced by WP-Typography

PID mana­ger is not meant to fix typo­gra­phy. Still, it does its best to prepare the text for an auto­ma­tic typo­gra­phy plugin such as WP-Typography. For instance, in the current version (May 2020) WP-Typography misses a closing quote in French typo­gra­phy when follo­wed with a super­script (see above image).

Typography fixed by PID manager
Typography fixed by PID manager

PID mana­ger anti­ci­pates the problem and inserts requi­red code for a correct proces­sing (see side image).

PID mana­ger replaces all no-break spaces ‘&nbsp ;’ with stan­dard spaces in the body of the text. Then it recreates the ones follo­wing digits — such as « 10_000 » or « 3_meters ». No-break spaces asso­cia­ted (in French) with some punc­tua­tion signs or inside French « quotes » are recons­truc­ted by WP-Typography.

Many other ‘fixes’ can be imple­men­ted in PID mana­ger while being care­ful not to mess up page contents in future versions of WordPress and widgets handles by the Gutenberg editor.

An attempt to use narrow no-break space&#8239 ;’ in repla­ce­ment of ‘&nbsp ;’ has been aban­do­ned because this charac­ter is not (yet?) reco­gni­zed by Safari. It is also an option of WP-typography.

Security

It is clear that secu­rity proce­dures imple­men­ted in WordPress or added as plugins may not agree with modi­fi­ca­tions of the data­base done by a “foreign” script. In order to work PID mana­ger properly it may be neces­sary to add excep­tions to the protec­tion system, for instance append the IP range of your DSL box to the white list of the protec­tion device.

Current limitations

At present, PID mana­ger modi­fies the follo­wing HTML code :

  1. <i> tags are repla­ced with <em>
  2. <b> tags are repla­ced with <strong>
  3. Some no-break spaces ‘&nbsp ;’ may be repla­ced with stan­dard spaces (read above)
  4. Strings of spaces are repla­ced with a single space
  5. <span> instruc­tions inside headers are deleted
  6. id=“…” markers inside header tags H1, H2 etc. are deleted

The first two opera­tions are stan­dard in the WordPress editor. Multiple spaces are redu­ced to single space by WP-Typography.

To conclude…

Dealing with PIDs was my domain of exper­tise when working in the field of Digital Humanities (lien:wvdl). DH is an area of scho­larly acti­vity inclu­ding the syste­ma­tic use of digi­tal resources in the huma­ni­ties, as well as the analy­sis of their appli­ca­tion. I had taken part in a French pilot project for the imple­men­ta­tion of repo­si­to­ries aiming at the long-term preser­va­tion and sharing of linguis­tic resources. This was later arti­cu­la­ted with the CLARIN and DARIAH European research infra­struc­tures. For these reasons I take care in looking for the most reliable and useful bits of infor­ma­tion and ensu­ring a reliable access to the same.

I invite readers and desi­gners to send sugges­tions for impro­ving PID mana­ger. Implementation on other sites is open to discus­sion. Use my contact page or write a public comment at the bottom of this page…

Recommander

Écrire un commentaire...

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.