🇬🇧 PID manager

Home » <span title="🇬🇧 PID manager"> 🇬🇧 PID manager


🇫🇷 Version fran­çaise

Logo
Source : Twitter / CC BY

This page is for web desi­gners (inclu­ding myself) eager to dis­co­ver (or remem­ber) the design of this web­site with res­pect to Persistent IDentifiers (PID lien:o5wn).

PID is a concept worked out by mana­gers of repo­si­to­ries for the long-term pre­ser­va­tion of digi­tal docu­ments. A well-known example in aca­de­mic circles is the “DOI” (Digital Object Identifier) assi­gned to every publi­ca­tion in a scien­ti­fic journal.

For ins­tance, a paper listed by Frontiers in Neurology under URL
https://​www​.fron​tier​sin​.org/​a​r​t​i​c​l​e​s​/​1​0​.​3​3​8​9​/​f​n​e​u​r​.​2​0​1​8​.​0​0​9​5​2​/​f​ull has been assi­gned
“doi:10.3389/fneur.2018.00952”.

Following link https://​doi​.org/​1​0​.​3​3​8​9​/​f​n​e​u​r​.​2​0​1​8​.​0​0​952 should redi­rect to the paper’s web page… Archive mana­gers are aware that the target URL is bound to vary in the long term because Internet domains and paths lea­ding to docu­ments are not per­sistent. The advan­tage of lin­king via a DOI is that the regis­trant (Frontiers in Neurology) takes care of main­tai­ning its redi­rec­tion to a valid loca­tion. Thus, a DOI is a type of Persistent IDentifier : a long-lasting refe­rence to a docu­ment, file, web page, or other object (lien:o5wn).

Another type of PID regis­tra­tion used for aca­de­mic archives is the Handle system (lien:8bby). Unlike DOI, Handles are free of charge and may be set up for pro­ce­du­ral access to archi­val items in an archive.

A digi­tal object may be assi­gned seve­ral PIDs. Each one points at the object’s loca­tion or at ano­ther PID for­war­ding queries…

Sommaire

A ‘local’ PID service

DOIs are mana­ged by the DOI Foundation (lien:p7it) via a tech­ni­cal infra­struc­ture dedi­ca­ted to publi­ca­tions in aca­de­mic jour­nals. However, it makes sense to create and manage PIDs on any domain which claims to be ‘per­sistent’ and is used by a rela­ted device. This is the case of the leti​.lt domain asso­cia­ted with the LeBonheurEstPossible.org web­site, as both are hosted on the same account.

PID mana­ger is wor­king on the leti​.lt domain lis­ting paths to resources cited on the web­site. The core of this server is a table whose records contain the unique iden­ti­fier and its target URL. Every record also includes a brief des­crip­tion to be dis­played on the title field of HTML links, and optio­nally a com­plete biblio­gra­phi­cal refe­rence named the content. (This voca­bu­lary is mine.)

PIDs on leti​.lt are 4‑character strings, for ins­tance ‘dyig’ poin­ting at the Frontiers in Neurology paper men­tio­ned above. Thanks to the PID resol­ver, link https://​leti​.lt/​d​yig should be redi­rec­ted to the paper. In plain text, this is sum­ma­ri­zed as lien:dyig or link:dyig. (The word « lien » means “link” in French.)

The com­plete list of PIDs is avai­lable on page https://​leti​.lt/​l​ist. Search for ‘dyig’ to find the article cited as an example.

Below are two entries in the data­base, ‘6ran’ and ‘rm0y’, the former without content and the latter with a content (tagged with a red disk 🔴):

Two entries in the PID database

In the left­most column, expres­sion (lien:6ran) can be copied and pasted to any text on a web­site. It is a good idea to include paren­theses so that it will not be trun­ca­ted. Depending on the context, PID mana­ger will decide whe­ther these paren­theses need to be pre­ser­ved. Bibliographic refe­rences can be more expli­cit, for ins­tance (Blekkenhorst L et al., 2015 lien:6ran).

The tar­gets of PIDs on leti​.lt should be DOIs whe­ne­ver these are avai­lable. Sometimes I prefer to store the (rela­ti­vely stable) URL of a publi­ca­tion when it leads to a down­loa­dable ver­sion shared by the authors. It is also not cer­tain that a DOI poin­ting at a pre­print will remain the same after the publi­ca­tion. Adding to the confu­sion, archi­vists some­times bind a DOI to the PDF full text ins­tead of the arti­cle’s des­crip­tion, which is a bad practice.

A plugin Broken Link Checker takes care of signa­ling obso­lete target URLs. On page https://​leti​.lt/​l​ist, admins have access to but­tons ope­ning an editor for target URLs, PID des­crip­tions, or even to delete a PID in a secure manner.

All links of page https://​leti​.lt/​l​ist contain title fields. Therefore, drag­ging the cursor over link ‘6ran’ dis­plays its des­crip­tion : “Dietary satu­ra­ted fat intake and athe­ros­cle­ro­tic vas­cu­lar disease mor­ta­lity in elderly women : a pros­pec­tive cohort study (ver­sion 2019-12-19)”. Unfortunately, this field is dele­ted when pas­ting the link to a page in the cur­rent WordPress editor — unless copy/paste is done in HTML code… PID mana­ger takes care of res­to­ring titles on final links.

The date “(ver­sion 2019-12-19)” appea­ring at the end of a title is that of crea­ting the PID entry.

Creating a PID

At the bottom of page https://​leti​.lt/​l​ist, a link can be cli­cked to log in as an admi­nis­tra­tor of the PID ser­vice. Links will then appear lea­ding to the PID crea­tion page. Suppose that we need a PID for “Jukka K. Korpela : IT and com­mu­ni­ca­tion” whose URL is http://​jkor​pela​.fi/. We enter the two parameters :

Creating a PHD
Entering para­me­ters to create a PID

After cli­cking “Create PID” we see the result :

PID has been created
PID1pbb’ has been created.

Note that the target access pro­to­col is ‘http’, not ‘https’. The secure pro­to­col ‘https’ has become the stan­dard for web­sites though it may not have been imple­men­ted on old ones. PID mana­ger tries to open the URL in both pro­to­cols and com­pares their out­comes in num­bers of bytes. Whenever ‘https’ yields an equal or larger size it is recor­ded in the data­base. On page https://​leti​.lt/​l​ist, URLs which have not yet been acces­sed under ‘https’ are high­ligh­ted in yellow. The admi­nis­tra­tor needs to click an ‘update’ button from time to time until the ‘https’ pro­to­col has been confirmed.

Whatever the pro­to­col, PID1ppb’ has now been crea­ted for Jukka’s web­site. Now, (lien:1pbb) takes us to the target site. In case the ran­domly gene­ra­ted code sounds weird, click ‘Choose a dif­ferent PID’ to get a dif­ferent one.

PID mana­ger warns the admin in case a PID lin­king to the same URL is already exis­ting. An option to change the des­crip­tion is offe­red. To achieve a sui­table mat­ching of iden­ti­cal tar­gets, URLs are stan­dar­di­zed. For ins­tance, copying the URL of Wikipedia page « Corticoïde » makes it appear as :
https://fr.wikipedia.org/wiki/Cortico%C3%AFde

But PID mana­ger will store it as :
https://fr.wikipedia.org/wiki/Corticoïde
which is equi­va­lent and easier to read. This case occurs with every URL contai­ning Unicode cha­rac­ters out­side the English alphabet.

In addi­tion, PID mana­ger may cut out the end of a URL at the point it is no longer signi­fi­cant. This is the case with pages ending up with ‘?utm­source=…’ or ‘?fbclid=…’ — both allo­wing the site owner to trace whoe­ver acces­sed their page !

Running PID manager

After being iden­ti­fied as an admin on the PID server, access is gran­ted to the list of posts/pages with a few actio­nable buttons :

  • Add URL” will add to the list a page/post of the website ;
  • Delete” will sup­press a page/post from the list ;
  • Run PID mana­ger” will launch PID Manager on the page/post.
PID mana­ger : List of posts/pages

No para­me­ter needs to be set for run­ning PID mana­ger as all options have already been writ­ten on the page (read below). Click “Run PID mana­ger” close to the page that needs to be processed :

Running PID mana­ger on the “detoxi­na­tion” page

After run­ning, PID mana­ger dis­plays the pro­ces­sed page on a new window and returns to the list of post/pages. An “Undo” button has been crea­ted, making it pos­sible to cancel the whole operation.

After run­ning PID manager

Note that PID mana­ger works on pages and posts com­pliant with the cur­rent WordPress editor (Gutenberg). Warnings are dis­played on attempts to run it on a Classic design.

Basic operation

Two types of text docu­ments may be publi­shed on WordPress web­sites, namely pages and posts. The dif­fe­rence lies in clas­si­fi­ca­tion sys­tems. Posts use cate­go­ries and tend to be listed in a chro­no­lo­gi­cal order, whe­reas pages are sup­po­sed to be per­ma­nent. These points bear no rele­vance to PID mana­ger as it only takes care of the content of the page or post. In what fol­lows, the word “page” is used to desi­gnate either type.

Effect of run­ning PID mana­ger for the first time

Displaying PIDs in the body of a text makes rea­ding a bit unplea­sant, as shown on the left side of the pic­ture above. This was an incen­tive to imple­ment PID mana­ger which ren­ders shor­ter links in the form of foot­note calls.

Let us take for ins­tance the sentence :

There is an esca­la­ting debate over the value and vali­dity of memory-based die­tary assess­ment methods (Archer E et al., 2018 lien:5ys0).

After being pro­ces­sed by PID mana­ger, it will read :

There is an esca­la­ting debate over the value and vali­dity of memory-based die­tary assess­ment methods (Archer E et al., 2018N6).

This pro­cess takes care of (pre­su­ma­bly) all syn­tac­tic variants, deci­ding to keep or remove paren­theses, and where super­script is appro­priate for a com­pact and com­pre­hen­sive occur­rence of the PID in its context. Mixed levels are some­times requi­red, such as for instance :

Une autre patho­lo­gie qui n’est pas abor­dée dans cet article est l’ané­vrisme (N8 ; vidéoN9).

French typo­gra­phy is not bound by the conven­tion of Modern Language Association (lien:9t7t) sta­ting that super­script num­bers within the text should be placed out­side any punc­tua­tion that might be present. This conven­tion is fol­lo­wed by PID mana­ger as an option for English text. A mul­ti­lin­gual ver­sion is under study.

It is pos­sible to run PID mana­ger seve­ral time on the same page. No change will occur unless its content has been modi­fied. Formally if we call f the func­tion of PID mana­ger, we can write : f ○ f = f

Uncategorized links (notes)

By default, foot­note calls are label­led N1, N2 etc. and listed at the bottom of the page along with their des­crip­tions. See for example the bottom of page Vivre bien et longtemps. Let us convene that ‘N’ means ‘note’. These links are unca­te­go­ri­zed. Different labe­lings such as A1…, B1… etc. are used in spe­ci­fic contexts (read below).

If you don’t want to dis­play the list as foot­notes, type _no_footnotes at the end of the page. This ins­truc­tion will be picked up by PID mana­ger and saved for fur­ther use as an invi­sible phrase : <div id=“_no_footnotes”></div>. The phrase remains visible in the WordPress editor and can be dele­ted. A page without foot­notes is for ins­tance Covid-19 — ressources.

Links for­mat­ted by PID mana­ger are user-friendly : they dis­play their des­crip­tions on “mouse-over” (the title fields of HTML links) and a new window is opened on cli­cking the link :

Clicking link opens its target in a specific window
Clicking link ‘N186’ opened its target in a spe­ci­fic window

Opening new win­dows will work even with brow­sers set up to blo­cking pop-ups.

Categorized links (bibliographic entries)

Web pages should contain expli­cit biblio­gra­phic entries rea­dable on paper prints. This is the case with page Soigner ses artères. Entries have been ini­tially edited as follows :

Bibliographic entries, source

After being pro­ces­sed by PID mana­ger they appear as :

Bibliographic entries, processed

In this pro­cess, foot­note calls ‘A1’ and ‘A2’ have been crea­ted which are dis­tinct from ‘N1’, ‘N2’ etc.

In the body of the text, all calls pre­viously label­led lien:exjf have been repla­ced with A1, gene­rally dis­played in super­script. Moving the mouse over ‘A1’ dis­plays the des­crip­tion (title field) and cli­cking it opens a window contai­ning its target.

The ‘’ sign close to the H2, H3 or H4 header tag of a biblio­gra­phic list (i.e. <h2>✓ or <h3>✓ or <h4>✓) tells PID mana­ger that the list should be cate­go­ri­zed.

Thesign is a Unicode cha­rac­ter, not a glyph. You may ignore thesign, an optio­nal link jum­ping back to the table of contents.

PID mana­ger does not only reshape foot­note calls and entries. It also stores the full entry (the content) into the PID data­base. It reads the entry in the biblio­gra­phy and com­pares its length with both the des­crip­tion and any content already saved in the data­base. The lon­gest content is saved if it is dif­ferent from the des­crip­tion.

Conversely, the content stored in the data­base will be copied to the biblio­gra­phic entry if it is longer than the cur­rent entry. This can be used to construct biblio­gra­phies in a very quick way. For ins­tance, just type :

After run­ning PID mana­ger we get full entries because the contents of ‘exjf’ and ‘3a5m’ had already been stored in the database :

  • A1 · exjf · Alehagen, U et al. (2015). Reduced Cardiovascular Mortality 10 Years after Supplementation with Selenium and Coenzyme Q10 for Four Years : Follow-Up Results of a Prospective Randomized Double-Blind Placebo-Controlled Trial in Elderly Citizens. PLOS (on line).
  • A2 · 3a5m · Allan, NJR (1990). Household Food Supply in Hunza Valley, Pakistan. Geographical Review 80, 4, Oct.: 399–415.

Note that the gar­bage text writ­ten after “lien:3a5m” on this demo should be shor­ter than the des­crip­tion of PID3a5m’, other­wise it would be picked up and stored as a new content.

You can create seve­ral biblio­gra­phi­cal lists on the same page for dif­ferent cate­go­ries of publi­ca­tions. For ins­tance, on page Statines et médicaments anticholestérol there are 3 cate­go­ries of publi­ca­tions assi­gned pre­fixes A, B and C. You may use ‘A’ for ‘articles’ and ‘B’ for ‘books’… Since ‘N’ is reser­ved to unca­te­go­ri­zed PID entries, please contact me if you need to go beyond ‘M’!

Alphabetic order

By default, cate­go­ri­zed biblio­gra­phic entries are kept in the order they have been ente­red. There is an option for alpha­be­tic sor­ting a biblio­gra­phi­cal list. An example of sorted entries is page Pourquoi diminuer le cholestérol ?. Conversely, entries had been sorted chro­no­lo­gi­cally in Faut-il jeter les enquêtes nutritionnelles ? and PID mana­ger did not change their order.

To acti­vate this option, type “_alpha” in the hea­ders of blocks contai­ning cate­go­ri­zed biblio­gra­phic entries that need to be sorted alpha­be­ti­cally, for instance :

<h2>✓ Ouvrages _alpha

This ins­truc­tion will be read by PID mana­ger and saved for fur­ther use as an invi­sible phrase such as <div id=“_alphabetic_order_1”></div>. This phrase remains visible as a HTML block in the Gutenberg editor and can be deleted.

Shared bibliography

Bibliographic entries crea­ted on a page are auto­ma­ti­cally repro­du­ced as unca­te­go­ri­zed links (notes) in other pages men­tio­ning the same PIDs.

Look for ins­tance at page Cancer - conclusion et références which contains all refe­rences cited on seve­ral rela­ted pages. This page contains the fol­lo­wing reference :

  • A8 · sfm0 · Blasco, MT et al. (2019). Complete Regression of Advanced Pancreatic Ductal Adenocarcinomas upon Combined Inhibition of EGFR and C‑RAF. Cancer Cell, 35, 4 : 573–587. doi:10.1016/j.ccell.2019.03.002.

A call to PID sfm0 is found on page Cancer - nouvelles pistes and label­led ‘N106’. PID mana­ger repro­du­ced this entry exactly at the bottom of the page :

  • N106 · sfm0 · Blasco, MT et al. (2019). Complete Regression of Advanced Pancreatic Ductal Adenocarcinomas upon Combined Inhibition of EGFR and C‑RAF. Cancer Cell, 35, 4 : 573–587. doi:10.1016/j.ccell.2019.03.002.

These biblio­gra­phic entries may be copied by hand on the same page to construct lists of cate­go­ri­zed links for articles (A), books (B) etc. These will be optio­nally sorted alpha­be­ti­cally. Once an entry is listed as a cate­go­ri­zed link, PID mana­ger no longer includes it as unca­te­go­ri­zed (N) — dupli­ca­ting foot­notes would be ugly…

The contents of these entries are syn­chro­ni­zed each time PID mana­ger is run on the pages on which they are displayed.

Cleaning-up references

By default, PID mana­ger cleans up refe­rences. Every cate­go­ri­zed link (biblio­gra­phic entry) is dis­played as stri­ke­through text if it does not appear in the text of the page. For instance :

Striked biblio­gra­phic entries

To deac­ti­vate this clean-up, enter “_no_strike” anyw­here on the page. This ins­truc­tion will be picked up by PID mana­ger and saved for fur­ther use as an invi­sible phrase : <div id=“_no_strike”></div>. The phrase remains visible in the WordPress editor and can be deleted.

Cuttting-out labels

Labels on the list of cate­go­ri­zed links may be igno­red, nota­bly when entries are not found in the text of the page. This will yield the following :

Categorized links without labels

To remove labels, enter “_no_label” anyw­here on the page. This ins­truc­tion will be picked up by PID mana­ger and saved for fur­ther use as an invi­sible phrase : <div id=“_no_label”></div>. The phrase remains visible in the WordPress editor and can be deleted.

No recode + secure recovering of pages

Some pages/posts may not be eli­gible for being pro­ces­sed by PID mana­ger. This one for ins­tance. In order to pro­tect them against unwan­ted use of the pro­ce­dure, type “_no_recode” anyw­here on the page. This ins­truc­tion will be read by PID mana­ger and saved for fur­ther use as an invi­sible phrase : <div id=“_no_recode”></div>.

PID mana­ger modi­fies the text content of a page in WordPress without crea­ting a new ver­sion in the WordPress data­base. It does not even modify its date of last saving. This means that if the page has been mis­ta­kenly pro­ces­sed it cannot be reco­ve­red in the WordPress envi­ron­ment. Fortunately, the pro­gram stores a backup of its ori­gi­nal ver­sion as a text file that can be retrie­ved simply by cli­cking the “Undo” button.

Be care­ful that backups are over­writ­ten each time a page is pro­ces­sed. It is the­re­fore safe to check that the pro­cess has done what was expec­ted. Even safer, keep the page opened in edit mode while applying PID mana­ger. If the result is not satis­fac­tory, cli­cking the “Update” button in the editor will return to the pre-processing version.

Table of contents

This site constructs tables of contents using the designer-friendly plugin CM Table Of Contents Pro. This page is an example of the pro­cess. The plugin builds dyna­mi­cally a table of contents based on the hie­rar­chy of <h1>, <h2>, <h3> tags. A spe­ci­fic marker [cmtoc_…] needs to be placed in the text at the very loca­tion the table of contents will be displayed.

PID mana­ger looks for the [cmtoc_…] marker and per­forms two changes :

  1. It inserts a “Sommaire” line above the table of content with anchor id=“toc”. This id is used both for retur­ning to the “Sommaire” loca­tion and for­mat­ting this word via the “toc” iden­ti­fier in CSS.
  2. It inserts an up-arrow () at the begin­ning of every <h…> header lin­king back to the table of contents.

This table of contents fea­ture is optio­nal : PID mana­ger only does this if the [cmtoc_…] marker has been found in the page.

Syntax of footnote calls

In most cases, foot­note calls appear as single occur­rences in a simple syntax, e.g.:

There is an esca­la­ting debate over the value and vali­dity of memory-based die­tary assess­ment methods (Archer E et al., 2018 lien:5ys0).

However, mul­tiple calls may occur, for instance :

There is an esca­la­ting debate over the value and vali­dity of memory-based die­tary assess­ment methods (Archer E et al., 2018 lien:5ys0, lien:yhcg ; Young SS, Karr A, 2011 lien:5ep8).

After being pro­ces­sed by PID mana­ger this sen­tence will be dis­played as :

There is an esca­la­ting debate over the value and vali­dity of memory-based die­tary assess­ment methods (Archer E et al., 2018N9·N13 ; Young SS, Karr A, 2011N14).

Multiple entries are grou­ped when linked by commas or spaces, such as

Voir Archer, E et al. (2018 lien:54ji, lien:aw3j, lien:cm76 ; 2017 lien:nxg2 ou encore 2015 lien:f4st lien:ohn3 lien:s9ks).

yiel­ding :

Voir Archer, E et al. (2018N13·N14·N15 ; 2017N16 ou encore 2015N17·N18·N19).

Many syn­tac­tic variants of mul­tiple biblio­gra­phic calls are pro­ces­sed by PID mana­ger, and a few mis­takes such as an unwan­ted clo­sing paren­the­sis may be auto­ma­ti­cally fixed. More cases will be inclu­ded in the imple­men­ta­tion whe­ne­ver possible.

Faulty closing quote produced by WP-Typography
Faulty clo­sing quote pro­du­ced by WP-Typography

PID mana­ger is not meant to fix typo­gra­phy. Still, it does its best to pre­pare the text for an auto­ma­tic typo­gra­phy plugin such as WP-Typography. For ins­tance, in the cur­rent ver­sion (May 2020) WP-Typography misses a clo­sing quote in French typo­gra­phy when fol­lo­wed with a super­script (see above image).

Typography fixed by PID manager
Typography fixed by PID manager

PID mana­ger anti­ci­pates the pro­blem and inserts requi­red code for a cor­rect pro­ces­sing (see side image).

PID mana­ger replaces all no-break spaces ‘&nbsp ;’ with stan­dard spaces in the body of the text. Then it recreates the ones fol­lo­wing digits — such as « 10_000 » or « 3_meters ». No-break spaces asso­cia­ted (in French) with some punc­tua­tion signs or inside French « quotes » are recons­truc­ted by WP-Typography.

Many other ‘fixes’ can be imple­men­ted in PID mana­ger while being care­ful not to mess up page contents in future ver­sions of WordPress and wid­gets handles by the Gutenberg editor.

An attempt to use narrow no-break space&#8239 ;’ in repla­ce­ment of ‘&nbsp ;’ has been aban­do­ned because this cha­rac­ter is not (yet?) reco­gni­zed by Safari. It is also an option of WP-typography.

Security

It is clear that secu­rity pro­ce­dures imple­men­ted in WordPress or added as plu­gins may not agree with modi­fi­ca­tions of the data­base done by a “foreign” script. In order to work PID mana­ger pro­perly it may be neces­sary to add excep­tions to the pro­tec­tion system, for ins­tance append the IP range of your DSL box to the white list of the pro­tec­tion device.

Current limitations

At present, PID mana­ger modi­fies the fol­lo­wing HTML code :

  1. <i> tags are repla­ced with <em>
  2. <b> tags are repla­ced with <strong>
  3. Some no-break spaces ‘&nbsp ;’ may be repla­ced with stan­dard spaces (read above)
  4. Strings of spaces are repla­ced with a single space
  5. <span> ins­truc­tions inside hea­ders are deleted
  6. id=“…” mar­kers inside header tags H1, H2 etc. are deleted

The first two ope­ra­tions are stan­dard in the WordPress editor. Multiple spaces are redu­ced to single space by WP-Typography.

To conclude…

Dealing with PIDs was my domain of exper­tise when wor­king in the field of Digital Humanities (lien:wvdl). DH is an area of scho­larly acti­vity inclu­ding the sys­te­ma­tic use of digi­tal resources in the huma­ni­ties, as well as the ana­ly­sis of their appli­ca­tion. I had taken part in a French pilot pro­ject for the imple­men­ta­tion of repo­si­to­ries aiming at the long-term pre­ser­va­tion and sha­ring of lin­guis­tic resources. This was later arti­cu­la­ted with the CLARIN and DARIAH European research infra­struc­tures. For these rea­sons I take care in loo­king for the most reliable and useful bits of infor­ma­tion and ensu­ring a reliable access to the same.

I invite rea­ders and desi­gners to send sug­ges­tions for impro­ving PID mana­ger. Implementation on other sites is open to dis­cus­sion. Use my contact page or write a public com­ment at the bottom of this page…

Article créé le 27/04/2020 - modifié le 11/06/2020 à 14h08

Recommander

Écrire un commentaire...

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.