[00:18] *** bitBaron has quit IRC (Quit: Bye.) [00:19] *** eythian has joined #internetarchive [00:34] *** Jopik has quit IRC (Remote host closed the connection) [00:34] *** Jopik has joined #internetarchive [01:01] *** jut has quit IRC (Read error: Connection reset by peer) [01:02] *** jut has joined #internetarchive [01:13] martini (if you read the logs): It's totally fine to run queries like: https://archive.org/search.php?query=mediatype:texts&sort=-publicdate [01:13] That one is a direct link from the navigation menu (under Texts -> This Just In) [03:02] *** deevious has quit IRC (Ping timeout: 252 seconds) [03:02] *** Flashfire has quit IRC (Read error: Connection reset by peer) [03:02] *** deevious has joined #internetarchive [03:02] *** jut has quit IRC (Ping timeout: 252 seconds) [03:03] *** Flashfire has joined #internetarchive [03:03] *** kiska has quit IRC (Ping timeout: 252 seconds) [03:03] *** kiska has joined #internetarchive [03:04] *** jut has joined #internetarchive [03:24] *** odemg has quit IRC (Ping timeout: 615 seconds) [03:30] *** odemg has joined #internetarchive [03:32] *** qw3rty119 has joined #internetarchive [03:36] *** qw3rty118 has quit IRC (Read error: Operation timed out) [05:24] *** Frogging has quit IRC (Read error: Operation timed out) [05:24] *** Frogging has joined #internetarchive [05:24] *** balrog has quit IRC (Read error: Operation timed out) [05:24] *** ivan has quit IRC (Read error: Operation timed out) [05:24] *** JAA has quit IRC (Read error: Operation timed out) [05:25] *** balrog has joined #internetarchive [05:25] *** ivan has joined #internetarchive [05:25] *** simon816 has quit IRC (Ping timeout: 246 seconds) [05:26] *** fredgido has quit IRC (Ping timeout: 600 seconds) [05:26] *** swebb has quit IRC (Read error: Operation timed out) [05:28] *** swebb has joined #internetarchive [05:35] *** simon816 has joined #internetarchive [05:38] *** JAA has joined #internetarchive [05:39] *** bakJAA sets mode: +o JAA [05:53] *** JAA has quit IRC (Read error: Operation timed out) [05:55] *** simon816 has quit IRC (Read error: Operation timed out) [05:58] *** simon816 has joined #internetarchive [06:01] *** JAA has joined #internetarchive [06:45] *** Jasjar has quit IRC (Ping timeout: 252 seconds) [07:39] *** Jasjar has joined #internetarchive [07:46] *** lenary has joined #internetarchive [08:34] *** jesso has quit IRC (Quit: jesso) [08:43] *** jesso has joined #internetarchive [08:45] *** JAA has quit IRC (Reconnecting) [08:45] *** JAA has joined #internetarchive [08:46] *** bakJAA sets mode: +o JAA [09:48] *** deevious has quit IRC (Read error: Connection reset by peer) [09:54] *** atomotic has joined #internetarchive [10:50] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:59] *** kiska1 has quit IRC (Ping timeout (120 seconds)) [11:59] *** kiska1 has joined #internetarchive [12:19] JAA: i think IA crawler saves PDFs, but i am not sure about it scraping its content to find urls in the pdf text [12:19] i read that bibliography urls are broken often and soon [12:20] *** VADemon has quit IRC (Quit: left4dead) [12:21] Imagine how critical this problem is for those who want to cite web pages in dissertations, legal opinions, or scientific research. A recent Harvard study found that 49% of the URLs referenced in U.S. Supreme Court decisions are dead now. Those decisions affect everyone in the U.S., but the evidence the opinions are based on is disappearing. [12:21] https://blog.archive.org/2013/10/25/fixing-broken-links/ [12:24] Yeah [12:25] I'm also curious what will happen in the next years as some journals will inevitably fold. The DOIs will still resolve certainly, but what happens to the journal website, peer review documents (if public), etc.? [12:27] Or what happens when a journal redesigns their website and changes the URL structure without redirecting previous URLs? I've seen many citations that used the journal website URL instead of a DOI (which could be adapted), and many journals also include URLs in the citation downloads instead of just using the DOI. [12:28] journal/conferences sites die soon unarchived. Papers are saved sometimes by IA crawler, but fortunately most of them are printed and hold in some physical library, so they can be scanned in the future [12:29] i have published some papers and I myself upload pdfs copies to my website, many reasearchers do the same [12:32] That depends strongly on the field though. I know several journals which are definitely not printed anymore. [12:33] So unless a library prints the article themselves, it doesn't exist in printed form. [14:50] *** sivoais has quit IRC (Read error: Operation timed out) [14:52] *** sivoais has joined #internetarchive [15:27] *** bitspill has quit IRC (Quit: Connection closed for inactivity) [18:51] *** VADemon has joined #internetarchive