#internetarchive 2019-04-10,Wed

↑back Search

Time	Nickname	Message
00:18 ^🔗		bitBaron has quit IRC (Quit: Bye.)
00:19 ^🔗		eythian has joined #internetarchive
00:34 ^🔗		Jopik has quit IRC (Remote host closed the connection)
00:34 ^🔗		Jopik has joined #internetarchive
01:01 ^🔗		jut has quit IRC (Read error: Connection reset by peer)
01:02 ^🔗		jut has joined #internetarchive
01:13 ^🔗	Somebody2	martini (if you read the logs): It's totally fine to run queries like: https://archive.org/search.php?query=mediatype:texts&sort=-publicdate
01:13 ^🔗	Somebody2	That one is a direct link from the navigation menu (under Texts -> This Just In)
03:02 ^🔗		deevious has quit IRC (Ping timeout: 252 seconds)
03:02 ^🔗		Flashfire has quit IRC (Read error: Connection reset by peer)
03:02 ^🔗		deevious has joined #internetarchive
03:02 ^🔗		jut has quit IRC (Ping timeout: 252 seconds)
03:03 ^🔗		Flashfire has joined #internetarchive
03:03 ^🔗		kiska has quit IRC (Ping timeout: 252 seconds)
03:03 ^🔗		kiska has joined #internetarchive
03:04 ^🔗		jut has joined #internetarchive
03:24 ^🔗		odemg has quit IRC (Ping timeout: 615 seconds)
03:30 ^🔗		odemg has joined #internetarchive
03:32 ^🔗		qw3rty119 has joined #internetarchive
03:36 ^🔗		qw3rty118 has quit IRC (Read error: Operation timed out)
05:24 ^🔗		Frogging has quit IRC (Read error: Operation timed out)
05:24 ^🔗		Frogging has joined #internetarchive
05:24 ^🔗		balrog has quit IRC (Read error: Operation timed out)
05:24 ^🔗		ivan has quit IRC (Read error: Operation timed out)
05:24 ^🔗		JAA has quit IRC (Read error: Operation timed out)
05:25 ^🔗		balrog has joined #internetarchive
05:25 ^🔗		ivan has joined #internetarchive
05:25 ^🔗		simon816 has quit IRC (Ping timeout: 246 seconds)
05:26 ^🔗		fredgido has quit IRC (Ping timeout: 600 seconds)
05:26 ^🔗		swebb has quit IRC (Read error: Operation timed out)
05:28 ^🔗		swebb has joined #internetarchive
05:35 ^🔗		simon816 has joined #internetarchive
05:38 ^🔗		JAA has joined #internetarchive
05:39 ^🔗		bakJAA sets mode: +o JAA
05:53 ^🔗		JAA has quit IRC (Read error: Operation timed out)
05:55 ^🔗		simon816 has quit IRC (Read error: Operation timed out)
05:58 ^🔗		simon816 has joined #internetarchive
06:01 ^🔗		JAA has joined #internetarchive
06:45 ^🔗		Jasjar has quit IRC (Ping timeout: 252 seconds)
07:39 ^🔗		Jasjar has joined #internetarchive
07:46 ^🔗		lenary has joined #internetarchive
08:34 ^🔗		jesso has quit IRC (Quit: jesso)
08:43 ^🔗		jesso has joined #internetarchive
08:45 ^🔗		JAA has quit IRC (Reconnecting)
08:45 ^🔗		JAA has joined #internetarchive
08:46 ^🔗		bakJAA sets mode: +o JAA
09:48 ^🔗		deevious has quit IRC (Read error: Connection reset by peer)
09:54 ^🔗		atomotic has joined #internetarchive
10:50 ^🔗		atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
11:59 ^🔗		kiska1 has quit IRC (Ping timeout (120 seconds))
11:59 ^🔗		kiska1 has joined #internetarchive
12:19 ^🔗	VoynichCr	JAA: i think IA crawler saves PDFs, but i am not sure about it scraping its content to find urls in the pdf text
12:19 ^🔗	VoynichCr	i read that bibliography urls are broken often and soon
12:20 ^🔗		VADemon has quit IRC (Quit: left4dead)
12:21 ^🔗	VoynichCr	Imagine how critical this problem is for those who want to cite web pages in dissertations, legal opinions, or scientific research. A recent Harvard study found that 49% of the URLs referenced in U.S. Supreme Court decisions are dead now. Those decisions affect everyone in the U.S., but the evidence the opinions are based on is disappearing.
12:21 ^🔗	VoynichCr	https://blog.archive.org/2013/10/25/fixing-broken-links/
12:24 ^🔗	JAA	Yeah
12:25 ^🔗	JAA	I'm also curious what will happen in the next years as some journals will inevitably fold. The DOIs will still resolve certainly, but what happens to the journal website, peer review documents (if public), etc.?
12:27 ^🔗	JAA	Or what happens when a journal redesigns their website and changes the URL structure without redirecting previous URLs? I've seen many citations that used the journal website URL instead of a DOI (which could be adapted), and many journals also include URLs in the citation downloads instead of just using the DOI.
12:28 ^🔗	VoynichCr	journal/conferences sites die soon unarchived. Papers are saved sometimes by IA crawler, but fortunately most of them are printed and hold in some physical library, so they can be scanned in the future
12:29 ^🔗	VoynichCr	i have published some papers and I myself upload pdfs copies to my website, many reasearchers do the same
12:32 ^🔗	JAA	That depends strongly on the field though. I know several journals which are definitely not printed anymore.
12:33 ^🔗	JAA	So unless a library prints the article themselves, it doesn't exist in printed form.
14:50 ^🔗		sivoais has quit IRC (Read error: Operation timed out)
14:52 ^🔗		sivoais has joined #internetarchive
15:27 ^🔗		bitspill has quit IRC (Quit: Connection closed for inactivity)
18:51 ^🔗		VADemon has joined #internetarchive

irclogger-viewer