[00:21] godane: heh, that open directory with DJ sets and such got a little blasted with bandwidth usage [00:21] and I think I'm seeing myself on their bandwidth graph... [00:35] *** ZexaronS has joined #archiveteam-bs [00:50] *** BlueMaxim has joined #archiveteam-bs [01:32] *** schbirid has joined #archiveteam-bs [01:34] *** username1 has quit IRC (Read error: Operation timed out) [01:47] *** namibj_ has joined #archiveteam-bs [01:49] Is there any effort made regarding tumblr archival? I can get a list of about the 10% most popular tumblr blogs/nicks [01:51] *** plue has joined #archiveteam-bs [01:58] *** Ravenloft has joined #archiveteam-bs [02:05] *** ZexaronS has quit IRC (Leaving) [02:34] namibj_: We've been grabbing random tumblr blogs as we come across them -- your list would be *VERY* welcome. [02:34] Please post it on the wiki, or a pastebin, or somewhere. [02:37] !d 3l37rv80rmmc92293j057ce7n 50 160 [02:37] oops [02:42] Somebody2: direct the asking to plue , as he grabbed it (and got a nice letter telling him to stop hitting their private api so hard) [02:42] he'll be bavk in about 10h, i guess. [02:43] the letter: https://pastebin.com/xHnCygvq [02:51] namibj_: cool, will do [02:51] plue: When you get back, please post your list somewhere! [02:52] *** Mayonaise has quit IRC (Read error: Operation timed out) [03:02] *** Mayonaise has joined #archiveteam-bs [03:03] *** Ravenloft has quit IRC (Ping timeout: 506 seconds) [03:48] *** Ravenloft has joined #archiveteam-bs [03:51] *** qw3rty114 has joined #archiveteam-bs [03:54] i'm at 5612 items this month [03:57] *** qw3rty113 has quit IRC (Ping timeout: 600 seconds) [04:00] *** Ravenloft has quit IRC (Ping timeout: 260 seconds) [04:11] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:17] *** Sk1d has joined #archiveteam-bs [04:25] *** Aranje has joined #archiveteam-bs [04:44] *** Odd0002 has quit IRC (ZNC - http://znc.in) [04:46] *** Odd0002 has joined #archiveteam-bs [04:53] *** Stilett0 has joined #archiveteam-bs [05:34] *** Aranje has quit IRC (Three sheets to the wind) [05:48] *** zyphlar has quit IRC (Quit: Connection closed for inactivity) [06:32] SketchCow: Q: does the Wayback Machine automatically strip ?utm_* junk when parsing URLs from WARCs and user search input? [06:37] NO idea [06:45] *** mls has joined #archiveteam-bs [06:57] *** Honno has joined #archiveteam-bs [07:31] *** BlueMaxim has quit IRC (Ping timeout: 268 seconds) [07:32] *** BlueMaxim has joined #archiveteam-bs [08:08] *** Mateon1 has quit IRC (Remote host closed the connection) [08:12] *** Mateon1 has joined #archiveteam-bs [08:31] *** Mateon1 has quit IRC (Ping timeout: 245 seconds) [08:31] *** Mateon1 has joined #archiveteam-bs [08:52] *** drumstick has quit IRC (Ping timeout: 268 seconds) [09:04] *** drumstick has joined #archiveteam-bs [10:11] *** GLaDOS has quit IRC (Remote host closed the connection) [10:20] *** Mayonaise has quit IRC (Read error: Operation timed out) [10:58] *** BlueMaxim has quit IRC (Quit: Leaving) [11:19] *** drumstick has quit IRC (Ping timeout: 268 seconds) [11:32] *** schbirid has quit IRC (Quit: Leaving) [12:28] *** ZexaronS has joined #archiveteam-bs [13:36] https://wikipedia.org/wiki/User_talk:84.251.84.243?diff=prev&oldid=770831446 ---- https://wikipedia.org/wiki/Wayback_Machine?diff=prev&oldid=765542495 ---- https://wikipedia.org/wiki/Help:Using_the_Wayback_Machine?diff=prev&oldid=765542701 [13:45] *** schbirid has joined #archiveteam-bs [13:51] *** SimpBrain has quit IRC (west.us.hub irc.Prison.NET) [13:51] *** brayden has quit IRC (west.us.hub irc.Prison.NET) [13:51] *** zerkalo has quit IRC (west.us.hub irc.Prison.NET) [13:51] *** greenie has quit IRC (west.us.hub irc.Prison.NET) [13:57] *** kristian_ has joined #archiveteam-bs [13:57] *** SimpBrain has joined #archiveteam-bs [13:57] *** brayden has joined #archiveteam-bs [13:57] *** zerkalo has joined #archiveteam-bs [13:57] *** greenie has joined #archiveteam-bs [13:57] *** irc.Prison.NET sets mode: +o brayden [13:57] *** swebb sets mode: +o brayden [14:18] *** schbirid has quit IRC (Quit: Leaving) [14:20] *** RichardG_ has joined #archiveteam-bs [14:21] *** RichardG has quit IRC (Read error: Connection reset by peer) [14:26] *** drumstick has joined #archiveteam-bs [14:44] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [14:44] *** zhongfu has joined #archiveteam-bs [14:45] *** schbirid has joined #archiveteam-bs [14:51] *** Mayonaise has joined #archiveteam-bs [15:03] *** kristian_ has quit IRC (Quit: Leaving) [15:11] *** drumstick has quit IRC (Ping timeout: 268 seconds) [15:46] *** RichardG has joined #archiveteam-bs [15:52] *** RichardG_ has quit IRC (Read error: Operation timed out) [16:01] *** Ravenloft has joined #archiveteam-bs [16:49] In "nobody was probably tracking this than me" news, I intend to get the "screenshot the archivebot pages" thing running again soon. [16:51] SketchCow: good! [16:51] (I wasn't tracking it, but it's a neat thing, glad to see more of it happening) [16:53] *** Ravenloft has quit IRC (Ping timeout: 633 seconds) [17:26] "PDF viewer support will end September 30, 2017. Beginning October 1, 2017 all viewing will be through the JPG viewer." - newspaperarchive.com [17:26] Fate of the PDF files is unknown. We might only have until that date to grab them all. [17:35] Where are you seeing this announcement? [17:36] *** Atros has quit IRC (Read error: Operation timed out) [17:51] https://access.newspaperarchive.com/us/california/ontario/ontario-daily-report/1970/04-01/page-42 [17:52] https://usercontent.irccloud-cdn.com/file/U5RWfgeQ/Capture5.JPG [18:18] *** antomatic has quit IRC (Read error: Connection reset by peer) [18:18] *** antomatic has joined #archiveteam-bs [18:18] *** swebb sets mode: +o antomatic [19:09] ping arkiver jrwr: newsgrabber is fetching http://st03.dlf.de/dlf/03/128/mp3/stream.mp3 which ends up stalling the pipeline (until the connection interrupts or disk fills up) [19:09] I feel that isn't intentional behaviour so I'm bringing it up to your attention [19:10] I believe it shouldn't keep going indefinitely, but if the disk fills up before it aborts, then that's clearly a problem. [19:10] oh [19:10] its a radio stream [19:10] lol [19:11] Yeah, according to the source code, it should stop after two days. I wonder why that value was chosen... [19:11] I usually take 6 hours as the cutoff. [19:12] Item newsbuddy:warrior_15153_1503646483 if you want to check it out [19:12] Relevant line: https://github.com/ArchiveTeam/NewsGrabber-Warrior/blob/master/pipeline.py#L257 [19:17] wpull saves the response to a tmpfile and simultaneously saves to the warc, so the disk space needed during the grab is doubled [19:37] For the Newspaper Archive site, the PDF files can be accessed with these URLs. http://pdftojp2.newspaperarchive.com:8080/IIPViewerWeb/webresources/pdfdownload/100744470 [19:37] The number at the end can be changed to get a different file. [19:39] I guess our options are to throw it at Archivebot and see how it goes, or use the Warrior for it. [19:40] *** Kisikilli has joined #archiveteam-bs [19:43] One thing I've noticed is that since Wikipedia started using an incredibly complicated module for archive links instead of a template, the spread of archived pages on wikis has really slowed down to a halt - it might be worth encouraging the use of multiple-link web archives to stuff on wikis, I tried to make a template on https://wiki.crygaia.org/view/Template:Webarchive?action=edit to copy http://wikipedia.org/wiki/Template:Webarchive ( http:/ [19:49] *** Kisikilli has quit IRC (Quit: http://www.okay.uz/) [19:49] *** Kisiki has joined #archiveteam-bs [19:49] If anyone replied I DC'd sorry [19:50] Kisiki: I think your message got cut off. This is the last part we got: "to copy http://wikipedia.org/wiki/Template:Webarchive ( http:/" [19:54] What was the first part [19:54] http://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2017-08-27,Sun&sel=109#l105 [19:54] *** Stilett0 is now known as Stiletto [19:55] Kisiki: No one replied before you re-joined [19:56] thanks [19:57] One thing I've noticed is that since Wikipedia started using an incredibly complicated module for archive links instead of a template, the spread of archived pages on wikis has really slowed down to a halt - it might be worth encouraging the use of multiple-link web archives to stuff on wikis, I tried to make a template on https://wiki.crygaia.org/view/Template:Webarchive?action=edit [19:57] to copy http://wikipedia.org/wiki/Template:Webarchive ( http://wikipedia.org/wiki/Module:Webarchive ) but I'm not sure how to get it to work because I don't know the code very well [20:08] I know crygaia is a really tiny wiki but worldwide there doesn't seem to be any kind of template for encouraging people to save links that die later other than on Wikipedia itself, because it's so complex fan wikis and wikia etc just don't bother [20:15] so a lot of smaller wikis are graveyards of dead links just because there was nothing out there to make it easier for people/remind people to save the pages [21:01] Well this is certainly new. [21:02] from WatsonReborn sent 2 minutes ago [21:02] Hey I noticed you posted in this thread ( https://www.reddit.com/r/IMDbFilmGeneral/comments/62sn11/imdb_message_boards_back_up/ ) regarding IMDb shutting down their message boards earlier this year. Wanted to let you know that there's a community that archived all of IMDb's message boards over at MovieChat.org ( https://moviechat.org/ ). [21:02] Took archive, put ads on it, rehosted. [21:02] (Note that the first link appears to be wrong, I think they meant https://www.reddit.com/r/scifi/comments/5s0iiv/imdb_message_boards_being_removed_long_running/ [21:03] ) [21:03] Yeah, I saw that just a few weeks after the official boards went down. [21:03] Got the PM about five minutes ago on reddit. [21:03] Tickled me. [21:36] Just feels like when it comes to ideas like this that are small but could change a lot of websites/wikis and help so many people just no one really cares if it's not a big visible project :( [21:42] *** SimpBrain has quit IRC (Ping timeout: 255 seconds) [21:45] rocode: wait. they grabbed our archive of imdb boards, then just put ads on it? :/ [21:46] I'm gonna make sure the ads on that site get put in adblock filter lists. [21:56] *** drumstick has joined #archiveteam-bs [21:59] *** SimpBrain has joined #archiveteam-bs [22:04] *** tuluu has quit IRC (hub.efnet.us hub.dk) [22:04] *** kimmer has quit IRC (hub.efnet.us hub.dk) [22:04] *** ndiddy has quit IRC (hub.efnet.us hub.dk) [22:04] *** dboard2 has quit IRC (hub.efnet.us hub.dk) [22:04] *** decay has quit IRC (hub.efnet.us hub.dk) [22:04] *** espes__ has quit IRC (hub.efnet.us hub.dk) [22:04] *** Darkstar has quit IRC (hub.efnet.us hub.dk) [22:04] *** pikhq_ has quit IRC (hub.efnet.us hub.dk) [22:04] *** Lord_Nigh has quit IRC (hub.efnet.us hub.dk) [22:04] *** Igloo has quit IRC (hub.efnet.us hub.dk) [22:04] *** Sue has quit IRC (hub.efnet.us hub.dk) [22:04] *** Fletcher_ has quit IRC (hub.efnet.us hub.dk) [22:04] *** underscor has quit IRC (hub.efnet.us hub.dk) [22:04] *** dxrt- has quit IRC (hub.efnet.us hub.dk) [22:04] *** phuzion has quit IRC (hub.efnet.us hub.dk) [22:04] *** Baljem_ has quit IRC (hub.efnet.us hub.dk) [22:04] *** closure_ has quit IRC (hub.efnet.us hub.dk) [22:04] *** Yurume has quit IRC (hub.efnet.us hub.dk) [22:04] *** joepie91 has quit IRC (hub.efnet.us hub.dk) [22:04] *** klg has quit IRC (hub.efnet.us hub.dk) [22:04] *** LordNigh2 has joined #archiveteam-bs [22:07] *** espes___ has joined #archiveteam-bs [22:07] *** joepie91_ has joined #archiveteam-bs [22:07] *** Yurume_ has joined #archiveteam-bs [22:07] *** Igloo_ has joined #archiveteam-bs [22:08] *** pikhq has joined #archiveteam-bs [22:08] *** decay_ has joined #archiveteam-bs [22:08] *** Dark_Star has joined #archiveteam-bs [22:09] *** phuzion_ has joined #archiveteam-bs [22:10] *** tuluu_ has joined #archiveteam-bs [22:11] *** db420 has joined #archiveteam-bs [22:11] *** db420 has quit IRC (Read error: Connection reset by peer) [22:12] *** ndiddy-pi has joined #archiveteam-bs [22:12] *** schbirid has quit IRC (Quit: Leaving) [22:13] *** Fletcher- has joined #archiveteam-bs [22:13] *** Baljem has joined #archiveteam-bs [22:16] *** closure has joined #archiveteam-bs [22:16] *** midas sets mode: +o closure [22:16] *** db420 has joined #archiveteam-bs [22:17] *** db420 is now known as dboard [22:19] *** underscor has joined #archiveteam-bs [22:19] *** swebb sets mode: +o underscor [22:19] *** decay_ is now known as decay [22:19] *** LordNigh2 is now known as Lord_Nigh [22:19] *** dboard is now known as dboard2 [22:29] Somebody2: https://a.pomf.cat/prmxhz.xz [22:29] there you go [22:29] please note... this data is over 1.5 years old already [23:04] *** Honno has quit IRC (Read error: Operation timed out) [23:09] plue: thanks, will look [23:09] 1.5 years old is better than what we had [23:12] plue: looks like it is 15,047,214 entries; 208MB uncompressed. [23:15] It'd be nice if someone (not me) wanted to dump that onto IA. [23:20] *** atrocity has joined #archiveteam-bs [23:25] Somebody2: just usernames really worth it? :/ [23:26] i'd really be up to help on some organized tumblr archiving event... but it's a shitload of data [23:31] *** drumstick has quit IRC (Ping timeout: 268 seconds) [23:34] if we ever have to salve tumblr [23:34] sweet jesus [23:42] *** michciope has quit IRC (Quit: ZNC 1.6.3+deb1 - http://znc.in) [23:44] *** Kisiki has quit IRC (Quit: http://www.okay.uz/) [23:44] *** Kisikilli has joined #archiveteam-bs [23:45] I am going AFK and might idle out/dc but if anyone replies to my message (please please) I will be checking back on the channel log [23:45] *** michciope has joined #archiveteam-bs [23:59] *** Sue has joined #archiveteam-bs