#archiveteam-bs 2017-08-27,Sun

↑back Search

Time Nickname Message
00:21 🔗 joepie91 godane: heh, that open directory with DJ sets and such got a little blasted with bandwidth usage
00:21 🔗 joepie91 and I think I'm seeing myself on their bandwidth graph...
00:35 🔗 ZexaronS has joined #archiveteam-bs
00:50 🔗 BlueMaxim has joined #archiveteam-bs
01:32 🔗 schbirid has joined #archiveteam-bs
01:34 🔗 username1 has quit IRC (Read error: Operation timed out)
01:47 🔗 namibj_ has joined #archiveteam-bs
01:49 🔗 namibj_ Is there any effort made regarding tumblr archival? I can get a list of about the 10% most popular tumblr blogs/nicks
01:51 🔗 plue has joined #archiveteam-bs
01:58 🔗 Ravenloft has joined #archiveteam-bs
02:05 🔗 ZexaronS has quit IRC (Leaving)
02:34 🔗 Somebody2 namibj_: We've been grabbing random tumblr blogs as we come across them -- your list would be *VERY* welcome.
02:34 🔗 Somebody2 Please post it on the wiki, or a pastebin, or somewhere.
02:37 🔗 hook54321 !d 3l37rv80rmmc92293j057ce7n 50 160
02:37 🔗 hook54321 oops
02:42 🔗 namibj_ Somebody2: direct the asking to plue , as he grabbed it (and got a nice letter telling him to stop hitting their private api so hard)
02:42 🔗 namibj_ he'll be bavk in about 10h, i guess.
02:43 🔗 namibj_ the letter: https://pastebin.com/xHnCygvq
02:51 🔗 Somebody2 namibj_: cool, will do
02:51 🔗 Somebody2 plue: When you get back, please post your list somewhere!
02:52 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
03:02 🔗 Mayonaise has joined #archiveteam-bs
03:03 🔗 Ravenloft has quit IRC (Ping timeout: 506 seconds)
03:48 🔗 Ravenloft has joined #archiveteam-bs
03:51 🔗 qw3rty114 has joined #archiveteam-bs
03:54 🔗 godane i'm at 5612 items this month
03:57 🔗 qw3rty113 has quit IRC (Ping timeout: 600 seconds)
04:00 🔗 Ravenloft has quit IRC (Ping timeout: 260 seconds)
04:11 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:17 🔗 Sk1d has joined #archiveteam-bs
04:25 🔗 Aranje has joined #archiveteam-bs
04:44 🔗 Odd0002 has quit IRC (ZNC - http://znc.in)
04:46 🔗 Odd0002 has joined #archiveteam-bs
04:53 🔗 Stilett0 has joined #archiveteam-bs
05:34 🔗 Aranje has quit IRC (Three sheets to the wind)
05:48 🔗 zyphlar has quit IRC (Quit: Connection closed for inactivity)
06:32 🔗 joepie91 SketchCow: Q: does the Wayback Machine automatically strip ?utm_* junk when parsing URLs from WARCs and user search input?
06:37 🔗 SketchCow NO idea
06:45 🔗 mls has joined #archiveteam-bs
06:57 🔗 Honno has joined #archiveteam-bs
07:31 🔗 BlueMaxim has quit IRC (Ping timeout: 268 seconds)
07:32 🔗 BlueMaxim has joined #archiveteam-bs
08:08 🔗 Mateon1 has quit IRC (Remote host closed the connection)
08:12 🔗 Mateon1 has joined #archiveteam-bs
08:31 🔗 Mateon1 has quit IRC (Ping timeout: 245 seconds)
08:31 🔗 Mateon1 has joined #archiveteam-bs
08:52 🔗 drumstick has quit IRC (Ping timeout: 268 seconds)
09:04 🔗 drumstick has joined #archiveteam-bs
10:11 🔗 GLaDOS has quit IRC (Remote host closed the connection)
10:20 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
10:58 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:19 🔗 drumstick has quit IRC (Ping timeout: 268 seconds)
11:32 🔗 schbirid has quit IRC (Quit: Leaving)
12:28 🔗 ZexaronS has joined #archiveteam-bs
13:36 🔗 kittymeow https://wikipedia.org/wiki/User_talk:84.251.84.243?diff=prev&oldid=770831446 ---- https://wikipedia.org/wiki/Wayback_Machine?diff=prev&oldid=765542495 ---- https://wikipedia.org/wiki/Help:Using_the_Wayback_Machine?diff=prev&oldid=765542701
13:45 🔗 schbirid has joined #archiveteam-bs
13:51 🔗 SimpBrain has quit IRC (west.us.hub irc.Prison.NET)
13:51 🔗 brayden has quit IRC (west.us.hub irc.Prison.NET)
13:51 🔗 zerkalo has quit IRC (west.us.hub irc.Prison.NET)
13:51 🔗 greenie has quit IRC (west.us.hub irc.Prison.NET)
13:57 🔗 kristian_ has joined #archiveteam-bs
13:57 🔗 SimpBrain has joined #archiveteam-bs
13:57 🔗 brayden has joined #archiveteam-bs
13:57 🔗 zerkalo has joined #archiveteam-bs
13:57 🔗 greenie has joined #archiveteam-bs
13:57 🔗 irc.Prison.NET sets mode: +o brayden
13:57 🔗 swebb sets mode: +o brayden
14:18 🔗 schbirid has quit IRC (Quit: Leaving)
14:20 🔗 RichardG_ has joined #archiveteam-bs
14:21 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
14:26 🔗 drumstick has joined #archiveteam-bs
14:44 🔗 zhongfu has quit IRC (Ping timeout: 260 seconds)
14:44 🔗 zhongfu has joined #archiveteam-bs
14:45 🔗 schbirid has joined #archiveteam-bs
14:51 🔗 Mayonaise has joined #archiveteam-bs
15:03 🔗 kristian_ has quit IRC (Quit: Leaving)
15:11 🔗 drumstick has quit IRC (Ping timeout: 268 seconds)
15:46 🔗 RichardG has joined #archiveteam-bs
15:52 🔗 RichardG_ has quit IRC (Read error: Operation timed out)
16:01 🔗 Ravenloft has joined #archiveteam-bs
16:49 🔗 SketchCow In "nobody was probably tracking this than me" news, I intend to get the "screenshot the archivebot pages" thing running again soon.
16:51 🔗 Somebody2 SketchCow: good!
16:51 🔗 Somebody2 (I wasn't tracking it, but it's a neat thing, glad to see more of it happening)
16:53 🔗 Ravenloft has quit IRC (Ping timeout: 633 seconds)
17:26 🔗 hook54321 "PDF viewer support will end September 30, 2017. Beginning October 1, 2017 all viewing will be through the JPG viewer." - newspaperarchive.com
17:26 🔗 hook54321 Fate of the PDF files is unknown. We might only have until that date to grab them all.
17:35 🔗 SketchCow Where are you seeing this announcement?
17:36 🔗 Atros has quit IRC (Read error: Operation timed out)
17:51 🔗 hook54321 https://access.newspaperarchive.com/us/california/ontario/ontario-daily-report/1970/04-01/page-42
17:52 🔗 hook54321 https://usercontent.irccloud-cdn.com/file/U5RWfgeQ/Capture5.JPG
18:18 🔗 antomatic has quit IRC (Read error: Connection reset by peer)
18:18 🔗 antomatic has joined #archiveteam-bs
18:18 🔗 swebb sets mode: +o antomatic
19:09 🔗 t2t2 ping arkiver jrwr: newsgrabber is fetching http://st03.dlf.de/dlf/03/128/mp3/stream.mp3 which ends up stalling the pipeline (until the connection interrupts or disk fills up)
19:09 🔗 t2t2 I feel that isn't intentional behaviour so I'm bringing it up to your attention
19:10 🔗 JAA I believe it shouldn't keep going indefinitely, but if the disk fills up before it aborts, then that's clearly a problem.
19:10 🔗 jrwr oh
19:10 🔗 jrwr its a radio stream
19:10 🔗 jrwr lol
19:11 🔗 JAA Yeah, according to the source code, it should stop after two days. I wonder why that value was chosen...
19:11 🔗 JAA I usually take 6 hours as the cutoff.
19:12 🔗 t2t2 Item newsbuddy:warrior_15153_1503646483 if you want to check it out
19:12 🔗 JAA Relevant line: https://github.com/ArchiveTeam/NewsGrabber-Warrior/blob/master/pipeline.py#L257
19:17 🔗 t2t2 wpull saves the response to a tmpfile and simultaneously saves to the warc, so the disk space needed during the grab is doubled
19:37 🔗 hook54321 For the Newspaper Archive site, the PDF files can be accessed with these URLs. http://pdftojp2.newspaperarchive.com:8080/IIPViewerWeb/webresources/pdfdownload/100744470
19:37 🔗 hook54321 The number at the end can be changed to get a different file.
19:39 🔗 hook54321 I guess our options are to throw it at Archivebot and see how it goes, or use the Warrior for it.
19:40 🔗 Kisikilli has joined #archiveteam-bs
19:43 🔗 Kisikilli One thing I've noticed is that since Wikipedia started using an incredibly complicated module for archive links instead of a template, the spread of archived pages on wikis has really slowed down to a halt - it might be worth encouraging the use of multiple-link web archives to stuff on wikis, I tried to make a template on https://wiki.crygaia.org/view/Template:Webarchive?action=edit to copy http://wikipedia.org/wiki/Template:Webarchive ( http:/
19:49 🔗 Kisikilli has quit IRC (Quit: http://www.okay.uz/)
19:49 🔗 Kisiki has joined #archiveteam-bs
19:49 🔗 Kisiki If anyone replied I DC'd sorry
19:50 🔗 JAA Kisiki: I think your message got cut off. This is the last part we got: "to copy http://wikipedia.org/wiki/Template:Webarchive ( http:/"
19:54 🔗 Kisiki What was the first part
19:54 🔗 JAA http://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2017-08-27,Sun&sel=109#l105
19:54 🔗 Stilett0 is now known as Stiletto
19:55 🔗 hook54321 Kisiki: No one replied before you re-joined
19:56 🔗 Kisiki thanks
19:57 🔗 Kisiki One thing I've noticed is that since Wikipedia started using an incredibly complicated module for archive links instead of a template, the spread of archived pages on wikis has really slowed down to a halt - it might be worth encouraging the use of multiple-link web archives to stuff on wikis, I tried to make a template on https://wiki.crygaia.org/view/Template:Webarchive?action=edit
19:57 🔗 Kisiki to copy http://wikipedia.org/wiki/Template:Webarchive ( http://wikipedia.org/wiki/Module:Webarchive ) but I'm not sure how to get it to work because I don't know the code very well
20:08 🔗 Kisiki I know crygaia is a really tiny wiki but worldwide there doesn't seem to be any kind of template for encouraging people to save links that die later other than on Wikipedia itself, because it's so complex fan wikis and wikia etc just don't bother
20:15 🔗 Kisiki so a lot of smaller wikis are graveyards of dead links just because there was nothing out there to make it easier for people/remind people to save the pages
21:01 🔗 rocode Well this is certainly new.
21:02 🔗 rocode from WatsonReborn sent 2 minutes ago
21:02 🔗 rocode Hey I noticed you posted in this thread ( https://www.reddit.com/r/IMDbFilmGeneral/comments/62sn11/imdb_message_boards_back_up/ ) regarding IMDb shutting down their message boards earlier this year. Wanted to let you know that there's a community that archived all of IMDb's message boards over at MovieChat.org ( https://moviechat.org/ ).
21:02 🔗 rocode Took archive, put ads on it, rehosted.
21:02 🔗 rocode (Note that the first link appears to be wrong, I think they meant https://www.reddit.com/r/scifi/comments/5s0iiv/imdb_message_boards_being_removed_long_running/
21:03 🔗 rocode )
21:03 🔗 JAA Yeah, I saw that just a few weeks after the official boards went down.
21:03 🔗 rocode Got the PM about five minutes ago on reddit.
21:03 🔗 rocode Tickled me.
21:36 🔗 Kisiki Just feels like when it comes to ideas like this that are small but could change a lot of websites/wikis and help so many people just no one really cares if it's not a big visible project :(
21:42 🔗 SimpBrain has quit IRC (Ping timeout: 255 seconds)
21:45 🔗 hook54321 rocode: wait. they grabbed our archive of imdb boards, then just put ads on it? :/
21:46 🔗 hook54321 I'm gonna make sure the ads on that site get put in adblock filter lists.
21:56 🔗 drumstick has joined #archiveteam-bs
21:59 🔗 SimpBrain has joined #archiveteam-bs
22:04 🔗 tuluu has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 kimmer has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 ndiddy has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 dboard2 has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 decay has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 espes__ has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 Darkstar has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 pikhq_ has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 Lord_Nigh has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 Igloo has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 Sue has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 Fletcher_ has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 underscor has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 dxrt- has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 phuzion has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 Baljem_ has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 closure_ has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 Yurume has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 joepie91 has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 klg has quit IRC (hub.efnet.us hub.dk)
22:04 🔗 LordNigh2 has joined #archiveteam-bs
22:07 🔗 espes___ has joined #archiveteam-bs
22:07 🔗 joepie91_ has joined #archiveteam-bs
22:07 🔗 Yurume_ has joined #archiveteam-bs
22:07 🔗 Igloo_ has joined #archiveteam-bs
22:08 🔗 pikhq has joined #archiveteam-bs
22:08 🔗 decay_ has joined #archiveteam-bs
22:08 🔗 Dark_Star has joined #archiveteam-bs
22:09 🔗 phuzion_ has joined #archiveteam-bs
22:10 🔗 tuluu_ has joined #archiveteam-bs
22:11 🔗 db420 has joined #archiveteam-bs
22:11 🔗 db420 has quit IRC (Read error: Connection reset by peer)
22:12 🔗 ndiddy-pi has joined #archiveteam-bs
22:12 🔗 schbirid has quit IRC (Quit: Leaving)
22:13 🔗 Fletcher- has joined #archiveteam-bs
22:13 🔗 Baljem has joined #archiveteam-bs
22:16 🔗 closure has joined #archiveteam-bs
22:16 🔗 midas sets mode: +o closure
22:16 🔗 db420 has joined #archiveteam-bs
22:17 🔗 db420 is now known as dboard
22:19 🔗 underscor has joined #archiveteam-bs
22:19 🔗 swebb sets mode: +o underscor
22:19 🔗 decay_ is now known as decay
22:19 🔗 LordNigh2 is now known as Lord_Nigh
22:19 🔗 dboard is now known as dboard2
22:29 🔗 plue Somebody2: https://a.pomf.cat/prmxhz.xz
22:29 🔗 plue there you go
22:29 🔗 plue please note... this data is over 1.5 years old already
23:04 🔗 Honno has quit IRC (Read error: Operation timed out)
23:09 🔗 Somebody2 plue: thanks, will look
23:09 🔗 Somebody2 1.5 years old is better than what we had
23:12 🔗 Somebody2 plue: looks like it is 15,047,214 entries; 208MB uncompressed.
23:15 🔗 Somebody2 It'd be nice if someone (not me) wanted to dump that onto IA.
23:20 🔗 atrocity has joined #archiveteam-bs
23:25 🔗 plue Somebody2: just usernames really worth it? :/
23:26 🔗 plue i'd really be up to help on some organized tumblr archiving event... but it's a shitload of data
23:31 🔗 drumstick has quit IRC (Ping timeout: 268 seconds)
23:34 🔗 jrwr if we ever have to salve tumblr
23:34 🔗 jrwr sweet jesus
23:42 🔗 michciope has quit IRC (Quit: ZNC 1.6.3+deb1 - http://znc.in)
23:44 🔗 Kisiki has quit IRC (Quit: http://www.okay.uz/)
23:44 🔗 Kisikilli has joined #archiveteam-bs
23:45 🔗 Kisikilli I am going AFK and might idle out/dc but if anyone replies to my message (please please) I will be checking back on the channel log
23:45 🔗 michciope has joined #archiveteam-bs
23:59 🔗 Sue has joined #archiveteam-bs

irclogger-viewer