#archiveteam-bs 2017-08-27,Sun

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
joepie91godane: heh, that open directory with DJ sets and such got a little blasted with bandwidth usage
and I think I'm seeing myself on their bandwidth graph...
[00:21]
***ZexaronS has joined #archiveteam-bs [00:35]
.... (idle for 15mn)
BlueMaxim has joined #archiveteam-bs [00:50]
......... (idle for 42mn)
schbirid has joined #archiveteam-bs
username1 has quit IRC (Read error: Operation timed out)
[01:32]
namibj_ has joined #archiveteam-bs [01:47]
namibj_Is there any effort made regarding tumblr archival? I can get a list of about the 10% most popular tumblr blogs/nicks [01:49]
***plue has joined #archiveteam-bs [01:51]
Ravenloft has joined #archiveteam-bs [01:58]
ZexaronS has quit IRC (Leaving) [02:05]
...... (idle for 29mn)
Somebody2namibj_: We've been grabbing random tumblr blogs as we come across them -- your list would be *VERY* welcome.
Please post it on the wiki, or a pastebin, or somewhere.
[02:34]
hook54321!d 3l37rv80rmmc92293j057ce7n 50 160
oops
[02:37]
namibj_Somebody2: direct the asking to plue , as he grabbed it (and got a nice letter telling him to stop hitting their private api so hard)
he'll be bavk in about 10h, i guess.
the letter: https://pastebin.com/xHnCygvq
[02:42]
Somebody2namibj_: cool, will do
plue: When you get back, please post your list somewhere!
[02:51]
***Mayonaise has quit IRC (Read error: Operation timed out) [02:52]
Mayonaise has joined #archiveteam-bs
Ravenloft has quit IRC (Ping timeout: 506 seconds)
[03:02]
.......... (idle for 45mn)
Ravenloft has joined #archiveteam-bs
qw3rty114 has joined #archiveteam-bs
[03:48]
godanei'm at 5612 items this month [03:54]
***qw3rty113 has quit IRC (Ping timeout: 600 seconds)
Ravenloft has quit IRC (Ping timeout: 260 seconds)
[03:57]
Sk1d has quit IRC (Ping timeout: 194 seconds) [04:11]
Sk1d has joined #archiveteam-bs [04:17]
Aranje has joined #archiveteam-bs [04:25]
.... (idle for 19mn)
Odd0002 has quit IRC (ZNC - http://znc.in)
Odd0002 has joined #archiveteam-bs
[04:44]
Stilett0 has joined #archiveteam-bs [04:53]
......... (idle for 41mn)
Aranje has quit IRC (Three sheets to the wind) [05:34]
zyphlar has quit IRC (Quit: Connection closed for inactivity) [05:48]
......... (idle for 44mn)
joepie91SketchCow: Q: does the Wayback Machine automatically strip ?utm_* junk when parsing URLs from WARCs and user search input? [06:32]
SketchCowNO idea [06:37]
***mls has joined #archiveteam-bs [06:45]
Honno has joined #archiveteam-bs [06:57]
....... (idle for 34mn)
BlueMaxim has quit IRC (Ping timeout: 268 seconds)
BlueMaxim has joined #archiveteam-bs
[07:31]
........ (idle for 36mn)
Mateon1 has quit IRC (Remote host closed the connection)
Mateon1 has joined #archiveteam-bs
[08:08]
.... (idle for 19mn)
Mateon1 has quit IRC (Ping timeout: 245 seconds)
Mateon1 has joined #archiveteam-bs
[08:31]
..... (idle for 21mn)
drumstick has quit IRC (Ping timeout: 268 seconds) [08:52]
drumstick has joined #archiveteam-bs [09:04]
.............. (idle for 1h7mn)
GLaDOS has quit IRC (Remote host closed the connection) [10:11]
Mayonaise has quit IRC (Read error: Operation timed out) [10:20]
........ (idle for 38mn)
BlueMaxim has quit IRC (Quit: Leaving) [10:58]
..... (idle for 21mn)
drumstick has quit IRC (Ping timeout: 268 seconds) [11:19]
schbirid has quit IRC (Quit: Leaving) [11:32]
............ (idle for 56mn)
ZexaronS has joined #archiveteam-bs [12:28]
.............. (idle for 1h8mn)
kittymeowhttps://wikipedia.org/wiki/User_talk:84.251.84.243?diff=prev&oldid=770831446 ---- https://wikipedia.org/wiki/Wayback_Machine?diff=prev&oldid=765542495 ---- https://wikipedia.org/wiki/Help:Using_the_Wayback_Machine?diff=prev&oldid=765542701 [13:36]
***schbirid has joined #archiveteam-bs [13:45]
SimpBrain has quit IRC (west.us.hub irc.Prison.NET)
brayden has quit IRC (west.us.hub irc.Prison.NET)
zerkalo has quit IRC (west.us.hub irc.Prison.NET)
greenie has quit IRC (west.us.hub irc.Prison.NET)
[13:51]
kristian_ has joined #archiveteam-bs
SimpBrain has joined #archiveteam-bs
brayden has joined #archiveteam-bs
zerkalo has joined #archiveteam-bs
greenie has joined #archiveteam-bs
irc.Prison.NET sets mode: +o brayden
swebb sets mode: +o brayden
[13:57]
..... (idle for 21mn)
schbirid has quit IRC (Quit: Leaving)
RichardG_ has joined #archiveteam-bs
RichardG has quit IRC (Read error: Connection reset by peer)
[14:18]
drumstick has joined #archiveteam-bs [14:26]
.... (idle for 18mn)
zhongfu has quit IRC (Ping timeout: 260 seconds)
zhongfu has joined #archiveteam-bs
schbirid has joined #archiveteam-bs
[14:44]
Mayonaise has joined #archiveteam-bs [14:51]
kristian_ has quit IRC (Quit: Leaving) [15:03]
drumstick has quit IRC (Ping timeout: 268 seconds) [15:11]
........ (idle for 35mn)
RichardG has joined #archiveteam-bs [15:46]
RichardG_ has quit IRC (Read error: Operation timed out) [15:52]
Ravenloft has joined #archiveteam-bs [16:01]
.......... (idle for 48mn)
SketchCowIn "nobody was probably tracking this than me" news, I intend to get the "screenshot the archivebot pages" thing running again soon. [16:49]
Somebody2SketchCow: good!
(I wasn't tracking it, but it's a neat thing, glad to see more of it happening)
[16:51]
***Ravenloft has quit IRC (Ping timeout: 633 seconds) [16:53]
....... (idle for 33mn)
hook54321"PDF viewer support will end September 30, 2017. Beginning October 1, 2017 all viewing will be through the JPG viewer." - newspaperarchive.com
Fate of the PDF files is unknown. We might only have until that date to grab them all.
[17:26]
SketchCowWhere are you seeing this announcement? [17:35]
***Atros has quit IRC (Read error: Operation timed out) [17:36]
.... (idle for 15mn)
hook54321https://access.newspaperarchive.com/us/california/ontario/ontario-daily-report/1970/04-01/page-42
https://usercontent.irccloud-cdn.com/file/U5RWfgeQ/Capture5.JPG
[17:51]
...... (idle for 26mn)
***antomatic has quit IRC (Read error: Connection reset by peer)
antomatic has joined #archiveteam-bs
swebb sets mode: +o antomatic
[18:18]
........... (idle for 51mn)
t2t2ping arkiver jrwr: newsgrabber is fetching http://st03.dlf.de/dlf/03/128/mp3/stream.mp3 which ends up stalling the pipeline (until the connection interrupts or disk fills up)
I feel that isn't intentional behaviour so I'm bringing it up to your attention
[19:09]
JAAI believe it shouldn't keep going indefinitely, but if the disk fills up before it aborts, then that's clearly a problem. [19:10]
jrwroh
its a radio stream
lol
[19:10]
JAAYeah, according to the source code, it should stop after two days. I wonder why that value was chosen...
I usually take 6 hours as the cutoff.
[19:11]
t2t2Item newsbuddy:warrior_15153_1503646483 if you want to check it out [19:12]
JAARelevant line: https://github.com/ArchiveTeam/NewsGrabber-Warrior/blob/master/pipeline.py#L257 [19:12]
t2t2wpull saves the response to a tmpfile and simultaneously saves to the warc, so the disk space needed during the grab is doubled [19:17]
..... (idle for 20mn)
hook54321For the Newspaper Archive site, the PDF files can be accessed with these URLs. http://pdftojp2.newspaperarchive.com:8080/IIPViewerWeb/webresources/pdfdownload/100744470
The number at the end can be changed to get a different file.
I guess our options are to throw it at Archivebot and see how it goes, or use the Warrior for it.
[19:37]
***Kisikilli has joined #archiveteam-bs [19:40]
KisikilliOne thing I've noticed is that since Wikipedia started using an incredibly complicated module for archive links instead of a template, the spread of archived pages on wikis has really slowed down to a halt - it might be worth encouraging the use of multiple-link web archives to stuff on wikis, I tried to make a template on https://wiki.crygaia.org/view/Template:Webarchive?action=edit to copy http://wikipedia.org/wiki/Template:Webarchive ( http:/ [19:43]
***Kisikilli has quit IRC (Quit: http://www.okay.uz/)
Kisiki has joined #archiveteam-bs
[19:49]
KisikiIf anyone replied I DC'd sorry [19:49]
JAAKisiki: I think your message got cut off. This is the last part we got: "to copy http://wikipedia.org/wiki/Template:Webarchive ( http:/" [19:50]
KisikiWhat was the first part [19:54]
JAAhttp://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2017-08-27,Sun&sel=109#l105 [19:54]
***Stilett0 is now known as Stiletto [19:54]
hook54321Kisiki: No one replied before you re-joined [19:55]
Kisikithanks
One thing I've noticed is that since Wikipedia started using an incredibly complicated module for archive links instead of a template, the spread of archived pages on wikis has really slowed down to a halt - it might be worth encouraging the use of multiple-link web archives to stuff on wikis, I tried to make a template on https://wiki.crygaia.org/view/Template:Webarchive?action=edit
to copy http://wikipedia.org/wiki/Template:Webarchive ( http://wikipedia.org/wiki/Module:Webarchive ) but I'm not sure how to get it to work because I don't know the code very well
[19:56]
I know crygaia is a really tiny wiki but worldwide there doesn't seem to be any kind of template for encouraging people to save links that die later other than on Wikipedia itself, because it's so complex fan wikis and wikia etc just don't bother [20:08]
so a lot of smaller wikis are graveyards of dead links just because there was nothing out there to make it easier for people/remind people to save the pages [20:15]
.......... (idle for 46mn)
rocodeWell this is certainly new.
from WatsonReborn sent 2 minutes ago
Hey I noticed you posted in this thread ( https://www.reddit.com/r/IMDbFilmGeneral/comments/62sn11/imdb_message_boards_back_up/ ) regarding IMDb shutting down their message boards earlier this year. Wanted to let you know that there's a community that archived all of IMDb's message boards over at MovieChat.org ( https://moviechat.org/ ).
Took archive, put ads on it, rehosted.
(Note that the first link appears to be wrong, I think they meant https://www.reddit.com/r/scifi/comments/5s0iiv/imdb_message_boards_being_removed_long_running/
)
[21:01]
JAAYeah, I saw that just a few weeks after the official boards went down. [21:03]
rocodeGot the PM about five minutes ago on reddit.
Tickled me.
[21:03]
....... (idle for 33mn)
KisikiJust feels like when it comes to ideas like this that are small but could change a lot of websites/wikis and help so many people just no one really cares if it's not a big visible project :( [21:36]
***SimpBrain has quit IRC (Ping timeout: 255 seconds) [21:42]
hook54321rocode: wait. they grabbed our archive of imdb boards, then just put ads on it? :/
I'm gonna make sure the ads on that site get put in adblock filter lists.
[21:45]
***drumstick has joined #archiveteam-bs
SimpBrain has joined #archiveteam-bs
[21:56]
tuluu has quit IRC (hub.efnet.us hub.dk)
kimmer has quit IRC (hub.efnet.us hub.dk)
ndiddy has quit IRC (hub.efnet.us hub.dk)
dboard2 has quit IRC (hub.efnet.us hub.dk)
decay has quit IRC (hub.efnet.us hub.dk)
espes__ has quit IRC (hub.efnet.us hub.dk)
Darkstar has quit IRC (hub.efnet.us hub.dk)
pikhq_ has quit IRC (hub.efnet.us hub.dk)
Lord_Nigh has quit IRC (hub.efnet.us hub.dk)
Igloo has quit IRC (hub.efnet.us hub.dk)
Sue has quit IRC (hub.efnet.us hub.dk)
Fletcher_ has quit IRC (hub.efnet.us hub.dk)
underscor has quit IRC (hub.efnet.us hub.dk)
dxrt- has quit IRC (hub.efnet.us hub.dk)
phuzion has quit IRC (hub.efnet.us hub.dk)
Baljem_ has quit IRC (hub.efnet.us hub.dk)
closure_ has quit IRC (hub.efnet.us hub.dk)
Yurume has quit IRC (hub.efnet.us hub.dk)
joepie91 has quit IRC (hub.efnet.us hub.dk)
klg has quit IRC (hub.efnet.us hub.dk)
LordNigh2 has joined #archiveteam-bs
espes___ has joined #archiveteam-bs
joepie91_ has joined #archiveteam-bs
Yurume_ has joined #archiveteam-bs
Igloo_ has joined #archiveteam-bs
pikhq has joined #archiveteam-bs
decay_ has joined #archiveteam-bs
Dark_Star has joined #archiveteam-bs
phuzion_ has joined #archiveteam-bs
tuluu_ has joined #archiveteam-bs
db420 has joined #archiveteam-bs
db420 has quit IRC (Read error: Connection reset by peer)
ndiddy-pi has joined #archiveteam-bs
schbirid has quit IRC (Quit: Leaving)
Fletcher- has joined #archiveteam-bs
Baljem has joined #archiveteam-bs
closure has joined #archiveteam-bs
midas sets mode: +o closure
db420 has joined #archiveteam-bs
db420 is now known as dboard
underscor has joined #archiveteam-bs
swebb sets mode: +o underscor
decay_ is now known as decay
LordNigh2 is now known as Lord_Nigh
dboard is now known as dboard2
[22:04]
plueSomebody2: https://a.pomf.cat/prmxhz.xz
there you go
please note... this data is over 1.5 years old already
[22:29]
........ (idle for 35mn)
***Honno has quit IRC (Read error: Operation timed out) [23:04]
Somebody2plue: thanks, will look
1.5 years old is better than what we had
plue: looks like it is 15,047,214 entries; 208MB uncompressed.
It'd be nice if someone (not me) wanted to dump that onto IA.
[23:09]
***atrocity has joined #archiveteam-bs [23:20]
plueSomebody2: just usernames really worth it? :/
i'd really be up to help on some organized tumblr archiving event... but it's a shitload of data
[23:25]
***drumstick has quit IRC (Ping timeout: 268 seconds) [23:31]
jrwrif we ever have to salve tumblr
sweet jesus
[23:34]
***michciope has quit IRC (Quit: ZNC 1.6.3+deb1 - http://znc.in)
Kisiki has quit IRC (Quit: http://www.okay.uz/)
Kisikilli has joined #archiveteam-bs
[23:42]
KisikilliI am going AFK and might idle out/dc but if anyone replies to my message (please please) I will be checking back on the channel log [23:45]
***michciope has joined #archiveteam-bs [23:45]
Sue has joined #archiveteam-bs [23:59]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)