#archiveteam 2016-06-18,Sat

↑back Search

Time Nickname Message
00:10 πŸ”— BlueMaxim has joined #archiveteam
00:12 πŸ”— tomwsmf-a has joined #archiveteam
00:18 πŸ”— anjacks0n has joined #archiveteam
00:21 πŸ”— anjacks0n has quit IRC (Ping timeout: 190 seconds)
00:29 πŸ”— j08nY has quit IRC (Ping timeout: 633 seconds)
00:31 πŸ”— JesseW has joined #archiveteam
00:56 πŸ”— _ris has quit IRC ()
00:57 πŸ”— DoomTay has joined #archiveteam
00:57 πŸ”— DoomTay Sorry about that. What'd I miss?
00:58 πŸ”— kken has joined #archiveteam
00:58 πŸ”— kken has quit IRC (Client Quit)
01:01 πŸ”— Pudsey has joined #archiveteam
01:03 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
01:03 πŸ”— Pudsey Anyone used the wayback API? I get one result via the website but none via the API. I've tried encoding the URL but still no
01:04 πŸ”— DoomTay What's the URL?
01:11 πŸ”— Pudsey https://blip.tv/file/get/ActorAJ-DeerLakesEnding728.wmv one result from 2014 but if even if you pass the timestamp in the API request it still doesn't know about it
01:13 πŸ”— DoomTay ....yeah...
01:13 πŸ”— DoomTay http://web.archive.org/web/*/https://blip.tv/file/get/ActorAJ-DeerLakesEnding728.wmv
01:15 πŸ”— Pudsey Well that's odd I just got one result a few minutes ago.
01:16 πŸ”— DoomTay The weird thing is accessing the robots.txt file itself yields a 523
01:27 πŸ”— kristian_ has quit IRC (Leaving)
01:31 πŸ”— Stiletto has quit IRC (Ping timeout: 246 seconds)
01:40 πŸ”— Pudsey has quit IRC (Remote host closed the connection)
01:57 πŸ”— VADemon has quit IRC (Quit: left4dead)
01:58 πŸ”— redlob has quit IRC (Read error: Operation timed out)
02:00 πŸ”— schbirid has quit IRC (Read error: Connection refused)
02:02 πŸ”— Stiletto has joined #archiveteam
02:10 πŸ”— SketchCow I am getting hammered on coursera
02:10 πŸ”— SketchCow arkiver: we need all gawker and all related properties like io9
02:10 πŸ”— redlob has joined #archiveteam
02:11 πŸ”— BlueMaxim has quit IRC (Read error: Operation timed out)
02:13 πŸ”— DoomTay Maybe assemble a warrior project for the gawker business?
02:13 πŸ”— schbirid has joined #archiveteam
02:33 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
02:40 πŸ”— BartoCH has joined #archiveteam
02:46 πŸ”— godane i'm still doing the sitemap grabs
03:05 πŸ”— godane i'm starting to upload my collection of LDS Magazines i found
03:06 πŸ”— godane they have pdfs going back to 2001
03:08 πŸ”— Aranje has quit IRC (Quit: Three sheets to the wind)
03:12 πŸ”— Emcy has quit IRC (Read error: Operation timed out)
03:32 πŸ”— Whopper Arto project is about to run out jobs but there are 25,000 out. Does someone start expiring older 'out' jobs at this point?
03:45 πŸ”— DoomTay What do you mean "run out of jobs"?
03:50 πŸ”— bugcat has joined #archiveteam
03:50 πŸ”— bugcat has quit IRC (Client Quit)
03:58 πŸ”— yipdw Whopper: yes
04:21 πŸ”— Whopper cool
04:22 πŸ”— Whopper DoomTay: in the to-do part there are only < 200 jobs but in the out part there are 25,588. so there could be lots of clients sitting around doing nothing while valid jobs are assigned to inactive clients
04:25 πŸ”— Coderjoe has quit IRC (Read error: Operation timed out)
04:34 πŸ”— Coderjoe has joined #archiveteam
04:48 πŸ”— devrandom has joined #archiveteam
04:49 πŸ”— devrandom Is anyone working on downloading Coursera? They are going to delete 472 courses on June 30.
04:49 πŸ”— devrandom http://makemeflow.org/advice/2016/06/how-to-download-courseras-courses-before-theyre-gone-forever/
04:53 πŸ”— devrandom has quit IRC (Client Quit)
04:54 πŸ”— DoomTay ....
04:54 πŸ”— DoomTay Did he just pop in, drop a message, then jeave?
04:57 πŸ”— DoomTay has quit IRC (Quit: Page closed)
05:08 πŸ”— DoomTay has joined #archiveteam
05:10 πŸ”— redlob has quit IRC (Read error: Operation timed out)
05:11 πŸ”— redlob has joined #archiveteam
05:20 πŸ”— BlueMaxim has joined #archiveteam
05:23 πŸ”— anjacks0n has joined #archiveteam
05:25 πŸ”— Coderjoe http://www.openculture.com/2016/06/a-handy-guide-on-how-to-download-old-coursera-courses-before-they-disappear.html
05:25 πŸ”— Coderjoe well, I see someone else has mentioned it already
05:26 πŸ”— anjacks0n has quit IRC (Ping timeout: 190 seconds)
05:26 πŸ”— Coderjoe DoomTay: well, he did wait 5 minutes between joining and leaving. I guess he expects that channels with 199 people will have more activity.
05:27 πŸ”— DoomTay That's pretty much how I felt when I first joined
05:39 πŸ”— JesseW has joined #archiveteam
06:07 πŸ”— DoomTay has quit IRC (Quit: Page closed)
06:36 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
06:44 πŸ”— xmc we know
06:44 πŸ”— xmc you're impatient and not very understanding
06:45 πŸ”— anjacks0n has joined #archiveteam
06:48 πŸ”— anjacks0n has quit IRC (Ping timeout: 190 seconds)
07:05 πŸ”— anjacks0n has joined #archiveteam
07:08 πŸ”— Honno_ has joined #archiveteam
07:33 πŸ”— anjacks0n has quit IRC (anjacks0n)
07:54 πŸ”— tomwsmf-a has quit IRC (Read error: Operation timed out)
08:34 πŸ”— SDragon has joined #archiveteam
08:34 πŸ”— SDragon hello #archiveteam, 2 queries:
08:36 πŸ”— schbirid warning: query has timed out
08:36 πŸ”— SDragon 1, I have an internal corp wiki, that is essentially the better half of my brain, started writing it in 2004, expanded considerably in 2011, contains ~1.2MB of plaintext, and ~15K URLs. I have been browsing ancient pages of these, and to my dismay, ~40% of the links are nowhere to be found; 50% of which isn't even on archive.org . This is very not good. What tool would you recommend, to
08:36 πŸ”— SDragon which I can plug a list of URLs, and it will archive all the pages as comprehensibly as possible?
08:37 πŸ”— SDragon have tried so far: curl, httrack, wkhtmltopdf
08:37 πŸ”— SDragon curl won't pull all resources; httrack does, but configuring it to limit to one specific page is tricky; wkhtmltopdf crashes on some pages with ads
08:39 πŸ”— tomwsmf-a has joined #archiveteam
08:44 πŸ”— dxrt https://github.com/ludios/grab-site
08:44 πŸ”— dxrt should be perfect
08:46 πŸ”— SDragon dxrt, okay. How can I open a page saved in WARC format?
08:47 πŸ”— dxrt You can use something like webarchiveplayer
08:48 πŸ”— dxrt and you can also extract the contents of WARCs using various tools
08:48 πŸ”— dxrt http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem
08:49 πŸ”— SDragon okay, yeah, I now know what my weekend will look like
08:50 πŸ”— SDragon #2:
08:51 πŸ”— SDragon There exists ~20 groups, and ~15 persons of high interest (to me), who are publishing original content to facebook. That is, content not accessible anywhere else on the Internet. This is very, very not good, for all the obvious reasons archiveteam exists (and I contribute to). What tool would you recommend, to which I can plug a list of Facebook pages / FBIDs (and prolly my own auth token,
08:51 πŸ”— SDragon as they're "shared to friends only"), and it will archive all of the posts belonging to the group / person?
08:53 πŸ”— SDragon have tried so far: fb's "export" tool (only pulls my own), archiving via tools above (except WARC; none of them can do paging)
08:53 πŸ”— tomwsmf-a has quit IRC (Read error: Operation timed out)
08:56 πŸ”— PurpleSym grab-site supports phantom-js. Not sure if that’s enough to grab Facebook though.
09:00 πŸ”— PurpleSym Otherwise use a WARC recording HTTP proxy and scroll through the site manually.
09:01 πŸ”— SDragon that..... might take a while
09:01 πŸ”— SDragon also, it's not automagic. Which means it will have a high false-negative rate ( stuff needs to be pulled, but won't, because it's manual)
09:26 πŸ”— Emcy has joined #archiveteam
09:30 πŸ”— arkiver SketchCow: yeah, we're being spammed about coursera
09:30 πŸ”— arkiver I plan on setting that up this weekend so the grab can be done next week
09:31 πŸ”— HCross we need a bot that picks up any mention of "Coursera" and just goes "We're on it. Quit bugging us"
09:32 πŸ”— arkiver Okay, I'll set that up
09:32 πŸ”— HCross xD
09:33 πŸ”— SDragon YES! that was my #3 as well
09:34 πŸ”— SDragon .torrent balls of coursera, or even better: pirate edx instances of their courses would get my infomorph salivating
09:35 πŸ”— arkiver We'll get it as WARCs. After that they'll be uploaded as items to IA too
09:43 πŸ”— PurpleSym changes topic to: Archive Team: We're not archive.org | http://archiveteam.org/ | Coursera grab starting soon | lengthy/off-topic in #archiveteam-bs
09:43 πŸ”— PurpleSym HCross: ^
09:44 πŸ”— HCross :)
09:48 πŸ”— coursebug has joined #archiveteam
09:48 πŸ”— arkiver coursera
09:48 πŸ”— coursebug arkiver: We're working it.
09:50 πŸ”— arkiver HCross ^
09:51 πŸ”— HCross haha. awesome :)
09:55 πŸ”— _ris has joined #archiveteam
10:42 πŸ”— _ris is now known as ris
10:43 πŸ”— * ris ponders the practicality of backing up mapillary's content http://www.mapillary.com (it is CC licensed)
10:43 πŸ”— ris it would be *huge* though
10:50 πŸ”— schbirid iirc they would be supportive
10:55 πŸ”— ris it would just be gigs & gigs & gigs & gigs
11:00 πŸ”— Honno__ has joined #archiveteam
11:00 πŸ”— ris (i guess it depends how "supportive" they are - if they're supportive enough would they just mail an hd to archive.org?)
11:03 πŸ”— Honno_ has quit IRC (Ping timeout: 492 seconds)
11:06 πŸ”— metalcamp has joined #archiveteam
11:22 πŸ”— xhdr has quit IRC (Quit: ZNC - http://znc.in)
11:25 πŸ”— xhdr has joined #archiveteam
11:25 πŸ”— xhdr has quit IRC (Excess Flood)
11:28 πŸ”— xhdr has joined #archiveteam
11:28 πŸ”— xhdr has quit IRC (Excess Flood)
11:30 πŸ”— xhdr has joined #archiveteam
11:30 πŸ”— xhdr has quit IRC (Excess Flood)
11:33 πŸ”— xhdr has joined #archiveteam
11:33 πŸ”— xhdr has quit IRC (Excess Flood)
11:36 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
11:38 πŸ”— xhdr has joined #archiveteam
11:38 πŸ”— xhdr has quit IRC (Excess Flood)
11:42 πŸ”— xhdr has joined #archiveteam
11:42 πŸ”— xhdr has quit IRC (Excess Flood)
11:47 πŸ”— xhdr has joined #archiveteam
11:47 πŸ”— xhdr has quit IRC (Excess Flood)
11:51 πŸ”— xhdr has joined #archiveteam
11:51 πŸ”— xhdr has quit IRC (Excess Flood)
11:56 πŸ”— xhdr has joined #archiveteam
11:56 πŸ”— xhdr has quit IRC (Excess Flood)
12:00 πŸ”— xhdr has joined #archiveteam
12:00 πŸ”— xhdr has quit IRC (Excess Flood)
12:05 πŸ”— xhdr has joined #archiveteam
12:05 πŸ”— xhdr has quit IRC (Excess Flood)
12:07 πŸ”— j08nY has joined #archiveteam
12:09 πŸ”— xhdr has joined #archiveteam
12:09 πŸ”— xhdr has quit IRC (Excess Flood)
12:14 πŸ”— xhdr has joined #archiveteam
12:14 πŸ”— xhdr has quit IRC (Excess Flood)
12:15 πŸ”— PurpleSym sets mode: +b *!~xhdr@static.182.114.9.176.clients.your-server.de
12:17 πŸ”— j08nY has quit IRC (Quit: Leaving)
12:35 πŸ”— j08nY has joined #archiveteam
12:51 πŸ”— metalcamp has quit IRC (Read error: Connection reset by peer)
13:03 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
13:06 πŸ”— dashcloud has joined #archiveteam
13:08 πŸ”— Atom-- has joined #archiveteam
13:13 πŸ”— Atom__ has quit IRC (Read error: Operation timed out)
13:30 πŸ”— coursebug has quit IRC (Remote host closed the connection)
13:52 πŸ”— ris has quit IRC (Read error: Operation timed out)
14:35 πŸ”— zino I have dragged my feet long enough, time to complete Gamefront. I sent you a mail with the proposed config changes for processing arkiver.
14:49 πŸ”— WinterFox has quit IRC (Remote host closed the connection)
14:49 πŸ”— DigDug has joined #archiveteam
14:53 πŸ”— Atluxity gj zino
15:08 πŸ”— SDragon does archiveteam maintains a repo of their own, or is everything going to archive.org, and private repos only?
15:09 πŸ”— Atluxity containing what?
15:19 πŸ”— RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
15:21 πŸ”— RichardG has joined #archiveteam
15:38 πŸ”— ris has joined #archiveteam
15:56 πŸ”— Legionof7 has joined #archiveteam
15:57 πŸ”— Legionof7 I hope feelthebern.org gets archived.
15:59 πŸ”— joepie91 Legionof7: it does now
15:59 πŸ”— joepie91 :p
15:59 πŸ”— Legionof7 Wait wait
15:59 πŸ”— Legionof7 Joepie
15:59 πŸ”— Legionof7 I think I know you from somewhere
15:59 πŸ”— joepie91 very possible
16:00 πŸ”— Legionof7 Have you ever gone onto anonops?
16:00 πŸ”— Legionof7 or #reddit ?
16:00 πŸ”— joepie91 Legionof7: yes, anonops, but let's move to #archiveteam-bs for off-topic discussion :P
16:26 πŸ”— Coderjoe has quit IRC (Read error: Connection reset by peer)
16:26 πŸ”— Coderjoe has joined #archiveteam
16:27 πŸ”— Legionof7 has quit IRC (Quit: Page closed)
16:43 πŸ”— kristian_ has joined #archiveteam
16:44 πŸ”— Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~)
16:51 πŸ”— mr-b has quit IRC (Ping timeout: 246 seconds)
16:58 πŸ”— mr-b has joined #archiveteam
17:11 πŸ”— VADemon has joined #archiveteam
17:14 πŸ”— JesseW has joined #archiveteam
17:27 πŸ”— anjacks0n has joined #archiveteam
17:28 πŸ”— anjacks0n has quit IRC (Client Quit)
17:29 πŸ”— ris has quit IRC ()
17:29 πŸ”— DoomTay has joined #archiveteam
17:45 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
17:47 πŸ”— dashcloud has quit IRC (Ping timeout: 244 seconds)
17:47 πŸ”— dashcloud has joined #archiveteam
18:23 πŸ”— Honno__ has quit IRC (Ping timeout: 492 seconds)
18:30 πŸ”— ris has joined #archiveteam
18:40 πŸ”— JesseW has joined #archiveteam
19:00 πŸ”— JesseW SDragon: By "a repo" do you mean a copy of the content we grab, or a version-control repository of programs we use to grab it? We do have multiple version-control repos with various programs in them, but no, we don't have a complete 3rd copy of the content we grab -- it's just at archive.org (and random pieces in various other places).
19:08 πŸ”— JesseW ris: regarding mapillary -- please join #archiveteam-bs to discuss this further
19:18 πŸ”— MMovie has joined #archiveteam
19:19 πŸ”— anjacks0n has joined #archiveteam
19:20 πŸ”— Aranje has joined #archiveteam
19:21 πŸ”— kris33 has joined #archiveteam
19:22 πŸ”— bwn has quit IRC (Ping timeout: 244 seconds)
19:23 πŸ”— DoomTay I know you're aware of the issue with Coursera right now, but did you know someone made a script to help with downloading?
19:23 πŸ”— DoomTay https://github.com/Chillee/coursera-dl-all
19:25 πŸ”— JesseW DoomTay: yes, yes are we. :-)
19:26 πŸ”— JesseW DoomTay: btw, this channel *is* logged (and the logs are searchable) -- you can check this sort of thing. :-)
19:27 πŸ”— JesseW hm, it does look like *that* particular script hadn't been mentioned before, though
19:28 πŸ”— arkiver hmm
19:28 πŸ”— arkiver oh, bot was down
19:29 πŸ”— arkiver Who wants to run the Coursera 'We're on it' bot?
19:29 πŸ”— DoomTay Maybe make a project page on it?
19:29 πŸ”— schbirid please do
19:29 πŸ”— arkiver go ahead
19:29 πŸ”— coursebug has joined #archiveteam
19:30 πŸ”— kris33 has quit IRC (Textual IRC Client: www.textualapp.com)
19:30 πŸ”— anjacks0n has quit IRC (anjacks0n)
19:32 πŸ”— bwn has joined #archiveteam
19:40 πŸ”— tomwsmf-a has joined #archiveteam
19:48 πŸ”— DoomTay Well, I got a basic page going
19:48 πŸ”— DoomTay I think I better log off for a bit. It's getting stormy here
19:48 πŸ”— DoomTay has quit IRC (Quit: Page closed)
19:54 πŸ”— kristian_ has quit IRC (Leaving)
19:55 πŸ”— DoomTay has joined #archiveteam
19:57 πŸ”— PurpleSym sets mode: -b *!~xhdr@static.182.114.9.176.clients.your-server.de
19:57 πŸ”— xhdr has joined #archiveteam
20:03 πŸ”— nwf has joined #archiveteam
20:06 πŸ”— metalcamp has joined #archiveteam
20:13 πŸ”— zxtx has joined #archiveteam
20:36 πŸ”— JesseW has quit IRC (Read error: Operation timed out)
20:47 πŸ”— JesseW has joined #archiveteam
20:47 πŸ”— DoomTay has quit IRC (Quit: Page closed)
20:56 πŸ”— DoomTay has joined #archiveteam
21:08 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
21:24 πŸ”— Rye has quit IRC (Quit: ZNC - http://znc.in)
21:25 πŸ”— metalcamp has quit IRC (Ping timeout: 244 seconds)
21:27 πŸ”— Rye has joined #archiveteam
22:04 πŸ”— tomwsmf-a has quit IRC (Read error: Connection reset by peer)
22:11 πŸ”— DoomTay has quit IRC (Quit: Page closed)
22:18 πŸ”— DoomTay has joined #archiveteam
22:21 πŸ”— tomwsmf-a has joined #archiveteam
22:24 πŸ”— dashcloud has quit IRC (Remote host closed the connection)
22:26 πŸ”— Pudsey has joined #archiveteam
22:28 πŸ”— Pudsey has quit IRC (Remote host closed the connection)
22:34 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
22:34 πŸ”— dashcloud has joined #archiveteam
22:41 πŸ”— BartoCH has joined #archiveteam
22:48 πŸ”— ohhdemgir has joined #archiveteam
23:17 πŸ”— mutoso_ has joined #archiveteam
23:19 πŸ”— mutoso has quit IRC (Read error: Operation timed out)
23:41 πŸ”— ariscop has quit IRC (Quit: Leaving)
23:50 πŸ”— ohhdemgir has quit IRC (Read error: Operation timed out)
23:58 πŸ”— BlueMaxim has joined #archiveteam
23:59 πŸ”— ohhdemgir has joined #archiveteam

irclogger-viewer