#archiveteam-bs 2016-02-25,Thu

↑back Search

Time Nickname Message
00:00 🔗 HCross https://github.com/HarryC145/PythonBits/blob/master/al_uploader.py there we go
00:00 🔗 snape Grumble grumble kids these days grumble grumble when we used to want to share code we had to break out the carbon paper grumble grumble get off my lawn :p
00:01 🔗 HCross Haha. Copy and paste is a glorious thing
00:02 🔗 HCross Feel free to tell me how rubbish I am at Python etc
00:07 🔗 tomwsmf-a (~tomwsmfa@[redacted]) has joined #archiveteam-bs
00:10 🔗 VADemon has quit (Read error: Operation timed out)
00:28 🔗 SimpBrain has quit (Read error: Operation timed out)
00:45 🔗 RichardG (richardg86@[redacted]) has joined #archiveteam-bs
00:50 🔗 HCross Ive just realised, Al Jazeera use AWS, so are paying through the teeth for my video downloads
01:18 🔗 SketchCow S'ok
01:18 🔗 SketchCow The king can take it
01:20 🔗 pikhq Are you sure their videos are hosted on AWS?
01:20 🔗 pikhq Like, Netflix uses AWS for most of their infrastructure, but they don't really host video streaming from it.
01:26 🔗 icedice (MangaReade@[redacted]) has joined #archiveteam-bs
01:32 🔗 HCross2 Yeah, their video cdn is
01:33 🔗 ivan` grab-site users who run multi-month crawls are encouraged to upgrade for really convoluted reasons involving removing a port listener in the future
01:34 🔗 MrRadar Can grab-site crawls be paused and resumed? I know wpull supports resumption and grab-site builds on it
01:35 🔗 ivan` they cannot. does wpull resumption work for you? I tried it once on a big crawl; took 10h30m to resume
01:35 🔗 MrRadar Hmm... I've never tried it on anything big
01:35 🔗 ivan` maybe it's better now, I forgot if that was before or after the sqlalchemy fix
01:36 🔗 ivan` I have a really terrible pause/resume solution in https://github.com/ludios/grab-site/issues/58#issuecomment-186730028
01:36 🔗 ivan` you have to fight a program called CRIU until it stops spitting errors
01:37 🔗 HCross2 Isn't there the script that pauses it on low disk space
01:37 🔗 ivan` https://github.com/ludios/grab-site/blob/master/extra_docs/pause_resume_grab_sites.sh
01:38 🔗 ivan` I thought you were talking about being able to resume a crawl after rebooting
01:38 🔗 HCross2 Ah yes. I was playing with that earlier, but I run it all on a different disk
01:39 🔗 HCross2 Should be a simple change
01:39 🔗 ivan` yes
01:55 🔗 SimpBrain (~SimpleBra@[redacted]) has joined #archiveteam-bs
02:05 🔗 Start_ is now known as Start
02:09 🔗 icedice has quit (Quit: Leaving)
02:28 🔗 JesseW (~jesse@[redacted]) has joined #archiveteam-bs
02:36 🔗 JesseW has quit (Quit: Leaving.)
02:38 🔗 dashcloud has quit (Read error: Operation timed out)
02:40 🔗 JesseW (~jesse@[redacted]) has joined #archiveteam-bs
02:41 🔗 JesseW has quit (Client Quit)
02:49 🔗 dashcloud (~quassel@[redacted]) has joined #archiveteam-bs
03:31 🔗 SimpBrain has quit (Read error: Operation timed out)
03:48 🔗 FalconK (nobody@[redacted]) has joined #archiveteam-bs
04:01 🔗 dashcloud has quit (Read error: Operation timed out)
04:04 🔗 dashcloud (~quassel@[redacted]) has joined #archiveteam-bs
04:06 🔗 JesseW (~jesse@[redacted]) has joined #archiveteam-bs
04:30 🔗 snape opencongress.org shutting down March 1st, per here: https://twitter.com/knowladgeispwr/status/702671846609539072
04:36 🔗 MrRadar Thanks, put it in Archivebot
04:36 🔗 MrRadar *I put it
04:38 🔗 snape K, TYVM.
04:42 🔗 tomwsmf-a has quit (Ping timeout: 258 seconds)
04:46 🔗 MrRadar Wow, that site is *huge*
04:46 🔗 MrRadar I don't know if ArchiveBot would be able to finish it in even a month, let alone less than a week
04:47 🔗 MrRadar Can someone with a standalone grab-site instance also hit it? Grab-site runs like 10x faster than ArchiveBot
04:48 🔗 bwn has quit (Read error: Operation timed out)
04:51 🔗 snape Ugh, that might require more than grab-site to get in six days. I didn't realize quite how extensive it is...
05:00 🔗 SimpBrain (~SimpleBra@[redacted]) has joined #archiveteam-bs
05:01 🔗 JetBalsa has quit (Read error: Connection reset by peer)
05:03 🔗 yipdw I suggest emailing the operators of that site
05:03 🔗 yipdw they likely have a much faster way to get data
05:04 🔗 vitzli (~vitzli@[redacted]) has joined #archiveteam-bs
05:17 🔗 vitzli has quit (Quit: Leaving)
05:25 🔗 Sk1d has quit (Ping timeout: 250 seconds)
05:26 🔗 karen has quit (Quit: leaving)
05:33 🔗 JesseW I've emailed the Sunlight Foundation, the folks who run OpenCongress.org, asking them to keep the site up long enough for it to be copied into the Wayback Machine, and pointing them at the #archiveteam channel if they have questions. Hopefully I'll get a useful response.
05:35 🔗 Sk1d (~Sk1d@[redacted]) has joined #archiveteam-bs
05:35 🔗 MrRadar I'm scraping the site with wpull at home, so far its found over 60k URLs
05:53 🔗 MrRadar Now over 80k from 10k retrieved URLs
06:24 🔗 logan2 has quit (Read error: Connection reset by peer)
06:25 🔗 logan (~a@[redacted]) has joined #archiveteam-bs
06:26 🔗 metalcamp (~metalcamp@[redacted]) has joined #archiveteam-bs
07:44 🔗 JesseW is examining 3,458 IA identifiers that aren't darked, but that I didn't get data from in the census.
07:54 🔗 bwn (~bwn@[redacted]) has joined #archiveteam-bs
07:58 🔗 JesseW MrRadar: got a response from Clayton at the Sunlight Foundation, about OpenCongress.org !
07:59 🔗 JesseW They are very willing to facilitate a scrape for the wayback machine. They said "We don't plan to complete the shut down until early-mid March." and asked how to request a scrape.
07:59 🔗 JesseW Could you get in contact with them and discuss things further?
08:00 🔗 Sk2d (~Sk1d@[redacted]) has joined #archiveteam-bs
08:01 🔗 metalcamp has quit (Ping timeout: 252 seconds)
08:05 🔗 Sk1d has quit (hub.se irc.du.se)
08:12 🔗 JesseW Interesting -- http://cryptobin.org appears to be down
08:13 🔗 JesseW https://www.riskbasedsecurity.com/2016/02/cryptobin-down-after-dhs-fbi-leaks/
08:14 🔗 kvieta has quit (Read error: Operation timed out)
08:14 🔗 JesseW MrRadar: here's what I wrote back to the Sunlight Foundation person: https://paste.ubuntu.com/15195515/
08:14 🔗 rduser has quit (Read error: Operation timed out)
08:15 🔗 mr-b has quit (Read error: Operation timed out)
08:15 🔗 beardicus has quit (Read error: Operation timed out)
08:16 🔗 botpie91 has quit (Read error: Operation timed out)
08:16 🔗 closure has quit (Read error: Operation timed out)
08:16 🔗 kvieta (~kvieta@[redacted]) has joined #archiveteam-bs
08:16 🔗 botpie91 (~botpie91@[redacted]) has joined #archiveteam-bs
08:16 🔗 remsen has quit (Read error: Operation timed out)
08:17 🔗 closure (~lambda@[redacted]) has joined #archiveteam-bs
08:17 🔗 beardicus (~beardicus@[redacted]) has joined #archiveteam-bs
08:19 🔗 remsen (~remsen@[redacted]) has joined #archiveteam-bs
08:19 🔗 JesseW Ah, I'd gotten cryptobin and 0bin confused...
08:19 🔗 JesseW 0bin is still doing just fine, apparently.
08:20 🔗 Sk2d is now known as Sk1d
08:21 🔗 mr-b (~mr-b@[redacted]) has joined #archiveteam-bs
08:29 🔗 JesseW has quit (Ping timeout: 252 seconds)
08:29 🔗 rduser (~rduser@[redacted]) has joined #archiveteam-bs
08:29 🔗 schbirid (~schbirid4@[redacted]) has joined #archiveteam-bs
08:30 🔗 toad1 has quit (Read error: Operation timed out)
08:39 🔗 toad1 (~toad@[redacted]) has joined #archiveteam-bs
08:50 🔗 kvieta has quit (Read error: Operation timed out)
08:50 🔗 mr-b has quit (Read error: Operation timed out)
08:52 🔗 beardicus has quit (Read error: Operation timed out)
08:52 🔗 botpie91 has quit (Read error: Operation timed out)
08:52 🔗 remsen has quit (Read error: Operation timed out)
08:53 🔗 closure has quit (Read error: Operation timed out)
08:54 🔗 botpie91 (~botpie91@[redacted]) has joined #archiveteam-bs
08:56 🔗 closure (~lambda@[redacted]) has joined #archiveteam-bs
08:56 🔗 kvieta (~kvieta@[redacted]) has joined #archiveteam-bs
08:57 🔗 beardicus (~beardicus@[redacted]) has joined #archiveteam-bs
08:59 🔗 remsen (~remsen@[redacted]) has joined #archiveteam-bs
09:00 🔗 mr-b (~mr-b@[redacted]) has joined #archiveteam-bs
09:02 🔗 godane has quit (Quit: Leaving.)
09:29 🔗 snape has quit (Remote host closed the connection)
09:45 🔗 bwn has quit (Ping timeout: 499 seconds)
10:27 🔗 godane (~slacker@[redacted]) has joined #archiveteam-bs
10:40 🔗 bwn (~bwn@[redacted]) has joined #archiveteam-bs
10:42 🔗 bwn_ (~bwn@[redacted]) has joined #archiveteam-bs
10:53 🔗 bwn has quit (Read error: Operation timed out)
10:59 🔗 xmc has quit (Read error: Operation timed out)
11:01 🔗 achip has quit (Ping timeout: 258 seconds)
11:01 🔗 godane has quit (Ping timeout: 258 seconds)
11:01 🔗 schbirid has quit (Ping timeout: 258 seconds)
11:03 🔗 schbirid (~schbirid4@[redacted]) has joined #archiveteam-bs
11:04 🔗 vtyl has quit (Read error: Connection reset by peer)
11:04 🔗 xmc (~chronomex@[redacted]) has joined #archiveteam-bs
11:05 🔗 swebb gives channel operator status to xmc
11:05 🔗 espes___ has quit (Remote host closed the connection)
11:06 🔗 godane (~slacker@[redacted]) has joined #archiveteam-bs
11:08 🔗 achip (~thechip@[redacted]) has joined #archiveteam-bs
11:18 🔗 espes__ (~espes@[redacted]) has joined #archiveteam-bs
11:19 🔗 lytv (~lytv@[redacted]) has joined #archiveteam-bs
13:51 🔗 vitzli (~vitzli@[redacted]) has joined #archiveteam-bs
13:51 🔗 vitzli and it only took 10 minutes to connect to EFNet :) SlySoft.com closed, but forum is still up
13:55 🔗 VADemon (~VADemon@[redacted]) has joined #archiveteam-bs
14:40 🔗 godane SketchCow: this is the one with a mp3 at 72+ hours: https://archive.org/details/kpfa-archives-radio-podcast-2009-09-29
15:19 🔗 VADemon has quit (Read error: Connection reset by peer)
15:31 🔗 brayden has quit (Quit: Leaving)
15:34 🔗 snape (~snape@[redacted]) has joined #archiveteam-bs
16:00 🔗 bauruine has quit (Ping timeout: 260 seconds)
16:04 🔗 bauruine (~bauruine@[redacted]) has joined #archiveteam-bs
16:10 🔗 brayden (~brayden@[redacted]) has joined #archiveteam-bs
16:10 🔗 swebb gives channel operator status to brayden
16:26 🔗 JesseW (~jesse@[redacted]) has joined #archiveteam-bs
16:29 🔗 midas has quit (Quit: WeeChat 1.3)
16:30 🔗 midas (~midas@[redacted]) has joined #archiveteam-bs
16:43 🔗 JesseW has quit (Quit: Leaving.)
16:45 🔗 xXx_ndidd (~Nathan@[redacted]) has joined #archiveteam-bs
16:45 🔗 vitzli has quit (Quit: Leaving)
16:47 🔗 logchfoo1 starts logging #archiveteam-bs at Thu Feb 25 16:47:23 2016
16:47 🔗 logchfoo1 has joined #archiveteam-bs
16:47 🔗 bwn_ has quit IRC (Read error: Operation timed out)
16:49 🔗 metalcamp has joined #archiveteam-bs
16:58 🔗 ndiddy has quit IRC (Read error: Operation timed out)
17:01 🔗 tomwsmf-a has joined #archiveteam-bs
17:24 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
17:29 🔗 mismatch_ has quit IRC (Ping timeout: 260 seconds)
17:36 🔗 mismatch_ has joined #archiveteam-bs
17:53 🔗 matthusby has joined #archiveteam-bs
17:54 🔗 SadDM has joined #archiveteam-bs
17:54 🔗 swebb sets mode: +o SadDM
17:56 🔗 jspiros has joined #archiveteam-bs
18:12 🔗 xmc is now known as butts
18:13 🔗 butts is now known as xmc
18:59 🔗 lytv has quit IRC (Read error: Operation timed out)
19:02 🔗 lytv has joined #archiveteam-bs
19:03 🔗 JW_work1 has quit IRC (Read error: Operation timed out)
19:20 🔗 JW_work has joined #archiveteam-bs
20:12 🔗 Protab has joined #archiveteam-bs
20:51 🔗 xXx_ndidd has quit IRC (Ping timeout: 252 seconds)
21:13 🔗 schbirid SIGH https://mta.openssl.org/pipermail/openssl-announce/2016-February/000063.html
21:32 🔗 JW_work1 has joined #archiveteam-bs
21:33 🔗 JW_work has quit IRC (Read error: Operation timed out)
21:37 🔗 Famicoma1 has joined #archiveteam-bs
21:37 🔗 Famicoma1 has quit IRC (Client Quit)
21:38 🔗 bwn has joined #archiveteam-bs
21:40 🔗 JW_work1 has quit IRC (Ping timeout: 362 seconds)
21:44 🔗 zino The last year has been a good year not running on OpenSSL based SSL.
21:46 🔗 FalconK the changelog for 1.0.2 on that release is light on the vuln fixes unless it's just not up to date yet
21:55 🔗 metalcamp has quit IRC (Ping timeout: 252 seconds)
21:57 🔗 godane 67 hours in one mp3: https://archive.org/details/kpfa-archives-radio-podcast-2009-10-09
21:58 🔗 godane its cause it was another fund drive special
21:58 🔗 godane https://kpfa.org/archives/2009/10/9/
22:05 🔗 schbirid has quit IRC (Quit: Leaving)
22:10 🔗 Famicoma1 has joined #archiveteam-bs
22:15 🔗 godane SketchCow: i'm up to 2009-10-31 with kpfa
22:17 🔗 ndiddy has joined #archiveteam-bs
22:25 🔗 Boppen has joined #archiveteam-bs
22:30 🔗 JW_work has joined #archiveteam-bs
22:31 🔗 JW_work1 has joined #archiveteam-bs
22:36 🔗 JW_work has quit IRC (Ping timeout: 362 seconds)
22:50 🔗 Chorca1 has quit IRC (Read error: Operation timed out)
22:51 🔗 yipdw this feels weird to say, but gitlab releases too fast for me :?
22:51 🔗 tomwsmf-a has joined #archiveteam-bs
22:52 🔗 yipdw like I installed 8.5 not too long ago and now they have a patch release
22:52 🔗 yipdw I mean I really like the fact that they're on the ball
22:52 🔗 yipdw it's just like "wat"
22:52 🔗 yipdw er 8.4 that is
22:54 🔗 Famicoma1 has quit IRC (Quit: leaving)
22:54 🔗 Famicoma1 has joined #archiveteam-bs
22:54 🔗 Chorca has joined #archiveteam-bs
23:08 🔗 xXx_ndidd has joined #archiveteam-bs
23:10 🔗 HCross2 swebb: just an FYI - Al Jazeera seem to be throttling me, I cant get more than 4 videos at once, and then they are also limiting me to a max of 60Mbps outbound
23:10 🔗 xmc what monsters
23:11 🔗 HCross2 there is a lot of content to get
23:11 🔗 snape Sketchcow, did you see the stuff this morning about OpenCongress? Goes away in a week, folks who run it were contacted and I guess they're quite amenable to enduring a deep crawl for posterity. Looks like it might be 200-300k URLs, all told. Not sure who's got what planned, if anything, though...
23:12 🔗 xmc oh yeah for those of you where panicking last week, ia brought 2.5P of new disk online https://archive.org/~tracey/mrtg/df.html
23:13 🔗 xmc right on cue
23:16 🔗 SketchCow Well, have at
23:20 🔗 ndiddy has quit IRC (Read error: Operation timed out)
23:23 🔗 JW_work1 snape, SketchCow: yeah, I suggested they run a crawler locally, then upload it to IA, and if so, to contact SketchCow about getting it into the Wayback Machine. Otherwise, I warned them about archivebot working on it, but being slow.
23:36 🔗 godane SketchCow: i'm uploading some higher res copies of EGM
23:37 🔗 godane for example issue 025 is only 60mb in your copy
23:37 🔗 godane but i have one thats close 230mb

irclogger-viewer